# How to Use Text Rank

# [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/alvaro-francisco-gil/text-rank/blob/main/examples/using_text_rank.ipynb)


## Install  Library

In [1]:
!pip install git+https://github.com/alvaro-francisco-gil/text-rank.git

Collecting git+https://github.com/alvaro-francisco-gil/text-rank.git
  Cloning https://github.com/alvaro-francisco-gil/text-rank.git to c:\users\alvar\appdata\local\temp\pip-req-build-zfufp3pd
  Resolved https://github.com/alvaro-francisco-gil/text-rank.git to commit 362e4434bfc3f613f90c04ec50c01e4ccbcae6f1
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'


  Running command git clone --filter=blob:none --quiet https://github.com/alvaro-francisco-gil/text-rank.git 'C:\Users\alvar\AppData\Local\Temp\pip-req-build-zfufp3pd'


## Basic Usage

In [2]:
from text_rank import TextRankKeywordExtractor

extractor = TextRankKeywordExtractor(window_size=5)

In [3]:
text = """
Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence 
concerned with the interactions between computers and human language. It is used to apply algorithms to identify 
and extract the natural language rules such that unstructured language data is converted into a form that computers 
can understand.
"""

keywords = extractor.extract_keywords(text, top_n=10)
for word, score in keywords:
    print(f"{word}: {score:.4f}")

language: 0.1190
computers: 0.0784
natural: 0.0774
science: 0.0560
computer: 0.0558
artificial: 0.0557
linguistics: 0.0555
intelligence: 0.0555
interactions: 0.0551
rules: 0.0548


## Exporting the Co-occurrence Graph

In [5]:
# Build and export the graph
graph = extractor.build_cooccurrence_graph(text)
extractor.export_pajek(graph, '../data/example_graph.net')

## Analyzing Text Files

In [6]:
from text_rank.utils import analyze_text_file

# Analyze a single file (all keywords)
keywords = analyze_text_file('../data\text_examples\._C-41.txt.tagged')

In [7]:
for i, (word, score) in enumerate(keywords):
    print(f"{word}: {score:.4f}")
    if i == 4:
        break

thou: 0.0144
thy: 0.0137
good: 0.0098
thee: 0.0083
man: 0.0075


## Exporting Pajek Multiple Files

In [None]:
import os

# Get all text files from data/text_examples directory
text_files_dir = os.path.join('..', 'data', 'text_examples')
file_paths = [
    os.path.join(text_files_dir, f) 
    for f in os.listdir(text_files_dir) 
    if f.endswith('.txt')
]

In [None]:
from text_rank.utils import export_multiple_graphs_to_pajek

In [None]:


result = export_multiple_graphs_to_pajek(
    file_paths=file_paths,
    output_dir=text_files_dir,
    window_size=5,
    single_file=True
)