# How to Use Text Rank

# [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/alvaro-francisco-gil/text-rank/blob/main/examples/using_text_rank.ipynb)


## Install  Library

In [1]:
# !pip install git+https://github.com/alvaro-francisco-gil/text-rank.git

## Basic Usage

In [2]:
from text_rank import TextRankKeywordExtractor

extractor = TextRankKeywordExtractor(window_size=5)

In [3]:
text = """
Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence 
concerned with the interactions between computers and human language. It is used to apply algorithms to identify 
and extract the natural language rules such that unstructured language data is converted into a form that computers 
can understand.
"""

keywords = extractor.extract_keywords(text, top_n=10)
for word, score in keywords:
    print(f"{word}: {score:.4f}")

language: 0.1190
computers: 0.0784
natural: 0.0774
science: 0.0560
computer: 0.0558
artificial: 0.0557
linguistics: 0.0555
intelligence: 0.0555
interactions: 0.0551
rules: 0.0548


## Exporting the Co-occurrence Graph

In [12]:
# Build and export the graph
graph = extractor.build_cooccurrence_graph(text)
extractor.export_pajek(graph, '../data/natural_language_example_graph.net')

## Analyzing Text Files

In [5]:
import os

file_path = os.path.join("..", "data", "text_examples", "C-41.txt.final")

In [6]:
from text_rank.utils import analyze_text_file

keywords = analyze_text_file(file_path)

In [7]:
for i, (word, score) in enumerate(keywords):
    print(f"{word}: {score:.4f}")
    if i == 4:
        break

resource: 0.0435
system: 0.0421
video: 0.0320
utilization: 0.0269
application: 0.0216


## Exporting Pajek Multiple Files

In [9]:
# Get all text files from data/text_examples directory
text_files_dir = os.path.join('..', 'data', 'text_examples')
file_paths = [
    os.path.join(text_files_dir, f) 
    for f in os.listdir(text_files_dir)
]

In [11]:
from text_rank.utils import export_multiple_graphs_to_pajek

result = export_multiple_graphs_to_pajek(
    file_paths=file_paths,
    output_dir=r'../data',
    window_size=5,
    single_file=True
)