## Word Frequency (Counter)

In [1]:
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from collections import Counter
import nltk

nltk.download("punkt")
nltk.download("stopwords")

def summarize_word_frequency(text, n):
    sentences = sent_tokenize(text)
    stop_words = set(stopwords.words("english"))
    words = [word.lower() for word in word_tokenize(text) if word.isalnum() and word.lower() not in stop_words]

    word_freq = Counter(words)
    sentence_scores = {}

    for sentence in sentences:
        sentence_words = [word.lower() for word in word_tokenize(sentence) if word.isalnum() and word.lower() not in stop_words]
        sentence_scores[sentence] = sum([word_freq[word] for word in sentence_words])

    summary_sentences = sorted(sentence_scores, key=sentence_scores.get, reverse=True)[:n]
    return " ".join(summary_sentences)

[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /usr/share/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


## TF-IDF

In [2]:
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.tokenize import sent_tokenize
import numpy as np

def summarize_tfidf_similarity(text, n):
    sentences = sent_tokenize(text)
    vectorizer = TfidfVectorizer(stop_words="english")
    tfidf_matrix = vectorizer.fit_transform(sentences)

    similarity_matrix = tfidf_matrix @ tfidf_matrix.T
    scores = similarity_matrix.sum(axis=1).A1

    top_sentence_indices = sorted(range(len(scores)), key=lambda i: -scores[i])[:n]
    summary_sentences = [sentences[i] for i in sorted(top_sentence_indices)]
    return " ".join(summary_sentences)

## Implementation

In [3]:
def summarize_text(text, n, method="word_frequency"):
    if method == "word_frequency":
        return summarize_word_frequency(text, n)
    elif method == "tfidf_similarity":
        return summarize_tfidf_similarity(text, n)
    else:
        raise ValueError("Invalid method. Choose 'word_frequency', or 'tfidf_similarity'.")

In [4]:
Wikipedia_text = """
Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines, particularly computer systems. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals. Such machines may be called AIs.

High-profile applications of AI include advanced web search engines (e.g., Google Search); recommendation systems (used by YouTube, Amazon, and Netflix); virtual assistants (e.g., Google Assistant, Siri, and Alexa); autonomous vehicles (e.g., Waymo); generative and creative tools (e.g., ChatGPT and AI art); and superhuman play and analysis in strategy games (e.g., chess and Go). However, many AI applications are not perceived as AI: "A lot of cutting edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore."

Various subfields of AI research are centered around particular goals and the use of particular tools. The traditional goals of AI research include reasoning, knowledge representation, planning, learning, natural language processing, perception, and support for robotics.[a] General intelligence—the ability to complete any task performed by a human on an at least equal level—is among the field's long-term goals. To reach these goals, AI researchers have adapted and integrated a wide range of techniques, including search and mathematical optimization, formal logic, artificial neural networks, and methods based on statistics, operations research, and economics.[b] AI also draws upon psychology, linguistics, philosophy, neuroscience, and other fields.

Artificial intelligence was founded as an academic discipline in 1956, and the field went through multiple cycles of optimism, followed by periods of disappointment and loss of funding, known as AI winters. Funding and interest vastly increased after 2012 when deep learning outperformed previous AI techniques. This growth accelerated further after 2017 with the transformer architecture, and by the early 2020s hundreds of billions of dollars were being invested in AI (known as the "AI boom"). The widespread use of AI in the 21st century exposed several unintended consequences and harms in the present and raised concerns about its risks and long-term effects in the future, prompting discussions about regulatory policies to ensure the safety and benefits of the technology.
"""
print(Wikipedia_text)


Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines, particularly computer systems. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals. Such machines may be called AIs.

High-profile applications of AI include advanced web search engines (e.g., Google Search); recommendation systems (used by YouTube, Amazon, and Netflix); virtual assistants (e.g., Google Assistant, Siri, and Alexa); autonomous vehicles (e.g., Waymo); generative and creative tools (e.g., ChatGPT and AI art); and superhuman play and analysis in strategy games (e.g., chess and Go). However, many AI applications are not perceived as AI: "A lot of cutting edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common eno

In [5]:
counter_summary = summarize_text(Wikipedia_text, 5, method="word_frequency")
print(counter_summary)

However, many AI applications are not perceived as AI: "A lot of cutting edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore." High-profile applications of AI include advanced web search engines (e.g., Google Search); recommendation systems (used by YouTube, Amazon, and Netflix); virtual assistants (e.g., Google Assistant, Siri, and Alexa); autonomous vehicles (e.g., Waymo); generative and creative tools (e.g., ChatGPT and AI art); and superhuman play and analysis in strategy games (e.g., chess and Go). To reach these goals, AI researchers have adapted and integrated a wide range of techniques, including search and mathematical optimization, formal logic, artificial neural networks, and methods based on statistics, operations research, and economics. This growth accelerated further after 2017 with the transformer architecture, and by the early 2020s hundreds of billions 

In [6]:
tf_summary = summarize_text(Wikipedia_text, 5, method="tfidf_similarity")
print(tf_summary)


Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines, particularly computer systems. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals. However, many AI applications are not perceived as AI: "A lot of cutting edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore." Various subfields of AI research are centered around particular goals and the use of particular tools. Artificial intelligence was founded as an academic discipline in 1956, and the field went through multiple cycles of optimism, followed by periods of disappointment and loss of funding, known as AI winters.


## Evaluation

In [7]:
pip install rouge-score

Note: you may need to restart the kernel to use updated packages.


In [8]:
from rouge_score import rouge_scorer

def compute_rouge_score(predicted_summary, reference_summary):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    scores = scorer.score(reference_summary, predicted_summary)    
    return scores

In [9]:
from nltk.translate.bleu_score import sentence_bleu

def compute_bleu_score(predicted_summary, reference_summary):
    reference_tokens = reference_summary.split()
    predicted_tokens = predicted_summary.split()
    bleu_score = sentence_bleu([reference_tokens], predicted_tokens)
    return bleu_score

In [10]:
def evaluate_summary(predicted_summary, reference_summary):
    # ROUGE score
    rouge_scores = compute_rouge_score(predicted_summary, reference_summary)
    
    # BLEU score
    bleu_score = compute_bleu_score(predicted_summary, reference_summary)
        
    print("ROUGE Scores:", rouge_scores)
    print("BLEU Score:", bleu_score)

In [11]:
evaluate_summary(counter_summary, Wikipedia_text)

ROUGE Scores: {'rouge1': Score(precision=1.0, recall=0.5012853470437018, fmeasure=0.6678082191780822), 'rouge2': Score(precision=0.979381443298969, recall=0.4896907216494845, fmeasure=0.6529209621993127), 'rougeL': Score(precision=0.6820512820512821, recall=0.34190231362467866, fmeasure=0.45547945205479456)}
BLEU Score: 0.3557936886611093


In [12]:
evaluate_summary(tf_summary, Wikipedia_text)

ROUGE Scores: {'rouge1': Score(precision=1.0, recall=0.36503856041131105, fmeasure=0.5348399246704332), 'rouge2': Score(precision=0.9858156028368794, recall=0.3582474226804124, fmeasure=0.5255198487712665), 'rougeL': Score(precision=1.0, recall=0.36503856041131105, fmeasure=0.5348399246704332)}
BLEU Score: 0.1860789223111018
