The code imports three modules: numpy, sent_tokenize from the nltk (Natural Language Toolkit) package, and the os module.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import numpy as np
from nltk import sent_tokenize
import os
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

The code is reading the file 'glove.6B.100d.txt' using the utf8 encoding and assigning the result to the variable 'glove_file'. This file is a pre-trained Glove model with 100-dimensional word vectors.

In [None]:
# Read the Glove model file
glove_file = open('/content/drive/MyDrive/Final NLP/glove.6B.100d.txt', encoding = 'utf8')

This code reads the embedding file, and creates a dictionary called "model" where each word from the file is the key and its corresponding vector of floating point values is the value. The vector is created by splitting each line of the file issue by the space character and converting the resulting list of string values to floating point numbers.

In [None]:
# Initialize the model
model = {}
for line in glove_file:
    split_line = line.split()
    word = split_line[0]
    embedding = np.array([float(val) for val in split_line[1:]])
    model[word] = embedding

The code is performing abstractive summarization on a given text using Glove. It first tokenizes the text into individual sentences. Then, it finds the vector representation of each sentence by summing the vectors of each word in the sentence, and dividing by the number of words in the sentence. It then calculates the similarity matrix between all sentences, using dot products and norms. The sentences are ranked based on the sum of their similarity values, and the top k sentences are selected and added to the summary. The summary is returned as a string.

In [None]:
# Abstractive summarization using Glove
def summarization(text, k):
    # Tokenization
    sentences = sent_tokenize(text)
    # Find the vector representation of each sentence
    sentence_vectors = []
    for sentence in sentences:
        if len(sentence) != 0:
            vector = sum([model[word] for word in sentence.split() if word in model])/(len(sentence.split())+0.001)
        else:
            vector = np.zeros(100)
        sentence_vectors.append(vector)
    # Calculate the similarity matrix
    sim_mat = np.zeros([len(sentences),len(sentences)])
    for i in range(len(sentences)):
        for j in range(len(sentences)):
            if i != j:
                sim_mat[i][j] = np.dot(sentence_vectors[i],sentence_vectors[j])/(np.linalg.norm(sentence_vectors[i])*np.linalg.norm(sentence_vectors[j]))
    # Create a ranking of sentences in descending order
    ranking = np.zeros(len(sentences))
    for i in range(len(sentences)):
        ranking[i] = np.sum(sim_mat[i])
    sorted_rankings = np.argsort(-1*ranking)
    # Select the top K sentences
    top_k_sentences = sorted_rankings[:k]
    top_k_sentences.sort()
    summary = ""
    for index in top_k_sentences:
        summary += sentences[index]
    return summary


The code reads the text file 'text.txt' using the utf8 encoding and saves it as the variable 'text'. It then uses a summarization function to summarize the text with a value of k = 3 and prints the resulting summary.

In [None]:
# Read the text file
text = """Dougie Freedman is on the verge of agreeing a new two-year deal to remain at Nottingham Forest. Freedman has stabilised Forest since he replaced cult hero Stuart Pearce and the club's owners are pleased with the job he has done at the City Ground. Dougie Freedman is set to sign a new deal at Nottingham Forest . Freedman has impressed at the City Ground since replacing Stuart Pearce in February . They made an audacious attempt on the play-off places when Freedman replaced Pearce but have tailed off in recent weeks. That has not prevented Forest's ownership making moves to secure Freedman on a contract for the next two seasons."""

# Print the summary for the text
k = 3
summary = summarization(text, k)
print(summary)

Freedman has stabilised Forest since he replaced cult hero Stuart Pearce and the club's owners are pleased with the job he has done at the City Ground.Dougie Freedman is set to sign a new deal at Nottingham Forest .That has not prevented Forest's ownership making moves to secure Freedman on a contract for the next two seasons.


In [None]:
from nltk.tokenize import word_tokenize
#tokenize the sentences
sentences = word_tokenize(summary)

#tokenize the text
tok_text = word_tokenize(text)

In [None]:
from nltk.translate.bleu_score import sentence_bleu
bleu_score = sentence_bleu([sentences], tok_text)
print(bleu_score)

0.4663508209086473


In [None]:
#calculate rouge score
# !pip install rouge
from rouge import Rouge
rouge = Rouge()
scores = rouge.get_scores(summary, text)
print(scores)

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting rouge
  Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Installing collected packages: rouge
Successfully installed rouge-1.0.1
[{'rouge-1': {'r': 0.6666666666666666, 'p': 1.0, 'f': 0.7999999952000001}, 'rouge-2': {'r': 0.5894736842105263, 'p': 0.9824561403508771, 'f': 0.7368421005756578}, 'rouge-l': {'r': 0.6666666666666666, 'p': 1.0, 'f': 0.7999999952000001}}]
