This notebook contains method of summarizing text using extractive approach.

In [None]:
!pip install rouge sumy rouge_score datasets -qq

In [2]:
import os
import pandas as pd
import sys
import nltk
from rouge import Rouge
from datasets import load_dataset
from sumy.nlp.stemmers import Stemmer
from sumy.nlp.tokenizers import Tokenizer
from sumy.parsers.plaintext import PlaintextParser
from sumy.summarizers.kl import KLSummarizer
from sumy.summarizers.lsa import LsaSummarizer
from sumy.summarizers.lex_rank import LexRankSummarizer
from sumy.summarizers.sum_basic import SumBasicSummarizer
from sumy.summarizers.text_rank import TextRankSummarizer

In [None]:
dataset = load_dataset("d0r1h/ILC", split='test')

In [5]:
dataset

Dataset({
    features: ['Title', 'Summary', 'Case'],
    num_rows: 1015
})

In [20]:
df = pd.DataFrame(dataset)

df.sample(5)

Unnamed: 0,Title,Summary,Case
883,Mere pendency of proceedings before the author...,Mere pendency of proceedings before the State ...,W.P.No.157021IN THE HIGH COURT OF JUDICATURE A...
372,Suit for passing off can continue even when th...,Section 124 of the Trade Marks Act does not pr...,CM428 2021 IN THE HIGH COURT OF DELHI AT NEW D...
794,Breath Analyser test should be conducted to th...,Reports of the DGMS (Air) do not recommend hea...,1 2 & 15 IN THE HIGH COURT OF DELHI AT NEW DEL...
349,"When there are disputes inter se parties, whic...","Disputes inter se parties, which could not be ...",ARB.P. 787 2021 IN THE HIGH COURT OF DELHI AT...
586,Amit Sahni V/s Commissioner of Police and Ors.,While appreciating the existence of the right ...,IN THE CIVIL APPELLATE JURISDICTION CIVIL APPE...


In [17]:
nltk.download("punkt")
rouge = Rouge()

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [18]:
print(sys.getrecursionlimit())
sys.setrecursionlimit(10000)
print(sys.getrecursionlimit())

1000
10000


In [19]:
def summarize(text, sumarizer, SENTENCES_COUNT):
    sentences_ = []
    doc = text
    doc_ = PlaintextParser(doc, Tokenizer("en")).document
    for sentence in sumarizer(doc_, SENTENCES_COUNT):
        sentences_.append(str(sentence))

    summm_ = " ".join(sentences_)
    return summm_

In [21]:
df["LexRankSummary"] = df["Case"].map(
    lambda x: summarize(x, LexRankSummarizer(), 3)
)
df["KLSummary"] = df["Case"].map(
    lambda x: summarize(x, KLSummarizer(), 3)
)
df["TextRankSummary"] = df["Case"].map(
    lambda x: summarize(x, TextRankSummarizer(), 3)
)
df["SumBasicSummary"] = df["Case"].map(
    lambda x: summarize(x, SumBasicSummarizer(), 3)
)
df["LsaSummary"] = df["Case"].map(
    lambda x: summarize(x, LsaSummarizer(), 3)
)

In [22]:
df[['Summary','LexRankSummary','KLSummary','TextRankSummary','SumBasicSummary','LsaSummary']].sample(2)

Unnamed: 0,Summary,LexRankSummary,KLSummary,TextRankSummary,SumBasicSummary,LsaSummary
361,A determination has to be made as to whether o...,Since the decree was not satisfied the respond...,He draws attention of the Court to the Settlem...,Since the decree was not satisfied the respond...,The application is disposed of. CM(M) 770 2021...,CM(M) 770 2021 SHREE VARDHMAN INFRAHOME PVT LT...
994,Relegating an employee from a particular depar...,Since now he stands qualified for appointment ...,July 2020 passed by the first respondent releg...,The petitioner had participated in Administrat...,The same is however disputed by the respondent...,From that day onwards the respondent should be...


In [23]:
data_path = "/content/drive/MyDrive/Working | Project/ILC/"

df[['Summary','LexRankSummary','KLSummary','TextRankSummary','SumBasicSummary','LsaSummary']].to_csv(data_path + "Extractiveprediction3.csv", index=False, header=True)

In [24]:
df = pd.read_csv("/content/drive/MyDrive/Working | Project/ILC/Extractiveprediction3.csv")

In [25]:
rouge = Rouge()

In [26]:
def RougeScore(ModelScore, ModelSummary):

    standard_summary = df["Summary"]
    ModelScore_ = rouge.get_scores(ModelSummary, standard_summary, avg=True)
    ModelDF = pd.DataFrame(ModelScore_).set_index(
        [["recall", "precision", "f-measure"]]
    )
    return ModelDF

In [27]:
LexRouge = RougeScore("LexRouge", df["LexRankSummary"])

In [29]:
TextRankRouge = RougeScore("TextRankRouge", df["TextRankSummary"])

In [28]:
SumBasicRouge = RougeScore("SumBasicRouge", df["SumBasicSummary"])

In [30]:
LsaRouge = RougeScore("LsaRouge", df["LsaSummary"])

In [31]:
KLRouge = RougeScore("KLRouge", df["KLSummary"])

In [37]:
score_path = "/content/drive/MyDrive/Working | Project/ILC/score/"

TextRankRouge.to_csv(score_path + "TextRankRouge.csv", header=True, index=True)
LexRouge.to_csv(score_path + "LexRouge.csv", header=True, index=True)
SumBasicRouge.to_csv(score_path + "SumBasicRouge.csv", header=True, index=True)
LsaRouge.to_csv(score_path + "LsaRouge.csv", header=True, index=True)
KLRouge.to_csv(score_path + "KLRouge.csv", header=True, index=True)

In [32]:
LexRouge

Unnamed: 0,rouge-1,rouge-2,rouge-l
recall,0.256096,0.126502,0.231894
precision,0.529033,0.299393,0.481559
f-measure,0.332126,0.169408,0.301292


In [33]:
TextRankRouge

Unnamed: 0,rouge-1,rouge-2,rouge-l
recall,0.287039,0.146202,0.257997
precision,0.475492,0.270229,0.430109
f-measure,0.34631,0.181632,0.312055


In [34]:
SumBasicRouge

Unnamed: 0,rouge-1,rouge-2,rouge-l
recall,0.095403,0.035682,0.090126
precision,0.593369,0.297393,0.565127
f-measure,0.158502,0.061162,0.149902


In [35]:
LsaRouge

Unnamed: 0,rouge-1,rouge-2,rouge-l
recall,0.140865,0.046396,0.131172
precision,0.489725,0.227685,0.458704
f-measure,0.212129,0.073927,0.197843


In [36]:
KLRouge

Unnamed: 0,rouge-1,rouge-2,rouge-l
recall,0.133956,0.061535,0.122868
precision,0.603072,0.346679,0.55599
f-measure,0.213815,0.101621,0.196277
