# EXTRACTIVE TEXT SUMMARIZATION

In [1]:
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

# 1. Importing The Provided CSV File

In [2]:
df1 = pd.read_csv("./news.csv")
df1.head()

Unnamed: 0,title,content,published_at,source,topic
0,BTS: RM is reminded of Bon Voyage as he travel...,"After reaching his hotel in the city, RM revea...",2022-07-30T07:00:00Z,2,13
1,RM recalls wondering if he 'made right decisio...,RM aka Kim Namjoon was the first member to joi...,2022-12-22T15:57:55Z,2,13
2,BTS: J-Hope and RM go bonkers at Billie Eilish...,"Billie Eilish's concert was held in Seoul, Sou...",2022-08-16T07:00:00Z,1,7
3,"BTS: J-Hope proudly states he raised Jungkook,...",BTS ARMY y'all would be missing the members a ...,2022-12-18T13:08:40Z,1,7
4,BTS: Jin aka Kim Seokjin takes us through the ...,BTS member Kim Seokjin aka Jin has the capacit...,2022-11-21T08:00:00Z,1,8


# 2. Discovering The Irregularities In The Data

In [3]:
df1.shape

(810, 5)

## 2.1 Randomly selectingout the row, to see what are the things that needs to be pre-processed

In [4]:
df1['content'].iloc[401]

'BTS consists of members Jin, Suga, J-Hope, RM, Jimin, V, and Jungkook&mdash;co-writes and co-produces much of their own material. Originally a hip hop group, their musical style has evolved to incorporate a wide range of genres; their lyrics have often discussed mental health, the troubles of school-age youth and coming of age, loss, the journey towards self-love, and individualism. Their work also frequently references literature, philosophy and psychological concepts, and includes an alternate universe storyline.\n\nBTS in the beginning:\n\nAfter launching in 2013 with their single album 2 Cool 4 Skool, BTS respectively released their first Korean-language studio album, Dark &amp; Wild, and Japanese-language studio album, Wake Up, in 2014. The group&#39;s second Korean studio album, Wings (2016), was their first to sell one million copies in South Korea. By 2017, BTS had crossed into the global music market, leading the Korean wave into the United States and breaking several sales r

## 2.2 Realized that columns other than published_at, source, topic are useless, and would contribute nothing, rather than increasing dimensionality. So decided to drop them

In [5]:
df1.drop(['published_at', 'source', 'topic'], axis=1)

Unnamed: 0,title,content
0,BTS: RM is reminded of Bon Voyage as he travel...,"After reaching his hotel in the city, RM revea..."
1,RM recalls wondering if he 'made right decisio...,RM aka Kim Namjoon was the first member to joi...
2,BTS: J-Hope and RM go bonkers at Billie Eilish...,"Billie Eilish's concert was held in Seoul, Sou..."
3,"BTS: J-Hope proudly states he raised Jungkook,...",BTS ARMY y'all would be missing the members a ...
4,BTS: Jin aka Kim Seokjin takes us through the ...,BTS member Kim Seokjin aka Jin has the capacit...
...,...,...
805,BTS’ SUGA’s Suchwita Ep 2 Teaser OUT: Top 3 so...,BTS has conquered the world with their group r...
806,BTS ARMY celebrate 700 days of Jin's special s...,Today marks 700 days since BTS' worldwide hand...
807,"BTS: 'I am not a baby,' says Jungkook as an AR...",BTS' youngest member Jungkook came online on W...
808,BTS' Jin shares 1st pics after joining militar...,BTS' eldest member Jin has shared pictures and...


In [6]:
df1.drop(['published_at', 'source', 'topic'], axis=1, inplace=True)
df1.head()

Unnamed: 0,title,content
0,BTS: RM is reminded of Bon Voyage as he travel...,"After reaching his hotel in the city, RM revea..."
1,RM recalls wondering if he 'made right decisio...,RM aka Kim Namjoon was the first member to joi...
2,BTS: J-Hope and RM go bonkers at Billie Eilish...,"Billie Eilish's concert was held in Seoul, Sou..."
3,"BTS: J-Hope proudly states he raised Jungkook,...",BTS ARMY y'all would be missing the members a ...
4,BTS: Jin aka Kim Seokjin takes us through the ...,BTS member Kim Seokjin aka Jin has the capacit...


## 2.3 Checking if there are any null values in the dataframe

In [7]:
df1.isna().sum()

title      0
content    4
dtype: int64

## 2.4 As there are null values and are very less (4 <<< 810), its better to drop them

In [8]:
df1.dropna(axis=0, inplace=True)
df1.head()

Unnamed: 0,title,content
0,BTS: RM is reminded of Bon Voyage as he travel...,"After reaching his hotel in the city, RM revea..."
1,RM recalls wondering if he 'made right decisio...,RM aka Kim Namjoon was the first member to joi...
2,BTS: J-Hope and RM go bonkers at Billie Eilish...,"Billie Eilish's concert was held in Seoul, Sou..."
3,"BTS: J-Hope proudly states he raised Jungkook,...",BTS ARMY y'all would be missing the members a ...
4,BTS: Jin aka Kim Seokjin takes us through the ...,BTS member Kim Seokjin aka Jin has the capacit...


In [9]:
df1.shape

(806, 2)

In [10]:
df1['content'].iloc[0]

'After reaching his hotel in the city, RM revealed that his stay would be for four days and added that he would step out for dinner. As he sat at a roadside open-air restaurant, RM feasted on beer, burgers and fries. He said, "I\'m starving right now. I\'m out to grab some food. It\'s much quieter than I expected and feels like a rural town. I like the familiar atmosphere." RM attended Art Basel and explained on camera the details of the art fair. He also gave a glimpse as he had noodles and beer which was followed by soup noodles and wrap. Showing the pattern of a ping pong table, RM said, "The table looks like our (BTS) symbol." He also spoke about the art pieces as he viewed them. After that, RM took a tram to visit the Foundation Beyeler, a museum. He later took a walk through the city. On his third day, RM visited the Kunstmuseum Basel, the Vitra Design Museum and the gallery. As he walked around, RM showed a chair to his fans and said, "I have breaking news for you guys. Coldplay

### In the above data, we can see that there are many contractions, puntuations

In [11]:
df1['content'].iloc[365]

'BTS&rsquo; Jin is good at a lot of things, singing, being &lsquo;worldwide handsome&rsquo;, cooking for Bangtan, and now even making alcohol  The second week of his own program, Drunken Truth, which starred well-known chef Baek Jong Won and saw a guest appearance from actor Kim Nam Gil, unfolds in unique ways \n\nDrunken Truth Episode 3\nAs Jin and Baek Jong Won reach a spot to take care of the final steps of his first try at making makgeolli (rice wine), the two show another attempt at their flourishing synergy  Jin is known to be good with elders as he can joke around and that is clearly visible in the show  The two head out with their own bottles of self-made alcohol, their destination is a traditional market  Being a chef, it is Baek Jong Won&rsquo;s territory who is recognized by the vendors which consist of mostly older women and is asked for photos, while funnily enough, they are unaware of BTS member Jin  A neck-and-neck competition between them shows the difference in prefere

### In the above data, we can see there are certain HTML entities like &rsquo. Also we can see '\n' many a times continously

In [12]:
text = df1['content'].iloc[369]
text

'2022 came to a celebratory end for BTS, especially member J-Hope, who was in New York, making another fabulous display of his skills  While away from the members, he seemed to have enjoyed it to the fullest with a solo performance at Dick Clark&rsquo;s New Year&rsquo;s Rockin&rsquo; Eve where he did a live stage of 3 songs, solo track &lsquo;= (Equal Sign)&rsquo;, &lsquo;Chicken Noodle Soup&rsquo; his collaboration track with Becky G, and BTS&rsquo; &lsquo;Butter&rsquo; (Holiday Remix)  He officially became only the second South Korean soloist to perform at the event, following PSY  This was also J-Hope&rsquo;s third time at Dick Clark&rsquo;s New Year&rsquo;s Rockin&rsquo; Eve after group stages with BTS in 2017 and 2019 \n\nJ-Hope&rsquo;s live\n\nAfter returning from Times Square where he performed as the penultimate act alongside multiple other singers from around the world and spoke to Ryan Seacrest, the host of the show, J-Hope was back at his hotel room and turned on a live broa

### The data above, is filled with empty spaces, '\t', '\n' characters, HTML entities, punctuations and contractions. And all these has to be removed before text summarization process

In [13]:
df1['content'].iloc[450]

'Actor Son Ye-jin, of Crash Landing on You fame, had once revealed that she wanted to treat BTS members RM, Jin, Suga, J-Hope, Jimin, V and Jungkook to a meal  At the 2018 Korean Popular Culture &amp; Arts Awards, Son Ye-jin was asked if she will buy a meal for anyone in the audience  (Also Read | BTS ARMY says they are \'getting deals\' for the band after convincing singer Pink Sweat$ for a collaboration)In a video, shared by a fan account on YouTube, she had said, "After the drama, there are so many people asking me to buy food  So, I\'m trying not to meet people " The host of the event, asked her, "Is there anyone here that you want to buy a meal for?" After thinking for a moment, she turned around, smiled and replied, "BTS "Amid hooting, Son Ye-jin was seen laughing  While RM too laughed, Jin flashed finger hearts, Suga clapped and bowed his head and J-Hope smiled at the actor\'s response  Jimin smiled, V made fists and Jungkook bowed his head However, what Son Ye-jin next said lef

## 2.5 There are HTML Tags and Entities, which needs to be cleaned from data

In [14]:
from bs4 import BeautifulSoup

In [15]:
def removeHTMLTagsAndEntities(s):
    return BeautifulSoup(BeautifulSoup(s, "lxml").text, "html.parser")

In [16]:
removeHTMLTagsAndEntities(df1['content'].iloc[365])

BTS’ Jin is good at a lot of things, singing, being ‘worldwide handsome’, cooking for Bangtan, and now even making alcohol  The second week of his own program, Drunken Truth, which starred well-known chef Baek Jong Won and saw a guest appearance from actor Kim Nam Gil, unfolds in unique ways 

Drunken Truth Episode 3
As Jin and Baek Jong Won reach a spot to take care of the final steps of his first try at making makgeolli (rice wine), the two show another attempt at their flourishing synergy  Jin is known to be good with elders as he can joke around and that is clearly visible in the show  The two head out with their own bottles of self-made alcohol, their destination is a traditional market  Being a chef, it is Baek Jong Won’s territory who is recognized by the vendors which consist of mostly older women and is asked for photos, while funnily enough, they are unaware of BTS member Jin  A neck-and-neck competition between them shows the difference in preferences between the older and y

In [17]:
text = removeHTMLTagsAndEntities(text)
text

2022 came to a celebratory end for BTS, especially member J-Hope, who was in New York, making another fabulous display of his skills  While away from the members, he seemed to have enjoyed it to the fullest with a solo performance at Dick Clark’s New Year’s Rockin’ Eve where he did a live stage of 3 songs, solo track ‘= (Equal Sign)’, ‘Chicken Noodle Soup’ his collaboration track with Becky G, and BTS’ ‘Butter’ (Holiday Remix)  He officially became only the second South Korean soloist to perform at the event, following PSY  This was also J-Hope’s third time at Dick Clark’s New Year’s Rockin’ Eve after group stages with BTS in 2017 and 2019 

J-Hope’s live

After returning from Times Square where he performed as the penultimate act alongside multiple other singers from around the world and spoke to Ryan Seacrest, the host of the show, J-Hope was back at his hotel room and turned on a live broadcast to speak with his fans  As they congratulated him on his sparkling performance, being the

## 2.6 There are many spaces, new line characters and also some entire data block with not even a single fullstop/period

### Fullstops are important for us, because we are going to tokenize the sentences using fullstops

In [18]:
import re

In [19]:
def addingFullstops(s):
    s = re.sub("\xa0",' ', str(s))
    s = re.sub("(\\n)+", '.', str(s))
    s = re.sub("(\\t)+", '', str(s))
    s = re.sub("   ", '', str(s))
    s = re.sub("(  )",'.', str(s))
    return s

In [20]:
text = addingFullstops(text)
text

'2022 came to a celebratory end for BTS, especially member J-Hope, who was in New York, making another fabulous display of his skills.While away from the members, he seemed to have enjoyed it to the fullest with a solo performance at Dick Clark’s New Year’s Rockin’ Eve where he did a live stage of 3 songs, solo track ‘= (Equal Sign)’, ‘Chicken Noodle Soup’ his collaboration track with Becky G, and BTS’ ‘Butter’ (Holiday Remix).He officially became only the second South Korean soloist to perform at the event, following PSY.This was also J-Hope’s third time at Dick Clark’s New Year’s Rockin’ Eve after group stages with BTS in 2017 and 2019 .J-Hope’s live.After returning from Times Square where he performed as the penultimate act alongside multiple other singers from around the world and spoke to Ryan Seacrest, the host of the show, J-Hope was back at his hotel room and turned on a live broadcast to speak with his fans.As they congratulated him on his sparkling performance, being the perf

### Even after cleaning, there are some repeated fullstops without any sentence. Pre-processing it

In [21]:
from nltk.tokenize import sent_tokenize

In [22]:
def addFullStops(s):
    modified_sentences = []
    sentences = s.split(".")
    for i in sentences:
        i = i.strip()
        if len(i) != 0:
            i += '.'
            modified_sentences.append(i)
    return " ". join(modified_sentences)

In [23]:
text = addFullStops(text)
text

'2022 came to a celebratory end for BTS, especially member J-Hope, who was in New York, making another fabulous display of his skills. While away from the members, he seemed to have enjoyed it to the fullest with a solo performance at Dick Clark’s New Year’s Rockin’ Eve where he did a live stage of 3 songs, solo track ‘= (Equal Sign)’, ‘Chicken Noodle Soup’ his collaboration track with Becky G, and BTS’ ‘Butter’ (Holiday Remix). He officially became only the second South Korean soloist to perform at the event, following PSY. This was also J-Hope’s third time at Dick Clark’s New Year’s Rockin’ Eve after group stages with BTS in 2017 and 2019. J-Hope’s live. After returning from Times Square where he performed as the penultimate act alongside multiple other singers from around the world and spoke to Ryan Seacrest, the host of the show, J-Hope was back at his hotel room and turned on a live broadcast to speak with his fans. As they congratulated him on his sparkling performance, being the

## 2.7 For certain words, I saw contractions were used, so decided to replace them with actual word

In [24]:
import contractions

In [25]:
def removingContractions(s):
    words = []
    for i in s.split(" "):
        words.append(contractions.fix(i))
    s = " ".join(words)
    return s

In [26]:
removingContractions("I'd")

'I would'

In [27]:
text = removingContractions(text)
text

'2022 came to a celebratory end for BTS, especially member J-Hope, who was in New York, making another fabulous display of his skills. While away from the members, he seemed to have enjoyed it to the fullest with a solo performance at Dick Clark’s New Year’s Rockin’ Eve where he did a live stage of 3 songs, solo track ‘= (Equal Sign)’, ‘Chicken Noodle Soup’ his collaboration track with Becky G, and BTS’ ‘Butter’ (Holiday Remix). He officially became only the second South Korean soloist to perform at the event, following PSY. This was also J-Hope’s third time at Dick Clark’s New Year’s Rockin’ Eve after group stages with BTS in 2017 and 2019. J-Hope’s live. After returning from Times Square where he performed as the penultimate act alongside multiple other singers from around the world and spoke to Ryan Seacrest, the host of the show, J-Hope was back at his hotel room and turned on a live broadcast to speak with his fans. As they congratulated him on his sparkling performance, being the

## 2.8 Till here combining all the process into a single function

In [28]:
def initial_preprocessing(s):
    s = removeHTMLTagsAndEntities(s)
    s = addingFullstops(s)
    s = removingContractions(s)
    return s

In [29]:
initial_preprocessing(df1['content'].iloc[369])

'2022 came to a celebratory end for BTS, especially member J-Hope, who was in New York, making another fabulous display of his skills.While away from the members, he seemed to have enjoyed it to the fullest with a solo performance at Dick Clark’s New Year’s Rockin’ Eve where he did a live stage of 3 songs, solo track ‘= (Equal Sign)’, ‘Chicken Noodle Soup’ his collaboration track with Becky G, and BTS’ ‘Butter’ (Holiday Remix).He officially became only the second South Korean soloist to perform at the event, following PSY.This was also J-Hope’s third time at Dick Clark’s New Year’s Rockin’ Eve after group stages with BTS in 2017 and 2019 .J-Hope’s live.After returning from Times Square where he performed as the penultimate act alongside multiple other singers from around the world and spoke to Ryan Seacrest, the host of the show, J-Hope was back at his hotel room and turned on a live broadcast to speak with his fans.As they congratulated him on his sparkling performance, being the perf

In [30]:
initial_preprocessing(df1['content'].iloc[0])

'After reaching his hotel in the city, RM revealed that his stay would be for four days and added that he would step out for dinner. As he sat at a roadside open-air restaurant, RM feasted on beer, burgers and fries. He said, "I am starving right now. I am out to grab some food. It is much quieter than I expected and feels like a rural town. I like the familiar atmosphere." RM attended Art Basel and explained on camera the details of the art fair. He also gave a glimpse as he had noodles and beer which was followed by soup noodles and wrap. Showing the pattern of a ping pong table, RM said, "The table looks like our (BTS) symbol." He also spoke about the art pieces as he viewed them. After that, RM took a tram to visit the Foundation Beyeler, a museum. He later took a walk through the city. On his third day, RM visited the Kunstmuseum Basel, the Vitra Design Museum and the gallery. As he walked around, RM showed a chair to his fans and said, "I have breaking news for you guys. Coldplay

## 2.9 Removing the punctuations and Stopwords and coverting entire sentence into lower case

In [31]:
from nltk.corpus import stopwords

In [32]:
engStopwords = stopwords.words("english")
print(engStopwords)

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', '

In [33]:
def removePunctuations(s):
    words = []
    s = re.sub(r"[<>()|&©ø\[\]\'\",;?~*!]", ' ', str(s))
    for i in s.split(" "):
        if i not in engStopwords:
            words.append(i)
    s = " ".join(words)
    return s

In [34]:
removePunctuations(text)

'2022 came celebratory end BTS  especially member J-Hope  New York  making another fabulous display skills. While away members  seemed enjoyed fullest solo performance Dick Clark’s New Year’s Rockin’ Eve live stage 3 songs  solo track ‘=  Equal Sign ’  ‘Chicken Noodle Soup’ collaboration track Becky G  BTS’ ‘Butter’  Holiday Remix . He officially became second South Korean soloist perform event  following PSY. This also J-Hope’s third time Dick Clark’s New Year’s Rockin’ Eve group stages BTS 2017 2019. J-Hope’s live. After returning Times Square performed penultimate act alongside multiple singers around world spoke Ryan Seacrest  host show  J-Hope back hotel room turned live broadcast speak fans. As congratulated sparkling performance  perfectionist  J-Hope expressed sadness unable use voice fullest talked slipping incident due rainy weather rehearsal stage. J-Hope BTS members’ New Year wishes. As soon turned 12  New Year began South Korea  BTS members Jungkook  Jimin  V took fan comm

## 2.10 Trying to combine all of these pre-processing in a single function 

In [35]:
def preprocessingRow(s):
    s = BeautifulSoup(BeautifulSoup(s, "lxml").text, "html.parser")
    s = re.sub("\xa0",' ', str(s))
    s = re.sub("(\\n)+", '.', str(s))
    s = re.sub("(\\t)+", '', str(s))
    s = re.sub("   ", '', str(s))
    s = re.sub("(  )",'.', str(s))
    modified_sentences = []
    sentences = s.split(".")
    for i in sentences:
        i = i.strip()
        if len(i) != 0:
            i += '.'
            modified_sentences.append(i)
    s = " ". join(modified_sentences)
    words = []
    for i in s.split(" "):
        words.append(contractions.fix(i))
    s = " ".join(words)
    words = []
    s = re.sub(r"[<>()|&©ø\[\]\'\",;?~*!]", ' ', str(s)).lower()
    for i in s.split(" "):
        if i not in engStopwords and i not in words:
            words.append(i)
    s = " ".join(words)
    s = re.sub(' +', ' ', s)
    return s

In [36]:
preprocessingRow(df1['content'].iloc[369])

'2022 came celebratory end bts especially member j-hope new york making another fabulous display skills. away members seemed enjoyed fullest solo performance dick clark’s year’s rockin’ eve live stage 3 songs track ‘= equal sign ’ ‘chicken noodle soup’ collaboration becky g bts’ ‘butter’ holiday remix . officially became second south korean soloist perform event following psy. also j-hope’s third time group stages 2017 2019. live. returning times square performed penultimate act alongside multiple singers around world spoke ryan seacrest host show back hotel room turned broadcast speak fans. congratulated sparkling perfectionist expressed sadness unable use voice talked slipping incident due rainy weather rehearsal stage. members’ year wishes. soon 12 began korea jungkook jimin v took fan community platform weverse share wishes fans well discuss plans coming year. kept brief wishing successful happy ahead wrote big letter feelings seemingly bottled long release music meeting composers.

### Although, I was not able to figure out how to remove a link because, I wasn't able to find any regular expression for removing it. I did research on link removal fuctions through BeautifulSoup but wasnt able to do it.

### The above function will work perfectly when provided row by row input but at that time we cannot utilize the multiprocessing functions of NLP. Hence creating same function, but this time we shall pass an entire column rather than a row to function.

In [37]:
from tqdm.auto import tqdm

In [38]:
def preprocessingColumn(col):
    for row in tqdm(col, total=col.shape[0]):
        cleanedData = preprocessingRow(row)
        yield cleanedData

In [39]:
preprocessingColumn(df1['content']) # will result in a Generator object

<generator object preprocessingColumn at 0x00000179CE7E7DD0>

## 2.11 Storing cleaned data in the dataframe

In [40]:
cleanedContent = preprocessingColumn(df1['content'])

### Using spacy to fasten the process of Data Pre-processing

In [41]:
import spacy
nlp = spacy.load('en_core_web_sm', disable=['ner', 'parser'])

### Perorming Data Pre-processing on the Dataframe's content column

In [42]:
cleanedText = [str(doc) for doc in nlp.pipe(cleanedContent, batch_size=5000, n_process=-1)]

  0%|          | 0/806 [00:00<?, ?it/s]

In [43]:
cleanedText[0]

'reaching hotel city rm revealed stay would four days added step dinner. sat roadside open-air restaurant feasted beer burgers fries. said starving right now. grab food. much quieter expected feels like rural town. familiar atmosphere. attended art basel explained camera details fair. also gave glimpse noodles followed soup wrap. showing pattern ping pong table looks bts symbol. spoke pieces viewed them. took tram visit foundation beyeler museum. later walk city. third day visited kunstmuseum vitra design museum gallery. walked around showed chair fans breaking news guys. coldplay chris martin made displayed see give call. amazing. next lucerne hiked mount rigi. recalling previous remember crossing bridge buying souvenirs. reminded bon voyage reality show featuring members jin suga j-hope jimin v jungkook. speaking rode ssb train boat mountain track road cable cars planning go ride again. travel switzerland ended tinguely. flew paris attend pinault collection musee orsay. went centre g

In [44]:
cleanedText[369]

'2022 came celebratory end bts especially member j-hope new york making another fabulous display skills. away members seemed enjoyed fullest solo performance dick clark’s year’s rockin’ eve live stage 3 songs track ‘= equal sign ’ ‘chicken noodle soup’ collaboration becky g bts’ ‘butter’ holiday remix . officially became second south korean soloist perform event following psy. also j-hope’s third time group stages 2017 2019. live. returning times square performed penultimate act alongside multiple singers around world spoke ryan seacrest host show back hotel room turned broadcast speak fans. congratulated sparkling perfectionist expressed sadness unable use voice talked slipping incident due rainy weather rehearsal stage. members’ year wishes. soon 12 began korea jungkook jimin v took fan community platform weverse share wishes fans well discuss plans coming year. kept brief wishing successful happy ahead wrote big letter feelings seemingly bottled long release music meeting composers.

In [45]:
len(cleanedText)

806

## 2.12 Checking for any errors in the process of data cleaning, such that out code might have skipped processing an entire row

In [46]:
flag = 1
for i in range(len(cleanedText)):
    if len(cleanedText[i]) == 0:
        print(f"Error at: {i}")
        flag = 0
if flag == 1:
    print("No error, in process of Data pre-processing")

No error, in process of Data pre-processing


# 3. Trying To Generate Summary For Any One Random Row

In [47]:
from gensim.models import Word2Vec
from scipy import spatial
import networkx as nx

In [48]:
text = initial_preprocessing(df1['content'].iloc[405])
text

'BTS’ oldest member Jin went solo after J-Hope and made his official debut with a single album. The Astronaut has been meaningful for multiple reasons for the group as well as the fans who have welcomed the release with warm words and moist eyes..Music charts.The song which came as a result of Jin working with Coldplay for the second time following ‘My Universe’, debuted on the Billboard Hot100 chart at No. 51. He also tied with PSY for a record on United Kingdom’s Official Official Singles Chart for grabbing the 61st spot in the week after its release. His latest achievement comes with selling 1,024,382 copies of The Astronaut according to the numbers released by Circle Chart (earlier known as Gaon Chart)..Third million-seller.The BTS member is only the third soloist in the history of the music chart to have recorded over a million copies sold of his album. Jin follows EXO member Baekhyun and trot-ballad singer Lim Young Woong on the list. Becoming a million seller is a massive feat f

In [49]:
sentences = [i.strip()+"." for i in text.split(".") if len(i) != 0]
sentences

['BTS’ oldest member Jin went solo after J-Hope and made his official debut with a single album.',
 'The Astronaut has been meaningful for multiple reasons for the group as well as the fans who have welcomed the release with warm words and moist eyes.',
 'Music charts.',
 'The song which came as a result of Jin working with Coldplay for the second time following ‘My Universe’, debuted on the Billboard Hot100 chart at No.',
 '51.',
 'He also tied with PSY for a record on United Kingdom’s Official Official Singles Chart for grabbing the 61st spot in the week after its release.',
 'His latest achievement comes with selling 1,024,382 copies of The Astronaut according to the numbers released by Circle Chart (earlier known as Gaon Chart).',
 'Third million-seller.',
 'The BTS member is only the third soloist in the history of the music chart to have recorded over a million copies sold of his album.',
 'Jin follows EXO member Baekhyun and trot-ballad singer Lim Young Woong on the list.',
 'Be

In [50]:
sentence_tokens = []
for sentence in sentences:
    for word in preprocessingRow(sentence).split(" "):
        if word not in sentence_tokens:
            sentence_tokens.append(word)

In [51]:
w2v = Word2Vec(sentence_tokens, vector_size=1, min_count = 1, epochs = 1000)
sentence_embeddings = [[w2v.wv[word][0] for word in words] for words in sentence_tokens]
max_len = max([len(tokens) for tokens in sentence_tokens])
sentence_embeddings = [np.pad(embedding,(0,max_len-len(embedding)),'constant') for embedding in sentence_embeddings]

In [52]:
similarity_matrix = np.zeros([len(sentence_tokens), len(sentence_tokens)])
for i,row_embedding in enumerate(sentence_embeddings):
    for j,column_embedding in enumerate(sentence_embeddings):
        similarity_matrix[i][j]=1-spatial.distance.cosine(row_embedding,column_embedding)

In [53]:
nx_graph = nx.from_numpy_array(similarity_matrix)
scores = nx.pagerank(nx_graph)

In [54]:
total_summary_sentences = len(sentences) // 4
top_sentence={sentence:scores[index] for index,sentence in enumerate(sentences)}
top=dict(sorted(top_sentence.items(), key=lambda x: x[1], reverse=True)[:total_summary_sentences])

In [55]:
summary = ""
for sentence in sentences:
    if sentence in top.keys():
        summary += sentence
print(summary)

The Astronaut has been meaningful for multiple reasons for the group as well as the fans who have welcomed the release with warm words and moist eyes.Music charts.His latest achievement comes with selling 1,024,382 copies of The Astronaut according to the numbers released by Circle Chart (earlier known as Gaon Chart).Becoming a million seller is a massive feat for the BTS member who is expected to enlist for his mandatory military service soon.


## 3.1 For below text:

In [56]:
text

'BTS’ oldest member Jin went solo after J-Hope and made his official debut with a single album. The Astronaut has been meaningful for multiple reasons for the group as well as the fans who have welcomed the release with warm words and moist eyes..Music charts.The song which came as a result of Jin working with Coldplay for the second time following ‘My Universe’, debuted on the Billboard Hot100 chart at No. 51. He also tied with PSY for a record on United Kingdom’s Official Official Singles Chart for grabbing the 61st spot in the week after its release. His latest achievement comes with selling 1,024,382 copies of The Astronaut according to the numbers released by Circle Chart (earlier known as Gaon Chart)..Third million-seller.The BTS member is only the third soloist in the history of the music chart to have recorded over a million copies sold of his album. Jin follows EXO member Baekhyun and trot-ballad singer Lim Young Woong on the list. Becoming a million seller is a massive feat f

## 3.2 We got summary as:

In [57]:
summary

'The Astronaut has been meaningful for multiple reasons for the group as well as the fans who have welcomed the release with warm words and moist eyes.Music charts.His latest achievement comes with selling 1,024,382 copies of The Astronaut according to the numbers released by Circle Chart (earlier known as Gaon Chart).Becoming a million seller is a massive feat for the BTS member who is expected to enlist for his mandatory military service soon.'

# 4. Time To Implement It On Actual Dataframe

In [58]:
def summarizerRow(s):
    # Applying only the necessary Pre-processing
    text = initial_preprocessing(s)
    # print("1")
    
    # Generating the tokens
    sentences = [i.strip()+"." for i in text.split(".") if len(i) != 0]
    sentence_tokens = [[word for word in preprocessingRow(sentence).split(" ")] for sentence in sentences]
    # print("2")
    
    # Calculating the embedding for each word
    w2v = Word2Vec(sentence_tokens, vector_size=1, min_count = 1, epochs = 1000)
    sentence_embeddings = [[w2v.wv[word][0] for word in words] for words in sentence_tokens]
    max_len = max([len(tokens) for tokens in sentence_tokens])
    sentence_embeddings = [np.pad(embedding,(0,max_len-len(embedding)),'constant') for embedding in sentence_embeddings]
    # print("3")
    
    # Developing the similarity matrix based on cosine similarity
    similarity_matrix = np.zeros([len(sentence_tokens), len(sentence_tokens)])
    for i,row_embedding in enumerate(sentence_embeddings):
        for j,column_embedding in enumerate(sentence_embeddings):
            similarity_matrix[i][j]=1-spatial.distance.cosine(row_embedding,column_embedding)
    # print("4")
    
    # Implementing the TextRank
    nx_graph = nx.from_numpy_array(similarity_matrix)
    try:
        scores = nx.pagerank(nx_graph,  max_iter=100000, tol=1.0e-2)

        # Sorting out the importing sentences
        total_summary_sentences = len(sentences) // 3
        top_sentence={sentence:scores[index] for index,sentence in enumerate(sentences)}
        top=dict(sorted(top_sentence.items(), key=lambda x: x[1], reverse=True)[:total_summary_sentences])

        # Joining the most important sentences
        summary = ""
        for sentence in sentences:
            if sentence in top.keys():
                summary += sentence + " "
        # print("5")
        return summary
    except:
        summary = ""
        # print("6")
        return summary

In [59]:
print(summarizerRow(df1['content'].iloc[402]))

He said the reason why he wanted to write a song for them is because he believes that ARMYs are the reason why the group exists. The songwriting process to creating the melody, Jungkook spilled all kinds of details. The so-teok (sausage tteok) and fried chicken looked so good. He also did not let the tasks overcome his love for SUGA’s song ‘That That’ as he danced to it as well. He also mentioned doing a V-Live there, which was what led fans to know which member will be putting up what kind of vlog. 


## 4.1 Developing a similar fuction column-wise

In [60]:
def summarizerColumn(col):
    for row in tqdm(col, total=col.shape[0]):
        summary = summarizerRow(row)
        yield summary

In [61]:
summaryGenerator = summarizerColumn(df1['content'])

In [62]:
summary = [str(doc) for doc in nlp.pipe(summaryGenerator, batch_size=5000, n_process=-1)]

  0%|          | 0/806 [00:00<?, ?it/s]

In [63]:
summary[0]

'As he sat at a roadside open-air restaurant, RM feasted on beer, burgers and fries. " RM attended Art Basel and explained on camera the details of the art fair. He also gave a glimpse as he had noodles and beer which was followed by soup noodles and wrap. After that, RM took a tram to visit the Foundation Beyeler, a museum. As he walked around, RM showed a chair to his fans and said, "I have breaking news for you guys. Coldplay\'s Chris Martin made a chair and it is displayed in the Vitra Design Museum. " RM next visited Lucerne and hiked to Mount Rigi. " RM\'s travel in Switzerland ended with a visit to the Museum Tinguely. '

In [64]:
df1['content'].shape

(806,)

In [65]:
len(summary)

806

In [66]:
df1['content'].iloc[804]

'BTS\' eldest member Jin has shared pictures and a message for fans for the first time after he joined the South Korean military  Taking to Weverse on Wednesday Jin posted his pictures including selfies  In a photo, Jin is seen in his uniform as he stood with his arms on his sides  The singer also wore a mask  (Also Read | BTS’ Jin has a special message for fans: \'I may not be by your side, but…\')In a selfie, Jin looked at the camera giving fans a closer glimpse of his face  He also flashed the victory sign in another picture  Sharing the pictures, Jin wrote, "I\'m enjoying my life  I\'m posting pictures after getting permission from the military  ARMY, be happy and take care "Jin\'s message and photos left the BTS ARMY emotional  A person wrote on Twitter, "Even though he must be soo tired but still took permission from there & came to update us about himself & telling us to be happy & be well  I\'m crying  I love you so much Jin ""Jin is proud of all the armys who waited until he p

In [67]:
summary[804]

'Taking to Weverse on Wednesday Jin posted his pictures including selfies. In a photo, Jin is seen in his uniform as he stood with his arms on his sides. He also flashed the victory sign in another picture. ARMY, be happy and take care "Jin\'s message and photos left the BTS ARMY emotional. I love you so much Jin ""Jin is proud of all the armys who waited until he posts. Let us continue waiting and not spreading pics that are not posted by Jin," read a comment. In the video Jin had said, “Hello everyone, this is Jin of BTS. I may not be by your side, but I will go looking for you soon, if you just wait a little. '

# 5. Creating a New Dataframe For Storing Only The Required Results

In [68]:
df2 = pd.DataFrame()

In [69]:
df2['Original Content'] = df1['content'].apply(initial_preprocessing)
df2.head()

Unnamed: 0,Original Content
0,"After reaching his hotel in the city, RM revea..."
1,RM aka Kim Namjoon was the first member to joi...
2,"Billie Eilish's concert was held in Seoul, Sou..."
3,BTS ARMY you all would be missing the members ...
4,BTS member Kim Seokjin aka Jin has the capacit...


In [70]:
df2.shape

(806, 1)

In [71]:
df2['Original Content'].iloc[369]

'2022 came to a celebratory end for BTS, especially member J-Hope, who was in New York, making another fabulous display of his skills.While away from the members, he seemed to have enjoyed it to the fullest with a solo performance at Dick Clark’s New Year’s Rockin’ Eve where he did a live stage of 3 songs, solo track ‘= (Equal Sign)’, ‘Chicken Noodle Soup’ his collaboration track with Becky G, and BTS’ ‘Butter’ (Holiday Remix).He officially became only the second South Korean soloist to perform at the event, following PSY.This was also J-Hope’s third time at Dick Clark’s New Year’s Rockin’ Eve after group stages with BTS in 2017 and 2019 .J-Hope’s live.After returning from Times Square where he performed as the penultimate act alongside multiple other singers from around the world and spoke to Ryan Seacrest, the host of the show, J-Hope was back at his hotel room and turned on a live broadcast to speak with his fans.As they congratulated him on his sparkling performance, being the perf

In [72]:
df2['New Content'] = summary
df2.head()

Unnamed: 0,Original Content,New Content
0,"After reaching his hotel in the city, RM revea...","As he sat at a roadside open-air restaurant, R..."
1,RM aka Kim Namjoon was the first member to joi...,RM aka Kim Namjoon was the first member to joi...
2,"Billie Eilish's concert was held in Seoul, Sou...",They really enjoyed the concert as the audienc...
3,BTS ARMY you all would be missing the members ...,The boys are going to complete their projects ...
4,BTS member Kim Seokjin aka Jin has the capacit...,"Some days back, we saw the Sea of Jin concept ..."


In [73]:
df2[df2['New Content'] == '']

Unnamed: 0,Original Content,New Content
229,BTS' fame is such that there is no doubt that ...,


### This is the part, where I faced a major problem, as it was unexpected. For data in some rows like the one above (229th row) the vectors generated were very long. Due to this, I was continously getting error as "Power Iteration failed". At last, I increased maximum interations to 1 Lakh (this is obviously veryyyyy high). But still, error persisted.

### Even, I rechecked everything from Data Pre-processing till the point, but I found no more pre-processing of data could be done to reduce the vectors. Eventually, I had to add try-except block in the function, so that if summary for a particular data cannot be generated, just replace it with empty string.

### Here, after thinking of each and every possibility, I decided to drop the row as its just 1 row.

In [74]:
df2.drop(df2[df2['New Content'] == ''].index[0], axis=0, inplace=True)

In [75]:
df2[df2['New Content'] == '']

Unnamed: 0,Original Content,New Content


In [76]:
df2.isna().sum()

Original Content    0
New Content         0
dtype: int64

In [77]:
df2.shape

(805, 2)

In [78]:
df2[df2['New Content'].isna()]

Unnamed: 0,Original Content,New Content


## 5.1 Saving my work till here, into a CSV file

In [79]:
try:
    df2.to_csv('./Summarized News.csv', index=False)
    print("Your file has been created !!!")
except:
    print("Failed to create fail. Might be due to permission related errors.")

Your file has been created !!!


# 6. Finally Working On Summarized File

In [80]:
df3 = pd.read_csv('./Summarized News.csv')
df3.head()

Unnamed: 0,Original Content,New Content
0,"After reaching his hotel in the city, RM revea...","As he sat at a roadside open-air restaurant, R..."
1,RM aka Kim Namjoon was the first member to joi...,RM aka Kim Namjoon was the first member to joi...
2,"Billie Eilish's concert was held in Seoul, Sou...",They really enjoyed the concert as the audienc...
3,BTS ARMY you all would be missing the members ...,The boys are going to complete their projects ...
4,BTS member Kim Seokjin aka Jin has the capacit...,"Some days back, we saw the Sea of Jin concept ..."


In [81]:
df3.isna().sum()

Original Content    0
New Content         0
dtype: int64

## 6.1 Discovering the lines that were removed

In [82]:
content = df3['Original Content'].iloc[370]
content

"On January 23, JYP Entertainment released new concept images for Stray Kids’ second fan meeting ‘SKZ’s Chocolate Factory’.Each member looks amazing in the soft pastel clothes and pretty accessories as they work in the chocolate factory.The fan meeting will be held on February 12 and 13.On February 13th, an offline fan meeting and an online paid live broadcast on the Beyond Live platform will be held at the same time, and precious memories will be made with domestic and foreign fans.Tickets for offline performances were pre-purchased for the fan club from 8:00 pm to 11:59 pm on January 17th for members of the 2nd period of the official fan club STAY, and all seats were sold out at the same time as they opened.Thanks to such enthusiastic support, JYP Entertainment opened additional seats available for viewing at 8 pm on January 19th, and this also sold out quickly, realizing the power of Stray Kids' tickets.This fan meeting is the first in about a year since the first official fan meeti

In [83]:
summary = df3['New Content'].iloc[370]
summary

"Tickets for offline performances were pre-purchased for the fan club from 8:00 pm to 11:59 pm on January 17th for members of the 2nd period of the official fan club STAY, and all seats were sold out at the same time as they opened. Thanks to such enthusiastic support, JYP Entertainment opened additional seats available for viewing at 8 pm on January 19th, and this also sold out quickly, realizing the power of Stray Kids' tickets. At the '2021 The Fact Music Awards', '2021 Asian Artist Awards', and '2021 Mnet Asian Music Awards' respectively, they won the Artist of the Year Award, Performance Award of the Year, and Worldwide Fans' Choice Top Ten respectively. "

### Showing the total number of lines in the actual content

In [84]:
content_split = []
for i in content.split("."):
    i = i.strip()
    if len(i) != 0 and i != " ":
        content_split.append(i)
len(content_split)

10

### Showing the total number of lines in the summary generated for content

In [85]:
summary_split = []
for i in summary.split("."):
    i = i.strip()
    if len(i) != 0 and i != " ":
        summary_split.append(i)
len(summary_split)

3

### Showing the number of lines removed

In [86]:
removed_lines = []
for i in content_split:
    if i not in summary_split:
        removed_lines.append(i)
len(removed_lines)

7

## 6.2 Combining all above process into a function

In [87]:
def removed_lines(content, summary):
    content_split = []
    for i in content.split("."):
        i = i.strip()
        if len(i) != 0 and i != " ":
            content_split.append(i)
    
    summary_split = []      
    for i in summary.split("."):
        i = i.strip()
        if len(i) != 0 and i != " ":
            summary_split.append(i)
    
    removed_line = []
    for i in content_split:
        if i not in summary_split:
            removed_line.append(i)
            
    removedLines = ". ".join(removed_line) 
    return removedLines

In [88]:
removed_line = df3.apply(lambda x: removed_lines(x['Original Content'], x['New Content']), axis=1)

In [89]:
removed_line

0      After reaching his hotel in the city, RM revea...
1      The group released their debut single album 2 ...
2      Billie Eilish's concert was held in Seoul, Sou...
3      BTS ARMY you all would be missing the members ...
4      BTS member Kim Seokjin aka Jin has the capacit...
                             ...                        
800    BTS has conquered the world with their group r...
801    Since it was released, the meaningful song rec...
802    After checking out his live video, BTS' Jungko...
803    BTS' eldest member Jin has shared pictures and...
804    After a lot of teasing, Benny Blanco’s collabo...
Length: 805, dtype: object

In [90]:
len(removed_line)

805

In [91]:
df3.shape

(805, 2)

## 6.3 Creating a seperated column for storing the removed lines

In [92]:
df3['Removed Lines'] = removed_line
df3.head()

Unnamed: 0,Original Content,New Content,Removed Lines
0,"After reaching his hotel in the city, RM revea...","As he sat at a roadside open-air restaurant, R...","After reaching his hotel in the city, RM revea..."
1,RM aka Kim Namjoon was the first member to joi...,RM aka Kim Namjoon was the first member to joi...,The group released their debut single album 2 ...
2,"Billie Eilish's concert was held in Seoul, Sou...",They really enjoyed the concert as the audienc...,"Billie Eilish's concert was held in Seoul, Sou..."
3,BTS ARMY you all would be missing the members ...,The boys are going to complete their projects ...,BTS ARMY you all would be missing the members ...
4,BTS member Kim Seokjin aka Jin has the capacit...,"Some days back, we saw the Sea of Jin concept ...",BTS member Kim Seokjin aka Jin has the capacit...


In [93]:
len(df3['Original Content'].iloc[0].split("."))-len(df3['New Content'].iloc[0].split("."))

17

In [94]:
len(df3['Removed Lines'].iloc[0].split("."))

17

In [95]:
from sentence_transformers import SentenceTransformer, util

In [96]:
model = SentenceTransformer('all-MiniLM-L6-v2')

In [97]:
originalC = df3['Original Content'].iloc[1]
newC = df3['New Content'].iloc[1]

In [98]:
en_1 = model.encode(originalC)
en_2 = model.encode(newC)
print(type(en_1))

<class 'numpy.ndarray'>


## 6.3 Using Cosine Similarity to estimate the similarity of summary and the original content

In [99]:
result = util.cos_sim(en_1, en_2)

print(float(result))
print(type(result))

0.7904312014579773
<class 'torch.Tensor'>


In [100]:
result_float = result.item()
print(result_float)
print(type(result_float))

0.7904312014579773
<class 'float'>


In [101]:
def metrics(originalC, newC):
    en_1 = model.encode(originalC)
    en_2 = model.encode(newC)
    result = util.cos_sim(en_1, en_2)
    # print(float(result))
    return float(result)

In [102]:
cosine = df3.apply(lambda x: metrics(x['Original Content'], x['New Content']), axis=1)

In [103]:
cosine.shape

(805,)

In [104]:
df3['Cosine Similarity'] = cosine

In [105]:
df3.head()

Unnamed: 0,Original Content,New Content,Removed Lines,Cosine Similarity
0,"After reaching his hotel in the city, RM revea...","As he sat at a roadside open-air restaurant, R...","After reaching his hotel in the city, RM revea...",0.796752
1,RM aka Kim Namjoon was the first member to joi...,RM aka Kim Namjoon was the first member to joi...,The group released their debut single album 2 ...,0.790431
2,"Billie Eilish's concert was held in Seoul, Sou...",They really enjoyed the concert as the audienc...,"Billie Eilish's concert was held in Seoul, Sou...",0.840863
3,BTS ARMY you all would be missing the members ...,The boys are going to complete their projects ...,BTS ARMY you all would be missing the members ...,0.709553
4,BTS member Kim Seokjin aka Jin has the capacit...,"Some days back, we saw the Sea of Jin concept ...",BTS member Kim Seokjin aka Jin has the capacit...,0.719757


In [106]:
min(cosine)

0.36477231979370117

In [107]:
max(cosine)

0.9571455717086792

In [108]:
np.mean(cosine)

0.7548249448678508

In [109]:
np.median(cosine)

0.7781685590744019

## 6.4 Using Sementic Similarity to estimate how sementically our generated summary is close to original content

In [110]:
nlp1 = spacy.load("en_core_web_lg")

In [111]:
doc1 = nlp(originalC)
doc2 = nlp(newC)
print(doc1.similarity(doc2)) 

0.9229698009279503


In [112]:
def nlptextsimilarity(originalC, newC):
    doc1 = nlp(originalC)
    doc2 = nlp(newC)
    return doc1.similarity(doc2) 

In [113]:
nlptextsimi = df3.apply(lambda x: nlptextsimilarity(x['Original Content'], x['New Content']), axis=1)

In [114]:
df3['NLP Text Similarity'] = nlptextsimi

In [115]:
df3.head()

Unnamed: 0,Original Content,New Content,Removed Lines,Cosine Similarity,NLP Text Similarity
0,"After reaching his hotel in the city, RM revea...","As he sat at a roadside open-air restaurant, R...","After reaching his hotel in the city, RM revea...",0.796752,0.947731
1,RM aka Kim Namjoon was the first member to joi...,RM aka Kim Namjoon was the first member to joi...,The group released their debut single album 2 ...,0.790431,0.92297
2,"Billie Eilish's concert was held in Seoul, Sou...",They really enjoyed the concert as the audienc...,"Billie Eilish's concert was held in Seoul, Sou...",0.840863,0.938945
3,BTS ARMY you all would be missing the members ...,The boys are going to complete their projects ...,BTS ARMY you all would be missing the members ...,0.709553,0.923807
4,BTS member Kim Seokjin aka Jin has the capacit...,"Some days back, we saw the Sea of Jin concept ...",BTS member Kim Seokjin aka Jin has the capacit...,0.719757,0.909008


In [116]:
min(nlptextsimi)

0.5577889218703898

In [117]:
max(nlptextsimi)

0.9906746332886787

In [118]:
np.mean(nlptextsimi)

0.923961025354796

In [119]:
np.median(nlptextsimi)

0.9330589202860173

## 6.5 As there was quite a difference in the median of metrics, I decided to provide Harmonic Mean of both the similaritiy

In [120]:
def harmonicMean(x, y):
    return (2*x*y)/(x+y)

In [121]:
harmean = df3.apply(lambda x: harmonicMean(x['Cosine Similarity'], x['NLP Text Similarity']), axis=1)

In [122]:
df3['Harmonic Mean'] = harmean

In [123]:
df3.head()

Unnamed: 0,Original Content,New Content,Removed Lines,Cosine Similarity,NLP Text Similarity,Harmonic Mean
0,"After reaching his hotel in the city, RM revea...","As he sat at a roadside open-air restaurant, R...","After reaching his hotel in the city, RM revea...",0.796752,0.947731,0.865708
1,RM aka Kim Namjoon was the first member to joi...,RM aka Kim Namjoon was the first member to joi...,The group released their debut single album 2 ...,0.790431,0.92297,0.851574
2,"Billie Eilish's concert was held in Seoul, Sou...",They really enjoyed the concert as the audienc...,"Billie Eilish's concert was held in Seoul, Sou...",0.840863,0.938945,0.887202
3,BTS ARMY you all would be missing the members ...,The boys are going to complete their projects ...,BTS ARMY you all would be missing the members ...,0.709553,0.923807,0.802628
4,BTS member Kim Seokjin aka Jin has the capacit...,"Some days back, we saw the Sea of Jin concept ...",BTS member Kim Seokjin aka Jin has the capacit...,0.719757,0.909008,0.803388


# 7. Storing The Result In a Seperate Dataframe

In [124]:
df3.to_csv('./Final Result.csv')

In [125]:
min(harmean)

0.49884301558564753

In [126]:
max(harmean)

0.9623399165870538

In [127]:
np.mean(harmean)

0.8266253043667523

In [128]:
np.median(harmean)

0.8415012837304162