In [1]:
import numpy as np
import pandas as pd
import nltk
nltk.download('punkt') # one time execution
import re

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [2]:
df = pd.read_csv("news.csv",encoding = 'unicode-escape')
df.head()

Unnamed: 0,title,content,published_at,source,topic
0,BTS: RM is reminded of Bon Voyage as he travel...,"After reaching his hotel in the city, RM revea...",2022-07-30T07:00:00Z,2,13
1,RM recalls wondering if he 'made right decisio...,RM aka Kim Namjoon was the first member to joi...,2022-12-22T15:57:55Z,2,13
2,BTS: J-Hope and RM go bonkers at Billie Eilish...,"Billie Eilish's concert was held in Seoul, Sou...",2022-08-16T07:00:00Z,1,7
3,"BTS: J-Hope proudly states he raised Jungkook,...",BTS ARMY y'all would be missing the members a ...,2022-12-18T13:08:40Z,1,7
4,BTS: Jin aka Kim Seokjin takes us through the ...,BTS member Kim Seokjin aka Jin has the capacit...,2022-11-21T08:00:00Z,1,8


In [3]:
text = df['content'][0]
text

'After reaching his hotel in the city, RM revealed that his stay would be for four days and added that he would step out for dinner. As he sat at a roadside open-air restaurant, RM feasted on beer, burgers and fries. He said, "I\'m starving right now. I\'m out to grab some food. It\'s much quieter than I expected and feels like a rural town. I like the familiar atmosphere." RM attended Art Basel and explained on camera the details of the art fair. He also gave a glimpse as he had noodles and beer which was followed by soup noodles and wrap. Showing the pattern of a ping pong table, RM said, "The table looks like our (BTS) symbol." He also spoke about the art pieces as he viewed them. After that, RM took a tram to visit the Foundation Beyeler, a museum. He later took a walk through the city. On his third day, RM visited the Kunstmuseum Basel, the Vitra Design Museum and the gallery. As he walked around, RM showed a chair to his fans and said, "I have breaking news for you guys. Coldplay

In [4]:
from nltk.tokenize import sent_tokenize
sentences = sent_tokenize(text)

In [5]:
len(sentences)

25

In [7]:
!wget http://nlp.stanford.edu/data/glove.6B.zip
!unzip glove*.zip

--2023-01-24 14:11:45--  http://nlp.stanford.edu/data/glove.6B.zip
Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://nlp.stanford.edu/data/glove.6B.zip [following]
--2023-01-24 14:11:45--  https://nlp.stanford.edu/data/glove.6B.zip
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://downloads.cs.stanford.edu/nlp/data/glove.6B.zip [following]
--2023-01-24 14:11:45--  https://downloads.cs.stanford.edu/nlp/data/glove.6B.zip
Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22
Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|171.64.64.22|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862182613 (822M) [application/zip]
Saving to: ‘glove.6B.zip’


202

In [12]:
# Extract word vectors
word_embeddings = {}
f = open('glove.6B.100d.txt', encoding='utf-8')
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    word_embeddings[word] = coefs
f.close()

In [13]:
len(word_embeddings)

400000

In [6]:
# remove punctuations, numbers and special characters
clean_sentences = pd.Series(sentences).str.replace("[^a-zA-Z]", " ")

# make alphabets lowercase
clean_sentences = [s.lower() for s in clean_sentences]

  clean_sentences = pd.Series(sentences).str.replace("[^a-zA-Z]", " ")


In [7]:
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [8]:
from nltk.corpus import stopwords
stop_words = stopwords.words('english')

In [9]:
# function to remove stopwords
def remove_stopwords(sen):
    sen_new = " ".join([i for i in sen if i not in stop_words])
    return sen_new

In [10]:
# remove stopwords from the sentences
clean_sentences = [remove_stopwords(r.split()) for r in clean_sentences]

In [14]:

sentence_vectors = []
for i in clean_sentences:
  if len(i) != 0:
    v = sum([word_embeddings.get(w, np.zeros((100,))) for w in i.split()])/(len(i.split())+0.001)
  else:
    v = np.zeros((100,))
  sentence_vectors.append(v)

In [15]:
len(sentence_vectors)

25

In [16]:
# similarity matrix
sim_mat = np.zeros([len(sentences), len(sentences)])


In [17]:
from sklearn.metrics.pairwise import cosine_similarity


In [18]:
for i in range(len(sentences)):
  for j in range(len(sentences)):

    if i != j:
      sim_mat[i][j] = cosine_similarity(sentence_vectors[i].reshape(1,100), sentence_vectors[j].reshape(1,100))[0,0]

In [19]:
import networkx as nx

#Ranking lines using PageRank Algorithm
nx_graph = nx.from_numpy_array(sim_mat)
scores = nx.pagerank(nx_graph)

In [20]:
ranked_sentences = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)

In [21]:
# Specify number of sentences to form the summary
sn = int(len(sentences)*(0.7))
# Generate summary
for i in range(sn):
  print(ranked_sentences[i][1])

After reaching his hotel in the city, RM revealed that his stay would be for four days and added that he would step out for dinner.
Recalling his previous visit to Lucerne, RM added, "I remember the day of crossing that bridge and buying souvenirs."
He later took a walk through the city.
As he walked around, RM showed a chair to his fans and said, "I have breaking news for you guys.
He also spoke about the art pieces as he viewed them.
RM attended Art Basel and explained on camera the details of the art fair.
It's much quieter than I expected and feels like a rural town.
If you see this Chris, give me a call.
Showing the pattern of a ping pong table, RM said, "The table looks like our (BTS) symbol."
RM's travel in Switzerland ended with a visit to the Museum Tinguely.
Speaking to the camera, RM said, "I rode the SSB train to Lucerne, rode a boat, rode the mountain train, walked down the track road, rode the cable cars, and now I'm on a boat planning to go ride the SSB again."
RM's vlog

In [26]:
def Summary(text,summary_text_percent):
  sentences = sent_tokenize(text)
  # remove punctuations, numbers and special characters
  clean_sentences = pd.Series(sentences).str.replace("[^a-zA-Z]", " ")

  # make alphabets lowercase
  clean_sentences = [s.lower() for s in clean_sentences]

  stop_words = stopwords.words('english')

  # remove stopwords from the sentences
  clean_sentences = [remove_stopwords(r.split()) for r in clean_sentences]

  #Making sentences vectors
  sentence_vectors = []
  for i in clean_sentences:
    if len(i) != 0:
      v = sum([word_embeddings.get(w, np.zeros((100,))) for w in i.split()])/(len(i.split())+0.001)
    else:
      v = np.zeros((100,))
    sentence_vectors.append(v)

  # similarity matrix
  sim_mat = np.zeros([len(sentences), len(sentences)])
  for i in range(len(sentences)):
    for j in range(len(sentences)):
      if i != j:
        sim_mat[i][j] = cosine_similarity(sentence_vectors[i].reshape(1,100), sentence_vectors[j].reshape(1,100))[0,0]
  
  
  #Ranking lines using PageRank Algorithm
  nx_graph = nx.from_numpy_array(sim_mat)
  scores = nx.pagerank(nx_graph)
  ranked_sentences = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)
  
  # Specify number of sentences to form the summary
  sn = int(len(sentences)*(summary_text_percent))
  
  # Generate summary
  summary_text = ''
  for i in range(sn):
    summary_text+=ranked_sentences[i][1]
  removed_lines=''
  for i in range(sn,len(ranked_sentences)):
    removed_lines+=ranked_sentences[i][1]

  return [text,summary_text,removed_lines]

In [27]:
#Example

text1 = df['content'][1]
[text1,summary_text,removed_lines] = Summary(text1,0.6)

  clean_sentences = pd.Series(sentences).str.replace("[^a-zA-Z]", " ")


In [28]:
print(summary_text)

"To be honest, one decision that I had often thought about was my choice to become a part of a boy band.I often wondered whether I made the right decision by joining a boy band.I often think about what it would have been like if I continued my studies or became something other than a musician."At the time, BTS, was treated like a complete outsider in the Korean hip-hop community.I was constantly thinking about how I would be able to overcome that perception and how to define music or hip-hop,â he added.Earlier this month, RM released his first full-length solo album Indigo.In my journey with BTS, I drifted further and further away from that world and was tormented by the thought that the people that I liked â and the people who enjoyed the same music as I â did not have any love for me.In the late 2000s, musicians like Zico, Changmo, and Giriboy were the people that I started out with.Three years later, he released his second mixtape, Mono.That film visualized many of the ideas t

In [29]:
print(removed_lines)

The group released their debut single album 2 Cool 4 Skool on June 12, 2013.Recently, I watched Everything Everywhere All At Once.In an interview with Hypebeast, RM said, "This is the most difficult question to answer truthfully.RM aka Kim Namjoon was the first member to join BTS.RM released his first solo mixtape in 2015.Apart from RM, BTS also features Jin, Suga, J-Hope, Jimin, V, and Jungkook.RM has collaborated with artists such as Wale, Younha, Warren G, Gaeko, Krizz Kaliko, MFBTY, Fall Out Boy, Primary, Lil Nas X, Erykah Badu, and Anderson .Paak.That stressed me out.
