# Using knkarthick/bart-large-xsum-samsum model and keybert to summarize text.

### 1. Import the necessary dependencies.

In [1]:
import pandas as pd

from transformers import pipeline

summarizer = pipeline("summarization", model="knkarthick/bart-large-xsum-samsum")

  from .autonotebook import tqdm as notebook_tqdm
Downloading: 100%|█████████████████████████████████████████████| 1.55k/1.55k [00:00<00:00, 604kB/s]
Downloading: 100%|████████████████████████████████████████████| 1.51G/1.51G [00:54<00:00, 29.7MB/s]
Downloading: 100%|████████████████████████████████████████████████| 337/337 [00:00<00:00, 72.4kB/s]
Downloading: 100%|███████████████████████████████████████████████| 780k/780k [00:03<00:00, 253kB/s]
Downloading: 100%|███████████████████████████████████████████████| 446k/446k [00:01<00:00, 298kB/s]
Downloading: 100%|█████████████████████████████████████████████| 1.29M/1.29M [00:03<00:00, 440kB/s]
Downloading: 100%|████████████████████████████████████████████████| 239/239 [00:00<00:00, 43.9kB/s]


### 2. Import the csv file, extract the text as a new line separated concatenated string, then tokenize the text.

After importing the csv file as a dataframe, I want to extract the text column as a concatenation of strings. Once I have the text in the desired format, I will use 'sent_tokenize' to identify all the sentences in the text.

In [78]:
transcript_df = pd.read_csv('transcript.csv')
transcript_df.columns = ['person_name','text']
text = transcript_df.text.str.cat(sep='')
text

"And first one in June this year.And it's been a month since we had the last one, but just to reiterate.We didn't review of the SO  assessment needed by PERSON B. PERSON B went to portfolio council afterwards to present it and get whitelisting. We got that from Denmark and Finland during the portfolio Council meeting, but Sweden and Norway had to go back to to secure that. But it came rather quickly. So today we have full whitelisting log, SO from Portfolio Council as well, and the implementation is processing.Today.Good. Justin, adoption platform, that's a recommendation also led by PERSON B.So basically PERSON B, I think the stage is yours for a presentation and the discussion.Yep, I have also invited earlier, but he notified that he'll be a few minutes late due to some.Some errands that he has to run.Alright, let me share.Let me share my screen.Let me know when you can see it.Yep.Yep.OK.Ohh really is also here. Hi.OK, good. So today we're talking about the customer digital adoption 

In [79]:
from nltk import tokenize

sentences = tokenize.sent_tokenize(text)

### 3. Divide the text in chunks and then summarize each chunk.

In [81]:
chunks = [sentences[x:x+10] for x in range(0, len(sentences), 10)]

In [84]:
for i in range(17):
    chunks[i]= " ".join(chunks[i])

In [89]:
for i in range(16):
    chunks[i] = summarizer(chunks[i], max_length=130, min_length=30)

In [90]:
chunks[0]

[{'summary_text': "We had a meeting today. And first one in June this year. And it's been a month since we had the last one, but just to reiterate."}]

In [91]:
chunks

[[{'summary_text': "We had a meeting today. And first one in June this year. And it's been a month since we had the last one, but just to reiterate."}],
 [{'summary_text': 'Umm and initial scope we have looked at Ishop cloud Portal which is ecommerce and asset management as a life cycle management applications and they will use pretty much the same thing.'}],
 [{'summary_text': 'The market and the digital option solutions is growing rapidly and companies are investing heavily to make sure that the products and services are. are usable.'}],
 [{'summary_text': 'Umm, we had a similar product called PRODUCT D, but we eliminated it because of the pricing model they have.Umm.'}],
 [{'summary_text': 'I have a question on the pricing. I know PRODUCT C is very pricey, but it sounds like you are also really impressed by what it offers us.'}],
 [{'summary_text': "It is part of the North Star architecture. It's part of ITECH as well. So it needs to be in here."}],
 [{'summary_text': "The major sel