> In this Notebook, we explored the summarization of TED Talk transcripts with two main approaches: <ul>
    <li> Keyword Weight (<a href='#keyword-weight'>Click to view</a>) </li>
    <li> Text Rank(<a href='#textrank'>Click to view</a>) </li> </ul>
  The end product of the both approaches are the key sentences in a transcript, which is an Extract summary of the transcript. <br>
  Due to the knowledge and time constraint, we did not explore abstract summarization to obtain more "short and sweet" summaries.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

import ast, pickle
from functions.preprocessing import preprocessing

import warnings
warnings.filterwarnings('ignore')

import spacy
nlp = spacy.load("en_core_web_lg")

In [2]:
df = pickle.load(open('data/pickle/summary_df.p','rb'))

## pre-work: the codes below do not have to be run except for cells containing functions##

In [2]:
df = pickle.load(open('data/pickle/filtered_talks.p','rb'))
df = df.reset_index(drop=True)
df.url = df.url.str.strip()

In [18]:
import contractions, re
def sentence_tokenize(text):
    fixed_contractions = contractions.fix(text)
    removed_parenthesis = re.sub("[\\(].*?[\\)]","",string=fixed_contractions) 
    doc = nlp(removed_parenthesis)
    return [sent.text for sent in doc.sents]

In [4]:
df['sents'] = df.transcript.map(sentence_tokenize)

<a id='keyword-weight'></a>
<h1>Keyword Weight Approach</h1>

> This approach utilizes the different types of keywords in a transcript to calculate a score for each sentence as a measurement of the importance to the transcript. The types of the keywords are <ul>
    <li>Words in the title</li>
    <li>Topic keywords in the top 3 likely topics of the transcript (30 in total) </li>
    <li>Top 15 TF-IDF words in the transcript</li> </ul>
  With regards to the way of scoring, we use one common formula: <b>Score of the sentence = Sum product of the unique keywords with their respective weights.</b> As for the weights of the keywords, we attempted three different sub-approaches:<ol>
    <li>All keywords carry the weight of 1</li>
    <li>The weight of a keyword = the number of its occurrence in the 3 types of keywords. E.g. if "Apple" appears in the title and topic keywords, it has a weight of 2.</li>
    <li>The weight of a keyword = (base score + fix score if it appears in the title) * TF-IDF score * Topic keyword score (highest score if it appears in more than 1 topic in the top 3 topics) We use multiplication rather than addition is that the scores are all in decimal places and multiplication will make those less important words not stand out.</li></ol>
    We did one last sub-approaches not on the keyword weights but to penalize the sentence by its length on top of the sub-approach 3.</li> 

In [5]:
title_tokens = pickle.load(open('data/pickle/title_tokens.p','rb'))
transcript_tokens = pickle.load(open('data/pickle/transcript_tokens.p','rb'))

title_tokens = title_tokens.reset_index(drop=True)
transcript_tokens =transcript_tokens.reset_index(drop=True)

In [6]:
df['title_tokens'] = title_tokens

In [7]:
from functions.tfidf_vectorizer_functions import preprocessor, tokenizer
vectorizer = pickle.load(open('model/tfidf_vectorizer.p','rb'))

In [9]:
def get_top_tfidf_dict(idx):
    if idx % 100 == 0:
        print(idx, end=',')
    transcript_str = "{@}".join(transcript_tokens[idx])
    tfidf_matrix = vectorizer.transform([transcript_str])
    feature_array = np.array(vectorizer.get_feature_names())
    tfidf_idx_score =  sorted([(i,score) for i,score in enumerate(tfidf_matrix.toarray().flatten())],key=lambda x:x[1],reverse=True)
    top_words_dict = {feature_array[i]:score for i,score in tfidf_idx_score[:15]}
    
    return top_words_dict

In [10]:
df['tfidf_dict'] = df.index.map(get_top_tfidf_dict)

0,100,200,300,400,500,600,700,800,900,1000,1100,1200,1300,1400,1500,1600,1700,1800,1900,2000,2100,2200,2300,2400,

In [11]:
topicWordMatrix = pd.read_csv('data/output/topicWordMatrix.csv')
docTop3TopicMatrix = pd.read_csv('data/output/docTop3TopicMatrix.csv')

top3TopicKeywords = docTop3TopicMatrix.applymap(lambda x:list([ast.literal_eval(x) for x in topicWordMatrix.iloc[:,x]]))

In [12]:
def get_topic_keywords_dict(idx):
    topic_keywords_list = top3TopicKeywords.iloc[idx,:]
    topic_keywords_dict = {}
    for keyword_list in topic_keywords_list:
        for (score,w) in keyword_list:
            if w not in topic_keywords_dict or score > topic_keywords_dict[w]:
                topic_keywords_dict[w] = score
    return topic_keywords_dict

In [13]:
df['topic_keywords_dict'] = df.index.map(lambda x: get_topic_keywords_dict(x))

In [15]:
len(df), len(transcript_tokens), len(top3TopicKeywords), len(title_tokens)

(2435, 2435, 2435, 2435)

In [16]:
pickle.dump(df,open('data/pickle/summary_df.p','wb'))

## Sub-approach 1: All keywords carry the same weight of 1 ##

> By looking at the result (top 15 sentences) of the first talk in the corpus, there are a few sentences that kind of capture the main ideas in the talk, such as sentence with indices 0, 4, 7, 8. They only cover the idea of education system issues in the talk at most.

In [3]:
def sent_scoring1(idx): # present vs absent 
    row = df.iloc[idx,:]
    sents_score_dict = {}
    keywords = set(row['title_tokens']) | set(row['tfidf_dict'].keys()) | set(row['topic_keywords_dict'].keys())
    for i, sent in enumerate(row['sents']):
        sents_score_dict[i] = 0
        for keyword in keywords:
            if keyword in sent or keyword.lower() in sent or keyword.title() in sent:
                sents_score_dict[i] +=1
    sents_score_list = sorted(zip(sents_score_dict.keys(),sents_score_dict.values()), key=lambda x:x[1],reverse=True)           
    
    return sents_score_list


In [18]:
print("Result from the first talk:")
for i, s in enumerate([df.sents[0][x[0]] for x in sent_scoring1(0)[:15]]):
    print("{}: {}".format(i,s))

Result from the first talk:
0: And the consequence is that many highly-talented, brilliant, creative people think they are not, because the thing they were good at at school was not valued, or was actually stigmatized.
1: There is not an education system on the planet that teaches dance everyday to children the way we teach them mathematics.
2: No, she is good at some things, but if she is cooking, she is dealing with people on the phone, she is talking to the kids, she is painting the ceiling, she is doing open-heart surgery over here.
3: If you think of it, children starting school this year will be retiring in 2065.
4: And we are now running national education systems where mistakes are the worst thing you can make.
5: what happens is, as children grow up, we start to educate them progressively from the waist up.
6: if you look at the output, who really succeeds by this, who does everything that they should, who gets all the brownie points, who are the winners — I think you would ha

## Sub-approach 2: Weight = No. of categories of keywords one belongs to ##

> The result of sub-approach two is similar to that of sub-approach one.

In [4]:
def sent_scoring2(idx): # present vs absent 
    row = df.iloc[idx,:]
    sents_score_dict = {}    
    
    keywords = set(row['title_tokens']) | set(row['tfidf_dict'].keys()) | set(row['topic_keywords_dict'].keys())
    keyword_weight_dict = {}
    for keyword in keywords:
        keyword_weight_dict[keyword] = 0
        if keyword in set(row['title_tokens']):
            keyword_weight_dict[keyword] +=1 
        if keyword in set(row['tfidf_dict'].keys()):
            keyword_weight_dict[keyword] +=1 
        if keyword in set(row['topic_keywords_dict'].keys()):
            keyword_weight_dict[keyword] +=1 

    for i, sent in enumerate(row['sents']):
        sents_score_dict[i] = 0
        for keyword in keywords:
            if keyword in sent:
                sents_score_dict[i] += keyword_weight_dict[keyword]
            elif keyword.lower() in sent:
                sents_score_dict[i] += keyword_weight_dict[keyword]
            elif keyword.title() in sent:
                sents_score_dict[i] += keyword_weight_dict[keyword]

    sents_score_list = sorted(zip(sents_score_dict.keys(),sents_score_dict.values()), key=lambda x:x[1],reverse=True)           
    
    return sents_score_list


In [20]:
print("Result from the first talk:")
for i, s in enumerate([df.sents[0][x[0]] for x in sent_scoring2(0)[:15]]):
    print("{}: {}".format(i,s))

Result from the first talk:
0: And the consequence is that many highly-talented, brilliant, creative people think they are not, because the thing they were good at at school was not valued, or was actually stigmatized.
1: If you think of it, children starting school this year will be retiring in 2065.
2: There is not an education system on the planet that teaches dance everyday to children the way we teach them mathematics.
3: So you were probably steered benignly away from things at school when you were a kid, things you liked, on the grounds that you would never get a job doing that.
4: No, she is good at some things, but if she is cooking, she is dealing with people on the phone, she is talking to the kids, she is painting the ceiling, she is doing open-heart surgery over here.
5: And we are now running national education systems where mistakes are the worst thing you can make.
6: what happens is, as children grow up, we start to educate them progressively from the waist up.
7: if y

## Sub-approach 3: Weight = (base score + score from title) * tfidf score * topic keyword score ##

> The result of sub-approach is very different from the first two sub-approaches with many new sentences emerged. The quality of the top 15 sentences of this approach is higher than that of the previous two sub-approaches. Sentences at indices 0, 3, 4 and 7 do capture the idea about creativity, and those at 6, 8, 10 capture the same idea as the first two sub-approaches about the problem with our education system.

In [5]:
def sent_scoring3(idx): # present vs absent 
    row = df.iloc[idx,:]
    sents_score_dict = {}    
    
    keywords = set(row['title_tokens']) | set(row['tfidf_dict'].keys()) | set(row['topic_keywords_dict'].keys())
    keyword_weight_dict = {}
    for keyword in keywords:
        keyword_weight_dict[keyword] = 0.01 #base score
        if keyword in set(row['title_tokens']):
            keyword_weight_dict[keyword] += 0.02 # score from title
        if keyword in set(row['tfidf_dict'].keys()):
            keyword_weight_dict[keyword] *= row['tfidf_dict'][keyword]
        if keyword in set(row['topic_keywords_dict'].keys()):
            keyword_weight_dict[keyword] *= row['topic_keywords_dict'][keyword]

    for i, sent in enumerate(row['sents']):
        sents_score_dict[i] = 0
        for keyword in keywords:
            if keyword in sent:
                sents_score_dict[i] += keyword_weight_dict[keyword]
            elif keyword.lower() in sent:
                sents_score_dict[i] += keyword_weight_dict[keyword]
            elif keyword.title() in sent:
                sents_score_dict[i] += keyword_weight_dict[keyword]

    sents_score_list = sorted(zip(sents_score_dict.keys(),sents_score_dict.values()), key=lambda x:x[1],reverse=True)           
    
    return sents_score_list


In [7]:
print("Result from the first talk:")
for i, s in enumerate([df.sents[0][x[0]] for x in sent_scoring3(0)[:15]]):
    print("{}: {}".format(i,s))

Result from the first talk:
0: In fact, creativity — which I define as the process of having original ideas that have value — more often than not comes about through the interaction of different disciplinary ways of seeing things.
1: One is the extraordinary evidence of human creativity in all of the presentations that we have had and in all of the people here.
2: and I want to talk about creativity.
3: My contention is that creativity now is as important in education as literacy, and we should treat it with the same status.
4: I believe this passionately, that we do not grow into creativity, we grow out of it.
5: And the second is academic ability, which has really come to dominate our view of intelligence, because the universities designed the system in their image.
6: If you think of it, the whole system of public education around the world is a protracted process of university entrance.
7: I do not mean to say that being wrong is the same thing as being creative.
8: if you look at 

## Sub-approach 4: Normalize the score by the sentence length with a factor of 0.01 ##

> Sub-approach 4 clearly penalized the length of the sentence to an extent that short sentences are prioritized too much. We also tried with the factor of 0.001 and the results do not seem to differ much. As such, the penalization does not work well in this case.

In [6]:
def sent_scoring4(idx): # present vs absent 
    row = df.iloc[idx,:]
    sents_score_dict = {}    
    
    keywords = set(row['title_tokens']) | set(row['tfidf_dict'].keys()) | set(row['topic_keywords_dict'].keys())
    keyword_weight_dict = {}
    for keyword in keywords:
        keyword_weight_dict[keyword] = 0.01 # base score
        if keyword in set(row['title_tokens']):
            keyword_weight_dict[keyword] += 0.02 # score from title
        if keyword in set(row['tfidf_dict'].keys()):
            keyword_weight_dict[keyword] *= row['tfidf_dict'][keyword]
        if keyword in set(row['topic_keywords_dict'].keys()):
            keyword_weight_dict[keyword] *= row['topic_keywords_dict'][keyword]

    for i, sent in enumerate(row['sents']):
        sents_score_dict[i] = 0
        for keyword in keywords:
            if keyword in sent:
                sents_score_dict[i] += keyword_weight_dict[keyword]
            elif keyword.lower() in sent:
                sents_score_dict[i] += keyword_weight_dict[keyword]
            elif keyword.title() in sent:
                sents_score_dict[i] += keyword_weight_dict[keyword]
        sents_score_dict[i] = sents_score_dict[i]/(len(sent) * 0.01)
    sents_score_list = sorted(zip(sents_score_dict.keys(),sents_score_dict.values()), key=lambda x:x[1],reverse=True)           
    
    return sents_score_list


In [8]:
print("Result from the first talk:")
for i, s in enumerate([df.sents[0][x[0]] for x in sent_scoring4(0)[:15]]):
    print("{}: {}".format(i,s))

Result from the first talk:
0: and I want to talk about creativity.
1: I believe this passionately, that we do not grow into creativity, we grow out of it.
2: My contention is that creativity now is as important in education as literacy, and we should treat it with the same status.
3: One is the extraordinary evidence of human creativity in all of the presentations that we have had and in all of the people here.
4: In fact, creativity — which I define as the process of having original ideas that have value — more often than not comes about through the interaction of different disciplinary ways of seeing things.
5: I think now they would say she had ADHD.
6: Was that wrong?"
7: and you say you work in education
8: People who had to move to think.
9: " Who had to move to think.
10: We were sitting there
11: We need to radically rethink our view of intelligence.
12: People who could not sit still.
13: I do not mean to say that being wrong is the same thing as being creative.
14: Our educa

<a id='textrank'></a>
# Text Rank Approach #

> This approach leverages the existing TextRank algorithm. It is a graph-based ranking algorithm where the importance of a sentence as the vertex in a graph is based on the sum of weighted “votes” cast by other sentences with the weights being the similarity between the two sentences. The algorithm is built on top of the PageRank algorithm on ranking sentences instead of documents. <br><br> Initially, we tried to apply the algorithm on all the sentences but failed due to the very large number of sentences each transcript contains. Thus, we conducted a filtering before applying the algorithm with the Keyword Weight sub-approach 3.<br><br> The results turned out to be not as good as the Keyword Weight sub-approach 3. 

In [9]:
import networkx as nx
from itertools import combinations,product

def build_similarity_matrix(sentences):
    similarity_matrix = np.zeros((len(sentences), len(sentences)))
    docs = [nlp(doc) for doc in sentences] 
    
    for idx1,idx2 in combinations(range(len(docs)),r=2):
        if idx1 != idx2:
            score = docs[idx1].similarity(docs[idx2])
            similarity_matrix[idx1][idx2] = score
            similarity_matrix[idx2][idx1] = score
    return similarity_matrix

In [10]:
def textRank_sents(sents):
    sentence_similarity_martix = build_similarity_matrix(sents)
    sentence_similarity_graph = nx.from_numpy_array(sentence_similarity_martix)
    scores = nx.pagerank(sentence_similarity_graph,max_iter=500)

    ranked_sentence = sorted([(s,scores[i],) for i,s in enumerate(sents)], key=lambda x:x[1], reverse=True)     
    return ranked_sentence

## Sub-approach 1: Use all the sentences ##

> When using all the sentences in a transcript, the large number of sentences may cause the pagerank algorithm in Networkx library to fail to converge even with 500 max iterations (the default is 100). As such, we decide to do a filtering of sentence before feeding them into the page rank algorithm. 

In [34]:
print("Result from the first talk:")
for i, s in enumerate(textRank_sents(df.sents[0])[:15]):
    print("{}: {}".format(i,s))

Result from the first talk:


PowerIterationFailedConvergence: (PowerIterationFailedConvergence(...), 'power iteration failed to converge within 500 iterations')

## Sub-approach 2: Use the top 100 sentences selected using Keyword Weight - 3 ##

In [13]:
selected_sents = [df.sents[0][x[0]] for x in sent_scoring3(0)][:100]

In [14]:
print("Result from the first talk:")
for i, s in enumerate(textRank_sents(selected_sents)[:15]):
    print("{}: {}".format(i,s))

Result from the first talk:
0: ('if you look at the output, who really succeeds by this, who does everything that they should, who gets all the brownie points, who are the winners — I think you would have to conclude the whole purpose of public education throughout the world is to produce university professors.', 0.010423259616860902)
1: ('there is something curious about professors in my experience — not all of them, but typically, they live in their heads.', 0.010373245902220336)
2: ('And I like university professors, but you know, we should not hold them up as the high-water mark of all human achievement.', 0.010369232863169061)
3: ('So you were probably steered benignly away from things at school when you were a kid, things you liked, on the grounds that you would never get a job doing that.', 0.0103587623833243)
4: ('But they are rather curious, and I say this out of affection for them.', 0.01033855187805182)
5: ('And she is exceptional, but I think she is not, so to speak, except

# Evaluation of approaches and sub-approaches #

> So far, the we use the first TED Talk, <i>Do schools kill creativity</i>, to measure the performance of the approaches and sub-approaches, and the measurement is based on human judgment rather than a standard quantitative measure. Moreover, we understand that different people may hold different opinions with respect to one summary. While it may be impossible to create a summary that satisfies everyone, we tried to create one that caters to as many people as possible. <br><br>
    As such, we randomly selected 10 talks and four members in our team watched them and selected the top 15 sentences that they believed to be included in the summary individually. We also used a free online summarizer: autosummarizer.com. The summaries from each of the sub-approach of each approach were generated as well. Then we used 3 measures: Common sentence count, Cosine similarity and ROUGE-2 metrics to measure the performance.<br><br>
    Due to some confusion, one member used the wrong talk url for her personal and autosummarizer evaluation. In the end, we will use 9 talks.<br><br>
    From the results, we observed that top Cosine similarity values differs minimally. The common sentence count only show that on an average, our model-generated summary only matches at most one sentences with the human-generated summary. These two metrics are thus not so important in choosing the best approach. <br><br>
Looking at the ROUGE2 metrics, there is a clear trade-off between Precision and Recall. The top 2 approaches in ROUGE2-F1 score are Key Weight sub-approach 2 and TextRank with Keyword Weight sub-approach 3. We believe that Recall is more important than Precision because lower Precision indicates more noise but lower Recall indicates missing out main points. Therefore, the final approach is TextRank with Keyword Weight sub-approach 3.
    

In [15]:
eval_urls = ['https://www.ted.com/talks/gabby_giffords_and_mark_kelly_be_passionate_be_courageous_be_your_best',
 'https://www.ted.com/talks/annmarie_thomas_squishy_circuits',
 'https://www.ted.com/talks/bel_pesce_5_ways_to_kill_your_dreams',
 'https://www.ted.com/talks/benjamin_barber_why_mayors_should_rule_the_world',
 'https://www.ted.com/talks/nick_bostrom_on_our_biggest_problems',
 'https://www.ted.com/talks/richard_dawkins_on_militant_atheism',
 'https://www.ted.com/talks/elif_shafak_the_politics_of_fiction',
 'https://www.ted.com/talks/roxane_gay_confessions_of_a_bad_feminist',
 'https://www.ted.com/talks/mena_trott_tours_her_blog_world']

#  'https://www.ted.com/talks/kristen_ashburn_s_heart_rending_pictures_of_aids' is ignored due to the confusion

In [16]:
xiaowei = pd.read_csv('data/input/summary/xiaowei.csv',encoding='iso-8859-1')
may = pd.read_csv('data/input/summary/may.csv',encoding='iso-8859-1')
suyee = pd.read_csv('data/input/summary/suyee.csv',encoding='iso-8859-1')
autosummarizer =  pd.read_csv('data/input/summary/autosummarizer.csv',encoding='iso-8859-1')
wende = pd.read_csv('data/input/summary/wende.csv',encoding='iso-8859-1')

suyee.url = suyee.url.str.strip()
xiaowei.url= xiaowei.url.str.strip()
autosummarizer.url = autosummarizer.url.str.strip()
may.url = may.url.str.strip()
may = may.iloc[:,:2]
wende.url = wende.url.str.strip()

In [19]:
model_output = df[df.url.isin(eval_urls)][['transcript','url']]
model_output['sents'] = model_output.transcript.apply(sentence_tokenize)
model_output['kw_1'] =  model_output.index.map(sent_scoring1)
model_output['kw_2'] =  model_output.index.map(sent_scoring2)
model_output['kw_3'] =  model_output.index.map(sent_scoring3)
model_output['kw_4'] =  model_output.index.map(sent_scoring4)

In [20]:
evaluator_df = pd.DataFrame()
evaluator_df['url'] = model_output.url

evaluator_df = evaluator_df.merge(xiaowei,on='url')
evaluator_df.rename({'summary':'xiaowei'},axis=1, inplace=True)

evaluator_df = evaluator_df.merge(wende,on='url')
evaluator_df.rename({'summary':'wende'},axis=1, inplace=True)

evaluator_df = evaluator_df.merge(may,on='url')
evaluator_df.rename({'summary':'may'},axis=1, inplace=True)

evaluator_df = evaluator_df.merge(suyee,on='url')
evaluator_df.rename({'summary':'suyee'},axis=1, inplace=True)

evaluator_df = evaluator_df.merge(autosummarizer,on='url')
evaluator_df.rename({'summary':'autosummarizer'},axis=1, inplace=True)

evaluator_df.iloc[:,1:] = evaluator_df.iloc[:,1:].applymap(sentence_tokenize)

In [21]:
def extract_sents(score_list, sents, num=10):
    return [sents[idx] for (idx,score) in score_list[:num]]

In [22]:
model_output['textrank-kw1'] = model_output.apply(lambda x:extract_sents(x['kw_1'],x['sents'],num=50),axis=1)
model_output['textrank-kw1'] = model_output['textrank-kw1'].map(textRank_sents)
model_output['textrank-kw1'] = model_output['textrank-kw1'].map(lambda x:[s[0] for s in x[:10]])

model_output['textrank-kw2'] = model_output.apply(lambda x:extract_sents(x['kw_2'],x['sents'],num=50),axis=1)
model_output['textrank-kw2'] = model_output['textrank-kw2'].map(textRank_sents)
model_output['textrank-kw2'] = model_output['textrank-kw2'].map(lambda x:[s[0] for s in x[:10]])

model_output['textrank-kw3'] = model_output.apply(lambda x:extract_sents(x['kw_3'],x['sents'],num=50),axis=1)
model_output['textrank-kw3'] = model_output['textrank-kw3'].map(textRank_sents)
model_output['textrank-kw3'] = model_output['textrank-kw3'].map(lambda x:[s[0] for s in x[:10]])

model_output['textrank-kw4'] = model_output.apply(lambda x:extract_sents(x['kw_4'],x['sents'],num=50),axis=1)
model_output['textrank-kw4'] = model_output['textrank-kw4'].map(textRank_sents)
model_output['textrank-kw4'] = model_output['textrank-kw4'].map(lambda x:[s[0] for s in x[:10]])

In [23]:
model_output['kw_1'] = model_output.apply(lambda x:extract_sents(x['kw_1'],x['sents']),axis=1)
model_output['kw_2'] = model_output.apply(lambda x:extract_sents(x['kw_2'],x['sents']),axis=1)
model_output['kw_3'] = model_output.apply(lambda x:extract_sents(x['kw_3'],x['sents']),axis=1)
model_output['kw_4'] = model_output.apply(lambda x:extract_sents(x['kw_4'],x['sents']),axis=1)

In [24]:
def evaluate(evaluator, approach, method):
    eval_result = []
    model_result = model_output[approach].tolist()
    evaluator_result = evaluator_df[evaluator].tolist()
    for idx in range(len(model_result)):
        eval_result.append(method(model_result[idx],evaluator_result[idx]))
    return eval_result

In [25]:
pickle.dump(model_output,open('data/pickle/summarizer_model_output.p','wb'))

In [16]:
model_output = pickle.load(open('data/pickle/summarizer_model_output.p','rb'))

## Common Sentence Count ##

In [26]:
def count_same_sent(t1,t2):
    sents1 = [t.strip() for t in t1]
    sents2 = [t.strip() for t in t2]
    return len(set(sents1) & set(sents2))

In [27]:
common_sent_count_matrix = np.zeros((5,8,9)) # evaluator x approach x talk

for i, e in enumerate(evaluator_df.columns[1:]):
    for j, a in enumerate(model_output.columns[3:]):
#         print(e,a, evaluate(e,a,count_same_sent))
        common_sent_count_matrix[i][j] = evaluate(e,a,count_same_sent)
#     break

In [28]:
print("Number of mean common sentences for the approaches: ")
print("KW-1, KW-2, KW-3, KW-4, Textrank-kw1, Textrank-kw2, Textrank-kw3, Textrank-kw4")
np.mean(common_sent_count_matrix,axis=(0,2)) # axis = (0,2) regardless of talk and evaluator

Number of mean common sentences for the approaches: 
KW-1, KW-2, KW-3, KW-4, Textrank-kw1, Textrank-kw2, Textrank-kw3, Textrank-kw4


array([1.68888889, 1.73333333, 1.6       , 1.31111111, 1.4       ,
       1.46666667, 1.35555556, 1.55555556])

## Cosine similarity ##

In [29]:
def cosine_similarity(t1,t2):
    doc1 = nlp(' '.join(t1))
    doc2 = nlp(' '.join(t2))
    return doc1.similarity(doc2)

In [30]:
cosine_similarity_matrix = np.zeros((5,8,9)) # evaluator x approach x talk

for i, e in enumerate(evaluator_df.columns[1:]):
    for j, a in enumerate(model_output.columns[3:]):
        cosine_similarity_matrix[i][j] = evaluate(e,a,cosine_similarity)

In [31]:
print("Mean cosine similarity for the approaches: ")
print("KW-1, KW-2, KW-3, KW-4, Textrank-kw1, Textrank-kw2, Textrank-kw3, Textrank-kw4")
np.mean(cosine_similarity_matrix,axis=(0,2))

Mean cosine similarity for the approaches: 
KW-1, KW-2, KW-3, KW-4, Textrank-kw1, Textrank-kw2, Textrank-kw3, Textrank-kw4


array([0.99149228, 0.9905352 , 0.98603891, 0.90557505, 0.99107434,
       0.99173125, 0.99171895, 0.91376191])

## ROUGE-2 Metrics ##

In [32]:
from rouge.metrics import rouge_n_summary_level

def compute_rouge2(t1,t2):
    doc1 = ' '.join(t1)
    doc2 = ' '.join(t2)

    score = rouge_n_summary_level(doc1,doc2,2)
    return score

In [33]:
rouge2_matrix = np.zeros((5,8,9,3)) # evaluator x approach x talk x metrics (recall - precision - f1)

for i, e in enumerate(evaluator_df.columns[1:]):
    for j, a in enumerate(model_output.columns[3:]):
        rouge2_matrix[i][j] = evaluate(e,a,compute_rouge2)

In [34]:
print("Mean Rouge-2 metric for the approaches: (Each row is an approach, the columns are Recall, Precision and F1)")
rouge2_matrix.mean(axis=(0,2))

Mean Rouge-2 metric for the approaches: (Each row is an approach, the columns are Recall, Precision and F1)


array([[0.80684015, 0.70085532, 0.7350154 ],
       [0.78627783, 0.71936748, 0.73719144],
       [0.65442494, 0.78583881, 0.69561985],
       [0.32633929, 0.81840495, 0.44051703],
       [0.81984058, 0.68168641, 0.72868632],
       [0.8168782 , 0.69227431, 0.73490255],
       [0.81090111, 0.70109097, 0.73838135],
       [0.71448321, 0.66901297, 0.66165103]])

In [35]:
eval_df = pd.DataFrame()
eval_df['common_sent_count'] = np.mean(common_sent_count_matrix,axis=(0,2)).T
eval_df['cosine_similarity'] = np.mean(cosine_similarity_matrix,axis=(0,2)).T
eval_df['ROUGE-2-Recall'] = rouge2_matrix.mean(axis=(0,2)).T[0]
eval_df['ROUGE-2-Precision'] = rouge2_matrix.mean(axis=(0,2)).T[1]
eval_df['ROUGE-2-F1'] = rouge2_matrix.mean(axis=(0,2)).T[2]
eval_df.index = ['Keyword Weight-1', 'Keyword Weight-2','Keyword Weight-3','Keyword Weight-4','TextRank-kw1','TextRank-kw2',
                 'TextRank-kw3','TextRank-kw4']
eval_df

Unnamed: 0,common_sent_count,cosine_similarity,ROUGE-2-Recall,ROUGE-2-Precision,ROUGE-2-F1
Keyword Weight-1,1.688889,0.991492,0.80684,0.700855,0.735015
Keyword Weight-2,1.733333,0.990535,0.786278,0.719367,0.737191
Keyword Weight-3,1.6,0.986039,0.654425,0.785839,0.69562
Keyword Weight-4,1.311111,0.905575,0.326339,0.818405,0.440517
TextRank-kw1,1.4,0.991074,0.819841,0.681686,0.728686
TextRank-kw2,1.466667,0.991731,0.816878,0.692274,0.734903
TextRank-kw3,1.355556,0.991719,0.810901,0.701091,0.738381
TextRank-kw4,1.555556,0.913762,0.714483,0.669013,0.661651


# Example usage #

In [41]:
idx = 0
transcript_sents = df.sents[idx]
scores = sent_scoring3(idx)
top100 = extract_sents(scores,transcript_sents)
ranked_sents = textRank_sents(top100)
for i, s in enumerate([x[0] for x in ranked_sents[:10]]):
    print("{}. {}".format(i+1,s))

1. if you look at the output, who really succeeds by this, who does everything that they should, who gets all the brownie points, who are the winners — I think you would have to conclude the whole purpose of public education throughout the world is to produce university professors.
2. In fact, creativity — which I define as the process of having original ideas that have value — more often than not comes about through the interaction of different disciplinary ways of seeing things.
3. My contention is that creativity now is as important in education as literacy, and we should treat it with the same status.
4. I do not mean to say that being wrong is the same thing as being creative.
5. I believe this passionately, that we do not grow into creativity, we grow out of it.
6. One is the extraordinary evidence of human creativity in all of the presentations that we have had and in all of the people here.
7. And the second is academic ability, which has really come to dominate our view of int