## Summary Generation using Gensim's Summarize method
### Data is read in the initial phase and given as input to Gensim's summarize method. Summaries obtained by varying different input parameter values are also obtained. 

## Initial Phase
### Importing Libraries and Reading Data 

In [41]:
import gensim
import logging
import numpy
import pandas
from gensim.summarization import summarize, keywords
import re

In [7]:
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

In [8]:
df = pandas.read_csv('Downloads/tennis_articles_v4.csv')

### Storing data from input file and replacing ' with white space 

In [4]:
sentences = ""
for a in df['article_text']:
    sentences+=a
sentences = re.sub("'","",sentences)
sentences

## Results
### Result Obtained from Gensim Summary

In [37]:
print 'summary:'
print summarize(sentences)

2019-10-28 12:18:13,738 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2019-10-28 12:18:13,746 : INFO : built Dictionary(507 unique tokens: [u'coach', u'forget', u'celebr', u'focus', u'month']...) from 106 documents (total 1071 corpus positions)
2019-10-28 12:18:13,753 : INFO : Building graph
2019-10-28 12:18:13,754 : INFO : Filling graph
2019-10-28 12:18:13,795 : INFO : Removing unreachable nodes of graph
2019-10-28 12:18:13,797 : INFO : Pagerank graph
2019-10-28 12:18:13,810 : INFO : Sorting pagerank scores


summary:
When Im on the courts or when Im on the court playing, Im a competitor and I want to beat every single person whether theyre in the locker room or across the net.So Im not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match.
There are so many other things that were interested in, that we do.BASEL, Switzerland (AP), Roger Federer advanced to the 14th Swiss Indoors final of his career by beating seventh-seeded Daniil Medvedev 6-1, 6-4 on Saturday.
Seeking a ninth title at his hometown event, and a 99th overall, Federer will play 93th-ranked Marius Copil on Sunday.
Federer dominated the 20th-ranked Medvedev and had his first match-point chance to break serve again at 5-1.
Speaking at the Swiss Indoors tournament where he will play in Sundays final against Romanian qualifier Marius Copil, the world number three said that given the impossibly short time frame to make a decision, he opted out of any c

### Shorter Summary obtained by setting ratio to 0.1

In [38]:
print 'summary:'
print summarize(sentences, ratio = 0.1)

2019-10-28 12:18:20,762 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2019-10-28 12:18:20,769 : INFO : built Dictionary(507 unique tokens: [u'coach', u'forget', u'celebr', u'focus', u'month']...) from 106 documents (total 1071 corpus positions)
2019-10-28 12:18:20,774 : INFO : Building graph
2019-10-28 12:18:20,776 : INFO : Filling graph
2019-10-28 12:18:20,808 : INFO : Removing unreachable nodes of graph
2019-10-28 12:18:20,809 : INFO : Pagerank graph
2019-10-28 12:18:20,828 : INFO : Sorting pagerank scores


summary:
When Im on the courts or when Im on the court playing, Im a competitor and I want to beat every single person whether theyre in the locker room or across the net.So Im not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match.
There are so many other things that were interested in, that we do.BASEL, Switzerland (AP), Roger Federer advanced to the 14th Swiss Indoors final of his career by beating seventh-seeded Daniil Medvedev 6-1, 6-4 on Saturday.
Speaking at the Swiss Indoors tournament where he will play in Sundays final against Romanian qualifier Marius Copil, the world number three said that given the impossibly short time frame to make a decision, he opted out of any commitment.
This was designed for the future generation of players." Argentina and Britain received wild cards to the new-look event, and will compete along with the four 2018 semi-finalists and the 12 teams who win qualifying ro

### Summary obtained as a complete paragraph with ratio as 0.1 

In [39]:
print 'summary:'
print summarize(sentences, ratio = 0.1, split = True)

2019-10-28 12:18:24,359 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2019-10-28 12:18:24,364 : INFO : built Dictionary(507 unique tokens: [u'coach', u'forget', u'celebr', u'focus', u'month']...) from 106 documents (total 1071 corpus positions)
2019-10-28 12:18:24,368 : INFO : Building graph
2019-10-28 12:18:24,369 : INFO : Filling graph
2019-10-28 12:18:24,404 : INFO : Removing unreachable nodes of graph
2019-10-28 12:18:24,407 : INFO : Pagerank graph
2019-10-28 12:18:24,434 : INFO : Sorting pagerank scores


summary:
['When Im on the courts or when Im on the court playing, Im a competitor and I want to beat every single person whether theyre in the locker room or across the net.So Im not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match.', 'There are so many other things that were interested in, that we do.BASEL, Switzerland (AP), Roger Federer advanced to the 14th Swiss Indoors final of his career by beating seventh-seeded Daniil Medvedev 6-1, 6-4 on Saturday.', 'Speaking at the Swiss Indoors tournament where he will play in Sundays final against Romanian qualifier Marius Copil, the world number three said that given the impossibly short time frame to make a decision, he opted out of any commitment.', 'This was designed for the future generation of players." Argentina and Britain received wild cards to the new-look event, and will compete along with the four 2018 semi-finalists and the 12 teams who win qu

### A more concise summary with ratio as 0.01

In [40]:
print 'summary:'
print summarize(sentences, ratio = 0.01, split = True)

2019-10-28 12:18:27,184 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2019-10-28 12:18:27,191 : INFO : built Dictionary(507 unique tokens: [u'coach', u'forget', u'celebr', u'focus', u'month']...) from 106 documents (total 1071 corpus positions)
2019-10-28 12:18:27,195 : INFO : Building graph
2019-10-28 12:18:27,196 : INFO : Filling graph
2019-10-28 12:18:27,227 : INFO : Removing unreachable nodes of graph
2019-10-28 12:18:27,229 : INFO : Pagerank graph
2019-10-28 12:18:27,266 : INFO : Sorting pagerank scores


summary:
['"I dont like being under that kind of pressure," Federer said of the deadline Kosmos handed him.Kei Nishikori will try to end his long losing streak in ATP finals and Kevin Anderson will go for his second title of the year at the Erste Bank Open on Sunday.']


### Different keywords identified by Gensim while generating Summary

In [42]:
print keywords(sentences)

federer
federation
federers
player
tennis players
finals
finally
anderson
nadal
nadals
nishikori
event
events
serve
indoors final
cup
playing
play
played
different
courts
court
weeks
week
time
times
point
points
like
copil
competitive
competition
titles
tour
tours
tournament
tournaments
career
seed
round
rounds
atps
atp
atmosphere
zverev
zverevs
world
beat
beating
losing
lose
open
opening
spaniard
match
matches
kosmos
ninth title
slam
draw
win
winning
nina
looks
look
masters
huge
davenport
roger
doesnt
moments
happy
storylines
youre
doubts
doubt
davis
winner
defending
new
major
big
qualifying
qualifier
qualify
quarter
november
maybe
received wild
del
year
years
thats
footballer
gerard
martin
second
said
