## Assignment 5
by Charlie Mei cm3947

Select any one article from your Webhose dataset, and write a Python program (in Jypyter Notebook) to perform the following operations on the body of the article:

- Extract and print subject-verb-object (SVO) relations from each sentence 
- Apply TextRank for ranking and selecting key phrases, print the result
- Apply LexRank to produce an extractive summary of 5 sentences.

In [1]:
# A toolkit of all functions learnt in class so far
import nlp_toolkit

from urllib import request
from bs4 import BeautifulSoup

from nltk import sent_tokenize

import spacy
from spacy.util import minibatch, compounding
from spacy.pipeline import SentenceSegmenter
from spacy.lang.en.stop_words import STOP_WORDS

from sumy.parsers.plaintext import PlaintextParser
from sumy.parsers.html import HtmlParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lex_rank import LexRankSummarizer

### Cleaning text data from URL

In [2]:
url = 'https://www.stuff.co.nz/entertainment/tv-radio/300026661/13-reasons-why-the-popular-netflix-shows-creator-teases-chance-of-a-hopeful-ending'

# Get all body text from the webpage
html = request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
data = soup.findAll(text=True)
text = nlp_toolkit.text_from_html(html)
print(text[:1000])

National World Business Climate Change Sport Entertainment Life & Style Homed Travel Motoring Stuff Nation Play Stuff Quizzes Politics Premium Well & Good Food & Wine Parenting Rugby Farming Technology Opinion Auckland Wellington Canterbury Waikato Bay of Plenty Taranaki Manawatu Nelson Marlborough Timaru Otago Southland Careers Advertising Contact Privacy © 2020 Stuff Limited Entertainment TV & Radio 13 Reasons Why: The popular Netflix show's creator teases chance of a hopeful ending 14:49, Jun 03 2020 Facebook Twitter Whats App Reddit Email NETFLIX The final season of 13 Reasons Why is out. The controversial 13  Reasons Why is returning for its fourth and final season on Netflix from Friday and creator Brian Yorkey has indicated there will be a hopeful ending. Adapted from Jay Asher's 2007 novel, the show was released on Netflix in 2017 and began with the first season focused on the death of Hannah Baker, a 17-year-old American high school student who


### Extracting SVO relations from each sentence

In [4]:
nlp = spacy.load('en_core_web_sm')

tok = nlp(text)
svos = nlp_toolkit.findSVOs(tok)
svos

[('suicide', 'set', 'tone'),
 ('netflix', 'edited', 'scene'),
 ('show', 'include', 'depression'),
 ('they', 'explore', 'themes'),
 ('cast', 'hold', 'tears'),
 ('foundation', 'urges', 'parents'),
 ('netflix', 'debuts', 'season'),
 ('we', 'follow', 'that'),
 ('yorkey', 'told', 'weekly'),
 ('covers', 'attracted', 'criticism'),
 ('glorifies', 'puts', 'people'),
 ('we', 'end', 'series'),
 ('it', 'earned', 'hope'),
 ('yorkey', 'told', 'weekly'),
 ('we', 'infuse', 'it'),
 ('ability', 'survive', 'moments'),
 ('ability', 'survive', 'ability'),
 ('we', 'achieved', 'that'),
 ('she', 'avoided', 'life'),
 ('robinson', 'found', 'dead'),
 ('speech', 'flouts', 'rules'),
 ('deal', 'mean', 'quiz'),
 ('harry', 'close', 'charity')]

### Applying TextRank

In [5]:
keyextractor = nlp_toolkit.TextRank4Keyword()
keyextractor.analyze(text, candidate_pos=['NOUN', 'PROPN', 'ADP'], window_size=8)
keyextractor.get_keywords(20)

season - 3.4231083734700563
Netflix - 3.260140734440743
series - 2.803306912490849
suicide - 2.749014050157208
Stuff - 2.4637979443009472
death - 2.0421209565412233
life - 1.9947319913216641
Entertainment - 1.9766971899081658
hope - 1.9306290719787125
Hannah - 1.8108652138618
Yorkey - 1.8025878973880944
Careers - 1.6027340917248196
Life - 1.4700500809308923
ending - 1.4406588958037738
Nation - 1.2714430964469017
Neighbourly - 1.2709033061909092
Death - 1.2644049827550856
Notices - 1.2610329251252157
Coupons - 1.2604468685313464
Baker - 1.2492870731489245
charity - 1.1947775877015232
Advertising - 1.1816943928705603


### LexRank Extractive Summary

In [6]:
textsummary = nlp_toolkit.TextSummary(text, num_sents=5)
textsummary.output()

'The controversial 13  Reasons Why is returning for its fourth and final season on Netflix from Friday and creator Brian Yorkey has indicated there will be a hopeful ending.NETFLIX Season four of 13 Reasons Why is out on Netflix on Friday.READ MORE: * 13 Reasons Why is coming to an end, and the cast can’t hold back tears * After Life: Ricky Gervais’ Netflix show speaks to Covid-19 life like nothing else * Mental Health Foundation urges parents to school up on suicide, as Netflix debuts a new season of 13 Reasons Why  "Our North Star has always been that the inciting incident of the whole series is Hannah’s death and the tapes she leaves behind, and so we want to follow that to its logical conclusion and I think and hope that’s what we do in season 4," Yorkey told Entertainment Weekly ."The series was born in darkness and as is often pointed out, it is a dark series, but we have always tried to infuse it with hope and with humor where we can, and we wanted to end on a note of hope that 