<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#The-dataset..." data-toc-modified-id="The-dataset...-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>The dataset...</a></span></li><li><span><a href="#Import-required-modules" data-toc-modified-id="Import-required-modules-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Import required modules</a></span></li></ul></div>

In [1]:
# API key = cd716c442d1f48408aafb24eca509e2f

## The dataset...

In this notebook, we won't be working with a pre-existing dataset, we'll be building one ourselves! We will be using an API and some webscraping to create a pandas DataFrame containing our online articles, and the text content that we wish to do some text-mining with. If that sounds weird, fear not, I'll explain as we go along.

In [38]:
# Below I've commented out some code that I no longer need, as I have already installed these packages
# Go ahead and uncomment this code and make sure you install each one before you begin

# !pip install pandas
# !pip install newsapi-python
# !pip install datetime
# !pip install requests
# !pip install bs4

## Import required modules

The first package 'newsapi' requires you to go to the following link https://newsapi.org/ and create a free account. You can then generate your API key, which you will need to locate your news articles.

In [39]:
import nltk
from nltk import word_tokenize
 # nltk stands for natural language tool kit and is useful for text-mining

from newsapi import NewsApiClient
# NewsAPI package allows us to use their API which returns search results for news articles

import pandas as pd
# includes useful functions for manipulating data 

import re
# re is for regular expressions, which we use later 

import datetime as dt
# supplies classes for manipulating dates and times

import requests
# allows us to send HTTP requests using Python - we will need this for webscraping!

from bs4 import BeautifulSoup
# contains functions that help us pull content from webpages and save that info into something more readable
# ...again, important for webscraping!

## Webscraping and APIs...

Webscraping refers to the extraction of data from a website, 

In [5]:
# Init
newsapi = NewsApiClient(api_key='cd716c442d1f48408aafb24eca509e2f')

In [6]:
# Instance of NewsApiClient class... it has 3 methods 
newsapi

<newsapi.newsapi_client.NewsApiClient at 0x7fbcda4d95b0>

In [7]:
top = newsapi.get_top_headlines(sources='bbc-news')

In [8]:
top.keys()

dict_keys(['status', 'totalResults', 'articles'])

In [9]:
headlines = []
source = []
date = []
url = []

for i in top.get("articles"):
# Want: title, author, source, date publishedAt
    headlines.append(i.get("title"))
    source_dict = i.get("source")
    source.append(source_dict.get("Name"))
    date.append(i.get("publishedAt"))
    
    
headlines
source
date

['2022-12-05T13:52:23.5536744Z',
 '2022-12-05T12:52:20.8966665Z',
 '2022-12-05T12:37:22.3970926Z',
 '2022-12-05T12:22:34.3492563Z',
 '2022-12-05T12:22:22.615995Z',
 '2022-12-05T10:07:22.3964855Z',
 '2022-12-05T09:52:22.4439306Z',
 '2022-12-05T08:52:20.8082457Z',
 '2022-12-05T06:07:23.2919233Z',
 '2022-12-05T04:22:21.5466845Z']

In [10]:
d = {'Headlines': headlines, 'Source': source, 'Date': date}
df = pd.DataFrame(data=d)
df

Unnamed: 0,Headlines,Source,Date
0,"The US' 2,000-year-old mystery mounds",BBC News,2022-12-05T13:52:23.5536744Z
1,Prince Harry says 'it's a dirty game' in new N...,BBC News,2022-12-05T12:52:20.8966665Z
2,China Covid: Xi's face-saving exit from his si...,BBC News,2022-12-05T12:37:22.3970926Z
3,US midterms: Georgia Senate run-off looms afte...,BBC News,2022-12-05T12:22:34.3492563Z
4,Kennedy Center Honours: Julia Roberts turns Ge...,BBC News,2022-12-05T12:22:22.615995Z
5,Explosions hit two military airfields in Russi...,BBC News,2022-12-05T10:07:22.3964855Z
6,Oxford word of the year 2022 revealed as 'gobl...,BBC News,2022-12-05T09:52:22.4439306Z
7,Tasmanian tiger: Remains of last thylacine fou...,BBC News,2022-12-05T08:52:20.8082457Z
8,Haiti: Inside the capital city taken hostage b...,BBC News,2022-12-05T06:07:23.2919233Z
9,Ukraine war: Oil prices rise as cap on Russian...,BBC News,2022-12-05T04:22:21.5466845Z


In [11]:
df['Date'] = df['Date'].str.extract(r'(\d{4}-\d{2}-\d{2})')
df

Unnamed: 0,Headlines,Source,Date
0,"The US' 2,000-year-old mystery mounds",BBC News,2022-12-05
1,Prince Harry says 'it's a dirty game' in new N...,BBC News,2022-12-05
2,China Covid: Xi's face-saving exit from his si...,BBC News,2022-12-05
3,US midterms: Georgia Senate run-off looms afte...,BBC News,2022-12-05
4,Kennedy Center Honours: Julia Roberts turns Ge...,BBC News,2022-12-05
5,Explosions hit two military airfields in Russi...,BBC News,2022-12-05
6,Oxford word of the year 2022 revealed as 'gobl...,BBC News,2022-12-05
7,Tasmanian tiger: Remains of last thylacine fou...,BBC News,2022-12-05
8,Haiti: Inside the capital city taken hostage b...,BBC News,2022-12-05
9,Ukraine war: Oil prices rise as cap on Russian...,BBC News,2022-12-05


In [12]:
# Not restricting by date - can't do this unless I have a paid plan
# Can only get articles from 31st October 

womens = newsapi.get_everything(q = "women super league AND WSL", language = 'en')

In [13]:
womens

{'status': 'ok',
 'totalResults': 0,
 'articles': [{'title': 'Women’s Super League: talking points from the weekend’s action',
   'author': 'Suzanne Wrack, Sophie Downey and Sarah Rendell',
   'source': {'Id': None, 'Name': 'The Guardian'},
   'publishedAt': '2022-11-21T12:01:31Z',
   'url': 'https://www.theguardian.com/football/2022/nov/21/womens-super-league-talking-points-from-the-weekends-action'},
  {'title': 'Leicester’s Ashleigh Plumptre: ‘I love everything about being in Nigeria’',
   'author': 'Ella Braidwood',
   'source': {'Id': None, 'Name': 'The Guardian'},
   'publishedAt': '2022-11-17T11:33:58Z',
   'url': 'https://www.theguardian.com/football/2022/nov/17/leicester-ashleigh-plumptre-nigeria'},
  {'title': 'Shaw stars for Man City in WSL win at Everton',
   'author': None,
   'source': {'Id': 'bbc-news', 'Name': 'BBC News'},
   'publishedAt': '2022-11-19T17:01:41Z',
   'url': 'https://www.bbc.co.uk/sport/football/63606404'},
  {'title': 'Man City win keeps pressure on WSL

In [14]:
headlines = []
author = []
source = []
date = []
url = []

for i in womens.get("articles"):
    headlines.append(i.get("title"))
    author.append(i.get("author"))
    source_dict = (i.get("source"))
    source.append(source_dict.get("Name"))
    date.append(i.get("publishedAt"))
    url.append(i.get("url"))

url

['https://www.theguardian.com/football/2022/nov/21/womens-super-league-talking-points-from-the-weekends-action',
 'https://www.theguardian.com/football/2022/nov/17/leicester-ashleigh-plumptre-nigeria',
 'https://www.bbc.co.uk/sport/football/63606404',
 'https://www.bbc.co.uk/sport/football/63771047',
 'https://www.bbc.co.uk/sport/football/63612799',
 'https://www.bbc.co.uk/sport/football/63765147',
 'https://www.bbc.co.uk/sport/football/63697262',
 'https://www.bbc.co.uk/sport/football/63606405',
 'https://www.bbc.co.uk/sport/football/63765142',
 'https://www.bbc.co.uk/sport/football/63443688',
 'https://www.skysports.com/football/news/35730/12746755/polly-bancroft-exclusive-manchester-united-women-ahead-of-curve-with-new-head-of-womens-football-role',
 'https://www.bbc.co.uk/sport/football/63691567',
 'https://www.bbc.co.uk/sport/football/62784258',
 'https://www.skysports.com/football/news/11095/12748880/khadija-bunny-shaw-how-the-strikers-ruthless-scoring-streak-is-helping-mancheste

In [15]:
d = {'Headlines': headlines, 'Source': source, 'Author': author, 'Date': date, 'Link': url}
df = pd.DataFrame(data=d)
df['Date'] = df['Date'].str.extract(r'(\d{4}-\d{2}-\d{2})')
df

Unnamed: 0,Headlines,Source,Author,Date,Link
0,Women’s Super League: talking points from the ...,The Guardian,"Suzanne Wrack, Sophie Downey and Sarah Rendell",2022-11-21,https://www.theguardian.com/football/2022/nov/...
1,Leicester’s Ashleigh Plumptre: ‘I love everyth...,The Guardian,Ella Braidwood,2022-11-17,https://www.theguardian.com/football/2022/nov/...
2,Shaw stars for Man City in WSL win at Everton,BBC News,,2022-11-19,https://www.bbc.co.uk/sport/football/63606404
3,Man City win keeps pressure on WSL top three,BBC News,,2022-12-04,https://www.bbc.co.uk/sport/football/63771047
4,Liverpool claim last-gasp WSL draw at Brighton,BBC News,,2022-11-20,https://www.bbc.co.uk/sport/football/63612799
...,...,...,...,...,...
69,"Sunderland Women fall to 3-0 defeat, but leave...",SB Nation,Rich Speight,2022-11-28,https://rokerreport.sbnation.com/2022/11/28/23...
70,Lasses Fan Focus: We chat to the MCWFC OSC to ...,SB Nation,CharlottePatterson,2022-11-27,https://rokerreport.sbnation.com/2022/11/27/23...
71,Wubben-Moy and FA Womens Boss pressing Govt fo...,Just Arsenal News,Admin Pat,2022-11-10,https://www.justarsenal.com/wubben-moy-and-the...
72,On This Day (22 November 2009): Nobbs complete...,SB Nation,Rich Speight,2022-11-22,https://rokerreport.sbnation.com/2022/11/22/23...


In [16]:
headings = []
text = []

def get_page(url):

    print("URL:", url)

    response = requests.get(url)
    soup = BeautifulSoup(response.text, features='lxml')
#     head = soup.find('h1', {"id": "main-heading"}) or soup.find('div', {'class': 'article-headline__text b-reith-sans-font b-font-weight-300'}) or soup.find('title')
#     heading = head.string
#     headings.append(heading)
#     print("HEADING", heading)
    
    blocks = []
#     for block in soup.find_all('div', {"data-component": "text-block"}) or soup.find_all('div', {"class": "body-text-card b-reith-sans-font"}) or soup.find_all('div', {"class": "ssrcss-1n5sg88-StyledSummary elwf6ac3"}) or soup.find_all('h3', {"class": "lx-stream-post__header-title gel-great-primer-bold qa-post-title gs-u-mt0 gs-u-mb-"}):
    for block in soup.find_all('p'):
        blocks.append(block.getText())
    print("BLOCKS", blocks)
    
    text.append(blocks)
   
    return text

In [17]:
for i in url:
    text = get_page(i)
    print("TEXT LENGTH:", len(text))
    print("")

URL: https://www.theguardian.com/football/2022/nov/21/womens-super-league-talking-points-from-the-weekends-action
BLOCKS ['Arsenal’s winning run was ended in dramatic style at the Emirates while Brighton and Liverpool also produced a thriller', 'Manchester United’s second win against one of Arsenal, Chelsea and Manchester City, and first away win against any of the traditional top three, was significant. The 3-2 defeat of Arsenal at the Emirates spoke to a real shift in the resilience of Marc Skinner’s side. Having taken the lead, United’s collapse early in the second half, conceding twice, looked to be following a familiar pattern. Instead, in front of an impressive away end, United roared back to equalise and then win it in injury time. Skinner called it a full Manchester United performance. “Our job is to bring the women under that same banner of: we might go down but we’re never beaten,” he said. They were helped by some poor defending for Katie Zelem’s lethal set-pieces, which pro

BLOCKS ['', "Last updated on 19 November 202219 November 2022.From the section Women's Football", 'Khadija Shaw is the "focal point" of the Manchester City team, said manager Gareth Taylor after the forward bagged her eighth goal of the Women\'s Super League campaign City\'s victory over Everton.', 'Julie Blakstad opened the scoring but it was a good piece of work from Shaw, who picked up a looped ball from Leila Ouahabi as Everton goalkeeper Emily Ramsey came out of her goal to meet Shaw, leaving an empty net for the Jamaican to square the ball to Blakstad who slotted home.', '"She does so much more than score goals, the way she takes the hits as well as giving the hits out," said Taylor.', '"She\'s also very strong and she\'s looking a lot fitter. She\'s found a rhythm now in knowing she\'s the focal point of the team."', 'Everton equalised from a corner when a scramble in the box saw former Toffees keeper Sandy MacIver try to punch the ball away, but it fell straight to Rikke Seveck

BLOCKS ['', "Last updated on 20 November 202220 November 2022.From the section Women's Footballcomments32", 'Rachel Furness headed a 92nd-minute equaliser as Liverpool ended their six-match losing run in a thriller at Brighton.', "England newcomer Katie Robinson's stunning long-range strike for the hosts was the pick of the goals as Elisabeth Terland and Danielle Carter also netted after Missy Bo Kearns' opener for Liverpool during the first half.", "Shanice van de Sanden's half-time introduction proved a turning point for Liverpool, nodding in with 14 minutes remaining and setting up Furness' last-gasp leveller.", 'In the 42nd match of the campaign, the first top-flight game of 2022-23 to end without a victory leaves Liverpool in 10th place, three points above bottom side Leicester.', 'Brighton are three points further clear in ninth and remain unbeaten in two matches under interim head coach Amy Merricks.', "With only one more goal than lowest WSL scorers Leicester at the start of th

BLOCKS ['', "Last updated on 20 November 202220 November 2022.From the section Women's Football", "Elsie, nine, had travelled up from near Brighton with her parents Rob and Vicky and big brother Olly. She clutched a sign asking her favourite player Lauren James for her shirt, and couldn't wait to watch her first football match.", 'Speaking to BBC Sport outside Stamford Bridge before Chelsea and Tottenham\'s Women\'s Super League match on Sunday, Vicky said: "We watched all the women\'s matches through the Euros, they\'re both into football a lot.', '"I just think it\'s a great day out, the women\'s game is really escalating and we want to be a part of that. We thought it would be a great occasion.', '"We just went to the [club] museum, saw the pictures of all the Champions League matches - we\'ll definitely be coming for more whether here or at their usual playing ground."', "What Chelsea - and English women's football at large - will hope for is that fans like Vicky and Elsie will com

BLOCKS ['', "Last updated on 3 December 20223 December 2022.From the section Women's Football", "Chelsea made light work of bottom side Leicester to maintain their impressive Women's Super League title defence at the King Power Stadium.", 'The Blues were five goals up at the break thanks to Guro Reiten, Fran Kirby, Jessie Fleming, Niamh Charles and Sam Kerr.', 'Fleming then grabbed her second and substitute Beth England headed home number seven.', 'Kirby added an eighth on a ruthless afternoon for the leaders.', 'It took just four minutes for Chelsea to take the lead through Reiten, who slid the ball into the bottom corner after clever link-up play with Kirby.', 'Reiten then got the first of four assists eight minutes later, slotting the ball into Fleming and leaving her with a simple finish from close range.', 'Fully in control, Chelsea began to exert themselves further, scoring three times in the final six minutes of the opening half.', 'Kirby got in on the act with a classy outside-

BLOCKS ['\n\n\n\nFootball\n                            \n', '"This club know how to do football," says Manchester United\'s new Head of Women’s Football, Polly Bancroft, in an exclusive interview with Sky Sports; watch Arsenal vs Man Utd live on Sky Sports Main Event on Saturday; kick-off 5.30pm ', "By Lynsey Hooper, Sky Sports' lead WSL reporter ", 'Saturday 19 November 2022 07:16, UK', "The appointment of Polly Bancroft as Head of Women’s Football at Manchester United has positioned her as one of the most powerful executives within the English women's game. ", 'In this unique, brand-new role, Bancroft will be responsible for helping take United to the very top of European football.', "Bancroft started the process to become Manchester United's first Head of Women's Football much like any other job application.", '"I saw the advert!" she told Sky Sports in an exclusive interview. "I was immediately interested in the scale of the role and the opportunity that this brand has, both in ter

BLOCKS ['', "Last updated on 24 November 202224 November 2022.From the section Women's Football", "Sanne Troelsgaard headed home an 89th-minute equaliser as Reading held Liverpool to continue the Reds' winless run in the Women's Super League.", "In a thrilling encounter Katie Stengel gave visitors Liverpool the lead before Tia Primmer's leveller on half-time.", 'Natasha Dowie put Reading in front, but Stengal and Rhiannon Roberts struck to put the Reds 3-2 up.', 'But Troelsgaard denied Liverpool a first WSL win since the opening day as her towering header was deflected in.', 'A draw saw both struggling sides stretch away from bottom club Leicester. Liverpool, in 10th place, are now five points clear of the Foxes, while second-bottom Reading have a four-point advantage.', 'But they will feel like they could have won this match, having each gone in front in the second half.', "The Reds dominated the opening exchanges and deservedly went ahead when Stengel headed in Melissa Lawley's cross

BLOCKS ['\nScheuer is understood to have visited the club this week\n', "Former Bayern Munich manager Jens Scheuer has emerged as a leading candidate for Women's Super League club Brighton & Hove Albion's vacant managerial post, Telegraph Sport can reveal.", 'Multiple sources have confirmed that Brighton have spoken to Scheuer about the possibility of the 44-year-old replacing former England head coach Hope Powell, who left the club just over a month ago after five years at the helm.', 'After multiple rounds of interviews, although a final decision has not yet been made, talks are said to be fairly\xa0advanced, and Scheuer is understood to have visited the club this week.', "The WSL side's search for a new manager has been extensive, with several candidates from across Europe also having been interviewed, including the current Linkopings head coach and former Finland manager Andree Jeglertz, who previously won the Uefa Women's Cup with Umea, while Brighton's interim head coach Amy Merr

BLOCKS ['\n\n\n\nFootball\n                            \n', 'Plus: Beating Lyon on her 27th birthday; the test her Dad made her take to become a goalkeeper; how she has improved since joining Arsenal; watch Arsenal vs Man Utd live on Sky Sports Premier League, Football and Main Event from 5.15pm; kick-off 5.30pm', '\n          Senior football journalist\n      ', 'Saturday 19 November 2022 18:59, UK', 'Please use Chrome browser for a more accessible video player', '', "It's hard not to enjoy any time spent with Manuela Zinsberger. One of Arsenal's bubbliest characters - across both men and women's teams - she speaks with passion, personality and expression on any question put to her.", "To top things off, she is having one of the best spells of her career, stretching back to the start of last season. Thirteen clean sheets in the Women's Super League saw her clinch the Golden Glove for the 2021/22 campaign, before setting a new league record when she kept 10 successive clean sheets this

BLOCKS ['\n\nLFC\n\n', "\n                                    Visit Match Centre to follow live coverage of Liverpool FC Women's Barclays Women's Super League meeting with Brighton & Hove Albion Women.\n                                ", 'Open the menu to access Match Centre now.']
TEXT LENGTH: 18

URL: https://www.telegraph.co.uk/football/2022/11/28/grass-roots-football-craves-euro-boost/
BLOCKS ['\nEngland’s win led to a flood of interest, but below the elite level, the game is battling\n', 'England’s Euros triumph at Wembley this summer provided a once-in-a generation chance to grow the women’s game. Four months on, the signs are that interest in women’s and girls’ football is continuing to soar, amid a flurry of record attendances across the pyramid. However, scratch beneath the surface and it is clear more needs to be done, with many issues still to be resolved, not least around facilities and accessibility.', 'As the Government embarks on a review of the entire women’s game, chai

BLOCKS ["This menu is keyboard accessible. To open a menu item's submenu, press the space bar. To close a submenu press the escape key.", 'Plenty is at stake as we look ahead to this weekend’s Women’s Super League fixture against bottom-of-the-table Leicester City.', 'Our Gunners head to the King Power Stadium in second place in the league, a game in hand over Chelsea but narrowly trailing Manchester United on goal difference. Victory on Sunday would be our fourteenth straight win in the league, breaking our own record that was set last weekend.\xa0', '\n\n\n\n\n\n\n', "It's been a disappointing start to the season for this weekend's opponents, who currently\xa0sit in last place and have yet to record a win or draw this season. However, with the exception of a 4-0 loss away to Man City, the Foxes have only lost narrowly to their league opponents on each occasion.\xa0", 'Their first victory of the campaign looked certain against Reading last Sunday\xa0and Natasha Flint’s goal would have

BLOCKS []
TEXT LENGTH: 22

URL: https://legacy.liverpoolfc.com/news/women/457189-brighton-3-3-lfc-women-furness-stoppage-time-goal-earns-dramatic-draw
BLOCKS ['\nSam Williams\n\n@SamWilIiams\n\n', '\n                                    Rachel Furness’ stoppage-time header salvaged a 3-3 draw for Liverpool FC Women at Brighton & Hove Albion Women.\n                                ', 'Missy Bo Kearns put the Reds ahead in Sunday’s Barclays Women’s Super League contest at Broadfield Stadium, but Brighton responded strongly and led 3-1 at the break.', 'Half-time substitute Shanice van de Sanden pulled one back for Matt Beard’s side and, as the clock ticked past 90 minutes, Taylor Hinds hit the crossbar from distance before Van de Sanden turned provider for Furness.', 'The Dutch forward surged down the right and sent in an excellent cross, which a stooping Furness planted home firmly to snatch an important point for the visitors.', 'Having absorbed an early spell of Brighton pressure, it wa

BLOCKS ['\nChris Shaw\n\n@__ChrisShaw\n\n', '\n                                    An impressive Liverpool FC Women performance delivered a 2-0 victory over West Ham United on Sunday afternoon.\n                                ', 'Goals from Ceri Holland and Katie Stengel highlighted a dominant Reds display during a first half at Prenton Park in which they created a host of chances.', 'A more even contest ensued past the break but Matt Beard’s charges saw it out to bag three points in the Barclays Women’s Super League.', 'It was an especially notable day for Reds captain Niamh Fahey too, the centre-back reaching the milestone of 100 appearances for the club.', 'Unbeaten in their last three matches coming into the game, Liverpool built on that recent momentum by taking a third-minute lead.', 'Holland pounced on hesitancy inside the West Ham penalty area from Gilly Flaherty’s forward pass and beat the outrushing goalkeeper to the ball, which she prodded high inside the left post for 1-0.

BLOCKS ['We use cookies and other tracking technologies to improve your browsing experience on our site, show personalized content and targeted ads, analyze site traffic, and understand where our audiences come from. To learn more or opt-out, read our Cookie Policy. Please also read our Privacy Notice and Terms of Use, which became effective December 20, 2019.', 'By choosing I Accept, you consent to our use of cookies and other tracking technologies.', 'Filed under:', 'Can Arsenal take advantage of Manchester United and Chelsea playing each other? ', 'Arsenal Women go to the King Power Stadium on Sunday to face Leicester City in the WSL. Arsenal know that their closest rivals, Manchester United and Chelsea, play each other on Sunday evening, and victory could give Arsenal a points gap with at least one time, with all 3 on 15 points. ', 'Leicester sacked first team manager Lydia Bedford on Friday afternoon, with Willie Kirk, the director of football, taking charge for the rest of the se

BLOCKS ['By Pa Sport Staff ', ' Published:  08:18, 7 November 2022   |  Updated:  09:52, 7 November 2022   ', '', ' 17', 'View  comments', '', "Chelsea hope manager Emma Hayes could return to the touchline for her side's next fixture against Tottenham on November 20.", 'Hayes, 46, is recovering from a hysterectomy and announced on October 13 that she would be taking temporary leave from her post.', "But following Chelsea's emphatic 3-1 Women's Super League win at Manchester United on Sunday evening, general manager Paul Green believes she is closing in on a comeback.", "Chelsea hope manager Emma Hayes could return to the touchline for her side's next fixture", 'The Chelsea women - who have been without Hayes - will on Tottenham on November 20', "Speaking to Sky Sports, Green, who has overseen first-team affairs alongside Hayes' assistant Denise Reddy, said: 'We go into the international break hopefully ready to welcome Emma Hayes back to the touchline for our next game, if everything g

BLOCKS ['30,000 ticket sold for Arsenal’s WSL clash with Manchester United at Emirates By Michelle', 'Arsenal Women will face Manchester United at Emirates Stadium on 19th November, in the next game of their winning WSL campaign.\xa0 This is their first game back after the international break.', 'Our Gunners are currently top of the Women’s Super League and top of their UEFA Women’s Champions League group.\xa0 Oh and they just happened to break another WSL record with 14 consecutive wins across 2021/22 season and the current 2022/23 season.', 'Arsenal are level on points with 2nd place Chelsea, who are of course the current WSL champions having pipped Arsenal at the post by one point to win the trophy last season.\xa0 BUT our Gunners have a better goal difference AND still have a game in hand over Chelsea.', 'Arsenal had an easy 4-0 win away at Leicester on Sunday afternoon and Chelsea defeated the League leaders Manchester United away 3-1 on Sunday evening, moving them up to 2nd spot 

BLOCKS ['We use cookies and other tracking technologies to improve your browsing experience on our site, show personalized content and targeted ads, analyze site traffic, and understand where our audiences come from. To learn more or opt-out, read our Cookie Policy. Please also read our Privacy Notice and Terms of Use, which became effective December 20, 2019.', 'By choosing I Accept, you consent to our use of cookies and other tracking technologies.', 'Filed under:', 'Arsenal host Manchester United at the Emirates—the biggest test so far this season', 'Arsenal Women host Manchester United on Saturday at the Emirates. Manchester United were top of the WSL until losing to Chelsea 3-1 two weeks ago, but this still represents Arsenal’s toughest test so far in the domestic campaign. ', 'Arsenal went top of the league with a 4-0 win against Leicester before the international break, and are ahead of Chelsea on goal difference, with Chelsea having played a game more. But matches after interna

BLOCKS ['Match Review: Comfortable 4-0 win to Arsenal Women away against Leicester By Michelle', 'Arsenal continued their perfect start to their Women’s Super League campaign by comfortably beating bottom of the league Leicester at the King Power Stadium and move back to the top of the table.', 'Our Gunners dominated from the start, with near misses in the first two minutes before Frida Maanum opened the scoring 13 minutes into the game with her fourth goal in five games.', 'In the driving rain, Caitlin Foord tapped home Beth Mead’s magnificently placed cross taking the Gunners to 2-0 at 22 minutes before Steph Catley scored directly from a corner to add a third for Arsenal before half-time.', 'A super strike from Stina Blackstenius in the second half sealed a sixth win in six WSL games for Arsenal Women – extending their consecutive wins to 14, in a remarkable winning streak dating streak crossing last season and this.', 'A rare Mead miss late on didn’t trouble Arsenal as they had eas

BLOCKS ['Confirmed Arsenal Team to face Manchester United Women at Emirates By Michelle', 'So today is the day!\xa0 Our Gunners are back together and back for their first WSL match after the international break!\xa0 And what a match this is going to be.\xa0 A real ‘clash of the titans’ in the Women’s Super League 2022-23 calendar.', 'We`ve looked at Players, Stats & Facts across the teams and we’ve listened to what boss Jonas Eidevall has had to say in his pre Manchester United Presser.', 'My starting eleven prediction is: Zinsberger, Weinrother, Catley, Beattie, Walti, Nobbs (C), McCabe, Maanum, Mead, Blackstenius, Foord.', 'Yes, that’s right.\xa0 Nobbs to skipper Arsenal once again as Kim Little recovers.\xa0 I’m not sure about Beattie starting, that may be Wubben-Moy.\xa0 It depends if she’s really fully on form after injury.\xa0 Take a look at Jonas’ confirmed team below.', 'Anyway, enough of my rambling.\xa0 Now, with only an hour to go until kick-off it’s time to get comfy and se

BLOCKS ['Will Leah Williamson & Lina Hurtig join Arsenal Women to take on Everton? by Michelle', 'After losing to Manchester United in their last Women’s Super League match, Arsenal are back in WSL action this Saturday as they take on Everton.\xa0 The big question is: who will Jonas Eidevall have available to draw on for his starting X1.', 'Arsenal have been suffering deeply with a raft of injured players in recent weeks.\xa0 We are currently still counting out Captain Kim Little with her knee injury, as she is not expected to return until into the New Year.\xa0 Our beloved Beth Mead is obviously out much longer term, with her significant ACL injury.\xa0 We are also counting out Rafaelle Souza (metatarsal bone injury) and Teyah Goldie (another victim of an ACL injury) with no update issued by Arsenal.', 'So with 6 players currently injured and the four above not expected to return in the short term, could Leah Williamson and Lina Hurtig we match-fit and ready to get back in the fray?\x

BLOCKS ['Frida Maanum voted Arsenal Women’s October Player of the Month By Michelle', 'Frida Maanum finished Arsenal’s poll with 45 per cent of the votes cast, followed by Jordan Nobbs in 2nd place and Lotte Wubben-Moy in 3rd.', 'Congratulations to Frida, who has had an exceptional start to the 2022/23 season.', 'Arsenal’s young midfielder (she’s only 23 years old) picked up her first start of the season in the Champions League away against Lyon in October, after spending the start of the season on the bench and delivered an outstanding individual performance our Gunners came out as 5-1 winners against the current champions – and, since that game, she’s remained a key part of Arsenal’s starting XI, knocking Miedema onto the bench and she remains in fine form.', 'Maanum opened the scoring against the French giants and also provided an assist for Beth Mead as she picked up the Player of the Match award, before going on to score our second goal in our 2-0 victory over Liverpool.', 'This w

BLOCKS ['Arsenal Women boss Eidevall confirms return of Leah Williamson & Rafaelle Souza, ready for Everton clash By Michelle', 'Jonas Eidevall held his pre-match press conference today ahead of Arsenal’s WSL match against Everton on Saturday at Meadow Park, kick-off 2pm UK.', 'Arsenal have been devastated by injuries over recent weeks, Eidevall being reduced to a choice of 15 first team players after Beth Mead’s ACL injury.\xa0 Eidevall discussed player returns from injury, with some very exciting news!', 'Good news on the injury front?', 'Yeah, it’s been good. \xa0We have Leah and Rafa, who are able to go back into the matchday squad here tomorrow, so that’s obviously pleasing for us. \xa0We’re starting to get players back from injuries and improving numbers again in the squad, so looking forward to that.', 'Will Leah and Rafa will both start tomorrow?', 'They’re in the matchday squad. \xa0So starting XI, everyone will know tomorrow. ', 'How much of a boost does their return have on 

BLOCKS ["Gab Marcotti and Craig Burley praise Olivier Giroud after he became France's all-time top men's scorer. (2:17)", "BIRKENHEAD, England -- Liverpool move up to ninth in the Women's Super League with their first win of the season since opening day, a 2-0 win over West Ham on a wintery Sunday afternoon on The Wirral.", "A miscommunication between West Ham midfielders Kate Longhurst and Honoka Hayashi gifted Liverpool's Ceri Holland an early goal when she picked up the ball after the pair had gotten in each other's way in the box. Under little pressure, Holland let a shot fly into the far side of goalkeeper Mackenzie Arnold's goal in the third minute.", "Struggling without the ball, the Hammers continued to tangle themselves into knots at the back and were soon chasing a two-goal deficit after Liverpool's Katie Stengel claimed her ninth goal of the season with a low strike in the 20 minute.", 'Despite second half adjustments that both strengthened their defence and helped their att

BLOCKS ['We use cookies and other tracking technologies to improve your browsing experience on our site, show personalized content and targeted ads, analyze site traffic, and understand where our audiences come from. To learn more or opt-out, read our Cookie Policy. Please also read our Privacy Notice and Terms of Use, which became effective December 20, 2019.', 'By choosing I Accept, you consent to our use of cookies and other tracking technologies.', 'Filed under:', 'Get your Friday morning headlines here.', 'If you buy something from an SB Nation link, Vox Media may earn a commission. See our ethics statement.', 'The Men’s side won’t play a match until December 22, but the Manchester City Women are back in WSL action tomorrow away to Everton. Sky Blue News is here with all the latest to get you ready.', 'Haaland has had a sensational start to life at City, scoring 23 goals in 18 matches which includes 18 in 13 Premier League outings. The attacker will be resting during the World Cup

BLOCKS ['By Kathryn Batte For Mailonline ', ' Published:  11:39, 26 November 2022   |  Updated:  11:45, 26 November 2022   ', '', ' 17', 'View  comments', '', 'It has been just over a year since Newcastle United became the richest club in the world. On day one, there was talk of the trophies, the Champions League and big-money transfers.\xa0', 'But also on the agenda, was a commitment to supporting the women’s team. Chief executive Amanda Staveley outlined how the new owners were ‘wholeheartedly committed to women’s and girls football.’\xa0', 'Amid the controversy of the Saudi-backed takeover, it was perhaps an easy win. A PR stunt, some may say. But as Sportsmail speaks to women’s head coach Becky Langley at the club’s Academy training centre, it is clear this is no vanity project.', "Newcastle United's Saudi-backed takeover has led to new investment in the side's women's team, with Amanda Staveley particularly interested in their success", '‘When the owners first came in, everyone wa

BLOCKS ['Becky Thompson and Sophie Lawson reveal the forward lines of their WSL all star teams. (1:30)', "LONDON -- WSL league leaders Arsenal saw their 14-match winning streak come to an end at home to Manchester United after a pair of late strikes from the visitors. In flying form coming into the match, Arsenal failed to find their better football and went in at the break a goal down thanks to Ella Toone's late first half effort.", '- Report: Arsenal 2-3 Manchester United | WSL table | Upcoming fixtures', "Not on the back foot for long, the hosts equalised early in the second half thanks to Frida Maanum's deflected drive before Laura Wienroither gave them the lead. Lively throughout the match, Untied refused to be beaten and found their own equaliser through Millie Turner at a late free-kick before Alessia Russo completed the breathless turnaround with her own header at set piece in the dying seconds of regular time.", 'JUMP TO: Player ratings | Best/worst performers | Highlights and

BLOCKS ['We use cookies and other tracking technologies to improve your browsing experience on our site, show personalized content and targeted ads, analyze site traffic, and understand where our audiences come from. To learn more or opt-out, read our Cookie Policy. Please also read our Privacy Notice and Terms of Use, which became effective December 20, 2019.', 'By choosing I Accept, you consent to our use of cookies and other tracking technologies.', 'Filed under:', 'Sunderland’s defeat to Man City on Sunday in the Conti Cup wasn’t a significant result, but it felt symbolic, like the passing of a torch between generations.', 'While our owners schmoosed with Sheikhs and sunned themselves in Dubai, the past and the future of English women’s football in the north east and Manchester clashed.', 'Sunderland Women did our club proud against the big guns of Man City, working tirelessly to ensure that a routine home Continental Cup win for the WSL side, didn’t turn into a drubbing. ', 'The d

BLOCKS ['We use cookies and other tracking technologies to improve your browsing experience on our site, show personalized content and targeted ads, analyze site traffic, and understand where our audiences come from. To learn more or opt-out, read our Cookie Policy. Please also read our Privacy Notice and Terms of Use, which became effective December 20, 2019.', 'By choosing I Accept, you consent to our use of cookies and other tracking technologies.', 'Filed under:', 'Sunderland AFC beat the Gunners twice in two days, with Jordan Nobbs and Darren Bent taking the plaudits. ', 'When Sunderland have both of our men’s and women’s teams in the top tier of English football, you know all is right with the world.', 'In November 2009, the Lads were having a very decent season indeed, and after a 1-0 win over Arsenal at the Stadium of Light, Steve Bruce’s side sat eighth in the Premier League. ', 'Darren Bent had grabbed the goal, supported in a hard-working midfield by the likes of Steed Malbr

In [18]:
for i in text:
    "".join(i)

In [19]:
text = ["".join(i) for i in text]

In [20]:
df['Content'] = text

In [21]:
df

Unnamed: 0,Headlines,Source,Author,Date,Link,Content
0,Women’s Super League: talking points from the ...,The Guardian,"Suzanne Wrack, Sophie Downey and Sarah Rendell",2022-11-21,https://www.theguardian.com/football/2022/nov/...,Arsenal’s winning run was ended in dramatic st...
1,Leicester’s Ashleigh Plumptre: ‘I love everyth...,The Guardian,Ella Braidwood,2022-11-17,https://www.theguardian.com/football/2022/nov/...,The defender on playing for the club in her he...
2,Shaw stars for Man City in WSL win at Everton,BBC News,,2022-11-19,https://www.bbc.co.uk/sport/football/63606404,Last updated on 19 November 202219 November 20...
3,Man City win keeps pressure on WSL top three,BBC News,,2022-12-04,https://www.bbc.co.uk/sport/football/63771047,Last updated on 4 December 20224 December 2022...
4,Liverpool claim last-gasp WSL draw at Brighton,BBC News,,2022-11-20,https://www.bbc.co.uk/sport/football/63612799,Last updated on 20 November 202220 November 20...
...,...,...,...,...,...,...
69,"Sunderland Women fall to 3-0 defeat, but leave...",SB Nation,Rich Speight,2022-11-28,https://rokerreport.sbnation.com/2022/11/28/23...,We use cookies and other tracking technologies...
70,Lasses Fan Focus: We chat to the MCWFC OSC to ...,SB Nation,CharlottePatterson,2022-11-27,https://rokerreport.sbnation.com/2022/11/27/23...,We use cookies and other tracking technologies...
71,Wubben-Moy and FA Womens Boss pressing Govt fo...,Just Arsenal News,Admin Pat,2022-11-10,https://www.justarsenal.com/wubben-moy-and-the...,Arsenal & Lioness Wubben-Moy and FA Director o...
72,On This Day (22 November 2009): Nobbs complete...,SB Nation,Rich Speight,2022-11-22,https://rokerreport.sbnation.com/2022/11/22/23...,We use cookies and other tracking technologies...


In [22]:
# First I'll create a new column called 'tokenised_words'

# df['tokenised_words'] = df.apply(lambda row: nltk.word_tokenize(row['Content']), axis = 1)
# df['tokenised_words'] = df['Content'] = df.apply(lambda row: nltk.word_tokenize(row['Content']), axis = 1)
# apply - used to apply a function along an axis of the DataFrame: i.e, axis 1
# lambda - anonymous function (no name) that can take any number of arguments
# lambda ensures that the function tokenize is applied to every ROW in the text column


In [28]:
df

Unnamed: 0,Headlines,Source,Author,Date,Link,Content,Quotes,tokenised_words
0,Women’s Super League: talking points from the ...,The Guardian,"Suzanne Wrack, Sophie Downey and Sarah Rendell",2022-11-21,https://www.theguardian.com/football/2022/nov/...,Arsenal’s winning run was ended in dramatic st...,,"[Arsenal, ’, s, winning, run, was, ended, in, ..."
1,Leicester’s Ashleigh Plumptre: ‘I love everyth...,The Guardian,Ella Braidwood,2022-11-17,https://www.theguardian.com/football/2022/nov/...,The defender on playing for the club in her he...,,"[The, defender, on, playing, for, the, club, i..."
2,Shaw stars for Man City in WSL win at Everton,BBC News,,2022-11-19,https://www.bbc.co.uk/sport/football/63606404,Last updated on 19 November 202219 November 20...,focal point,"[Last, updated, on, 19, November, 202219, Nove..."
3,Man City win keeps pressure on WSL top three,BBC News,,2022-12-04,https://www.bbc.co.uk/sport/football/63771047,Last updated on 4 December 20224 December 2022...,explosive,"[Last, updated, on, 4, December, 20224, Decemb..."
4,Liverpool claim last-gasp WSL draw at Brighton,BBC News,,2022-11-20,https://www.bbc.co.uk/sport/football/63612799,Last updated on 20 November 202220 November 20...,explosive,"[Last, updated, on, 20, November, 202220, Nove..."
...,...,...,...,...,...,...,...,...
69,"Sunderland Women fall to 3-0 defeat, but leave...",SB Nation,Rich Speight,2022-11-28,https://rokerreport.sbnation.com/2022/11/28/23...,We use cookies and other tracking technologies...,,"[We, use, cookies, and, other, tracking, techn..."
70,Lasses Fan Focus: We chat to the MCWFC OSC to ...,SB Nation,CharlottePatterson,2022-11-27,https://rokerreport.sbnation.com/2022/11/27/23...,We use cookies and other tracking technologies...,,"[We, use, cookies, and, other, tracking, techn..."
71,Wubben-Moy and FA Womens Boss pressing Govt fo...,Just Arsenal News,Admin Pat,2022-11-10,https://www.justarsenal.com/wubben-moy-and-the...,Arsenal & Lioness Wubben-Moy and FA Director o...,,"[Arsenal, &, Lioness, Wubben-Moy, and, FA, Dir..."
72,On This Day (22 November 2009): Nobbs complete...,SB Nation,Rich Speight,2022-11-22,https://rokerreport.sbnation.com/2022/11/22/23...,We use cookies and other tracking technologies...,,"[We, use, cookies, and, other, tracking, techn..."


In [27]:
df['tokenised_words'] = df.apply(lambda row: nltk.word_tokenize(row['Content']), axis = 1)

In [34]:
df['Quotes'] = df['Content'].str.extract(r'"([^"]*)"')
df['She_said'] = df['Content'].str.extract(r'([^.]* she said|says [^.]*\.)')

In [35]:
df[df.She_said.notnull()]

Unnamed: 0,Headlines,Source,Author,Date,Link,Content,Quotes,tokenised_words,She_said
1,Leicester’s Ashleigh Plumptre: ‘I love everyth...,The Guardian,Ella Braidwood,2022-11-17,https://www.theguardian.com/football/2022/nov/...,The defender on playing for the club in her he...,,"[The, defender, on, playing, for, the, club, i...",says of learning about her sister’s experiences.
6,Bumper weekend shows WSL must grow to keep up,BBC News,,2022-11-20,https://www.bbc.co.uk/sport/football/63697262,Last updated on 20 November 202220 November 20...,We watched all the women's matches through the...,"[Last, updated, on, 20, November, 202220, Nove...",Speaking before their 3-1 defeat to Aston Vill...
10,Will Bancroft appointment give Man Utd Women c...,Sky Sports,"Lynsey Hooper, Sky Sports' lead WSL reporter",2022-11-16,https://www.skysports.com/football/news/35730/...,\n\n\n\nFootball\n ...,"This club know how to do football,","[Football, '', This, club, know, how, to, do, ...",says Manchester United's new Head of Women’s F...
13,How Shaw's unique upbringing has created a rut...,Sky Sports,Laura Hunter,2022-11-17,https://www.skysports.com/football/news/11095/...,\n\n\n\nFootball\n ...,"I could be the one to change football in Jamaica,","[Football, Striker, Khadija, Shaw, is, from, h...","""It was almost like a waste of time,"" she said"
16,"Zinsberger: Becoming a goalkeeper, Arsenal's t...",Sky Sports,Charlotte Marsh,2022-11-18,https://www.skysports.com/football/news/35730/...,\n\n\n\nFootball\n ...,"At the moment I got the Golden Glove, we figur...","[Football, Plus, :, Beating, Lyon, on, her, 27...","When we arrived there, the boys started pract..."
18,Grass-roots football craves more than a Euro b...,Telegraph.co.uk,"Tom Garry, Molly McElwee",2022-11-28,https://www.telegraph.co.uk/football/2022/11/2...,"\nEngland’s win led to a flood of interest, bu...",,"[England, ’, s, win, led, to, a, flood, of, in...",says women’s football tickets are “too cheap” ...
20,Exclusive: Arsenal want women's side to play e...,Telegraph.co.uk,Tom Garry,2022-11-19,https://www.telegraph.co.uk/football/2022/11/1...,\nChief executive Vinai Venkatesham says the c...,sense of responsibility,"[Chief, executive, Vinai, Venkatesham, says, t...",says the club wants to be at 'the forefront of...
24,Sweetman-Kirk demands change on diversity in w...,Sky Sports,Dev Trehan & Anton Toloui,2022-11-15,https://www.skysports.com/football/news/11095/...,\n\n\n\nFootball\n ...,probably a bit ill-informed and a bit uneducated.,"[Football, Courtney, Sweetman-Kirk, challenges...","says football must show ""a genuine desire for ..."
34,Arsenal Debate: How should the Women handle th...,Just Arsenal News,Michelle,2022-11-22,https://www.justarsenal.com/womens-super-leagu...,Women’s Super League: with growing crowds come...,,"[Women, ’, s, Super, League, :, with, growing,...","Reading boss Kelly Chambers, speaking before t..."
56,Ex Arsenal forward Nikita Parris ‘can’t wait’ ...,Just Arsenal News,Michelle,2022-11-19,https://www.justarsenal.com/ex-arsenal-forward...,Ex Arsenal forward Nikita Parris ‘can’t wait’ ...,,"[Ex, Arsenal, forward, Nikita, Parris, ‘, can,...",says the Reds are eager for the opportunity to...
