# 1. Introduction <a id='Introduction'></a>

<b>What is the Premier League?</b>

The Premier League (often referred to as the English Premier League or the EPL outside England) is the top level of the English football league system. In 2019, as Manchester City and Liverpool contested a thrilling title race, a cumulative global audience of 3.2 billion for all programming watched the action, <a href="https://www.premierleague.com/news/1280062">an increase of six per cent on the previous season.</a> This rise in viewership numbers, combined with an increasingly engaged fanbase, has translated into a greater interest in Premier League related games and interactive content.

<b>What is the Fantasy Premier League?</b>

Fantasy football is a game in which participants assemble an imaginary team of real life footballers and score points based on those players' actual statistical performance or their perceived contribution on the field of play. The way the game works is simple: you pick eleven players, and whenever these players perform well in the live matches, they get points. A goal for a striker, for example, is worth 3 points, and a clean sheet (no goals conceded) is worth 6 points.

The Fantasy Premier League (abbreviated to FPL), in particular, is the world's largest fantasy league with over 6 million players. With attractive prizes (top FPL players in each region earn a ticket to watch their favourite football team, while the overall first place wins a cash prize), the FPL fanbase are often more engaged and fanatical about football.

<b>Reddit and the Premier League</b>

Reddit is a social news aggregation, web content rating, and discussion website. Posts are organized by subject into user-created boards called "subreddits", which cover a variety of topics including news, science, movies, and of course, football.

/r/PremierLeague/ is one such subreddit, where fans of the English Premier League aggregate and discuss the game. The subreddit is 8 years old and has 108,0000 members as of 2019.

/r/FantasyPL/ is a subreddit dedicated to the fantasy version of the EPL. The subreddit is also 8 years old, and surprisingly has more members than /r/PremierLeague, at 177,000.

<b>Preamble:</b>

<u>This project will be addressed from the perspective from the marketing team of the Fantasy Premier League (FantasyPL) App.</u>

The goals of this project will be outlined in the following section.

# 2. Problem Statement <a id='Problem Statement'></a>

As managers of the FPL app, we want to evaluate: 
<li>What kind of people surf the FPL reddit, and what kind of people surf the EPL reddit? Are there overlaps (same username) who surf both reddits, and what's the extent of this overlap?
<li> Overall sentiment for EPL and FPL reddits, is one more positive than the other?
<li> What type of content are posted in the EPL and FPL reddits respectively? What features differentiates an EPL post from an FPL post?
<br>
<br>
To frame the problem from a data science perspective, this is both an inference problem and a prediction problem:
<li> First, we need to create a classification model with sufficient predictive power that differentiates between the two reddits. We will apply a few different models here to evaluate differences in features selected and predictive power. Intuitively, the higher the predictive power, the better the model is at distinguishing between the two subreddits, and therefore the features it selects are better indicators of difference.
<li> Second, we need to find what features are characteristic of each reddit.
<li> Third, are there differences in what teams each subreddit discuss? E.g., is Manchester United popular in EPL, but unpopular in FPL? Are there differences in the type of comments made in each subreddit?</li>
    
Using this information, we should gather information that supports a targeted marketing campaign - i.e. how to appeal to the broader spectrum of people who watch EPL but don't play FPL. <b>If I want to create a targeted marketing campaign on r/EPL, how should I go about doing it?</b>

# 3. Table of Contents <a id='Table of Contents'></a>

- <a href='#Introduction'>1. Introduction</a>
- <a href='#Problem_Statement'>2. Problem Statement</a>
- <a href='#Table of Contents'>3. Table of Contents</a>
- <a href='#Data Imports'>4. Library Imports</a>

- <a href='#Data Gathering'>5. Data Gathering</a>
    - <a href='#Sample25'>5.1 Sampling 25 Posts from Reddit</a>    
    - <a href='#DataProper'>5.2 Data Gathering Proper</a>
- <a href='#Data Dictionary'>6. Data Dictionary</a>           
- <a href='#Shape Missingness'>7. Evaluating Shape and Missingness</a>    
    - <a href='#Overview FPL'>7.1. Overview of FPL data</a>  
    - <a href='#Overview EPL'>7.2. Overview of EPL data</a>  
    - <a href='#Measuring Missingness'>7.3 Measuring Missingness</a>  
    - <a href='#Analysis Shape Missingness'>7.4 Analysis of Shape and Missingness</a>
- <a href='#Cleaning and Preprocessing'>8. Data Cleaning and Preprocessing</a>  
    - <a href='#Cleaning Functions'>8.1 Cleaning and Encoding Functions</a>    
    - <a href='#Filling Missing'>8.2 Filling Missing Values</a>    
        - <a href='#Extract OCR'>8.2.1 Filling using OCR from Images</a>    
        - <a href='#Extract Tweet'>8.2.2 Filling using Tweet data</a>    
    - <a href='#Cleaning Text'>8.3. Cleaning, Tokenizing, and Lemmatizing Text</a>    
- <a href='#Sentiment Analysis'>8.4 Sentiment Analysis</a>  
    - <a href='#Cleaning Functions'>8.1 Cleaning and Encoding Functions</a>    

# 4. Library Imports <a id='Data Imports'></a>

In [1]:
#General Imports
import requests
import pandas as pd
import numpy as np
import time
import random
import math
from collections import namedtuple, Counter
import scipy.stats as stats
from scipy.stats import norm
import datetime

#Scraping Imports
import requests
from bs4 import BeautifulSoup

#NLP Imports
import spacy
import re
from textblob import TextBlob, Word, Blobber
from textblob.classifiers import NaiveBayesClassifier
from textblob.taggers import NLTKTagger
import nltk
from nltk.stem import WordNetLemmatizer 
from nltk import pos_tag, word_tokenize
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer


#Plotting/Graphs
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

#OCR imports
from PIL import Image
import pytesseract
from io import BytesIO
#This line won't work unless you install tesseract OCR software
#Change the path to where you saved the program if you have it
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

#Modelling Imports
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import confusion_matrix, roc_auc_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
#Lets you view all columns in a dataframe when there are a LARGE number of columns
pd.set_option('display.max_columns', None)
#Fix for pandas truncating long strings (e.g. urls)
#If you don't run this, pandas will cut short your long urls and this breaks code that require urls
pd.set_option("display.max_colwidth", 10000)

# 5. Data Gathering <a id='Data Gathering'></a>

### 5.1 Sampling 25 posts from Reddit <a id='Sample25'></a> 
<div align="right"><a href='#Table of Contents'>Back to Table of Contents</a></div>

Before any serious crawling begins, there is a need to evaluate what the raw data looks like, what fields are necessary, what can be dropped, and any potential gaps in the data.

In [2]:
#Get the first 25 posts from the PremierLeague subreddit
url = 'https://www.reddit.com/r/PremierLeague/.json'
res = requests.get(url, headers={'User-agent': 'Ponyman1'})                
post_dict = res.json()
posts = [p['data'] for p in post_dict['data']['children']]
EPL_sample = pd.DataFrame(posts)
EPL_sample.head(3)

Unnamed: 0,all_awardings,allow_live_comments,approved_at_utc,approved_by,archived,author,author_flair_background_color,author_flair_css_class,author_flair_richtext,author_flair_template_id,author_flair_text,author_flair_text_color,author_flair_type,author_fullname,author_patreon_flair,awarders,banned_at_utc,banned_by,can_gild,can_mod_post,category,clicked,content_categories,contest_mode,created,created_utc,discussion_type,distinguished,domain,downs,edited,gilded,gildings,hidden,hide_score,id,is_crosspostable,is_meta,is_original_content,is_reddit_media_domain,is_robot_indexable,is_self,is_video,likes,link_flair_background_color,link_flair_css_class,link_flair_richtext,link_flair_template_id,link_flair_text,link_flair_text_color,link_flair_type,locked,media,media_embed,media_metadata,media_only,mod_note,mod_reason_by,mod_reason_title,mod_reports,name,no_follow,num_comments,num_crossposts,num_reports,over_18,parent_whitelist_status,permalink,pinned,post_hint,preview,pwls,quarantine,removal_reason,report_reasons,saved,score,secure_media,secure_media_embed,selftext,selftext_html,send_replies,spoiler,steward_reports,stickied,subreddit,subreddit_id,subreddit_name_prefixed,subreddit_subscribers,subreddit_type,suggested_sort,thumbnail,thumbnail_height,thumbnail_width,title,total_awards_received,ups,url,user_reports,view_count,visited,whitelist_status,wls
0,[],True,,,False,pumkinhat,,,[],,,,text,t2_rk9wt,False,[],,,False,False,,False,,False,1566045000.0,1566016000.0,,,self.PremierLeague,0,False,0,{},False,False,crhhzv,False,False,False,False,True,True,False,,#7193ff,,"[{'e': 'text', 't': 'Discussion'}]",9f833a4a-8ee9-11e8-9ddc-0e618b2b6a52,Discussion,light,richtext,False,,{},,False,,,,[],t3_crhhzv,False,92,0,,False,all_ads,/r/PremierLeague/comments/crhhzv/sticky_thread_who_should_i_root_for/,False,,,6,False,,,False,49,,{},"As recently there have been quite a growth in these kind of posts, I'm going to create a this thread and forward users here.\n\n--\n\nPlease use this thread for all ""**Who should I root for/follow**"" related questions.\n\nCheck out the previous [Who should I root for thread?](https://www.reddit.com/r/PremierLeague/comments/c02cu9/sticky_thread_who_should_i_root_for/)\n\nAlso check out [r/Soccer's Premier League Team Preview Megathread](https://www.reddit.com/r/PremierLeague/comments/cjdilc/rsoccers_premier_league_team_preview_megathread/)","&lt;!-- SC_OFF --&gt;&lt;div class=""md""&gt;&lt;p&gt;As recently there have been quite a growth in these kind of posts, I&amp;#39;m going to create a this thread and forward users here.&lt;/p&gt;\n\n&lt;h2&gt;&lt;/h2&gt;\n\n&lt;p&gt;Please use this thread for all &amp;quot;&lt;strong&gt;Who should I root for/follow&lt;/strong&gt;&amp;quot; related questions.&lt;/p&gt;\n\n&lt;p&gt;Check out the previous &lt;a href=""https://www.reddit.com/r/PremierLeague/comments/c02cu9/sticky_thread_who_should_i_root_for/""&gt;Who should I root for thread?&lt;/a&gt;&lt;/p&gt;\n\n&lt;p&gt;Also check out &lt;a href=""https://www.reddit.com/r/PremierLeague/comments/cjdilc/rsoccers_premier_league_team_preview_megathread/""&gt;r/Soccer&amp;#39;s Premier League Team Preview Megathread&lt;/a&gt;&lt;/p&gt;\n&lt;/div&gt;&lt;!-- SC_ON --&gt;",False,False,[],True,PremierLeague,t5_2scup,r/PremierLeague,109707,public,,self,,,Sticky Thread - Who should I root for? (August/September),0,49,https://www.reddit.com/r/PremierLeague/comments/crhhzv/sticky_thread_who_should_i_root_for/,[],,False,all_ads,6
1,[],True,,,False,Sofishticated_,,ava,"[{'a': ':ava:', 'e': 'emoji', 'u': 'https://emoji.redditmedia.com/hndntrmkiq131_t5_2scup/ava'}, {'e': 'text', 't': ' Aston Villa'}]",12a59dbc-8460-11e9-ba38-0ef0b9539158,:ava: Aston Villa,dark,richtext,t2_13ejpc,False,[],,,False,False,,False,,False,1566399000.0,1566370000.0,,moderator,self.PremierLeague,0,False,0,{},False,False,ctd3bt,False,False,False,False,True,True,False,,#7193ff,,"[{'e': 'text', 't': 'Moderator Post'}]",19119140-8da4-11e9-8b74-0e0bd0a9a746,Moderator Post,light,richtext,False,,{},,False,,,,[],t3_ctd3bt,False,10,0,,False,all_ads,/r/PremierLeague/comments/ctd3bt/new_rule_in_place/,False,,,6,False,,,False,81,,{},"Hello everyone,\n\nAs the 2019/20 season gets underway, the mod team has been noticing a startlingly large amount of posts either seeking out or advertising streams or torrents. I want to remind everyone that both of these mediums are illegal in the majority of the world, including the United States: the country where Reddit is located.\n\nIn the past /r/PremierLeague has never had a distinct policy on streams or torrents, and we have decided to implement one today.\n\n**Rule #7** - Do not post or advertise illegal streams or torrents.\n\nWe understand that many users on this subreddit struggle to afford football packages from their country's providers, but unfortunately, if you are going to be watching football from your living room, we recommend your nation's legal sources of football.\n\nAnd if you are dead set on searching for illegal streams or torrents, /r/PremierLeague is **not** the location to do it. Reddit has a history of administrating subreddits that foster these types of posts, and we have no interest in that.\n\nThanks, and if you have any questions feel free to ask them below.","&lt;!-- SC_OFF --&gt;&lt;div class=""md""&gt;&lt;p&gt;Hello everyone,&lt;/p&gt;\n\n&lt;p&gt;As the 2019/20 season gets underway, the mod team has been noticing a startlingly large amount of posts either seeking out or advertising streams or torrents. I want to remind everyone that both of these mediums are illegal in the majority of the world, including the United States: the country where Reddit is located.&lt;/p&gt;\n\n&lt;p&gt;In the past &lt;a href=""/r/PremierLeague""&gt;/r/PremierLeague&lt;/a&gt; has never had a distinct policy on streams or torrents, and we have decided to implement one today.&lt;/p&gt;\n\n&lt;p&gt;&lt;strong&gt;Rule #7&lt;/strong&gt; - Do not post or advertise illegal streams or torrents.&lt;/p&gt;\n\n&lt;p&gt;We understand that many users on this subreddit struggle to afford football packages from their country&amp;#39;s providers, but unfortunately, if you are going to be watching football from your living room, we recommend your nation&amp;#39;s legal sources of football.&lt;/p&gt;\n\n&lt;p&gt;And if you are dead set on searching for illegal streams or torrents, &lt;a href=""/r/PremierLeague""&gt;/r/PremierLeague&lt;/a&gt; is &lt;strong&gt;not&lt;/strong&gt; the location to do it. Reddit has a history of administrating subreddits that foster these types of posts, and we have no interest in that.&lt;/p&gt;\n\n&lt;p&gt;Thanks, and if you have any questions feel free to ask them below.&lt;/p&gt;\n&lt;/div&gt;&lt;!-- SC_ON --&gt;",True,False,[],True,PremierLeague,t5_2scup,r/PremierLeague,109707,public,,self,,,New Rule in Place,0,81,https://www.reddit.com/r/PremierLeague/comments/ctd3bt/new_rule_in_place/,[],,False,all_ads,6
2,[],False,,,False,gibba97,,,[],,,,text,t2_wzajn,False,[],,,False,False,,False,,False,1571686000.0,1571658000.0,,,self.PremierLeague,0,False,0,{},False,False,dkz6tb,False,False,False,False,True,True,False,,#7193ff,,"[{'e': 'text', 't': 'Question'}]",86fb4972-8ee9-11e8-ab9f-0efe03519900,Question,light,richtext,False,,{},,False,,,,[],t3_dkz6tb,False,25,0,,False,all_ads,/r/PremierLeague/comments/dkz6tb/i_have_the_opportunity_to_interview_micah/,False,,,6,False,,,False,143,,{},"My work is affiliated with City Football Group and as a part of that, Micah Richards and Manchester City are visiting our workplace showcasing some of their silverware. \n\nI have the opportunity to sit down with Micah for a 15 minute interview. Do any of you have any interesting questions you could suggest asking? \n\nAs I work in a corporate environment, i’m planning to ask about the managers he played under, Mancini and Pellegrini, and what he found special about their leadership. Things along this line would be nice but then again, I have full reigns on asking anything! \n\nThanks :)","&lt;!-- SC_OFF --&gt;&lt;div class=""md""&gt;&lt;p&gt;My work is affiliated with City Football Group and as a part of that, Micah Richards and Manchester City are visiting our workplace showcasing some of their silverware. &lt;/p&gt;\n\n&lt;p&gt;I have the opportunity to sit down with Micah for a 15 minute interview. Do any of you have any interesting questions you could suggest asking? &lt;/p&gt;\n\n&lt;p&gt;As I work in a corporate environment, i’m planning to ask about the managers he played under, Mancini and Pellegrini, and what he found special about their leadership. Things along this line would be nice but then again, I have full reigns on asking anything! &lt;/p&gt;\n\n&lt;p&gt;Thanks :)&lt;/p&gt;\n&lt;/div&gt;&lt;!-- SC_ON --&gt;",True,False,[],False,PremierLeague,t5_2scup,r/PremierLeague,109707,public,,self,,,"I have the opportunity to interview Micah Richards later this week, what questions would you ask him?",0,143,https://www.reddit.com/r/PremierLeague/comments/dkz6tb/i_have_the_opportunity_to_interview_micah/,[],,False,all_ads,6


In [3]:
#Get the first 25 posts from the FantasyPL subreddit
url = 'https://www.reddit.com/r/fantasyPL/.json'
res = requests.get(url, headers={'User-agent': 'Ponyman1'})                
post_dict = res.json()
posts = [p['data'] for p in post_dict['data']['children']]
FPL_sample = pd.DataFrame(posts)
FPL_sample

Unnamed: 0,all_awardings,allow_live_comments,approved_at_utc,approved_by,archived,author,author_flair_background_color,author_flair_css_class,author_flair_richtext,author_flair_template_id,author_flair_text,author_flair_text_color,author_flair_type,author_fullname,author_patreon_flair,awarders,banned_at_utc,banned_by,can_gild,can_mod_post,category,clicked,content_categories,contest_mode,created,created_utc,discussion_type,distinguished,domain,downs,edited,gilded,gildings,hidden,hide_score,id,is_crosspostable,is_meta,is_original_content,is_reddit_media_domain,is_robot_indexable,is_self,is_video,likes,link_flair_background_color,link_flair_css_class,link_flair_richtext,link_flair_template_id,link_flair_text,link_flair_text_color,link_flair_type,locked,media,media_embed,media_metadata,media_only,mod_note,mod_reason_by,mod_reason_title,mod_reports,name,no_follow,num_comments,num_crossposts,num_reports,over_18,parent_whitelist_status,permalink,pinned,post_hint,preview,pwls,quarantine,removal_reason,report_reasons,saved,score,secure_media,secure_media_embed,selftext,selftext_html,send_replies,spoiler,steward_reports,stickied,subreddit,subreddit_id,subreddit_name_prefixed,subreddit_subscribers,subreddit_type,suggested_sort,thumbnail,thumbnail_height,thumbnail_width,title,total_awards_received,ups,url,user_reports,view_count,visited,whitelist_status,wls
0,[],True,,,False,FPLModerator,,,[],,13.0,dark,text,t2_10ke71,False,[],,,False,False,,False,,False,1571510000.0,1571481000.0,,,self.FantasyPL,0,1.57161e+09,0,{},False,False,dk1zo0,False,False,False,False,True,True,False,,,,[],,,dark,text,False,,{},,False,,,,[],t3_dk1zo0,False,7341,0,,False,all_ads,/r/FantasyPL/comments/dk1zo0/game_week_9_20192020_rant_discussion_thread/,False,self,"{'images': [{'source': {'url': 'https://external-preview.redd.it/EpXJe4So7G6a7yGQZyDCPU7hIwKytSuvQ-_zJW9tM50.jpg?auto=webp&amp;s=06a28537f2c3e1129a4ee52c8f011663b94a885b', 'width': 1200, 'height': 1200}, 'resolutions': [{'url': 'https://external-preview.redd.it/EpXJe4So7G6a7yGQZyDCPU7hIwKytSuvQ-_zJW9tM50.jpg?width=108&amp;crop=smart&amp;auto=webp&amp;s=cc46eaedffd0d2cfeaa31cacd44ef937cdf0f3d3', 'width': 108, 'height': 108}, {'url': 'https://external-preview.redd.it/EpXJe4So7G6a7yGQZyDCPU7hIwKytSuvQ-_zJW9tM50.jpg?width=216&amp;crop=smart&amp;auto=webp&amp;s=51b8520e47a974b898aa42bfff0853a772320230', 'width': 216, 'height': 216}, {'url': 'https://external-preview.redd.it/EpXJe4So7G6a7yGQZyDCPU7hIwKytSuvQ-_zJW9tM50.jpg?width=320&amp;crop=smart&amp;auto=webp&amp;s=55d0d132cd2cc436ff886a4537045d8782f8e292', 'width': 320, 'height': 320}, {'url': 'https://external-preview.redd.it/EpXJe4So7G6a7yGQZyDCPU7hIwKytSuvQ-_zJW9tM50.jpg?width=640&amp;crop=smart&amp;auto=webp&amp;s=4d43d5e33d31ffe8bb84912d14bd3e4d92afb728', 'width': 640, 'height': 640}, {'url': 'https://external-preview.redd.it/EpXJe4So7G6a7yGQZyDCPU7hIwKytSuvQ-_zJW9tM50.jpg?width=960&amp;crop=smart&amp;auto=webp&amp;s=10bcf7cfdb0e05987ea7af87d7ffe1fb60546192', 'width': 960, 'height': 960}, {'url': 'https://external-preview.redd.it/EpXJe4So7G6a7yGQZyDCPU7hIwKytSuvQ-_zJW9tM50.jpg?width=1080&amp;crop=smart&amp;auto=webp&amp;s=7c9fb04a52f096487d19f92d46aee682c090deb8', 'width': 1080, 'height': 1080}], 'variants': {}, 'id': 'hEYsV8boSQLHPW1sA5uGMuzPZfw75L3a8ffuetCRaGg'}], 'enabled': False}",6,False,,,False,108,,{},"This is the place to moan and discuss about every single thing that happened in games and with your team. If your player didn't start or saw a red card, or you picked the wrong player, captain, or other, this is the place to share all your rants, memes and outbursts (and your score). We have included all relevant information about the current gameweek - lineups, bonus, and predicted averages, etc.\n\nFrom all of the mod team - good luck!\n\n___\n\n#THREADS\n\n* **Captain Poll**: [This week has decided that Abraham is the #1 captain.](https://www.strawpoll.me/18808999/r)\n\n* **RMT Thread**: [Can be found here](https://redd.it/djx3pg)\n\n* **How did ___ Play?**:[Can be found here](https://redd.it/dk467h)\n\n#* **Live Chat Stream**: [Can be found here](https://www.reddit-stream.com/r/FantasyPL/comments/k1zo0/game_week_9_20192020_rant_discussion_thread/?)\n\n___\n\n#LINEUP THREADS\n\nHome Team| Lineup Thread | v | Away Team | Lineup Thread |\n:--|:--|:--|:--|:--\nEverton | [Click](https://redd.it/dk1zqy) | v | West Ham| [Click](https://redd.it/dk1zma)|\nAston Villa | [Click](https://redd.it/dk3h6d) | v | Brighton| [Click](https://redd.it/dk3ja9)|\nBournemouth | [Click](https://redd.it/dk3ijq) | v | Norwich| [Click](https://redd.it/dk3hxp)|\nChelsea | [Click](https://redd.it/dk3hm6) | v | Newcastle| [Click](https://redd.it/dk3phu)|\nLeicester | [Click](https://redd.it/dk3mfg) | v | Burnley| [Click](https://redd.it/dk3ijj)|\nTottenham | [Click](https://redd.it/dk3h8o) | v | Watford| [Click](https://redd.it/dk3s16)|\nWolverhampton | [Click]() | v | Southampton| [Click](https://redd.it/dk437d)|\nC.Palace | [Click](https://redd.it/dk5cj8) | v | Man.City| [Click](https://redd.it/dk59td)|\nMan.Utd | [Click]() | v | Liverpool | [Click]()|\nSheffield | [Click]() | v | Arsenal| [Click]()|\n___\n\n#BONUS POINTS\n\n*[Anewpla](http://anewpla.net/fpl/live/) or [FPL Alerts](http://fplalerts.com/) will provide live bonus updates.*\n\n\nMatch| (3) Bonus | (2) Bonus | (1) Bonus | \n:--|:--|:--|:--\nEverton v West Ham| Sidibé | Bernard | Keane, Roberto, Pickford \nAston Villa v Brighton| Grealish | Targett | Groß\nBournemouth v Norwich | Steve Cook | Aké, Rico | \nChelsea v Newcastle| Alonso, Hudson-Odoi | | Zouma \nLeicester v Burnley| Barnes | Vardy, Tielemans | \nTottenham v Watford| Janmaat | Doucouré | Alli\nWolverhampton v Southampton| Ings | Jiménez | Højbjerg\nPalace v Man.City | Cancelo, David Silva | | Jesus\nMan.Utd v Liverpool | Rashford, Robertson | | Lallana, James\n___\n\n#This is NOT an RMT thread. Please do not post questions about your team.\n\n#WARNING: Posting any fake reports about goals, injuries or players being benched will result in a BAN","&lt;!-- SC_OFF --&gt;&lt;div class=""md""&gt;&lt;p&gt;This is the place to moan and discuss about every single thing that happened in games and with your team. If your player didn&amp;#39;t start or saw a red card, or you picked the wrong player, captain, or other, this is the place to share all your rants, memes and outbursts (and your score). We have included all relevant information about the current gameweek - lineups, bonus, and predicted averages, etc.&lt;/p&gt;\n\n&lt;p&gt;From all of the mod team - good luck!&lt;/p&gt;\n\n&lt;hr/&gt;\n\n&lt;h1&gt;THREADS&lt;/h1&gt;\n\n&lt;ul&gt;\n&lt;li&gt;&lt;p&gt;&lt;strong&gt;Captain Poll&lt;/strong&gt;: &lt;a href=""https://www.strawpoll.me/18808999/r""&gt;This week has decided that Abraham is the #1 captain.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;strong&gt;RMT Thread&lt;/strong&gt;: &lt;a href=""https://redd.it/djx3pg""&gt;Can be found here&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;strong&gt;How did ___ Play?&lt;/strong&gt;:&lt;a href=""https://redd.it/dk467h""&gt;Can be found here&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;/ul&gt;\n\n&lt;h1&gt;* &lt;strong&gt;Live Chat Stream&lt;/strong&gt;: &lt;a href=""https://www.reddit-stream.com/r/FantasyPL/comments/k1zo0/game_week_9_20192020_rant_discussion_thread/?""&gt;Can be found here&lt;/a&gt;&lt;/h1&gt;\n\n&lt;hr/&gt;\n\n&lt;h1&gt;LINEUP THREADS&lt;/h1&gt;\n\n&lt;table&gt;&lt;thead&gt;\n&lt;tr&gt;\n&lt;th align=""left""&gt;Home Team&lt;/th&gt;\n&lt;th align=""left""&gt;Lineup Thread&lt;/th&gt;\n&lt;th align=""left""&gt;v&lt;/th&gt;\n&lt;th align=""left""&gt;Away Team&lt;/th&gt;\n&lt;th align=""left""&gt;Lineup Thread&lt;/th&gt;\n&lt;/tr&gt;\n&lt;/thead&gt;&lt;tbody&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Everton&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;a href=""https://redd.it/dk1zqy""&gt;Click&lt;/a&gt;&lt;/td&gt;\n&lt;td align=""left""&gt;v&lt;/td&gt;\n&lt;td align=""left""&gt;West Ham&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;a href=""https://redd.it/dk1zma""&gt;Click&lt;/a&gt;&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Aston Villa&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;a href=""https://redd.it/dk3h6d""&gt;Click&lt;/a&gt;&lt;/td&gt;\n&lt;td align=""left""&gt;v&lt;/td&gt;\n&lt;td align=""left""&gt;Brighton&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;a href=""https://redd.it/dk3ja9""&gt;Click&lt;/a&gt;&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Bournemouth&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;a href=""https://redd.it/dk3ijq""&gt;Click&lt;/a&gt;&lt;/td&gt;\n&lt;td align=""left""&gt;v&lt;/td&gt;\n&lt;td align=""left""&gt;Norwich&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;a href=""https://redd.it/dk3hxp""&gt;Click&lt;/a&gt;&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Chelsea&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;a href=""https://redd.it/dk3hm6""&gt;Click&lt;/a&gt;&lt;/td&gt;\n&lt;td align=""left""&gt;v&lt;/td&gt;\n&lt;td align=""left""&gt;Newcastle&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;a href=""https://redd.it/dk3phu""&gt;Click&lt;/a&gt;&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Leicester&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;a href=""https://redd.it/dk3mfg""&gt;Click&lt;/a&gt;&lt;/td&gt;\n&lt;td align=""left""&gt;v&lt;/td&gt;\n&lt;td align=""left""&gt;Burnley&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;a href=""https://redd.it/dk3ijj""&gt;Click&lt;/a&gt;&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Tottenham&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;a href=""https://redd.it/dk3h8o""&gt;Click&lt;/a&gt;&lt;/td&gt;\n&lt;td align=""left""&gt;v&lt;/td&gt;\n&lt;td align=""left""&gt;Watford&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;a href=""https://redd.it/dk3s16""&gt;Click&lt;/a&gt;&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Wolverhampton&lt;/td&gt;\n&lt;td align=""left""&gt;[Click]()&lt;/td&gt;\n&lt;td align=""left""&gt;v&lt;/td&gt;\n&lt;td align=""left""&gt;Southampton&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;a href=""https://redd.it/dk437d""&gt;Click&lt;/a&gt;&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;C.Palace&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;a href=""https://redd.it/dk5cj8""&gt;Click&lt;/a&gt;&lt;/td&gt;\n&lt;td align=""left""&gt;v&lt;/td&gt;\n&lt;td align=""left""&gt;Man.City&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;a href=""https://redd.it/dk59td""&gt;Click&lt;/a&gt;&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Man.Utd&lt;/td&gt;\n&lt;td align=""left""&gt;[Click]()&lt;/td&gt;\n&lt;td align=""left""&gt;v&lt;/td&gt;\n&lt;td align=""left""&gt;Liverpool&lt;/td&gt;\n&lt;td align=""left""&gt;[Click]()&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Sheffield&lt;/td&gt;\n&lt;td align=""left""&gt;[Click]()&lt;/td&gt;\n&lt;td align=""left""&gt;v&lt;/td&gt;\n&lt;td align=""left""&gt;Arsenal&lt;/td&gt;\n&lt;td align=""left""&gt;[Click]()&lt;/td&gt;\n&lt;/tr&gt;\n&lt;/tbody&gt;&lt;/table&gt;\n\n&lt;hr/&gt;\n\n&lt;h1&gt;BONUS POINTS&lt;/h1&gt;\n\n&lt;p&gt;&lt;em&gt;&lt;a href=""http://anewpla.net/fpl/live/""&gt;Anewpla&lt;/a&gt; or &lt;a href=""http://fplalerts.com/""&gt;FPL Alerts&lt;/a&gt; will provide live bonus updates.&lt;/em&gt;&lt;/p&gt;\n\n&lt;table&gt;&lt;thead&gt;\n&lt;tr&gt;\n&lt;th align=""left""&gt;Match&lt;/th&gt;\n&lt;th align=""left""&gt;(3) Bonus&lt;/th&gt;\n&lt;th align=""left""&gt;(2) Bonus&lt;/th&gt;\n&lt;th align=""left""&gt;(1) Bonus&lt;/th&gt;\n&lt;/tr&gt;\n&lt;/thead&gt;&lt;tbody&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Everton v West Ham&lt;/td&gt;\n&lt;td align=""left""&gt;Sidibé&lt;/td&gt;\n&lt;td align=""left""&gt;Bernard&lt;/td&gt;\n&lt;td align=""left""&gt;Keane, Roberto, Pickford&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Aston Villa v Brighton&lt;/td&gt;\n&lt;td align=""left""&gt;Grealish&lt;/td&gt;\n&lt;td align=""left""&gt;Targett&lt;/td&gt;\n&lt;td align=""left""&gt;Groß&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Bournemouth v Norwich&lt;/td&gt;\n&lt;td align=""left""&gt;Steve Cook&lt;/td&gt;\n&lt;td align=""left""&gt;Aké, Rico&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Chelsea v Newcastle&lt;/td&gt;\n&lt;td align=""left""&gt;Alonso, Hudson-Odoi&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;/td&gt;\n&lt;td align=""left""&gt;Zouma&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Leicester v Burnley&lt;/td&gt;\n&lt;td align=""left""&gt;Barnes&lt;/td&gt;\n&lt;td align=""left""&gt;Vardy, Tielemans&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Tottenham v Watford&lt;/td&gt;\n&lt;td align=""left""&gt;Janmaat&lt;/td&gt;\n&lt;td align=""left""&gt;Doucouré&lt;/td&gt;\n&lt;td align=""left""&gt;Alli&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Wolverhampton v Southampton&lt;/td&gt;\n&lt;td align=""left""&gt;Ings&lt;/td&gt;\n&lt;td align=""left""&gt;Jiménez&lt;/td&gt;\n&lt;td align=""left""&gt;Højbjerg&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Palace v Man.City&lt;/td&gt;\n&lt;td align=""left""&gt;Cancelo, David Silva&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;/td&gt;\n&lt;td align=""left""&gt;Jesus&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr&gt;\n&lt;td align=""left""&gt;Man.Utd v Liverpool&lt;/td&gt;\n&lt;td align=""left""&gt;Rashford, Robertson&lt;/td&gt;\n&lt;td align=""left""&gt;&lt;/td&gt;\n&lt;td align=""left""&gt;Lallana, James&lt;/td&gt;\n&lt;/tr&gt;\n&lt;/tbody&gt;&lt;/table&gt;\n\n&lt;hr/&gt;\n\n&lt;h1&gt;This is NOT an RMT thread. Please do not post questions about your team.&lt;/h1&gt;\n\n&lt;h1&gt;WARNING: Posting any fake reports about goals, injuries or players being benched will result in a BAN&lt;/h1&gt;\n&lt;/div&gt;&lt;!-- SC_ON --&gt;",False,False,[],True,FantasyPL,t5_2snvr,r/FantasyPL,177754,public,new,self,,,GAME WEEK 9 (2019/2020) RANT &amp; DISCUSSION THREAD,0,108,https://www.reddit.com/r/FantasyPL/comments/dk1zo0/game_week_9_20192020_rant_discussion_thread/,[],,False,all_ads,6
1,[],True,,,False,FPLModerator,,,[],,13.0,dark,text,t2_10ke71,False,[],,,False,False,,False,,False,1571523000.0,1571494000.0,,,self.FantasyPL,0,False,0,{},False,False,dk467h,False,False,False,False,True,True,False,,,,[],,,dark,text,False,,{},,False,,,,[],t3_dk467h,False,408,0,,False,all_ads,/r/FantasyPL/comments/dk467h/how_did_play_gameweek_9_20192020/,False,,,6,False,,,False,47,,{},"Please search the thread for players. Any duplicate posts will be removed.\n\n**If you guys could report the double posts as spam, that would help the mods a ton in removing the clutter. Thanks!**\n\n\n[The RANT thread is here](https://old.reddit.com/r/FantasyPL/comments/dk1zo0/game_week_9_20192020_rant_discussion_thread/)\n\n[The RMT thread is here](https://redd.it/djx3pg)","&lt;!-- SC_OFF --&gt;&lt;div class=""md""&gt;&lt;p&gt;Please search the thread for players. Any duplicate posts will be removed.&lt;/p&gt;\n\n&lt;p&gt;&lt;strong&gt;If you guys could report the double posts as spam, that would help the mods a ton in removing the clutter. Thanks!&lt;/strong&gt;&lt;/p&gt;\n\n&lt;p&gt;&lt;a href=""https://old.reddit.com/r/FantasyPL/comments/dk1zo0/game_week_9_20192020_rant_discussion_thread/""&gt;The RANT thread is here&lt;/a&gt;&lt;/p&gt;\n\n&lt;p&gt;&lt;a href=""https://redd.it/djx3pg""&gt;The RMT thread is here&lt;/a&gt;&lt;/p&gt;\n&lt;/div&gt;&lt;!-- SC_ON --&gt;",True,False,[],True,FantasyPL,t5_2snvr,r/FantasyPL,177754,public,,self,,,How did ___ Play? Gameweek 9 (2019/2020),0,47,https://www.reddit.com/r/FantasyPL/comments/dk467h/how_did_play_gameweek_9_20192020/,[],,False,all_ads,6
2,[],True,,,False,Ak_Ibrahim,,,[],,2.0,dark,text,t2_ar7lvxr,False,[],,,False,False,,False,,False,1571721000.0,1571692000.0,,,i.redd.it,0,False,0,{},False,False,dl6vwu,False,False,False,True,True,False,False,,,stat,[],1cbec5fc-757c-11e7-be6e-0e34476a8e7a,Statistics,dark,text,False,,{},,False,,,,[],t3_dl6vwu,False,80,0,,False,all_ads,/r/FantasyPL/comments/dl6vwu/john_lundstram_is_the_top_scoring_defender_in_the/,False,image,"{'images': [{'source': {'url': 'https://preview.redd.it/jksvmiwukyt31.jpg?auto=webp&amp;s=e5ab771b9844a15b1321f5f400bb715e26d76502', 'width': 910, 'height': 419}, 'resolutions': [{'url': 'https://preview.redd.it/jksvmiwukyt31.jpg?width=108&amp;crop=smart&amp;auto=webp&amp;s=076b4a2822ec141e796e8b81243fecc7766c01c0', 'width': 108, 'height': 49}, {'url': 'https://preview.redd.it/jksvmiwukyt31.jpg?width=216&amp;crop=smart&amp;auto=webp&amp;s=3b690e909548670195382c0d751d8d2729522f62', 'width': 216, 'height': 99}, {'url': 'https://preview.redd.it/jksvmiwukyt31.jpg?width=320&amp;crop=smart&amp;auto=webp&amp;s=f7c32f6d41422e69cd544b580779ea5910f04fd3', 'width': 320, 'height': 147}, {'url': 'https://preview.redd.it/jksvmiwukyt31.jpg?width=640&amp;crop=smart&amp;auto=webp&amp;s=d7cef58cab9a4088d8fc46623d0195d3a11e6cbe', 'width': 640, 'height': 294}], 'variants': {}, 'id': 'mhL9RrOQhl5Wcpc94IsfhGqRN6rBHZXP96gQIrDkuZs'}], 'enabled': True}",6,False,,,False,445,,{},,,True,False,[],False,FantasyPL,t5_2snvr,r/FantasyPL,177754,public,,https://b.thumbs.redditmedia.com/887X2AhS2GYsGUYUlFsobCqTPaJpPxKA3jWA8gC2KkY.jpg,64.0,140.0,John Lundstram is the top scoring defender in the game,0,445,https://i.redd.it/jksvmiwukyt31.jpg,[],,False,all_ads,6
3,[],False,,,False,JLane1996,,points,[],,3.0,dark,text,t2_1x2kyanz,False,[],,,False,False,,False,,False,1571694000.0,1571666000.0,,,i.redd.it,0,False,0,{},False,False,dl0ryk,False,False,False,True,True,False,False,,,,[],,,dark,text,False,,{},,False,,,,[],t3_dl0ryk,False,41,0,,False,all_ads,/r/FantasyPL/comments/dl0ryk/a_very_merry_christmas_coming_up_for_everton/,False,image,"{'images': [{'source': {'url': 'https://preview.redd.it/qh6v6a82fwt31.jpg?auto=webp&amp;s=c586453056ca51f0f9cee526848cf5c6274af2c8', 'width': 828, 'height': 361}, 'resolutions': [{'url': 'https://preview.redd.it/qh6v6a82fwt31.jpg?width=108&amp;crop=smart&amp;auto=webp&amp;s=a1ae6d92595d82a8ed9536399e1830126cf6fe4a', 'width': 108, 'height': 47}, {'url': 'https://preview.redd.it/qh6v6a82fwt31.jpg?width=216&amp;crop=smart&amp;auto=webp&amp;s=d707f22e7953ba155383fe2bb8da87612d5496ef', 'width': 216, 'height': 94}, {'url': 'https://preview.redd.it/qh6v6a82fwt31.jpg?width=320&amp;crop=smart&amp;auto=webp&amp;s=46ec9d80629bd3f229d0e2c2548f23353c212c57', 'width': 320, 'height': 139}, {'url': 'https://preview.redd.it/qh6v6a82fwt31.jpg?width=640&amp;crop=smart&amp;auto=webp&amp;s=e5e2d12c06204de4b125f0f3c7c291384f701893', 'width': 640, 'height': 279}], 'variants': {}, 'id': 'b6FUj_f4ohixKsKyjE8LSIFrc55B7eyt3eNbsYGj99Q'}], 'enabled': True}",6,False,,,False,835,,{},,,True,False,[],False,FantasyPL,t5_2snvr,r/FantasyPL,177754,public,,https://a.thumbs.redditmedia.com/Mp4Vm2lUmGuV7wAU19tBOyY4YU0nM1-XEbTEaFBWtV4.jpg,61.0,140.0,A very merry Christmas coming up for Everton...,0,835,https://i.redd.it/qh6v6a82fwt31.jpg,[],,False,all_ads,6
4,[],True,,,False,FPLFeeker,,,[],,14.0,dark,text,t2_y2vms,False,[],,,False,False,,False,,False,1571710000.0,1571681000.0,,,i.redd.it,0,False,0,{},False,False,dl4b2p,False,False,False,True,True,False,False,,,,[],,,dark,text,False,,{},,False,,,,[],t3_dl4b2p,False,65,0,,False,all_ads,/r/FantasyPL/comments/dl4b2p/arsenal_xi_auba_saka_start_and_lacazette_on_bench/,False,image,"{'images': [{'source': {'url': 'https://preview.redd.it/52vhqweioxt31.png?auto=webp&amp;s=5d16d7df491e549bde69e3d7bbae8ec7e9b86a5d', 'width': 680, 'height': 680}, 'resolutions': [{'url': 'https://preview.redd.it/52vhqweioxt31.png?width=108&amp;crop=smart&amp;auto=webp&amp;s=c8bc94aa680ab7a57082eab84282fdaf4aced279', 'width': 108, 'height': 108}, {'url': 'https://preview.redd.it/52vhqweioxt31.png?width=216&amp;crop=smart&amp;auto=webp&amp;s=d36bda11562be5d46fd52a51ae85702def36e3f6', 'width': 216, 'height': 216}, {'url': 'https://preview.redd.it/52vhqweioxt31.png?width=320&amp;crop=smart&amp;auto=webp&amp;s=2437b5e487074f67ede5c945f573dee1e6f20301', 'width': 320, 'height': 320}, {'url': 'https://preview.redd.it/52vhqweioxt31.png?width=640&amp;crop=smart&amp;auto=webp&amp;s=23fe6040af068b10f4b65869f95f851693ce1cbc', 'width': 640, 'height': 640}], 'variants': {}, 'id': 'fay_-t5pd9_SEPq_CyM948p68c7GjPp_vjed-gjK15s'}], 'enabled': True}",6,False,,,False,69,,{},,,True,False,[],False,FantasyPL,t5_2snvr,r/FantasyPL,177754,public,,https://b.thumbs.redditmedia.com/SxkiJHVg2jcLIIKPRxc901JBu5TBAovjv5pSlefMahM.jpg,140.0,140.0,"Arsenal XI - Auba, Saka start and Lacazette on bench",0,69,https://i.redd.it/52vhqweioxt31.png,[],,False,all_ads,6
5,[],False,,,False,3amz,,,[],,3.0,dark,text,t2_81r7i,False,[],,,False,False,,False,,False,1571726000.0,1571697000.0,,,i.redd.it,0,False,0,{},False,False,dl83np,False,False,False,True,True,False,False,,,,[],,,dark,text,False,,{},,False,,,,[],t3_dl83np,False,22,0,,False,all_ads,/r/FantasyPL/comments/dl83np/all_fpl_bonus_points_have_been_added_so_heres_the/,False,image,"{'images': [{'source': {'url': 'https://preview.redd.it/y1sx8mgd0zt31.jpg?auto=webp&amp;s=c33f743913c4ce77ac9eed157e4a32fd03a8ed1c', 'width': 1073, 'height': 1073}, 'resolutions': [{'url': 'https://preview.redd.it/y1sx8mgd0zt31.jpg?width=108&amp;crop=smart&amp;auto=webp&amp;s=f1b30c65a60330df8b228ca033320089850de5b7', 'width': 108, 'height': 108}, {'url': 'https://preview.redd.it/y1sx8mgd0zt31.jpg?width=216&amp;crop=smart&amp;auto=webp&amp;s=2c5b58d390f2233cf49038567233217797c9fb9e', 'width': 216, 'height': 216}, {'url': 'https://preview.redd.it/y1sx8mgd0zt31.jpg?width=320&amp;crop=smart&amp;auto=webp&amp;s=2d014aa3018379dd3667a443dc5aa85e2924fe2e', 'width': 320, 'height': 320}, {'url': 'https://preview.redd.it/y1sx8mgd0zt31.jpg?width=640&amp;crop=smart&amp;auto=webp&amp;s=249b61445477d508306c1dc4a78292d471ef86a0', 'width': 640, 'height': 640}, {'url': 'https://preview.redd.it/y1sx8mgd0zt31.jpg?width=960&amp;crop=smart&amp;auto=webp&amp;s=0b2a0ec1a874589932c652f4d0be8021f6c25150', 'width': 960, 'height': 960}], 'variants': {}, 'id': 'ew--HC5O6E8g5MqP94Eps6djrpTA1WVo_ZepcJU6gj8'}], 'enabled': True}",6,False,,,False,28,,{},,,True,False,[],False,FantasyPL,t5_2snvr,r/FantasyPL,177754,public,,https://b.thumbs.redditmedia.com/mLW28JefLjURhm3rVkZTXJ3gio_lx1jJ88Milauv9ec.jpg,140.0,140.0,"All FPL bonus points have been added, so here’s the final Dream Team for GW9.",0,28,https://i.redd.it/y1sx8mgd0zt31.jpg,[],,False,all_ads,6
6,[],True,,,False,Sad_Weed,,,[],,22.0,dark,text,t2_14asau,False,[],,,False,False,,False,,False,1571730000.0,1571702000.0,,,i.redd.it,0,False,0,{},False,True,dl9727,False,False,False,True,True,False,False,,,,[],,,dark,text,False,,{},,False,,,,[],t3_dl9727,False,3,0,,False,all_ads,/r/FantasyPL/comments/dl9727/also_consider_lundstram_as_a_potential_starter/,False,image,"{'images': [{'source': {'url': 'https://preview.redd.it/v3pv564jezt31.jpg?auto=webp&amp;s=281e748c9eeb3e388592be34ee98da5b789f6426', 'width': 750, 'height': 1012}, 'resolutions': [{'url': 'https://preview.redd.it/v3pv564jezt31.jpg?width=108&amp;crop=smart&amp;auto=webp&amp;s=472ba5d52b6f94556eab7323a7f0c8c444133d5c', 'width': 108, 'height': 145}, {'url': 'https://preview.redd.it/v3pv564jezt31.jpg?width=216&amp;crop=smart&amp;auto=webp&amp;s=387cb3a9e507b2944cae2006873b23ebdb9eb1e6', 'width': 216, 'height': 291}, {'url': 'https://preview.redd.it/v3pv564jezt31.jpg?width=320&amp;crop=smart&amp;auto=webp&amp;s=b833a45a9fc73ac780feff84e1ef12ca2944e602', 'width': 320, 'height': 431}, {'url': 'https://preview.redd.it/v3pv564jezt31.jpg?width=640&amp;crop=smart&amp;auto=webp&amp;s=27d19530dcbec5bf59e8eedb17cfb1e6f5814321', 'width': 640, 'height': 863}], 'variants': {}, 'id': 'iLloaRx8ELPL7fqrv1v3TR4zD2Cz3o2F-Ob0Pvgf2EM'}], 'enabled': True}",6,False,,,False,20,,{},,,True,False,[],False,FantasyPL,t5_2snvr,r/FantasyPL,177754,public,,https://b.thumbs.redditmedia.com/6c_1A-soEXtRWmiVWWtR1qi4qpBBzZOqGvgc6rG0KEA.jpg,140.0,140.0,Also consider Lundstram as a potential starter,0,20,https://i.redd.it/v3pv564jezt31.jpg,[],,False,all_ads,6
7,[],False,,,False,strawberrygenius7,,,[],,48.0,dark,text,t2_37lcnpex,False,[],,,False,False,,False,,False,1571724000.0,1571695000.0,,,twitter.com,0,False,0,{},False,False,dl7lx7,False,False,False,False,True,False,False,,,new,[],,News,dark,text,False,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/BenDinnery/status/1186397405467611136', 'author_name': 'Ben Dinnery', 'height': 433, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Maddison is still struggling with an ankle problem. &amp;quot;He did really well to put himself out there [vs Burnley]. He’s hardly trained,&amp;quot; said BR. &amp;quot;He could easily have not played, but he wanted to try it. We felt that if he could give us an hour, he’d have done really well.&amp;quot; &lt;a href=""https://twitter.com/hashtag/LCFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#LCFC&lt;/a&gt; &lt;a href=""https://t.co/dpxhxlDuda""&gt;pic.twitter.com/dpxhxlDuda&lt;/a&gt;&lt;/p&gt;&amp;mdash; Ben Dinnery (@BenDinnery) &lt;a href=""https://twitter.com/BenDinnery/status/1186397405467611136?ref_src=twsrc%5Etfw""&gt;October 21, 2019&lt;/a&gt;&lt;/blockquote&gt; &lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt; ', 'author_url': 'https://twitter.com/BenDinnery', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}","{'content': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Maddison is still struggling with an ankle problem. &amp;quot;He did really well to put himself out there [vs Burnley]. He’s hardly trained,&amp;quot; said BR. &amp;quot;He could easily have not played, but he wanted to try it. We felt that if he could give us an hour, he’d have done really well.&amp;quot; &lt;a href=""https://twitter.com/hashtag/LCFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#LCFC&lt;/a&gt; &lt;a href=""https://t.co/dpxhxlDuda""&gt;pic.twitter.com/dpxhxlDuda&lt;/a&gt;&lt;/p&gt;&amp;mdash; Ben Dinnery (@BenDinnery) &lt;a href=""https://twitter.com/BenDinnery/status/1186397405467611136?ref_src=twsrc%5Etfw""&gt;October 21, 2019&lt;/a&gt;&lt;/blockquote&gt; &lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt; ', 'width': 350, 'scrolling': False, 'height': 433}",,False,,,,[],t3_dl7lx7,False,5,0,,False,all_ads,/r/FantasyPL/comments/dl7lx7/maddison_is_still_struggling_with_an_ankle/,False,link,"{'images': [{'source': {'url': 'https://external-preview.redd.it/tNN9Hw2FG4WjMWEqedU_nTYerI4Dl1y1dFA7esy_iYk.jpg?auto=webp&amp;s=0d47ca746baa137130e9597cacc1b3de1435c090', 'width': 140, 'height': 89}, 'resolutions': [{'url': 'https://external-preview.redd.it/tNN9Hw2FG4WjMWEqedU_nTYerI4Dl1y1dFA7esy_iYk.jpg?width=108&amp;crop=smart&amp;auto=webp&amp;s=127168861570a844e3d08410d147ca5c1a0d8a28', 'width': 108, 'height': 68}], 'variants': {}, 'id': 'tE-1XxmzjJVL7K98kZOcoihH4uQgRaNqPsbf0nuHB3A'}], 'enabled': False}",6,False,,,False,25,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/BenDinnery/status/1186397405467611136', 'author_name': 'Ben Dinnery', 'height': 433, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Maddison is still struggling with an ankle problem. &amp;quot;He did really well to put himself out there [vs Burnley]. He’s hardly trained,&amp;quot; said BR. &amp;quot;He could easily have not played, but he wanted to try it. We felt that if he could give us an hour, he’d have done really well.&amp;quot; &lt;a href=""https://twitter.com/hashtag/LCFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#LCFC&lt;/a&gt; &lt;a href=""https://t.co/dpxhxlDuda""&gt;pic.twitter.com/dpxhxlDuda&lt;/a&gt;&lt;/p&gt;&amp;mdash; Ben Dinnery (@BenDinnery) &lt;a href=""https://twitter.com/BenDinnery/status/1186397405467611136?ref_src=twsrc%5Etfw""&gt;October 21, 2019&lt;/a&gt;&lt;/blockquote&gt; &lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt; ', 'author_url': 'https://twitter.com/BenDinnery', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}","{'content': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Maddison is still struggling with an ankle problem. &amp;quot;He did really well to put himself out there [vs Burnley]. He’s hardly trained,&amp;quot; said BR. &amp;quot;He could easily have not played, but he wanted to try it. We felt that if he could give us an hour, he’d have done really well.&amp;quot; &lt;a href=""https://twitter.com/hashtag/LCFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#LCFC&lt;/a&gt; &lt;a href=""https://t.co/dpxhxlDuda""&gt;pic.twitter.com/dpxhxlDuda&lt;/a&gt;&lt;/p&gt;&amp;mdash; Ben Dinnery (@BenDinnery) &lt;a href=""https://twitter.com/BenDinnery/status/1186397405467611136?ref_src=twsrc%5Etfw""&gt;October 21, 2019&lt;/a&gt;&lt;/blockquote&gt; &lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt; ', 'width': 350, 'scrolling': False, 'media_domain_url': 'https://www.redditmedia.com/mediaembed/dl7lx7', 'height': 433}",,,True,False,[],False,FantasyPL,t5_2snvr,r/FantasyPL,177754,public,,https://b.thumbs.redditmedia.com/88YibJdgskmqybkIaBypcK3aNVMx6MM4PAIz3QD3_3M.jpg,89.0,140.0,"Maddison is still struggling with an ankle problem. ""He did really well to put himself out there [vs Burnley]. He’s hardly trained,"" said Brendan Rodgers. ""He could easily have not played, but he wanted to try it. We felt that if he could give us an hour, he’d have done really well."" [Ben Dinnery]",0,25,https://twitter.com/BenDinnery/status/1186397405467611136,[],,False,all_ads,6
8,[],True,,,False,QuickyGaming,,,[],,1.0,dark,text,t2_27qhn0,False,[],,,False,False,,False,,False,1571731000.0,1571702000.0,,,self.FantasyPL,0,1.5717e+09,0,{},False,True,dl996t,False,False,False,False,True,True,False,,,,[],,,dark,text,False,,{},,False,,,,[],t3_dl996t,False,5,0,,False,all_ads,/r/FantasyPL/comments/dl996t/lundstram_would_have_scored_points_34_if_he_was/,False,,,6,False,,,False,16,,{},"These calculations will be really long, there's no real purpose in it other than to be a filler so this post doesn't get removed. Also the bonus points might have been different but I'm not sure how to calculate them.\n\nNote: Lundstram is currently on 45 points in his current, miscategorized state.\n\nGW 1: BOU (A) - 2 points for playing 77 minutes, 1 bonus point = 3 points\n\nGW 2: CRY (H) - 2 points for playing 90 minutes, 5 points for 1 goal scored, 1 point for a clean sheet, -1 point for a yellow card, 3 bonus points = 10 points\n\nGW 3: LEI (H) - 2 points for playing 90 minutes, -1 point for a yellow card = 1 point\n\nGW 4: CHE (A) - 2 points for playing 90 minutes = 2 points\n\nGW 5: SOU (H) - 2 points for playing 77 minutes = 2 points\n\nGW 6: EVE (A) - 2 points for playing 90 minutes, 3 points for 1 assist, 1 point for a clean sheet, 3 bonus points = 9 points\n\nGW 7: LIV (H) - 2 points for playing 90 minutes = 2 points\n\nGW 8: WAT (A) - 2 points for playing 90 minutes, 1 point for a clean sheet, -1 point for a yellow card = 2 points\n\nGW 9: ARS (H) - 2 points for playing 90 minutes, 1 point for a clean sheet = 3 points\n\n3 + 10 + 1 + 2 + 2 + 9 + 2 + 2 + 3 = 34 points.\n\n34 points would still make him one of the best midfield bench fodders available.\n\n**TL:DR Lundstram would have had 34 points if he was categorized as a midfielder.**","&lt;!-- SC_OFF --&gt;&lt;div class=""md""&gt;&lt;p&gt;These calculations will be really long, there&amp;#39;s no real purpose in it other than to be a filler so this post doesn&amp;#39;t get removed. Also the bonus points might have been different but I&amp;#39;m not sure how to calculate them.&lt;/p&gt;\n\n&lt;p&gt;Note: Lundstram is currently on 45 points in his current, miscategorized state.&lt;/p&gt;\n\n&lt;p&gt;GW 1: BOU (A) - 2 points for playing 77 minutes, 1 bonus point = 3 points&lt;/p&gt;\n\n&lt;p&gt;GW 2: CRY (H) - 2 points for playing 90 minutes, 5 points for 1 goal scored, 1 point for a clean sheet, -1 point for a yellow card, 3 bonus points = 10 points&lt;/p&gt;\n\n&lt;p&gt;GW 3: LEI (H) - 2 points for playing 90 minutes, -1 point for a yellow card = 1 point&lt;/p&gt;\n\n&lt;p&gt;GW 4: CHE (A) - 2 points for playing 90 minutes = 2 points&lt;/p&gt;\n\n&lt;p&gt;GW 5: SOU (H) - 2 points for playing 77 minutes = 2 points&lt;/p&gt;\n\n&lt;p&gt;GW 6: EVE (A) - 2 points for playing 90 minutes, 3 points for 1 assist, 1 point for a clean sheet, 3 bonus points = 9 points&lt;/p&gt;\n\n&lt;p&gt;GW 7: LIV (H) - 2 points for playing 90 minutes = 2 points&lt;/p&gt;\n\n&lt;p&gt;GW 8: WAT (A) - 2 points for playing 90 minutes, 1 point for a clean sheet, -1 point for a yellow card = 2 points&lt;/p&gt;\n\n&lt;p&gt;GW 9: ARS (H) - 2 points for playing 90 minutes, 1 point for a clean sheet = 3 points&lt;/p&gt;\n\n&lt;p&gt;3 + 10 + 1 + 2 + 2 + 9 + 2 + 2 + 3 = 34 points.&lt;/p&gt;\n\n&lt;p&gt;34 points would still make him one of the best midfield bench fodders available.&lt;/p&gt;\n\n&lt;p&gt;&lt;strong&gt;TL:DR Lundstram would have had 34 points if he was categorized as a midfielder.&lt;/strong&gt;&lt;/p&gt;\n&lt;/div&gt;&lt;!-- SC_ON --&gt;",True,False,[],False,FantasyPL,t5_2snvr,r/FantasyPL,177754,public,,self,,,Lundstram would have scored points 34 if he was correctly categorized as a midfielder.,0,16,https://www.reddit.com/r/FantasyPL/comments/dl996t/lundstram_would_have_scored_points_34_if_he_was/,[],,False,all_ads,6
9,[],True,,,False,cguinnesstout,,points,[],,26.0,dark,text,t2_1u8ws31r,False,[],,,False,False,,False,,False,1571710000.0,1571681000.0,,,i.redd.it,0,False,0,{},False,False,dl4bne,False,False,False,True,True,False,False,,,,[],,,dark,text,False,,{},,False,,,,[],t3_dl4bne,False,33,0,,False,all_ads,/r/FantasyPL/comments/dl4bne/sheffield_utd_xi/,False,image,"{'images': [{'source': {'url': 'https://preview.redd.it/c9voahepoxt31.jpg?auto=webp&amp;s=309bd1b1a365a23acaeb02ce1231ca5e28663ff5', 'width': 800, 'height': 800}, 'resolutions': [{'url': 'https://preview.redd.it/c9voahepoxt31.jpg?width=108&amp;crop=smart&amp;auto=webp&amp;s=4fa11ea1281d075494be4d52b0658f6a0f84634a', 'width': 108, 'height': 108}, {'url': 'https://preview.redd.it/c9voahepoxt31.jpg?width=216&amp;crop=smart&amp;auto=webp&amp;s=32cd2b0d471e7664988cb09145d33a739fdbded0', 'width': 216, 'height': 216}, {'url': 'https://preview.redd.it/c9voahepoxt31.jpg?width=320&amp;crop=smart&amp;auto=webp&amp;s=5770e6730289b0b691b276f2d1b78f67302704b6', 'width': 320, 'height': 320}, {'url': 'https://preview.redd.it/c9voahepoxt31.jpg?width=640&amp;crop=smart&amp;auto=webp&amp;s=9051d3ae2afd1f2ea76891523d72239dfd140e3a', 'width': 640, 'height': 640}], 'variants': {}, 'id': 'snTL6s7gIuJeuWSyFw8Cz5XQzMfvQjE5apP9cIrEE_E'}], 'enabled': True}",6,False,,,False,42,,{},,,True,False,[],False,FantasyPL,t5_2snvr,r/FantasyPL,177754,public,,https://b.thumbs.redditmedia.com/wRDFiaFkt5a2-nVp-8XEb1wNYNvHbDc8iHFkS5reWUY.jpg,140.0,140.0,Sheffield Utd XI,0,42,https://i.redd.it/c9voahepoxt31.jpg,[],,False,all_ads,6


In [4]:
#Check if the columns tally - are there any features that are in the EPL subreddit but not in the FPL subreddit?
print('FPL columns: {} \nEPL columns: {}'.format(len(FPL_sample.columns), len(EPL_sample.columns)))

FPL columns: 103 
EPL columns: 103


In [5]:
#Find difference in features; are the differences important?
[n for n in EPL_sample.columns if n not in FPL_sample.columns]

[]

<b>Issues Observed</b>

There are a few issues with the data as it currently is:

<li>Redundant columns - there are over 100! Only the following fields will be kept: 'subreddit_name_prefixed', 'author', 'title', 'selftext', 'domain', 'link_flair_text', 'created', 'media', 'media_embed', 'url', 'permalink', 'num_comments', 'score', 'ups'. These are the fields with data relevant to our analysis, and will be further elaborated on in the data dictionary.</li>
<br>

<li> Missing comments; will have to create a separate process to extract comments from each post </li>
<br>

<li> Some posts don't have text (null values) as they're links to websites or image posts. There are two ways to resolves this:
    for links, there are excerpts we can use in place of selftext, and for images, we can use OCR.</li>
 <br>   
    
<li> Column numbers don't tally for the two subreddits, but the two/three additional columns are not relevant to our analysis so we will drop this (crosspost_parent, crosspost_parent_list, media_metadata). The additional columns (e.g. media_metadata) depend on the content posted in linkposts.

### 5.2 Data Gathering Proper  <a id='DataProper'></a> 
<div align="right"><a href='#Table of Contents'>Back to Table of Contents</a></div>

<font color="red"><b>LONG RUNTIME WARNING - the code below takes 15 - 30 minutes to run depending on loop settings!
    
Set comment_count to 1 to pull only 1 comment if testing code, as this cuts down on run time.</b></font>

In [None]:
#This code takes quite some time to run - between 15-30 minutes depending on the scope.
#Args: total_loops - number of pages to crawl (if each page has 100, run total_loops = 10)
#      comment_count - number of comments per post to extract
#      filename - suffix to use when saving intermediate files to local drive
    
def crawl_reddit(link, total_loops=1, comment_count=5, filename = 'subreddit'):
    
    #Set of fake names to use under 'user-agent' to access reddit
    fake_name = ['Horse', 'Pony', 'Chihuahua', 'Donkey', 'Rabbit', 'Chicken', 'Duck']
    
    #Fields to keep in the output dataframe
    fields = ['subreddit_name_prefixed', 'author', 'title', 'selftext', 'domain', 'link_flair_text', 'created', 
              'media', 'media_embed', 'url', 'permalink', 'num_comments', 'score', 'ups']
    next_link = ""
    
    for i in range(total_loops):
        #Code for first loop is different, because it initiates a new dataframe and creates new variables.
        if i == 0:
            print('Starting crawl of page {}'.format(i+1))
            url = link
            res = requests.get(url, headers={'User-agent': 
                                             fake_name[np.random.randint(6)]+str(np.random.randint(10))})
            post_dict = res.json()
            next_link = post_dict['data']['after']   #retrieves 'after' key from json needed to get next n posts
            posts = [p['data'] for p in post_dict['data']['children']]
            df = pd.DataFrame(posts)[fields]
            df['comments'] = '' #initialize an empty series to put values into
           
            #Extracts top n comments from each post
            for count, post in enumerate(df['permalink']):
                res = requests.get('https://reddit.com'+post+'.json', headers={'User-agent': 
                                                                               fake_name[np.random.randint(6)]+str(np.random.randint(10))})
                comment_dict = res.json()
                comments = []

                #if number of comments is less than comment_count, return number of comments, otherwise return comment_count

                for comment in range(0, len(comment_dict[1]['data']['children'])-1 if len(comment_dict[1]['data']['children'])-1 < comment_count-1 else comment_count-1):
                    comments += [comment_dict[1]['data']['children'][comment]['data']['body']]
                    df.at[count, 'comments'] = comments
                                  
        else:
            #Breaks loop if there is no 'after' key in the json (i.e. no more posts)
            if next_link is None:
                print('Crawl terminated after {} loops: no more posts!'.format(i+1))
                return df
                break
            
            #Runs if it's not the first page. Creates a new placeholder dataframe to append to the first post dataframe
            else:
                print('Starting crawl of page {}'.format(i+1))
                url = link + '&after=' + next_link
                res = requests.get(url, headers={'User-agent': fake_name[np.random.randint(6)]+str(np.random.randint(10))})
                post_dict = res.json()
                next_link = post_dict['data']['after']
                posts = [p['data'] for p in post_dict['data']['children']]
                df_new = pd.DataFrame(posts)[fields]
                
                #Extract top n comments from each post
                for count, post in enumerate(df_new['permalink']):
                    res = requests.get('https://reddit.com'+post+'.json', headers={'User-agent': fake_name[np.random.randint(6)]+str(np.random.randint(10))})
                    comment_dict = res.json()
                    comments = []

                    #if number of comments is less than comment_count, return number of comments, otherwise return comment_count

                    for comment in range(0, len(comment_dict[1]['data']['children'])-1 if len(comment_dict[1]['data']['children'])-1 < comment_count-1 else comment_count-1):
                        comments += [comment_dict[1]['data']['children'][comment]['data']['body']]
                        df_new.at[count, 'comments'] = comments          

                df = df.append(df_new, ignore_index = True)
        
        #Saves a temporary file in the raw_data folder in case the function terminates or your computer explodes
        df.to_csv(r".\raw_data\reddit_"+ filename + str(i+1)+".csv")
        print('file saved!')
        
    return df

In [7]:
df_fpl = crawl_reddit('https://www.reddit.com/r/FantasyPL/new.json?limit=100', total_loops = 10, 
                      comment_count = 10, filename = 'FPL')

Starting crawl of page 1
file saved!
Starting crawl of page 2
file saved!
Starting crawl of page 3
file saved!
Starting crawl of page 4
file saved!
Starting crawl of page 5
file saved!
Starting crawl of page 6
file saved!
Starting crawl of page 7
file saved!
Starting crawl of page 8
file saved!
Starting crawl of page 9
file saved!
Starting crawl of page 10
file saved!


In [8]:
df_epl = crawl_reddit('https://www.reddit.com/r/PremierLeague/new.json?limit=100', total_loops = 10, 
                      comment_count = 10, filename = 'EPL')

Starting crawl of page 1
file saved!
Starting crawl of page 2
file saved!
Starting crawl of page 3
file saved!
Starting crawl of page 4
file saved!
Starting crawl of page 5
file saved!
Starting crawl of page 6
file saved!
Starting crawl of page 7
file saved!
Starting crawl of page 8
file saved!
Starting crawl of page 9
file saved!
Starting crawl of page 10
file saved!


In [9]:
#Save files to local drive for cleaning/EDA:
df_fpl.to_csv(r".\datasets\reddit_FPL_1_v2.csv")
df_epl.to_csv(r".\datasets\reddit_EPL_1_v2.csv")

# 6. Data Dictionary  <a id='Data Dictionary'></a> 
<div align="right"><a href='#Table of Contents'>Back to Table of Contents</a></div>

There are over 100 columns in the data dump the Reddit API provides.
The dictionary below covers the relevant fields extracted from the API, and the engineered fields which will be covered in later sections.

|Feature|Type|Description|Analysis|
|---|---|---|---|
|subreddit_name_prefixed|string/object|Name of the subreddit| Used as target for classification|
|author|string/object|User name of who made the main post| |
|title|string/object|Title of the post| To be included in bag of words|
|selftext|string/object|A self post is a text post, instead of a link post. A link post directs to an external link. A self post is nothing but the text you enter.| To be included in bag of words|
|domain|string/object|For link posts, domain captures the website the link belongs to. For self posts, the domain captures the subreddit the post was made in. |Used to differentiate between link post and self post. Self posts have a domain beginning with 'self'|
|link_flair_text|string/object|What category the post falls under - differs from subreddit to subreddit|Can run a barplot to get a sense of categories in the subreddit|
|created|float|When the post was made|Will be mapped to return date only; remove time data|
|media|string/object|For link posts, the type of link attached (e.g. Twitter, Facebook, news site)|Can extract tweet content from this field using a regex|
|media_embed|string/object|Description of link post| |
|url|string/object|External link, for link posts|Run through OCR if image|
|permalink|string/object|Permanent link to post|Used to loop through post for comments in crawler|
|num_comments|int64|Number of comments made on the post||
|score|int64|Number of upvotes - number of downvotes||
|ups|int64|Number of upvotes| |
|comments|string/object|List of up to 10 comments extracted from the comments made in the post| Engineered field - To be included in bag of words|

# 7. Evaluating Shape and Missingness  <a id='Shape Missingness'></a> 
<div align="right"><a href='#Table of Contents'>Back to Table of Contents</a></div>

In [60]:
df_fpl = pd.read_csv(r".\datasets\reddit_FPL_1.csv")

In [61]:
df_epl = pd.read_csv(r".\datasets\reddit_EPL_1.csv")

###   7.1 Quick overview of FPL dataframe  <a id='Overview FPL'></a> 
<div align="right"><a href='#Table of Contents'>Back to Table of Contents</a></div>

In [62]:
df_fpl.head(3)

Unnamed: 0.1,Unnamed: 0,subreddit_name_prefixed,author,title,selftext,domain,link_flair_text,created,media,media_embed,url,permalink,num_comments,score,ups,comments
0,0,r/FantasyPL,DoTheRax,Average of GW 9 is higher than the average of GW 8.,"Can someone please explain to me how this gw's average is higher than the previous gw? Despite highly owned players like Mane and Abraham exploding. Also despite returns from Burnley defenders, Mount, Lundstram, a cheeky TAA bonus in the previous gameweek, we have a higher gw average this week with literally no one who is significantly owned returning/exploding except a 5, 6, and 8 from Sterling, Lundstram and Robbo. I really don't understand the maths behind this.",self.FantasyPL,,1571732000.0,,{},https://www.reddit.com/r/FantasyPL/comments/dl9liy/average_of_gw_9_is_higher_than_the_average_of_gw_8/,/r/FantasyPL/comments/dl9liy/average_of_gw_9_is_higher_than_the_average_of_gw_8/,10,2,2,"['The difference is only one point though, both weekends weren’t great.', 'Aguero and salah owners got screwed over. Especially the ones who C and VC them', ""You can see the players that contributed to the average the most using this https://www.livefpl.net/Overall. It looks like a case of consistent 2s and 3s with some prominent clean sheets: Everton ( Digne, Mina) City( Ederson) Chelsea ( Azpi, Tomori) , Bourenmouth ( Ake, Rico) and Shiefield ( Lundy). Add to that Robertson's return and it adds up quickly."", 'Aguero and Salah owners missing out and the subsequent bench fodder being subbed in. Last week everyone expected to play, played. This week, those 2 were expected and the bench subs made up for it in a small way.', ""I got 53 :-)\n\nIf only I'd captained CHO ahead of Tammy, what was I thinking?""]"
1,1,r/FantasyPL,QuickyGaming,Lundstram would have scored points 34 if he was correctly categorized as a midfielder.,"These calculations will be really long, there's no real purpose in it other than to be a filler so this post doesn't get removed. Also the bonus points might have been different but I'm not sure how to calculate them.\n\nNote: Lundstram is currently on 45 points in his current, miscategorized state.\n\nGW 1: BOU (A) - 2 points for playing 77 minutes, 1 bonus point = 3 points\n\nGW 2: CRY (H) - 2 points for playing 90 minutes, 5 points for 1 goal scored, 1 point for a clean sheet, -1 point for a yellow card, 3 bonus points = 10 points\n\nGW 3: LEI (H) - 2 points for playing 90 minutes, -1 point for a yellow card = 1 point\n\nGW 4: CHE (A) - 2 points for playing 90 minutes = 2 points\n\nGW 5: SOU (H) - 2 points for playing 77 minutes = 2 points\n\nGW 6: EVE (A) - 2 points for playing 90 minutes, 3 points for 1 assist, 1 point for a clean sheet, 3 bonus points = 9 points\n\nGW 7: LIV (H) - 2 points for playing 90 minutes = 2 points\n\nGW 8: WAT (A) - 2 points for playing 90 minutes, 1 point for a clean sheet, -1 point for a yellow card = 2 points\n\nGW 9: ARS (H) - 2 points for playing 90 minutes, 1 point for a clean sheet = 3 points\n\n3 + 10 + 1 + 2 + 2 + 9 + 2 + 2 + 3 = 34 points.\n\n34 points would still make him one of the best midfield bench fodders available.\n\n**TL:DR Lundstram would have had 34 points if he was categorized as a midfielder.**",self.FantasyPL,,1571731000.0,,{},https://www.reddit.com/r/FantasyPL/comments/dl996t/lundstram_would_have_scored_points_34_if_he_was/,/r/FantasyPL/comments/dl996t/lundstram_would_have_scored_points_34_if_he_was/,5,16,16,"['I\'m an idiot, I realized I just put ""points 34"" instead of ""34 points"".', 'If you need to write fodder anyway maybe put how many points he has as is so we can compare?\n\nEdit: 45, cool - thought the gap would be bigger!']"
2,2,r/FantasyPL,Sad_Weed,Also consider Lundstram as a potential starter,,i.redd.it,,1571730000.0,,{},https://i.redd.it/v3pv564jezt31.jpg,/r/FantasyPL/comments/dl9727/also_consider_lundstram_as_a_potential_starter/,3,19,19,"['You mean attacking midfielder Lundstram?', 'He came off the bench to bail me out when Aguero was benched!']"


In [63]:
df_fpl.shape

(990, 16)

In [64]:
df_fpl.dtypes

Unnamed: 0                   int64
subreddit_name_prefixed     object
author                      object
title                       object
selftext                    object
domain                      object
link_flair_text             object
created                    float64
media                       object
media_embed                 object
url                         object
permalink                   object
num_comments                 int64
score                        int64
ups                          int64
comments                    object
dtype: object

### 7.2 Quick overview of EPL dataframe  <a id='Overview ePL'></a> 
<div align="right"><a href='#Table of Contents'>Back to Table of Contents</a></div>

In [65]:
df_epl.head(3)

Unnamed: 0.1,Unnamed: 0,subreddit_name_prefixed,author,title,selftext,domain,link_flair_text,created,media,media_embed,url,permalink,num_comments,score,ups,comments
0,0,r/PremierLeague,AngelrentiriaChivas,Lil peep wearing premier league clothing!,,i.redd.it,Discussion,1571734000.0,,{},https://i.redd.it/c9wtp9h7pzt31.jpg,/r/PremierLeague/comments/dl9z8s/lil_peep_wearing_premier_league_clothing/,2,0,0,['Weird post']
1,1,r/PremierLeague,misomiso82,NOOB question - why are Leicester city doing so well this season so far - is it the fixture list or are they playing well?,"Have only been following a little yet am just a little surprised to see them so high. Just wandering if it's been a good fixture list or if they have a new player, or are doing something tactically etc.\n\n&amp;#x200B;\n\nTy for any info",self.PremierLeague,Question,1571723000.0,,{},https://www.reddit.com/r/PremierLeague/comments/dl7kbt/noob_question_why_are_leicester_city_doing_so/,/r/PremierLeague/comments/dl7kbt/noob_question_why_are_leicester_city_doing_so/,8,1,1,"['Just playing good football', 'cuz 3 of so called Top 6 are shit atm. another one may fall under this category as the season continues.', 'Their fixture list has been quite tough actually. They’ve played four of the big six plus Wolves and Sheffield so far. I wouldn’t say this is too big of a shock, they’ve been playing incredibly well and have a squad that looks t be good enough to sustain this form, especially with Manchester United, Tottenham, and Arsenal struggling so far this season.', ""They have a very good team! Vardy, Maddison, Tielemans are a good trio and they are solid defensively. They've already played 4 of the 'big 6' in Liverpool, Chelsea, Man Utd and Tottenham in only 9 games, and Wolves are no pushovers too. If anything, they've had a fairly tough run"", 'Every club in the Premier League has 11 good players.... The team that wins the League has 22 good players so they can deal with the inevitable injuries. Time will tell.', 'They are good.', 'They have a young hungry and talented squad with some old quality keeping things ticking, and Brendan is a good manager']"
2,2,r/PremierLeague,Wuz314159,"What if, and hear me out.... Arsène wasn't the problem at Arsenal?",,self.PremierLeague,Discussion,1571720000.0,,{},https://www.reddit.com/r/PremierLeague/comments/dl6vjb/what_if_and_hear_me_out_arsène_wasnt_the_problem/,/r/PremierLeague/comments/dl6vjb/what_if_and_hear_me_out_arsène_wasnt_the_problem/,5,39,39,"['I honestly never thought he was.', 'I feel like it\'s a mix. Look at early career Arsene and his defence was prime. Honestly some of the best. Tony Adams still most Epl clean sheets to this date, Sol was a monster, Viera was a boss on midfield. In some way everything people laugh at Arsenal for today they were not in the past. Now after the Emirates Stadium he worked with academy and creating quality teams without big money when it became much more important. 08/09 being the Pinnacle but most players left in following seasons. I feel like he kept a lot of loyalty in the academy players because they all bled the red. Wilshire, Walcott, Ramsey, Diaby come to mind. It his later seasons he signed bigger names but at that point we had fallen behind the curve. Now Arsenal is basically playing catch-up but without a distinguished identity like under Wegner. Arsenal often lost ""trying to pass it into the goal"". I don\'t think u wld ever associate that with Arsenal now. Just my 2 pence. Love to hear other views', ""Also I understand the Brighton flair. My mum's team there."", ""My opinion, he somewhat got Arsenal to win titles and trophies and etc. He didn't seem that bad of a manager, nor never was the problem at arsenal.""]"


In [66]:
df_epl.shape

(997, 16)

In [67]:
df_epl.dtypes

Unnamed: 0                   int64
subreddit_name_prefixed     object
author                      object
title                       object
selftext                    object
domain                      object
link_flair_text             object
created                    float64
media                       object
media_embed                 object
url                         object
permalink                   object
num_comments                 int64
score                        int64
ups                          int64
comments                    object
dtype: object

#### 7.2.1 Filling empty cells with NaN 

> Some of the fields are empty, but not filled with NaN. I will replace any cells which = ' ' with np.nan to properly evaluate missingness. Importantly, the empty selftext fields do not show up as NaN, so running isnull() would give the wrong impression that all of EPL's posts are self text posts!

In [68]:
df_epl = df_epl.applymap(lambda x: np.nan if x == '' else x)
df_fpl = df_fpl.applymap(lambda x: np.nan if x == '' else x)

### 7.3 Measuring 'missingness'  <a id='Measuring Missingness'></a> 
<div align="right"><a href='#Table of Contents'>Back to Table of Contents</a></div>

In [69]:
pd.DataFrame(df_fpl.isnull().sum(), columns = ['FPL']).transpose()

Unnamed: 0.1,Unnamed: 0,subreddit_name_prefixed,author,title,selftext,domain,link_flair_text,created,media,media_embed,url,permalink,num_comments,score,ups,comments
FPL,0,0,0,0,537,0,420,0,801,0,0,0,0,0,0,75


In [70]:
print(f'There are {df_fpl.isnull().sum().sum()} missing values!')

There are 1833 missing values!


In [71]:
pd.DataFrame(df_epl.isnull().sum(), columns = ['EPL']).transpose()

Unnamed: 0.1,Unnamed: 0,subreddit_name_prefixed,author,title,selftext,domain,link_flair_text,created,media,media_embed,url,permalink,num_comments,score,ups,comments
EPL,0,0,0,0,489,0,285,0,958,0,0,0,0,0,0,295


In [72]:
print(f'There are {df_epl.isnull().sum().sum()} missing values!')

There are 2027 missing values!


### 7.4 Analysis of Shape and  Missingness  <a id='Analysis Shape Missingness'></a> 
<div align="right"><a href='#Table of Contents'>Back to Table of Contents</a></div>

<b>Overall Shape</b>

The number of posts from each subreddit extracted are close - 992 for FPL, and 996 for EPL.
This is a sufficient sample size for my analysis.

<b>Missingness</b>

For the majority of the columns, the missing values do not matter and can be filled with 0.
This applies to link_flair_text (implies the post doesn't have a flair), media (implies there's no media attachment), and comments (implies there are no comments for the post).

<b>Self posts vs Link posts</b>

The concern here is the selftext field - almost half the posts aren't self posts! These are link posts - which are basically a link to a website or an image. We could pull only self text posts for our analysis, but that could skew or bias our data.

Instead, we will fill these missing values with excerpts from the link, either through OCR or the link description in the following section.

# 8. Data Cleaning and Preprocessing  <a id='Cleaning and Preprocessing'></a> 
<div align="right"><a href='#Table of Contents'>Back to Table of Contents</a></div>

### 8.1 Data Cleaning and Encoding Functions  <a id='Cleaning Functions'></a>

In [73]:
#Removes duplicates and prints out number of duplicates removed
def remove_duplicates(my_df):
    before_d = len(my_df)
    my_df.drop_duplicates(inplace=True)
    after_d = len(my_df)
    print(str(before_d-after_d) + " duplicates were removed!")

In [74]:
#Converts image to text string
def ocr_core(filename):
    #Try/except handles 404 and other similar errors
    try:
        response = requests.get(filename)
        img = Image.open(BytesIO(response.content))
        text = pytesseract.image_to_string(img) 
        return text
    except:
        return np.nan

In [106]:
#Dictionary below captures html tags and formatting that I want to remove
#Creates a pattern to apply in regex
rep = {'"ltr"': "", '&gt': "", '&lt' : "", '&amp;#39' : "", 'br&gt' : " ", "p&gt" : "", ";quot" : "",
      "&amp" : "", "amp;": "", "mdash" : "",  ';' : ""}
rep = dict((re.escape(k), v) for k, v in rep.items()) 
pattern = re.compile("|".join(rep.keys())) #Creates pattern based on rep dictionary

#Extracts tweet content from media column in dataframe (which is in a bastardized html format)
def extract_tweet(tweet):
    #Try/except in case of formatting problems in the raw data.
    try:
        twit_string = pattern.sub(lambda x: rep[re.escape(x.group(0))], tweet) #gets rid of html tags
        twit_string = re.search('dir=(.*)blockquote', twit_string) #finds main content of tweet based on html class
        twit_string = re.sub(r'(?s)(a href)(.*?)(\/a)', " ", twit_string.group(1)) #gets rid of embedded links
        return(twit_string)
    
    except:
        return np.nan

### 8.2 Filling Missing Values <a id='Filling Missing'></a> 
<div align="right"><a href='#Table of Contents'>Back to Table of Contents</a></div>

In [75]:
remove_duplicates(df_fpl)

0 duplicates were removed!


In [76]:
remove_duplicates(df_epl)

0 duplicates were removed!


Changing datetime data in 'created' column to date only:

In [77]:
df_fpl['created'] = df_fpl['created'].apply(lambda x: datetime.datetime.utcfromtimestamp(x).date())

In [78]:
df_epl['created'] = df_epl['created'].apply(lambda x: datetime.datetime.utcfromtimestamp(x).date())

#### 8.2.1 Extracting image content using OCR for link posts <a id='Extract OCR'></a> 

<font color="red"><b>LONG RUNTIME WARNING - the code below takes up to 15 minutes to run depending on the number of images!
Pytesseract requires additional installations outside of a pip install. Guide to installation <a href='https://stackoverflow.com/questions/50951955/pytesseract-tesseractnotfound-error-tesseract-is-not-installed-or-its-not-i'>HERE</a></b></font>

In [79]:
#Convert all picture (png and jpg) attachments to text; return results to new column in dataframe
df_epl['ocr'] = [ocr_core(picture) if ('.png' in picture or '.jpg' in picture) else np.nan for picture in df_epl['url']]
df_fpl['ocr'] = [ocr_core(picture) if ('.png' in picture or '.jpg' in picture) else np.nan for picture in df_fpl['url']]

In [80]:
#Checking OCR output
df_epl[df_epl['ocr'].notnull()][['url', 'ocr']]

Unnamed: 0,url,ocr
0,https://i.redd.it/c9wtp9h7pzt31.jpg,
20,https://i.redd.it/uzrw1m09hjt31.jpg,"4 i i ‘ aa a AT\ni f ; Se\nZ | ty\nj =,\nveh %\nra ee\n° e ;\na - F\n="
21,https://i.redd.it/uqqe58fh0jt31.png,DECISION\nx NO GOAL y\n\nroared\n\n \n\nVQAV\n\n338 likes\n\nunofficial_saints Player of the Season so far...\n#SaintsFC #wemarchon
22,https://i.redd.it/m9tec2vfbit31.jpg,
27,https://i.redd.it/yzw4ttg0xbt31.png,Byejaynie | |\n\nun\n\n12\n\n14\n\nver\n\n@ worcesterciy\nG acer\n\n@ taceste cy\n@cresco\n\nMY conta reace\n@ aumiey\n\nG weston\n\n Toteshom Hora\nOG seurenout\n® Wotverhamston\nQ vores vie\n@ srerearrs\n\n© evoron a sbin\n\n Astonvita\n\nBp vevcosteu\n\n$ souhonen\n\n24\n\n16\n\n15\n\ncy\n\ncy\n\ncy\n\n2\n\n2\n\na\n\na\n\n10\n\nOpponent\nQ wonchester urd 4)\n\n& crystal Pooce 1)\n@ sreticta ura a)\nB aumiey es)\n\nGB Newcastle Utd (+)\n@ manchester city 4)\n@ tecester city 4)\nB ser00 (4)\n\n@H woriera es)\n\nGB Norwich city\n| southampton (+)\nEB wvernoal\n\nG arsenal ry\n\nEE aston vita ya)\n\n© sriahton &tbion (Hy\n\n@ ccretse0 (4)\n\n® Wolverhampton (4)\n\n \n\nBEB - B/S\n\nGacm\nGB veri ciy\nB woes\n\nGB wrest Har\n6 seumemouth (4)\n\n& Totenhom Hotspur (4)
32,https://i.redd.it/wz0xhhy6l8t31.jpg,Manchester United )\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\nLiverpool\nuf] DOMESTIC TITLE 20\n7 12\n8 LEAGUE CUP be)\n9 CHAMPIONS LEAGUE 3\n)L 3 UEFA CUP 1\nOo CUP WINNERS’ CUP 1\n16 COMMUNITY SHIELD 7 |\n| SUPER CUP Oo\nrt UEFA SUPER CUP 1\nfo) CLUB WORLD CUP 1\n\n \n\nINTERCONTINENTAL CUP
42,https://i.redd.it/qpg89gcg0ss31.jpg,Natt) INJURED As SPAIN DRAW IN\n1/0) 3)
46,https://i.redd.it/oxkxbsgeols31.jpg,PL XI\n\nOne player from each team\n\n§ Sa SEL)\noS\n©} )\n\na
62,https://i.redd.it/ujadxjxsnsr31.jpg,23:57 7 a >\n\n@ oddschecker.com (4\n\noddschecker\n\n \n\n+ Marco Silva\n\n+ Ole Gunnar Solskjaer\n\n5/1\n\n+ Mauricio Pochettino\n\nN\niS\nmy\n\n+ Steve Bruce\n\n+ Ralph Hasenhuttl\n\n+ Quique Sanchez Flores\n\n+ Roy Hodgson\n\n+ Daniel Farke\n\n+ Unai Emery\n\nHARE RA ees\nx\n=\n\nBet £10 Get £30\nT&Cs apply\n\n \n\n€A s6 t& ® @6\n\nHome Tips Free Bets Casino App\n\n< Q @
77,https://i.redd.it/417kh3fjqar31.jpg,ipicas\nPOINTS\nPere\nEy MAURICIO POCHETTINO\n7) PEP GUARDIOLA\nitt) ARSENE WENGER\n\naN aca\n(SINCE 96/10/2015)


In [81]:
#Number of entries created:
df_epl['ocr'].notnull().sum()

28

In [82]:
#Checking OCR output
df_fpl[df_fpl['ocr'].notnull()][['url', 'ocr']]

Unnamed: 0,url,ocr
2,https://i.redd.it/v3pv564jezt31.jpg,\ Sheffield United @\n@SheffieldUnited\n\n \n\nOur defence ()-)\n7 conceded (joint least in the #PL) &\n4 clean sheets (joint most) ©\n\n5 conceded in open play (least\namount in the @premierleague ) @\n\nHSUFC @
5,https://i.redd.it/y1sx8mgd0zt31.jpg,Pickford\n\n8\nAlonso O'Connell UTE Cas Steve Co...\n14 aed 11 9\ncoc ae [eYol dats David Sil... Bernard\n13 11 _—_———] 11 10\nIngs
8,https://i.redd.it/jksvmiwukyt31.jpg,"Statistics\n\n \n\n \n\n \n\n \n\n \n\n \n\nView Sorted by\nDefenders Y | | Total points v\nPlayer cost Sel Form Pts,\n. Lundstram\neG t SHU DEF 46 32.4% 43 45\n- ‘LEI DEF 7 a 7\n5 Alexander-Amold .\nx t UV DEF 72 30.6% 45 42\nRobertson 3\ni f uv DEF 69 19.4% 60 aa"
10,https://i.redd.it/c9voahepoxt31.jpg,Sa NU ce\n\nPOEMS Rare M1\nhau &\n\nmae\nMCGOLDRICK\n\n \n \n\nENDERSON\n\n \n\nSeem ima 4 Omelet fel ih rel BRE cILAM ¢- 1 Cod\n\nPs See if a _ era A
11,https://i.redd.it/52vhqweioxt31.png,"|\n\nFl\n\n|\n\n \n \n\noe\nCl = anand So\n‘CEBRLLOS, RI igh\nul L careTTE ‘a\nA rE )\n@aRS"
14,https://i.redd.it/qh6v6a82fwt31.jpg,Sun 1 Dec 16:30 GMT\n\nWed 4 Dec 20:15 GMT\n\nSat 7 Dec 12:30 GMT\n\nSun 15 Dec 14:00 GMT\n\nSat 21 Dec 12:30 GMT\n\n14\n\n15\n\n16\n\n17\n\n18\n\nLeicester (A)\nLiverpool (A)\nChelsea (H)\n\nMan Utd (A)\n\nArsenal (H)
27,https://i.redd.it/af6e0gfv3rt31.jpg,
30,https://i.redd.it/2au0zjdkuqt31.jpg,Premier ® League\n\nLé\nwe\nmw ft ital\n\nForm Gw9 Total Price TSB\n65 9pts 42pts £59 1.9%\n\nInfluence Creativity Threat ICT Index\n189.8 107.3 330.0 62.6\n\nThis Season\n\nGW OPP\n\nBUR (A) 3-0\n\nLIV (H) 1-2\n\nBHA (A) 0-2\nMUN (H) 1-1\nSHU (A) 0-1\nBOU (H) 1-3\nTOT (A) 2-1\nCHE (H) 1-4\n\nWOL (A) 1-1
34,https://i.redd.it/iedhbmboipt31.png,_ DE GEA\nBN eISSN\n_ LINDELOF\n\n~ MAGUIRE\n\n~ TUANZEBE\n\n_ MCTOMINAY\n_ FRED\n\n~ JAMES\n\n~ ANDREAS\ni YOUNG ©\n\n~ RASHFORD\n\n \n \n \n \n \n \n\nEe\nx\n\n>\n°°\nisi
35,https://i.redd.it/ekz39lw6ipt31.jpg,"&\nSTARTING Xa\n\nALISSON\nALEXANDER-ARNOLD\nMATIP\n\nVAN DIJK\nROBERTSON\nFABINHO\nHENDERSON\nWIJNALDUM\nMANE\n\nORIGI\nFIRMINO\n\n‘ADRIAN, LOVREN, MILNER,\nKEITA, OXLADE-CHAMBERLAIN,\nGOMEZ, LALLANA"


In [83]:
#Number of entries created:
df_fpl['ocr'].notnull().sum()

179

> Some of the output is gibberish - mostly because several of the images are tables/charts and cannot be adequately converted into a comprehensible string.

> It is also interesting to note that the FPL reddit makes far more image posts than the EPL reddit.

#### 8.2.2 Extracting twitter content for link posts <a id='Extract Tweet'></a> 

In [84]:
#Create new columns for tweet content if 'twitter' tag is in media field
df_fpl['tweet_text'] = [extract_tweet(text) if 'https://twitter' in text else np.nan for text in df_fpl['media'].astype(str)]
df_epl['tweet_text'] = [extract_tweet(text) if 'https://twitter' in text else np.nan for text in df_epl['media'].astype(str)]

In [85]:
df_epl[df_epl['tweet_text'].notnull()][['media', 'tweet_text']]

Unnamed: 0,media,tweet_text
14,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/City_Chief/status/1185969839308967937', 'author_name': 'City Chief', 'height': 527, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;📊| Liverpool (17) have failed to equal Manchester City&amp;#39;s record for most consecutive (18) wins in English top flight history.&lt;a href=""https://twitter.com/hashtag/olesatthewheel?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#olesatthewheel&lt;/a&gt; &lt;br&gt; &lt;a href=""https://t.co/Snq6KhgwcV""&gt;pic.twitter.com/Snq6KhgwcV&lt;/a&gt;&lt;/p&gt;&amp;mdash; City Chief (@City_Chief) &lt;a href=""https://twitter.com/City_Chief/status/1185969839308967937?ref_src=twsrc%5Etfw""&gt;October 20, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/City_Chief', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}",📊| Liverpool (17) have failed to equal Manchester Citys record for most consecutive (18) wins in English top flight history. / City Chief (@City_Chief) /
76,"{'oembed': {'provider_url': 'https://twitter.com', 'url': 'https://twitter.com/SheffieldUnited/status/1181645436509130752', 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;CALL UP &lt;br&gt;&lt;br&gt;Blades number 1 &lt;a href=""https://twitter.com/deanhenderson?ref_src=twsrc%5Etfw""&gt;@deanhenderson&lt;/a&gt; receives call up to &lt;a href=""https://twitter.com/England?ref_src=twsrc%5Etfw""&gt;@England&lt;/a&gt;&lt;br&gt; &lt;br&gt;Well done Deano &lt;a href=""https://t.co/C948o74ugo""&gt;pic.twitter.com/C948o74ugo&lt;/a&gt;&lt;/p&gt;&amp;mdash; Sheffield United (@SheffieldUnited) &lt;a href=""https://twitter.com/SheffieldUnited/status/1181645436509130752?ref_src=twsrc%5Etfw""&gt;October 8, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_name': 'Sheffield United', 'height': 545, 'width': 350, 'version': '1.0', 'author_url': 'https://twitter.com/SheffieldUnited', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}, 'type': 'twitter.com'}",CALL UP Blades number 1 receives call up to Well done Deano / Sheffield United (@SheffieldUnited) /
197,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/BookieInsidrs/status/1173184951778533379', 'author_name': 'Bookie Insiders', 'height': 625, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""und"" dir=""ltr""&gt;Goals + Assists (Top 5 Leagues)&lt;br&gt;&lt;br&gt;🇫🇮 Pukki 8&lt;br&gt;🇦🇷Agüero 8&lt;br&gt;🏴\U000e0067\U000e0062\U000e0065\U000e006e\U000e0067\U000e007fAbraham 7&lt;br&gt;🇵🇱 Lewy 7&lt;br&gt;🏴\U000e0067\U000e0062\U000e0065\U000e006e\U000e0067\U000e007f Sancho 6 &lt;br&gt;🇪🇸 Alcácer 6&lt;br&gt;🇫🇷 Benzema 5&lt;br&gt;🇳🇱 Depay 5&lt;br&gt;🇳🇬 Osimhen 5&lt;br&gt;🇫🇷 Dembele 5&lt;br&gt;🇧🇪 de Bruyne 5&lt;br&gt;🏴\U000e0067\U000e0062\U000e0065\U000e006e\U000e0067\U000e007fSterling 5&lt;br&gt;🇪🇬 Salah 5&lt;br&gt;🇸🇳 Mane 5&lt;br&gt;🇩🇪 Werner 5 &lt;a href=""https://t.co/0Q9GTsdLpb""&gt;pic.twitter.com/0Q9GTsdLpb&lt;/a&gt;&lt;/p&gt;&amp;mdash; Bookie Insiders (@BookieInsidrs) &lt;a href=""https://twitter.com/BookieInsidrs/status/1173184951778533379?ref_src=twsrc%5Etfw""&gt;September 15, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/BookieInsidrs', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}",Goals + Assists (Top 5 Leagues) 🇫🇮 Pukki 8 🇦🇷Agüero 8 🏴\U000e0067\U000e0062\U000e0065\U000e006e\U000e0067\U000e007fAbraham 7 🇵🇱 Lewy 7 🏴\U000e0067\U000e0062\U000e0065\U000e006e\U000e0067\U000e007f Sancho 6 🇪🇸 Alcácer 6 🇫🇷 Benzema 5 🇳🇱 Depay 5 🇳🇬 Osimhen 5 🇫🇷 Dembele 5 🇧🇪 de Bruyne 5 🏴\U000e0067\U000e0062\U000e0065\U000e006e\U000e0067\U000e007fSterling 5 🇪🇬 Salah 5 🇸🇳 Mane 5 🇩🇪 Werner 5 / Bookie Insiders (@BookieInsidrs) /
209,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/OptaJoe/status/1172427209367150605', 'author_name': 'OptaJoe', 'height': 526, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;13/09 - On this day in 1978, the first ever all-English European Cup tie takes place, with league champions Nottingham Forest beating European Cup holders Liverpool 2-0 at the City Ground. Clash. &lt;a href=""https://t.co/vhIBd4oQp6""&gt;pic.twitter.com/vhIBd4oQp6&lt;/a&gt;&lt;/p&gt;&amp;mdash; OptaJoe (@OptaJoe) &lt;a href=""https://twitter.com/OptaJoe/status/1172427209367150605?ref_src=twsrc%5Etfw""&gt;September 13, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/OptaJoe', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}","13/09 - On this day in 1978, the first ever all-English European Cup tie takes place, with league champions Nottingham Forest beating European Cup holders Liverpool 2-0 at the City Ground. Clash. / OptaJoe (@OptaJoe) /"
247,"{'oembed': {'provider_url': 'https://twitter.com', 'url': 'https://twitter.com/ChrisRWhiting/status/1164872556094087169', 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;And here&amp;#39;s (most of) the Premier League&amp;#39;s... &lt;a href=""https://twitter.com/hashtag/AFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#AFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/AVFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#AVFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/AFCB?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#AFCB&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/BHAFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#BHAFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/BurnleyFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#BurnleyFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/CPFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#CPFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/EvertonFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#EvertonFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/LCFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#LCFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/LFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#LFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/MCFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#MCFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/MUFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#MUFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/NUFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#NUFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/NCFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#NCFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/SUFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#SUFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/SaintsFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#SaintsFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/THFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#THFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/WHUFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#WHUFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/Wolves?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#Wolves&lt;/a&gt; &lt;a href=""https://t.co/lY6rPbABjN""&gt;pic.twitter.com/lY6rPbABjN&lt;/a&gt;&lt;/p&gt;&amp;mdash; Chris Whiting (@ChrisRWhiting) &lt;a href=""https://twitter.com/ChrisRWhiting/status/1164872556094087169?ref_src=twsrc%5Etfw""&gt;August 23, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_name': 'Chris Whiting', 'height': 767, 'width': 350, 'version': '1.0', 'author_url': 'https://twitter.com/ChrisRWhiting', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}, 'type': 'twitter.com'}",And heres (most of) the Premier Leagues... / Chris Whiting (@ChrisRWhiting) /
262,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/mia_huthi/status/1167724550857138176', 'author_name': 'ميا الحوثية', 'height': 430, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Mauricio &lt;a href=""https://twitter.com/hashtag/Pochettino?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#Pochettino&lt;/a&gt;: Tottenham know my commitment, I&amp;#39;ve rejected plenty of offers&lt;br&gt;&lt;br&gt;Mauricio Pochettino says he has showed commitment by staying at Tottenham despite numerous job offers &lt;a href=""https://t.co/F6cMSZjtH1""&gt;pic.twitter.com/F6cMSZjtH1&lt;/a&gt;&lt;/p&gt;&amp;mdash; ميا الحوثية (@mia_huthi) &lt;a href=""https://twitter.com/mia_huthi/status/1167724550857138176?ref_src=twsrc%5Etfw""&gt;August 31, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/mia_huthi', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}","Mauricio : Tottenham know my commitment, Ive rejected plenty of offers Mauricio Pochettino says he has showed commitment by staying at Tottenham despite numerous job offers / ميا الحوثية (@mia_huthi) /"
265,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/Everton/status/1167459460559900673', 'author_name': 'Everton', 'height': None, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;🔵 | All the best to Kevin Mirallas as he joins Royal Antwerp after 7️⃣ seasons with the Blues.&lt;br&gt;&lt;br&gt;Good luck, Kev. &lt;a href=""https://t.co/CbOXoKyiRY""&gt;pic.twitter.com/CbOXoKyiRY&lt;/a&gt;&lt;/p&gt;&amp;mdash; Everton (@Everton) &lt;a href=""https://twitter.com/Everton/status/1167459460559900673?ref_src=twsrc%5Etfw""&gt;August 30, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/Everton', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}","🔵 | All the best to Kevin Mirallas as he joins Royal Antwerp after 7️⃣ seasons with the Blues. Good luck, Kev. / Everton (@Everton) /"
287,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/FabrizioRomano/status/1166456319009808384', 'author_name': 'Fabrizio Romano', 'height': 148, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Alexis Sanchez to Inter, here we go! Done deal with Man United ⚫️🔵 &lt;a href=""https://twitter.com/hashtag/transfers?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#transfers&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/MUFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#MUFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/Inter?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#Inter&lt;/a&gt;&lt;/p&gt;&amp;mdash; Fabrizio Romano (@FabrizioRomano) &lt;a href=""https://twitter.com/FabrizioRomano/status/1166456319009808384?ref_src=twsrc%5Etfw""&gt;August 27, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/FabrizioRomano', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}","Alexis Sanchez to Inter, here we go! Done deal with Man United ⚫️🔵 / Fabrizio Romano (@FabrizioRomano) /"
421,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/IkerCasillas/status/1160893314868142082', 'author_name': 'Iker Casillas', 'height': 128, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""und"" dir=""ltr""&gt;4-4-2&lt;/p&gt;&amp;mdash; Iker Casillas (@IkerCasillas) &lt;a href=""https://twitter.com/IkerCasillas/status/1160893314868142082?ref_src=twsrc%5Etfw""&gt;August 12, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/IkerCasillas', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}",4-4-2/ Iker Casillas (@IkerCasillas) /
481,"{'oembed': {'provider_url': 'https://twitter.com', 'url': 'https://twitter.com/Arsenal/status/1159540036737687554', 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Welcome to The Arsenal, &lt;a href=""https://twitter.com/DavidLuiz_4?ref_src=twsrc%5Etfw""&gt;@DavidLuiz_4&lt;/a&gt; 😄&lt;br&gt;&lt;br&gt;🇧🇷 &lt;a href=""https://twitter.com/hashtag/BemvindoDavid?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#BemvindoDavid&lt;/a&gt;&lt;/p&gt;&amp;mdash; Arsenal (@Arsenal) &lt;a href=""https://twitter.com/Arsenal/status/1159540036737687554?ref_src=twsrc%5Etfw""&gt;August 8, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_name': 'Arsenal', 'height': None, 'width': 350, 'version': '1.0', 'author_url': 'https://twitter.com/Arsenal', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}, 'type': 'twitter.com'}","Welcome to The Arsenal, 😄 🇧🇷 / Arsenal (@Arsenal) /"


In [86]:
df_fpl[df_fpl['tweet_text'].notnull()][['media', 'tweet_text']]

Unnamed: 0,media,tweet_text
6,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/BenDinnery/status/1186397405467611136', 'author_name': 'Ben Dinnery', 'height': 433, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Maddison is still struggling with an ankle problem. &amp;quot;He did really well to put himself out there [vs Burnley]. He’s hardly trained,&amp;quot; said BR. &amp;quot;He could easily have not played, but he wanted to try it. We felt that if he could give us an hour, he’d have done really well.&amp;quot; &lt;a href=""https://twitter.com/hashtag/LCFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#LCFC&lt;/a&gt; &lt;a href=""https://t.co/dpxhxlDuda""&gt;pic.twitter.com/dpxhxlDuda&lt;/a&gt;&lt;/p&gt;&amp;mdash; Ben Dinnery (@BenDinnery) &lt;a href=""https://twitter.com/BenDinnery/status/1186397405467611136?ref_src=twsrc%5Etfw""&gt;October 21, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/BenDinnery', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}","Maddison is still struggling with an ankle problem. He did really well to put himself out there [vs Burnley]. He’s hardly trained, said BR. He could easily have not played, but he wanted to try it. We felt that if he could give us an hour, he’d have done really well. / Ben Dinnery (@BenDinnery) /"
13,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/afcbournemouth/status/1186235758555758593', 'author_name': 'AFC Bournemouth', 'height': 127, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Congratulations, &lt;a href=""https://twitter.com/AaronRamsdale98?ref_src=twsrc%5Etfw""&gt;@AaronRamsdale98&lt;/a&gt; ❤️🖤&lt;/p&gt;&amp;mdash; AFC Bournemouth (@afcbournemouth) &lt;a href=""https://twitter.com/afcbournemouth/status/1186235758555758593?ref_src=twsrc%5Etfw""&gt;October 21, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/afcbournemouth', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}","Congratulations, ❤️🖤/ AFC Bournemouth (@afcbournemouth) fcbournemouth/status/1186235758555758593?ref_src=twsrc%5Etfw""October 21, 2019/a/"
21,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/premierleague/status/1186038650741710848', 'author_name': 'Premier League', 'height': 385, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Since the start of 2017/18 season, &lt;a href=""https://twitter.com/LFC?ref_src=twsrc%5Etfw""&gt;@LFC&lt;/a&gt;&amp;#39;s Andrew Robertson has registered 19 assists in the &lt;a href=""https://twitter.com/hashtag/PL?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#PL&lt;/a&gt; - more than any other defender... &lt;a href=""https://t.co/SQbFKWMZth""&gt;pic.twitter.com/SQbFKWMZth&lt;/a&gt;&lt;/p&gt;&amp;mdash; Premier League (@premierleague) &lt;a href=""https://twitter.com/premierleague/status/1186038650741710848?ref_src=twsrc%5Etfw""&gt;October 20, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/premierleague', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}","Since the start of 2017/18 season, s Andrew Robertson has registered 19 assists in the - more than any other defender... / Premier League (@premierleague) /"
29,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/BenDinnery/status/1185996561714692096', 'author_name': 'Ben Dinnery', 'height': 223, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Salah is hoping to return in the UCL. “Mo was not ready, that’s how it is,” Klopp confirmed. “He couldn’t train with the team; I don’t know where it came from that everybody said he will play. There was pretty much no chance for today, maybe for Wednesday, we have to see.” &lt;a href=""https://twitter.com/hashtag/LFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#LFC&lt;/a&gt;&lt;/p&gt;&amp;mdash; Ben Dinnery (@BenDinnery) &lt;a href=""https://twitter.com/BenDinnery/status/1185996561714692096?ref_src=twsrc%5Etfw""&gt;October 20, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/BenDinnery', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}","Salah is hoping to return in the UCL. “Mo was not ready, that’s how it is,” Klopp confirmed. “He couldn’t train with the team I don’t know where it came from that everybody said he will play. There was pretty much no chance for today, maybe for Wednesday, we have to see.” / Ben Dinnery (@BenDinnery) /"
37,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/utdreport/status/1185924288324558848', 'author_name': 'utdreport', 'height': 147, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Mohamed Salah has not arrived with the Liverpool team at Old Trafford &lt;a href=""https://twitter.com/hashtag/mulive?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#mulive&lt;/a&gt; [sky]&lt;/p&gt;&amp;mdash; utdreport (@utdreport) &lt;a href=""https://twitter.com/utdreport/status/1185924288324558848?ref_src=twsrc%5Etfw""&gt;October 20, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/utdreport', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}",Mohamed Salah has not arrived with the Liverpool team at Old Trafford [sky]/ utdreport (@utdreport) /
38,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/RobDawsonESPN/status/1185923626102677504', 'author_name': 'Rob Dawson', 'height': 147, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;De Gea is starting for United today. Martial only fit enough for the bench.&lt;/p&gt;&amp;mdash; Rob Dawson (@RobDawsonESPN) &lt;a href=""https://twitter.com/RobDawsonESPN/status/1185923626102677504?ref_src=twsrc%5Etfw""&gt;October 20, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/RobDawsonESPN', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}",De Gea is starting for United today. Martial only fit enough for the bench./ Rob Dawson (@RobDawsonESPN) /
40,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/PremLeaguePanel/status/1185900144765689857', 'author_name': 'Premier League Panel', 'height': 488, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;He was already a creative hub for them but Graham Potter’s possession-based philosophy is taking Pascal Gross’ game to the next level.&lt;br&gt;&lt;br&gt;He’s now creating a big chance every 107 minutes in the Premier League this season&lt;br&gt;&lt;br&gt;2017/18 - 190 minutes&lt;br&gt;2018/19 - 216 minutes&lt;br&gt;&lt;br&gt;Thriving &lt;a href=""https://twitter.com/hashtag/bhafc?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#bhafc&lt;/a&gt; &lt;a href=""https://t.co/tsOZWsthSN""&gt;pic.twitter.com/tsOZWsthSN&lt;/a&gt;&lt;/p&gt;&amp;mdash; Premier League Panel (@PremLeaguePanel) &lt;a href=""https://twitter.com/PremLeaguePanel/status/1185900144765689857?ref_src=twsrc%5Etfw""&gt;October 20, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/PremLeaguePanel', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}",He was already a creative hub for them but Graham Potter’s possession-based philosophy is taking Pascal Gross’ game to the next level. He’s now creating a big chance every 107 minutes in the Premier League this season 2017/18 - 190 minutes 2018/19 - 216 minutes Thriving / Premier League Panel (@PremLeaguePanel) /
41,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/FPLTIPZ/status/1185902202663112704', 'author_name': 'FPLTIPZ', 'height': 0, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Hearing Salah not in the squad. Wasn’t spotted at the team hotel&lt;br&gt;&lt;br&gt;😔😔😔&lt;/p&gt;&amp;mdash; FPLTIPZ (@FPLTIPZ) &lt;a href=""https://twitter.com/FPLTIPZ/status/1185902202663112704?ref_src=twsrc%5Etfw""&gt;October 20, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/FPLTIPZ', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}",Hearing Salah not in the squad. Wasn’t spotted at the team hotel 😔😔😔/ FPLTIPZ (@FPLTIPZ) /
44,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/MiguelDelaney/status/1185870655474020354', 'author_name': 'Miguel Delaney', 'height': 166, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Told De Gea in with a chance of starting today, as well as Martial. Solskjaer has also been considering a back three.&lt;/p&gt;&amp;mdash; Miguel Delaney (@MiguelDelaney) &lt;a href=""https://twitter.com/MiguelDelaney/status/1185870655474020354?ref_src=twsrc%5Etfw""&gt;October 20, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/MiguelDelaney', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}","Told De Gea in with a chance of starting today, as well as Martial. Solskjaer has also been considering a back three./ Miguel Delaney (@MiguelDelaney) /"
45,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/FFScout/status/1185874384826843136', 'author_name': 'Fantasy Football Scout', 'height': 336, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;🗣️ Hodgson: &amp;quot;Martin Kelly was in yesterday&amp;#39;s training session, he felt a slight twinge in his groin. As with Guaita, I&amp;#39;m not sure, I&amp;#39;m hoping it won&amp;#39;t be a long-term injury.&amp;quot;&lt;a href=""https://twitter.com/hashtag/FFScout?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#FFScout&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/FPL?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#FPL&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/GW9?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#GW9&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/CPFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#CPFC&lt;/a&gt; &lt;a href=""https://twitter.com/hashtag/FantasyPL?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#FantasyPL&lt;/a&gt;&lt;/p&gt;&amp;mdash; Fantasy Football Scout (@FFScout) &lt;a href=""https://twitter.com/FFScout/status/1185874384826843136?ref_src=twsrc%5Etfw""&gt;October 20, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/FFScout', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}","🗣️ Hodgson: Martin Kelly was in yesterdays training session, he felt a slight twinge in his groin. As with Guaita, Im not sure, Im hoping it wont be a long-term injury. / Fantasy Football Scout (@FFScout) /"


#### 8.2.3 Filling missing values

In [87]:
#fill NA for r/fpl with 'ocr' values
df_fpl['selftext'] = df_fpl['selftext'].fillna(df_fpl['ocr'])

In [88]:
#fill NA for r/fpl with 'tweet_text' values
df_fpl['selftext'] = df_fpl['selftext'].fillna(df_fpl['tweet_text'])

In [89]:
df_fpl['selftext'].isnull().sum()

227

In [90]:
#fill NA for r/fpl with 'ocr' values
df_epl['selftext'] = df_epl['selftext'].fillna(df_epl['ocr'])

In [91]:
#fill NA for r/fpl with 'tweet_text' values
df_epl['selftext'] = df_epl['selftext'].fillna(df_epl['tweet_text'])

In [92]:
df_epl['selftext'].isnull().sum()

434

> Over 400 missing values were filled - the rest of the link posts are links to external websites, which may be problematic to crawl and derive values from. For these, we will just use the post title.

### 8.3 Cleaning, Tokenizing, and Lemmatizing Text  <a id='Cleaning Text'></a> 
<div align="right"><a href='#Table of Contents'>Back to Table of Contents</a></div>

In [93]:
lemmatizer = WordNetLemmatizer() 
#
def clean_text(text):
    #Regex to leave only emojis, text, and numbers
    cleaned_text = re.sub(r":-?[()]|([^A-Za-z\U00010000-\U0010ffff])", 
                          lambda x: " " if x.group(1) else x.group(), text.lower())
    
    
    return cleaned_text

In [94]:
#Clean SELFTEXT
#Run regex to clean text, then tokenize, then lemmatize selftext field
df_fpl['selftext'] = df_fpl['selftext'].astype(str).apply(lambda row: ' '.join([lemmatizer.lemmatize(x) for x in nltk.word_tokenize(clean_text(row))]))
df_epl['selftext'] = df_epl['selftext'].astype(str).apply(lambda row: ' '.join([lemmatizer.lemmatize(x) for x in nltk.word_tokenize(clean_text(row))]))

In [95]:
#Clean COMMENTS
#Run regex to clean text, then tokenize, then lemmatize selftext field
df_fpl['comments'] = df_fpl['comments'].astype(str).apply(lambda row: ' '.join([lemmatizer.lemmatize(x) for x in nltk.word_tokenize(clean_text(row))]))
df_epl['comments'] = df_epl['comments'].astype(str).apply(lambda row: ' '.join([lemmatizer.lemmatize(x) for x in nltk.word_tokenize(clean_text(row))]))

In [96]:
#Clean TITLE
#Run regex to clean text, then tokenize, then lemmatize selftext field
df_fpl['title'] = df_fpl['title'].astype(str).apply(lambda row: ' '.join([lemmatizer.lemmatize(x) for x in nltk.word_tokenize(clean_text(row))]))
df_epl['title'] = df_epl['title'].astype(str).apply(lambda row: ' '.join([lemmatizer.lemmatize(x) for x in nltk.word_tokenize(clean_text(row))]))

In [108]:
df_combined = df_fpl.append(df_epl, ignore_index = True).drop('Unnamed: 0', axis = 1)
df_combined = df_combined.fillna('').replace('nan', '')
#Creates a new column with all text fields combined
df_combined['combined'] = df_combined['title'] + df_combined['selftext'] + df_combined['comments'] 
df_combined

Unnamed: 0,subreddit_name_prefixed,author,title,selftext,domain,link_flair_text,created,media,media_embed,url,permalink,num_comments,score,ups,comments,ocr,tweet_text,combined
0,r/FantasyPL,DoTheRax,average of gw is higher than the average of gw,can someone please explain to me how this gw s average is higher than the previous gw despite highly owned player like mane and abraham exploding also despite return from burnley defender mount lundstram a cheeky taa bonus in the previous gameweek we have a higher gw average this week with literally no one who is significantly owned returning exploding except a and from sterling lundstram and robbo i really don t understand the math behind this,self.FantasyPL,,2019-10-22,,{},https://www.reddit.com/r/FantasyPL/comments/dl9liy/average_of_gw_9_is_higher_than_the_average_of_gw_8/,/r/FantasyPL/comments/dl9liy/average_of_gw_9_is_higher_than_the_average_of_gw_8/,10,2,2,the difference is only one point though both weekend weren t great aguero and salah owner got screwed over especially the one who c and vc them you can see the player that contributed to the average the most using this http www livefpl net overall it look like a case of consistent s and s with some prominent clean sheet everton digne mina city ederson chelsea azpi tomori bourenmouth ake rico and shiefield lundy add to that robertson s return and it add up quickly aguero and salah owner missing out and the subsequent bench fodder being subbed in last week everyone expected to play played this week those were expected and the bench sub made up for it in a small way i got : - ) n nif only i d captained cho ahead of tammy what wa i thinking,,,average of gw is higher than the average of gwcan someone please explain to me how this gw s average is higher than the previous gw despite highly owned player like mane and abraham exploding also despite return from burnley defender mount lundstram a cheeky taa bonus in the previous gameweek we have a higher gw average this week with literally no one who is significantly owned returning exploding except a and from sterling lundstram and robbo i really don t understand the math behind thisthe difference is only one point though both weekend weren t great aguero and salah owner got screwed over especially the one who c and vc them you can see the player that contributed to the average the most using this http www livefpl net overall it look like a case of consistent s and s with some prominent clean sheet everton digne mina city ederson chelsea azpi tomori bourenmouth ake rico and shiefield lundy add to that robertson s return and it add up quickly aguero and salah owner missing out and the subsequent bench fodder being subbed in last week everyone expected to play played this week those were expected and the bench sub made up for it in a small way i got : - ) n nif only i d captained cho ahead of tammy what wa i thinking
1,r/FantasyPL,QuickyGaming,lundstram would have scored point if he wa correctly categorized a a midfielder,these calculation will be really long there s no real purpose in it other than to be a filler so this post doesn t get removed also the bonus point might have been different but i m not sure how to calculate them note lundstram is currently on point in his current miscategorized state gw bou a point for playing minute bonus point point gw cry h point for playing minute point for goal scored point for a clean sheet point for a yellow card bonus point point gw lei h point for playing minute point for a yellow card point gw che a point for playing minute point gw sou h point for playing minute point gw eve a point for playing minute point for assist point for a clean sheet bonus point point gw liv h point for playing minute point gw wat a point for playing minute point for a clean sheet point for a yellow card point gw ar h point for playing minute point for a clean sheet point point point would still make him one of the best midfield bench fodder available tl dr lundstram would have had point if he wa categorized a a midfielder,self.FantasyPL,,2019-10-22,,{},https://www.reddit.com/r/FantasyPL/comments/dl996t/lundstram_would_have_scored_points_34_if_he_was/,/r/FantasyPL/comments/dl996t/lundstram_would_have_scored_points_34_if_he_was/,5,16,16,i m an idiot i realized i just put point instead of point if you need to write fodder anyway maybe put how many point he ha a is so we can compare n nedit cool thought the gap would be bigger,,,lundstram would have scored point if he wa correctly categorized a a midfielderthese calculation will be really long there s no real purpose in it other than to be a filler so this post doesn t get removed also the bonus point might have been different but i m not sure how to calculate them note lundstram is currently on point in his current miscategorized state gw bou a point for playing minute bonus point point gw cry h point for playing minute point for goal scored point for a clean sheet point for a yellow card bonus point point gw lei h point for playing minute point for a yellow card point gw che a point for playing minute point gw sou h point for playing minute point gw eve a point for playing minute point for assist point for a clean sheet bonus point point gw liv h point for playing minute point gw wat a point for playing minute point for a clean sheet point for a yellow card point gw ar h point for playing minute point for a clean sheet point point point would still make him one of the best midfield bench fodder available tl dr lundstram would have had point if he wa categorized a a midfielderi m an idiot i realized i just put point instead of point if you need to write fodder anyway maybe put how many point he ha a is so we can compare n nedit cool thought the gap would be bigger
2,r/FantasyPL,Sad_Weed,also consider lundstram a a potential starter,sheffield united sheffieldunited our defence conceded joint least in the pl clean sheet joint most conceded in open play least amount in the premierleague hsufc,i.redd.it,,2019-10-22,,{},https://i.redd.it/v3pv564jezt31.jpg,/r/FantasyPL/comments/dl9727/also_consider_lundstram_as_a_potential_starter/,3,19,19,you mean attacking midfielder lundstram he came off the bench to bail me out when aguero wa benched,\ Sheffield United @\n@SheffieldUnited\n\n \n\nOur defence ()-)\n7 conceded (joint least in the #PL) &\n4 clean sheets (joint most) ©\n\n5 conceded in open play (least\namount in the @premierleague ) @\n\nHSUFC @,,also consider lundstram a a potential startersheffield united sheffieldunited our defence conceded joint least in the pl clean sheet joint most conceded in open play least amount in the premierleague hsufcyou mean attacking midfielder lundstram he came off the bench to bail me out when aguero wa benched
3,r/FantasyPL,FMLFPL,fml fpl ep on to gw trusting the process,,fmlfpl.libsyn.com,Podcast,2019-10-22,"{'type': 'fmlfpl.libsyn.com', 'oembed': {'provider_url': 'https://www.libsyn.com', 'description': ""It's a little bit of frustration and darkness on today's episode but mostly it's just a simple job with Alon's fitness and the return to normal podding."", 'title': 'Ep. 213 - On to GW10 Trusting the Process', 'type': 'rich', 'author_name': 'FML FPL', 'height': 90, 'width': 600, 'html': '&lt;iframe class=""embedly-embed"" src=""https://cdn.embedly.com/widgets/media.html?src=%2F%2Fhtml5-player.libsyn.com%2Fembed%2Fepisode%2Fid%2F11731382%2Fheight%2F90%2Ftheme%2Fcustom%2Fthumbnail%2Fyes%2Fdirection%2Fforward%2Frender-playlist%2Fno%2Fcustom-color%2F499dff%2F&amp;url=http%3A%2F%2Ffmlfpl.libsyn.com%2Fep-213-on-to-gw10-trusting-the-process&amp;image=http%3A%2F%2Fstatic.libsyn.com%2Fp%2Fassets%2Fplatform%2Fwebsuite%2Fstitcher.png&amp;key=2aa3c4d5f3de4f5b9120b660ad850dc9&amp;type=text%2Fhtml&amp;schema=libsyn"" width=""600"" height=""90"" scrolling=""no"" frameborder=""0"" allow=""autoplay; fullscreen"" allowfullscreen=""true""&gt;&lt;/iframe&gt;', 'thumbnail_width': 300, 'version': '1.0', 'provider_name': 'Libsyn', 'thumbnail_url': 'https://i.embed.ly/1/image?url=http%3A%2F%2Fstatic.libsyn.com%2Fp%2Fassets%2Fplatform%2Fwebsuite%2Fstitcher.png&amp;key=b1e305db91cf4aa5a86b732cc9fffceb', 'thumbnail_height': 300, 'author_url': 'http://fmlfpl.com/'}}","{'content': '&lt;iframe class=""embedly-embed"" src=""https://cdn.embedly.com/widgets/media.html?src=%2F%2Fhtml5-player.libsyn.com%2Fembed%2Fepisode%2Fid%2F11731382%2Fheight%2F90%2Ftheme%2Fcustom%2Fthumbnail%2Fyes%2Fdirection%2Fforward%2Frender-playlist%2Fno%2Fcustom-color%2F499dff%2F&amp;url=http%3A%2F%2Ffmlfpl.libsyn.com%2Fep-213-on-to-gw10-trusting-the-process&amp;image=http%3A%2F%2Fstatic.libsyn.com%2Fp%2Fassets%2Fplatform%2Fwebsuite%2Fstitcher.png&amp;key=2aa3c4d5f3de4f5b9120b660ad850dc9&amp;type=text%2Fhtml&amp;schema=libsyn"" width=""600"" height=""90"" scrolling=""no"" frameborder=""0"" allow=""autoplay; fullscreen"" allowfullscreen=""true""&gt;&lt;/iframe&gt;', 'width': 600, 'scrolling': False, 'height': 90}",http://fmlfpl.libsyn.com/ep-213-on-to-gw10-trusting-the-process,/r/FantasyPL/comments/dl8njz/fml_fpl_ep_213_on_to_gw10_trusting_the_process/,3,10,10,it s a little bit of frustration and darkness on today s episode but mostly it s just a simple job with alon s fitness and the return to normal podding n n how d we do pre shu v ar in gw n top topic n listener question n gw captaincy amp our transfer n anus slap outro,,,fml fpl ep on to gw trusting the processit s a little bit of frustration and darkness on today s episode but mostly it s just a simple job with alon s fitness and the return to normal podding n n how d we do pre shu v ar in gw n top topic n listener question n gw captaincy amp our transfer n anus slap outro
4,r/FantasyPL,kayeloo,gw bookie review clean sheet amp goal scoring odds,amp x b amp x b amp x b http i redd it rqutepdh zt png amp x b http i redd it qtr yrlj zt png major favorite lost clean sheet but what can i say lad about goalscorers tremendously autumnal gameweek clean sheet probability http www reddit com r fantasypl comment dip b clean sheet probability gw http www reddit com r fantasypl comment dip b clean sheet probability gw goal scorer http www reddit com r fantasypl comment dju tq bookie goalscorer odds gw http www reddit com r fantasypl comment dju tq bookie goalscorer odds gw,self.FantasyPL,Statistics,2019-10-22,,{},https://www.reddit.com/r/FantasyPL/comments/dl8e20/gw9_bookies_review_clean_sheet_goal_scoring_odds/,/r/FantasyPL/comments/dl8e20/gw9_bookies_review_clean_sheet_goal_scoring_odds/,3,17,17,this is fine lundstram never tell me the odds,,,gw bookie review clean sheet amp goal scoring oddsamp x b amp x b amp x b http i redd it rqutepdh zt png amp x b http i redd it qtr yrlj zt png major favorite lost clean sheet but what can i say lad about goalscorers tremendously autumnal gameweek clean sheet probability http www reddit com r fantasypl comment dip b clean sheet probability gw http www reddit com r fantasypl comment dip b clean sheet probability gw goal scorer http www reddit com r fantasypl comment dju tq bookie goalscorer odds gw http www reddit com r fantasypl comment dju tq bookie goalscorer odds gwthis is fine lundstram never tell me the odds
5,r/FantasyPL,3amz,all fpl bonus point have been added so here s the final dream team for gw,pickford alonso o connell ute ca steve co aed coc ae eyol dat david sil bernard ings,i.redd.it,,2019-10-22,,{},https://i.redd.it/y1sx8mgd0zt31.jpg,/r/FantasyPL/comments/dl83np/all_fpl_bonus_points_have_been_added_so_heres_the/,22,28,28,the total cost of this team is m lmao would be interesting to see the combined ownership of that lot it s here lad the new template i remember seeing a post on here about picking jamaat for this gw wish i did summarises why this season ha been so frustrating for many imo what an absurd best xi so tough to predict i have none of these people no this is not how you are supposed to play the game http i kym cdn com photo image newsfeed gif oh i see these great lei fuxtures many got in soy vardy maddison but it s barnes who s the one to get great sum this week up couldn t pay me to have a single one of those in my team,Pickford\n\n8\nAlonso O'Connell UTE Cas Steve Co...\n14 aed 11 9\ncoc ae [eYol dats David Sil... Bernard\n13 11 _—_———] 11 10\nIngs,,all fpl bonus point have been added so here s the final dream team for gwpickford alonso o connell ute ca steve co aed coc ae eyol dat david sil bernard ingsthe total cost of this team is m lmao would be interesting to see the combined ownership of that lot it s here lad the new template i remember seeing a post on here about picking jamaat for this gw wish i did summarises why this season ha been so frustrating for many imo what an absurd best xi so tough to predict i have none of these people no this is not how you are supposed to play the game http i kym cdn com photo image newsfeed gif oh i see these great lei fuxtures many got in soy vardy maddison but it s barnes who s the one to get great sum this week up couldn t pay me to have a single one of those in my team
6,r/FantasyPL,strawberrygenius7,maddison is still struggling with an ankle problem he did really well to put himself out there v burnley he s hardly trained said brendan rodgers he could easily have not played but he wanted to try it we felt that if he could give u an hour he d have done really well ben dinnery,maddison is still struggling with an ankle problem he did really well to put himself out there v burnley he s hardly trained said br he could easily have not played but he wanted to try it we felt that if he could give u an hour he d have done really well ben dinnery bendinnery,twitter.com,News,2019-10-22,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/BenDinnery/status/1186397405467611136', 'author_name': 'Ben Dinnery', 'height': 433, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Maddison is still struggling with an ankle problem. &amp;quot;He did really well to put himself out there [vs Burnley]. He’s hardly trained,&amp;quot; said BR. &amp;quot;He could easily have not played, but he wanted to try it. We felt that if he could give us an hour, he’d have done really well.&amp;quot; &lt;a href=""https://twitter.com/hashtag/LCFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#LCFC&lt;/a&gt; &lt;a href=""https://t.co/dpxhxlDuda""&gt;pic.twitter.com/dpxhxlDuda&lt;/a&gt;&lt;/p&gt;&amp;mdash; Ben Dinnery (@BenDinnery) &lt;a href=""https://twitter.com/BenDinnery/status/1186397405467611136?ref_src=twsrc%5Etfw""&gt;October 21, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/BenDinnery', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}","{'content': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Maddison is still struggling with an ankle problem. &amp;quot;He did really well to put himself out there [vs Burnley]. He’s hardly trained,&amp;quot; said BR. &amp;quot;He could easily have not played, but he wanted to try it. We felt that if he could give us an hour, he’d have done really well.&amp;quot; &lt;a href=""https://twitter.com/hashtag/LCFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#LCFC&lt;/a&gt; &lt;a href=""https://t.co/dpxhxlDuda""&gt;pic.twitter.com/dpxhxlDuda&lt;/a&gt;&lt;/p&gt;&amp;mdash; Ben Dinnery (@BenDinnery) &lt;a href=""https://twitter.com/BenDinnery/status/1186397405467611136?ref_src=twsrc%5Etfw""&gt;October 21, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'width': 350, 'scrolling': False, 'height': 433}",https://twitter.com/BenDinnery/status/1186397405467611136,/r/FantasyPL/comments/dl7lx7/maddison_is_still_struggling_with_an_ankle/,5,26,26,why the fuck did you played him leave the man rest and get lundstram off my bench jesus n nhe better play on friday dont fuck u up give it a few day rest lad back in training thursday ready to play on friday eve wtf no after aguero a nd fail on my wc,,"Maddison is still struggling with an ankle problem. He did really well to put himself out there [vs Burnley]. He’s hardly trained, said BR. He could easily have not played, but he wanted to try it. We felt that if he could give us an hour, he’d have done really well. / Ben Dinnery (@BenDinnery) /",maddison is still struggling with an ankle problem he did really well to put himself out there v burnley he s hardly trained said brendan rodgers he could easily have not played but he wanted to try it we felt that if he could give u an hour he d have done really well ben dinnerymaddison is still struggling with an ankle problem he did really well to put himself out there v burnley he s hardly trained said br he could easily have not played but he wanted to try it we felt that if he could give u an hour he d have done really well ben dinnery bendinnerywhy the fuck did you played him leave the man rest and get lundstram off my bench jesus n nhe better play on friday dont fuck u up give it a few day rest lad back in training thursday ready to play on friday eve wtf no after aguero a nd fail on my wc
7,r/FantasyPL,fromdowntownn,otamendi and walker back in training for the ucl game v atalanta,,bbc.co.uk,News,2019-10-22,,{},https://www.bbc.co.uk/sport/football/49874408,/r/FantasyPL/comments/dl7lu7/otamendi_and_walker_back_in_training_for_the_ucl/,5,19,19,otamendi is back time to sell ederson,,,otamendi and walker back in training for the ucl game v atalantaotamendi is back time to sell ederson
8,r/FantasyPL,Ak_Ibrahim,john lundstram is the top scoring defender in the game,statistic view sorted by defender y total point v player cost sel form pt lundstram eg t shu def lei def a alexander amold x t uv def robertson i f uv def aa,i.redd.it,Statistics,2019-10-22,,{},https://i.redd.it/jksvmiwukyt31.jpg,/r/FantasyPL/comments/dl6vwu/john_lundstram_is_the_top_scoring_defender_in_the/,80,443,443,what s a king to a god before gw the sheffield united fan forum were saying he shouldn t start literally the worst advice i have ever received from any fan forum fucking benched him for his highest return ffs have to say for anyone faffing around looking for a good m midfield option with saka or whoever you don t fucking need one when you ve got lundstram acting like one when counting a a defender who is unlikely to concede more than goal a game too often will often get clean sheet and could pop up with an attacking return every so often so save m get a nakamba or dendoncker shove him last on your bench and just put this guy first on your bench unless of course you plan to start him fpl tower putting him a a m defender must be the stupidest decision they ve made shame for sheffield united s other defensive option a they all seem great too and we re unlikely to consider them apart from the odd run where a double up might be on if they keep up this defensive form soon to be sir john lundstram if he keep this up lundy you fucking beauty nice to see him on my bench every week : ( benched him and rico ffs not owning him rn feel a lot like not owning dougherty at all last season super fun love fpl super fun lundstram my master,"Statistics\n\n \n\n \n\n \n\n \n\n \n\n \n\nView Sorted by\nDefenders Y | | Total points v\nPlayer cost Sel Form Pts,\n. Lundstram\neG t SHU DEF 46 32.4% 43 45\n- ‘LEI DEF 7 a 7\n5 Alexander-Amold .\nx t UV DEF 72 30.6% 45 42\nRobertson 3\ni f uv DEF 69 19.4% 60 aa",,john lundstram is the top scoring defender in the gamestatistic view sorted by defender y total point v player cost sel form pt lundstram eg t shu def lei def a alexander amold x t uv def robertson i f uv def aawhat s a king to a god before gw the sheffield united fan forum were saying he shouldn t start literally the worst advice i have ever received from any fan forum fucking benched him for his highest return ffs have to say for anyone faffing around looking for a good m midfield option with saka or whoever you don t fucking need one when you ve got lundstram acting like one when counting a a defender who is unlikely to concede more than goal a game too often will often get clean sheet and could pop up with an attacking return every so often so save m get a nakamba or dendoncker shove him last on your bench and just put this guy first on your bench unless of course you plan to start him fpl tower putting him a a m defender must be the stupidest decision they ve made shame for sheffield united s other defensive option a they all seem great too and we re unlikely to consider them apart from the odd run where a double up might be on if they keep up this defensive form soon to be sir john lundstram if he keep this up lundy you fucking beauty nice to see him on my bench every week : ( benched him and rico ffs not owning him rn feel a lot like not owning dougherty at all last season super fun love fpl super fun lundstram my master
9,r/FantasyPL,superstoreman,top net transfer in and out,most net transfer in name net transfer change ownership de bruyne vardy david silva hudson odoi man aubameyang abraham jim nez grealish pereira most net transfer out name net transfer change ownership pukki son salah adri n kane walker mahrez otamendi ag ero zinchenko,self.FantasyPL,Statistics,2019-10-22,,{},https://www.reddit.com/r/FantasyPL/comments/dl57v1/top_10_net_transfers_in_and_out_21102019_22102019/,/r/FantasyPL/comments/dl57v1/top_10_net_transfers_in_and_out_21102019_22102019/,23,20,20,mane could overtake salah in ownership this week man might be tempted to jump back on salah if he drop a little more for the latest price change prediction see http www fplstatistics co uk or http www fantasyfootballfix com price everyone is abandoning the son ship surprised still have kane i m looking at the net transfer this gameweek and it seems like player who haven t risen yet such a aubameyang david silva and jimenez have been transferred in more than abraham but http fplstatistics com ha them all at while abraham is gt they are also all owned le than him could someone explain this to me i am guessing that it is net transfer in a different time period or something how likely is aguero to drop this week,,,top net transfer in and outmost net transfer in name net transfer change ownership de bruyne vardy david silva hudson odoi man aubameyang abraham jim nez grealish pereira most net transfer out name net transfer change ownership pukki son salah adri n kane walker mahrez otamendi ag ero zinchenkomane could overtake salah in ownership this week man might be tempted to jump back on salah if he drop a little more for the latest price change prediction see http www fplstatistics co uk or http www fantasyfootballfix com price everyone is abandoning the son ship surprised still have kane i m looking at the net transfer this gameweek and it seems like player who haven t risen yet such a aubameyang david silva and jimenez have been transferred in more than abraham but http fplstatistics com ha them all at while abraham is gt they are also all owned le than him could someone explain this to me i am guessing that it is net transfer in a different time period or something how likely is aguero to drop this week


In [109]:
df_combined['subreddit_name_prefixed'] = df_combined['subreddit_name_prefixed'].apply(lambda x: 0 if x == 'r/FantasyPL' else 1)
df_combined

Unnamed: 0,subreddit_name_prefixed,author,title,selftext,domain,link_flair_text,created,media,media_embed,url,permalink,num_comments,score,ups,comments,ocr,tweet_text,combined
0,0,DoTheRax,average of gw is higher than the average of gw,can someone please explain to me how this gw s average is higher than the previous gw despite highly owned player like mane and abraham exploding also despite return from burnley defender mount lundstram a cheeky taa bonus in the previous gameweek we have a higher gw average this week with literally no one who is significantly owned returning exploding except a and from sterling lundstram and robbo i really don t understand the math behind this,self.FantasyPL,,2019-10-22,,{},https://www.reddit.com/r/FantasyPL/comments/dl9liy/average_of_gw_9_is_higher_than_the_average_of_gw_8/,/r/FantasyPL/comments/dl9liy/average_of_gw_9_is_higher_than_the_average_of_gw_8/,10,2,2,the difference is only one point though both weekend weren t great aguero and salah owner got screwed over especially the one who c and vc them you can see the player that contributed to the average the most using this http www livefpl net overall it look like a case of consistent s and s with some prominent clean sheet everton digne mina city ederson chelsea azpi tomori bourenmouth ake rico and shiefield lundy add to that robertson s return and it add up quickly aguero and salah owner missing out and the subsequent bench fodder being subbed in last week everyone expected to play played this week those were expected and the bench sub made up for it in a small way i got : - ) n nif only i d captained cho ahead of tammy what wa i thinking,,,average of gw is higher than the average of gwcan someone please explain to me how this gw s average is higher than the previous gw despite highly owned player like mane and abraham exploding also despite return from burnley defender mount lundstram a cheeky taa bonus in the previous gameweek we have a higher gw average this week with literally no one who is significantly owned returning exploding except a and from sterling lundstram and robbo i really don t understand the math behind thisthe difference is only one point though both weekend weren t great aguero and salah owner got screwed over especially the one who c and vc them you can see the player that contributed to the average the most using this http www livefpl net overall it look like a case of consistent s and s with some prominent clean sheet everton digne mina city ederson chelsea azpi tomori bourenmouth ake rico and shiefield lundy add to that robertson s return and it add up quickly aguero and salah owner missing out and the subsequent bench fodder being subbed in last week everyone expected to play played this week those were expected and the bench sub made up for it in a small way i got : - ) n nif only i d captained cho ahead of tammy what wa i thinking
1,0,QuickyGaming,lundstram would have scored point if he wa correctly categorized a a midfielder,these calculation will be really long there s no real purpose in it other than to be a filler so this post doesn t get removed also the bonus point might have been different but i m not sure how to calculate them note lundstram is currently on point in his current miscategorized state gw bou a point for playing minute bonus point point gw cry h point for playing minute point for goal scored point for a clean sheet point for a yellow card bonus point point gw lei h point for playing minute point for a yellow card point gw che a point for playing minute point gw sou h point for playing minute point gw eve a point for playing minute point for assist point for a clean sheet bonus point point gw liv h point for playing minute point gw wat a point for playing minute point for a clean sheet point for a yellow card point gw ar h point for playing minute point for a clean sheet point point point would still make him one of the best midfield bench fodder available tl dr lundstram would have had point if he wa categorized a a midfielder,self.FantasyPL,,2019-10-22,,{},https://www.reddit.com/r/FantasyPL/comments/dl996t/lundstram_would_have_scored_points_34_if_he_was/,/r/FantasyPL/comments/dl996t/lundstram_would_have_scored_points_34_if_he_was/,5,16,16,i m an idiot i realized i just put point instead of point if you need to write fodder anyway maybe put how many point he ha a is so we can compare n nedit cool thought the gap would be bigger,,,lundstram would have scored point if he wa correctly categorized a a midfielderthese calculation will be really long there s no real purpose in it other than to be a filler so this post doesn t get removed also the bonus point might have been different but i m not sure how to calculate them note lundstram is currently on point in his current miscategorized state gw bou a point for playing minute bonus point point gw cry h point for playing minute point for goal scored point for a clean sheet point for a yellow card bonus point point gw lei h point for playing minute point for a yellow card point gw che a point for playing minute point gw sou h point for playing minute point gw eve a point for playing minute point for assist point for a clean sheet bonus point point gw liv h point for playing minute point gw wat a point for playing minute point for a clean sheet point for a yellow card point gw ar h point for playing minute point for a clean sheet point point point would still make him one of the best midfield bench fodder available tl dr lundstram would have had point if he wa categorized a a midfielderi m an idiot i realized i just put point instead of point if you need to write fodder anyway maybe put how many point he ha a is so we can compare n nedit cool thought the gap would be bigger
2,0,Sad_Weed,also consider lundstram a a potential starter,sheffield united sheffieldunited our defence conceded joint least in the pl clean sheet joint most conceded in open play least amount in the premierleague hsufc,i.redd.it,,2019-10-22,,{},https://i.redd.it/v3pv564jezt31.jpg,/r/FantasyPL/comments/dl9727/also_consider_lundstram_as_a_potential_starter/,3,19,19,you mean attacking midfielder lundstram he came off the bench to bail me out when aguero wa benched,\ Sheffield United @\n@SheffieldUnited\n\n \n\nOur defence ()-)\n7 conceded (joint least in the #PL) &\n4 clean sheets (joint most) ©\n\n5 conceded in open play (least\namount in the @premierleague ) @\n\nHSUFC @,,also consider lundstram a a potential startersheffield united sheffieldunited our defence conceded joint least in the pl clean sheet joint most conceded in open play least amount in the premierleague hsufcyou mean attacking midfielder lundstram he came off the bench to bail me out when aguero wa benched
3,0,FMLFPL,fml fpl ep on to gw trusting the process,,fmlfpl.libsyn.com,Podcast,2019-10-22,"{'type': 'fmlfpl.libsyn.com', 'oembed': {'provider_url': 'https://www.libsyn.com', 'description': ""It's a little bit of frustration and darkness on today's episode but mostly it's just a simple job with Alon's fitness and the return to normal podding."", 'title': 'Ep. 213 - On to GW10 Trusting the Process', 'type': 'rich', 'author_name': 'FML FPL', 'height': 90, 'width': 600, 'html': '&lt;iframe class=""embedly-embed"" src=""https://cdn.embedly.com/widgets/media.html?src=%2F%2Fhtml5-player.libsyn.com%2Fembed%2Fepisode%2Fid%2F11731382%2Fheight%2F90%2Ftheme%2Fcustom%2Fthumbnail%2Fyes%2Fdirection%2Fforward%2Frender-playlist%2Fno%2Fcustom-color%2F499dff%2F&amp;url=http%3A%2F%2Ffmlfpl.libsyn.com%2Fep-213-on-to-gw10-trusting-the-process&amp;image=http%3A%2F%2Fstatic.libsyn.com%2Fp%2Fassets%2Fplatform%2Fwebsuite%2Fstitcher.png&amp;key=2aa3c4d5f3de4f5b9120b660ad850dc9&amp;type=text%2Fhtml&amp;schema=libsyn"" width=""600"" height=""90"" scrolling=""no"" frameborder=""0"" allow=""autoplay; fullscreen"" allowfullscreen=""true""&gt;&lt;/iframe&gt;', 'thumbnail_width': 300, 'version': '1.0', 'provider_name': 'Libsyn', 'thumbnail_url': 'https://i.embed.ly/1/image?url=http%3A%2F%2Fstatic.libsyn.com%2Fp%2Fassets%2Fplatform%2Fwebsuite%2Fstitcher.png&amp;key=b1e305db91cf4aa5a86b732cc9fffceb', 'thumbnail_height': 300, 'author_url': 'http://fmlfpl.com/'}}","{'content': '&lt;iframe class=""embedly-embed"" src=""https://cdn.embedly.com/widgets/media.html?src=%2F%2Fhtml5-player.libsyn.com%2Fembed%2Fepisode%2Fid%2F11731382%2Fheight%2F90%2Ftheme%2Fcustom%2Fthumbnail%2Fyes%2Fdirection%2Fforward%2Frender-playlist%2Fno%2Fcustom-color%2F499dff%2F&amp;url=http%3A%2F%2Ffmlfpl.libsyn.com%2Fep-213-on-to-gw10-trusting-the-process&amp;image=http%3A%2F%2Fstatic.libsyn.com%2Fp%2Fassets%2Fplatform%2Fwebsuite%2Fstitcher.png&amp;key=2aa3c4d5f3de4f5b9120b660ad850dc9&amp;type=text%2Fhtml&amp;schema=libsyn"" width=""600"" height=""90"" scrolling=""no"" frameborder=""0"" allow=""autoplay; fullscreen"" allowfullscreen=""true""&gt;&lt;/iframe&gt;', 'width': 600, 'scrolling': False, 'height': 90}",http://fmlfpl.libsyn.com/ep-213-on-to-gw10-trusting-the-process,/r/FantasyPL/comments/dl8njz/fml_fpl_ep_213_on_to_gw10_trusting_the_process/,3,10,10,it s a little bit of frustration and darkness on today s episode but mostly it s just a simple job with alon s fitness and the return to normal podding n n how d we do pre shu v ar in gw n top topic n listener question n gw captaincy amp our transfer n anus slap outro,,,fml fpl ep on to gw trusting the processit s a little bit of frustration and darkness on today s episode but mostly it s just a simple job with alon s fitness and the return to normal podding n n how d we do pre shu v ar in gw n top topic n listener question n gw captaincy amp our transfer n anus slap outro
4,0,kayeloo,gw bookie review clean sheet amp goal scoring odds,amp x b amp x b amp x b http i redd it rqutepdh zt png amp x b http i redd it qtr yrlj zt png major favorite lost clean sheet but what can i say lad about goalscorers tremendously autumnal gameweek clean sheet probability http www reddit com r fantasypl comment dip b clean sheet probability gw http www reddit com r fantasypl comment dip b clean sheet probability gw goal scorer http www reddit com r fantasypl comment dju tq bookie goalscorer odds gw http www reddit com r fantasypl comment dju tq bookie goalscorer odds gw,self.FantasyPL,Statistics,2019-10-22,,{},https://www.reddit.com/r/FantasyPL/comments/dl8e20/gw9_bookies_review_clean_sheet_goal_scoring_odds/,/r/FantasyPL/comments/dl8e20/gw9_bookies_review_clean_sheet_goal_scoring_odds/,3,17,17,this is fine lundstram never tell me the odds,,,gw bookie review clean sheet amp goal scoring oddsamp x b amp x b amp x b http i redd it rqutepdh zt png amp x b http i redd it qtr yrlj zt png major favorite lost clean sheet but what can i say lad about goalscorers tremendously autumnal gameweek clean sheet probability http www reddit com r fantasypl comment dip b clean sheet probability gw http www reddit com r fantasypl comment dip b clean sheet probability gw goal scorer http www reddit com r fantasypl comment dju tq bookie goalscorer odds gw http www reddit com r fantasypl comment dju tq bookie goalscorer odds gwthis is fine lundstram never tell me the odds
5,0,3amz,all fpl bonus point have been added so here s the final dream team for gw,pickford alonso o connell ute ca steve co aed coc ae eyol dat david sil bernard ings,i.redd.it,,2019-10-22,,{},https://i.redd.it/y1sx8mgd0zt31.jpg,/r/FantasyPL/comments/dl83np/all_fpl_bonus_points_have_been_added_so_heres_the/,22,28,28,the total cost of this team is m lmao would be interesting to see the combined ownership of that lot it s here lad the new template i remember seeing a post on here about picking jamaat for this gw wish i did summarises why this season ha been so frustrating for many imo what an absurd best xi so tough to predict i have none of these people no this is not how you are supposed to play the game http i kym cdn com photo image newsfeed gif oh i see these great lei fuxtures many got in soy vardy maddison but it s barnes who s the one to get great sum this week up couldn t pay me to have a single one of those in my team,Pickford\n\n8\nAlonso O'Connell UTE Cas Steve Co...\n14 aed 11 9\ncoc ae [eYol dats David Sil... Bernard\n13 11 _—_———] 11 10\nIngs,,all fpl bonus point have been added so here s the final dream team for gwpickford alonso o connell ute ca steve co aed coc ae eyol dat david sil bernard ingsthe total cost of this team is m lmao would be interesting to see the combined ownership of that lot it s here lad the new template i remember seeing a post on here about picking jamaat for this gw wish i did summarises why this season ha been so frustrating for many imo what an absurd best xi so tough to predict i have none of these people no this is not how you are supposed to play the game http i kym cdn com photo image newsfeed gif oh i see these great lei fuxtures many got in soy vardy maddison but it s barnes who s the one to get great sum this week up couldn t pay me to have a single one of those in my team
6,0,strawberrygenius7,maddison is still struggling with an ankle problem he did really well to put himself out there v burnley he s hardly trained said brendan rodgers he could easily have not played but he wanted to try it we felt that if he could give u an hour he d have done really well ben dinnery,maddison is still struggling with an ankle problem he did really well to put himself out there v burnley he s hardly trained said br he could easily have not played but he wanted to try it we felt that if he could give u an hour he d have done really well ben dinnery bendinnery,twitter.com,News,2019-10-22,"{'type': 'twitter.com', 'oembed': {'provider_url': 'https://twitter.com', 'version': '1.0', 'url': 'https://twitter.com/BenDinnery/status/1186397405467611136', 'author_name': 'Ben Dinnery', 'height': 433, 'width': 350, 'html': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Maddison is still struggling with an ankle problem. &amp;quot;He did really well to put himself out there [vs Burnley]. He’s hardly trained,&amp;quot; said BR. &amp;quot;He could easily have not played, but he wanted to try it. We felt that if he could give us an hour, he’d have done really well.&amp;quot; &lt;a href=""https://twitter.com/hashtag/LCFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#LCFC&lt;/a&gt; &lt;a href=""https://t.co/dpxhxlDuda""&gt;pic.twitter.com/dpxhxlDuda&lt;/a&gt;&lt;/p&gt;&amp;mdash; Ben Dinnery (@BenDinnery) &lt;a href=""https://twitter.com/BenDinnery/status/1186397405467611136?ref_src=twsrc%5Etfw""&gt;October 21, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'author_url': 'https://twitter.com/BenDinnery', 'provider_name': 'Twitter', 'cache_age': 3153600000, 'type': 'rich'}}","{'content': '&lt;blockquote class=""twitter-video""&gt;&lt;p lang=""en"" dir=""ltr""&gt;Maddison is still struggling with an ankle problem. &amp;quot;He did really well to put himself out there [vs Burnley]. He’s hardly trained,&amp;quot; said BR. &amp;quot;He could easily have not played, but he wanted to try it. We felt that if he could give us an hour, he’d have done really well.&amp;quot; &lt;a href=""https://twitter.com/hashtag/LCFC?src=hash&amp;amp;ref_src=twsrc%5Etfw""&gt;#LCFC&lt;/a&gt; &lt;a href=""https://t.co/dpxhxlDuda""&gt;pic.twitter.com/dpxhxlDuda&lt;/a&gt;&lt;/p&gt;&amp;mdash; Ben Dinnery (@BenDinnery) &lt;a href=""https://twitter.com/BenDinnery/status/1186397405467611136?ref_src=twsrc%5Etfw""&gt;October 21, 2019&lt;/a&gt;&lt;/blockquote&gt;\n&lt;script async src=""https://platform.twitter.com/widgets.js"" charset=""utf-8""&gt;&lt;/script&gt;\n', 'width': 350, 'scrolling': False, 'height': 433}",https://twitter.com/BenDinnery/status/1186397405467611136,/r/FantasyPL/comments/dl7lx7/maddison_is_still_struggling_with_an_ankle/,5,26,26,why the fuck did you played him leave the man rest and get lundstram off my bench jesus n nhe better play on friday dont fuck u up give it a few day rest lad back in training thursday ready to play on friday eve wtf no after aguero a nd fail on my wc,,"Maddison is still struggling with an ankle problem. He did really well to put himself out there [vs Burnley]. He’s hardly trained, said BR. He could easily have not played, but he wanted to try it. We felt that if he could give us an hour, he’d have done really well. / Ben Dinnery (@BenDinnery) /",maddison is still struggling with an ankle problem he did really well to put himself out there v burnley he s hardly trained said brendan rodgers he could easily have not played but he wanted to try it we felt that if he could give u an hour he d have done really well ben dinnerymaddison is still struggling with an ankle problem he did really well to put himself out there v burnley he s hardly trained said br he could easily have not played but he wanted to try it we felt that if he could give u an hour he d have done really well ben dinnery bendinnerywhy the fuck did you played him leave the man rest and get lundstram off my bench jesus n nhe better play on friday dont fuck u up give it a few day rest lad back in training thursday ready to play on friday eve wtf no after aguero a nd fail on my wc
7,0,fromdowntownn,otamendi and walker back in training for the ucl game v atalanta,,bbc.co.uk,News,2019-10-22,,{},https://www.bbc.co.uk/sport/football/49874408,/r/FantasyPL/comments/dl7lu7/otamendi_and_walker_back_in_training_for_the_ucl/,5,19,19,otamendi is back time to sell ederson,,,otamendi and walker back in training for the ucl game v atalantaotamendi is back time to sell ederson
8,0,Ak_Ibrahim,john lundstram is the top scoring defender in the game,statistic view sorted by defender y total point v player cost sel form pt lundstram eg t shu def lei def a alexander amold x t uv def robertson i f uv def aa,i.redd.it,Statistics,2019-10-22,,{},https://i.redd.it/jksvmiwukyt31.jpg,/r/FantasyPL/comments/dl6vwu/john_lundstram_is_the_top_scoring_defender_in_the/,80,443,443,what s a king to a god before gw the sheffield united fan forum were saying he shouldn t start literally the worst advice i have ever received from any fan forum fucking benched him for his highest return ffs have to say for anyone faffing around looking for a good m midfield option with saka or whoever you don t fucking need one when you ve got lundstram acting like one when counting a a defender who is unlikely to concede more than goal a game too often will often get clean sheet and could pop up with an attacking return every so often so save m get a nakamba or dendoncker shove him last on your bench and just put this guy first on your bench unless of course you plan to start him fpl tower putting him a a m defender must be the stupidest decision they ve made shame for sheffield united s other defensive option a they all seem great too and we re unlikely to consider them apart from the odd run where a double up might be on if they keep up this defensive form soon to be sir john lundstram if he keep this up lundy you fucking beauty nice to see him on my bench every week : ( benched him and rico ffs not owning him rn feel a lot like not owning dougherty at all last season super fun love fpl super fun lundstram my master,"Statistics\n\n \n\n \n\n \n\n \n\n \n\n \n\nView Sorted by\nDefenders Y | | Total points v\nPlayer cost Sel Form Pts,\n. Lundstram\neG t SHU DEF 46 32.4% 43 45\n- ‘LEI DEF 7 a 7\n5 Alexander-Amold .\nx t UV DEF 72 30.6% 45 42\nRobertson 3\ni f uv DEF 69 19.4% 60 aa",,john lundstram is the top scoring defender in the gamestatistic view sorted by defender y total point v player cost sel form pt lundstram eg t shu def lei def a alexander amold x t uv def robertson i f uv def aawhat s a king to a god before gw the sheffield united fan forum were saying he shouldn t start literally the worst advice i have ever received from any fan forum fucking benched him for his highest return ffs have to say for anyone faffing around looking for a good m midfield option with saka or whoever you don t fucking need one when you ve got lundstram acting like one when counting a a defender who is unlikely to concede more than goal a game too often will often get clean sheet and could pop up with an attacking return every so often so save m get a nakamba or dendoncker shove him last on your bench and just put this guy first on your bench unless of course you plan to start him fpl tower putting him a a m defender must be the stupidest decision they ve made shame for sheffield united s other defensive option a they all seem great too and we re unlikely to consider them apart from the odd run where a double up might be on if they keep up this defensive form soon to be sir john lundstram if he keep this up lundy you fucking beauty nice to see him on my bench every week : ( benched him and rico ffs not owning him rn feel a lot like not owning dougherty at all last season super fun love fpl super fun lundstram my master
9,0,superstoreman,top net transfer in and out,most net transfer in name net transfer change ownership de bruyne vardy david silva hudson odoi man aubameyang abraham jim nez grealish pereira most net transfer out name net transfer change ownership pukki son salah adri n kane walker mahrez otamendi ag ero zinchenko,self.FantasyPL,Statistics,2019-10-22,,{},https://www.reddit.com/r/FantasyPL/comments/dl57v1/top_10_net_transfers_in_and_out_21102019_22102019/,/r/FantasyPL/comments/dl57v1/top_10_net_transfers_in_and_out_21102019_22102019/,23,20,20,mane could overtake salah in ownership this week man might be tempted to jump back on salah if he drop a little more for the latest price change prediction see http www fplstatistics co uk or http www fantasyfootballfix com price everyone is abandoning the son ship surprised still have kane i m looking at the net transfer this gameweek and it seems like player who haven t risen yet such a aubameyang david silva and jimenez have been transferred in more than abraham but http fplstatistics com ha them all at while abraham is gt they are also all owned le than him could someone explain this to me i am guessing that it is net transfer in a different time period or something how likely is aguero to drop this week,,,top net transfer in and outmost net transfer in name net transfer change ownership de bruyne vardy david silva hudson odoi man aubameyang abraham jim nez grealish pereira most net transfer out name net transfer change ownership pukki son salah adri n kane walker mahrez otamendi ag ero zinchenkomane could overtake salah in ownership this week man might be tempted to jump back on salah if he drop a little more for the latest price change prediction see http www fplstatistics co uk or http www fantasyfootballfix com price everyone is abandoning the son ship surprised still have kane i m looking at the net transfer this gameweek and it seems like player who haven t risen yet such a aubameyang david silva and jimenez have been transferred in more than abraham but http fplstatistics com ha them all at while abraham is gt they are also all owned le than him could someone explain this to me i am guessing that it is net transfer in a different time period or something how likely is aguero to drop this week


### 8.4 Sentiment Analysis  <a id='Sentiment Analysis'></a> 
<div align="right"><a href='#Table of Contents'>Back to Table of Contents</a></div>

In [121]:
analyser = SentimentIntensityAnalyzer()

In [122]:
df_combined['title_sentiment_neg'] = [analyser.polarity_scores(x)['neg'] for x in df_combined['title']]
df_combined['title_sentiment_neu'] = [analyser.polarity_scores(x)['neu'] for x in df_combined['title']]
df_combined['title_sentiment_pos'] = [analyser.polarity_scores(x)['pos'] for x in df_combined['title']]

In [123]:
df_combined['selftext_sentiment_neg'] = [analyser.polarity_scores(x)['neg'] for x in df_combined['selftext']]
df_combined['selftext_sentiment_neu'] = [analyser.polarity_scores(x)['neu'] for x in df_combined['selftext']]
df_combined['selftext_sentiment_pos'] = [analyser.polarity_scores(x)['pos'] for x in df_combined['selftext']]

In [124]:
df_combined['comments_sentiment_neg'] = [analyser.polarity_scores(x)['neg'] for x in df_combined['comments']]
df_combined['comments_sentiment_neu'] = [analyser.polarity_scores(x)['neu'] for x in df_combined['comments']]
df_combined['comments_sentiment_pos'] = [analyser.polarity_scores(x)['pos'] for x in df_combined['comments']]

In [125]:
df_combined['combined_sentiment_neg'] = [analyser.polarity_scores(x)['neg'] for x in df_combined['combined']]
df_combined['combined_sentiment_neu'] = [analyser.polarity_scores(x)['neu'] for x in df_combined['combined']]
df_combined['combined_sentiment_pos'] = [analyser.polarity_scores(x)['pos'] for x in df_combined['combined']]

### 8.5 Extracting Word Counts

In [131]:
df_combined['title_word_count'] = df_combined['title'].apply(lambda x: int(len(x.split())))
df_combined['selftext_word_count'] = df_combined['selftext'].apply(lambda x: int(len(x.split())))
df_combined['comments_word_count'] = df_combined['comments'].apply(lambda x: int(len(x.split())))
df_combined['combined_word_count'] = df_combined['combined'].apply(lambda x: int(len(x.split())))

### 8.6 Export 

In [132]:
#Fill all NAs before exporting
df_combined = df_combined.fillna('')

In [133]:
df_combined.to_csv(r".\datasets\reddit_cleaned_2.csv")

### Contined in Project3Reddit_EDA