# DSI Capstone Checkin 3: Jared Delora-Ellefson

## Problem Title: Tracking Conspiracies on Reddit and Twitter during the 2016 Election

### Identify conspiracy trends in the political discussions occurring on Reddit between June 2015 and Dec 2016. This period begins right before Donald Trump declares his nomination for the presidency until 1 month after the 2016 US presidential election.  

#### This project will focus on building an Entity Recognition Model to track conspiratorial talk on reddit. The use of terms associated with these conspiracies will be tracked over the course of the election in an effort to discover when they began. A number of twitter users, including President Donald Trump talked about these conspiracies on twitter during the election. Dates of when these twitter users began talking about these conspiracies will be compared to when reddit users began to discuss them in an effort to discover any correlations.

> **The fundamental question to be answered: Did the conspiracies that plagued the 2016 begin with chatter on social media? How much growth in the amount of chatter on Reddit was due to political figures and misinformation websites tweeting about them?**

<img src="./images/milestones3.png" alt="doggo" width="1500"/>

<img src="./images/example.png" alt="doggo" width="1500"/>

**There are four conspiracy theories popular during the election that have been proven to be false. These are:**  
- Seth Rich Murder
- Pizzagate + John Podesta Pedophilia Accusations
- Hillary Clinton Mishandling of classified materials
- BlueLivesMatter (Originating in 2014 but according to the Robert Mueller investigation this topic was pushed by Russian Intelligence to sew division amoung the American population)  

[Dissecting Trump's Most Rabid Online Following](https://fivethirtyeight.com/features/dissecting-trumps-most-rabid-online-following/)  
FiveThirtyEight.com has done some work looking at Reddit behavior surrounding Donald Trump. In the above article a number of subreddits are identified as being particularly toxic along with the reasons.  
    
    
[Online Political Discourse in the Trump Era, RISHAB NITHYANAND et al](https://arxiv.org/pdf/1711.05303.pdf)    
  
The above paper is a study of incivility in discourse before and after the 2016 US presidential election. The header, in part:  

**"We identify general trends in the (in)civility and complexity of political discussions occurring on Reddit between January 2007 and
May 2017 – a period spanning both terms of Barack Obama’s presidency and the first 100 days of Donald Trump’s presidency."**

This paper used a number of models, including a spaCy entity recognition model to investigate trends in civil discourse. This paper identifies a number of subreddits whose rhetoric grew particularly incivil.
  
**These subreddits have been shown to spread misinformation and to have discussed the conspiracies noted above:**  
- r/the_donald
- r/Republican
- r/Conservative
- r/uncensorednews
- r/TheRedPill

**These five subreddits will be used for the investigation since they have already been identified as particularly uncivil during the election.**


Twitter handles associated with misinformation, known to have discussed the conspiracies identified above:  
- @realdonaldtrump  
- @realrogerstone1  
- @donaldtrumpjr   
- @BreitbartNews   
- @infowarsmedia  
- @zerohedge

All of these twitter handles have been shown to have engaged in spreading the conspiracies noted above.

**Data Sources:**  
- 2015-2016 Reddit Data Pulled ✅   
- Twitter data for handles ✅  

<img src="./images/CapstoneOverview.png" alt="doggo" width="1000"/>

In [7]:
# An initial import of Donald Trumps tweets from June 1 2015 (right before declaring his candidacy) to December 30 2016.
import pandas as pd

In [8]:
df = pd.read_csv('./data/trumptweets.csv')

In [9]:
df.shape

(7497, 7)

In [10]:
df.head()

Unnamed: 0,source,text,created_at,retweet_count,favorite_count,is_retweet,id_str
0,Twitter for Android,Russians are playing @CNN and @NBCNews for suc...,12-30-2016 22:18:18,23213,84254,False,814958820980039681
1,Twitter for iPhone,Join @AmerIcan32 founded by Hall of Fame legen...,12-30-2016 19:46:55,7366,25336,False,814920722208296960
2,Twitter for Android,Great move on delay (by V. Putin) - I always k...,12-30-2016 19:41:33,34415,97669,False,814919370711461890
3,Twitter for iPhone,My Administration will follow two simple rules...,12-29-2016 14:54:21,11330,45609,False,814484710025994241
4,Twitter for iPhone,'Economists say Trump delivered hope' https://...,12-28-2016 22:06:28,13919,51857,False,814231064847728640


In [11]:
df['created_at'] = pd.to_datetime(df['created_at'])

In [12]:
df.sort_values(by = 'created_at', ascending = True, inplace = True)

In [13]:
df.index = list(range(df.shape[0]))

In [14]:
df.head()

Unnamed: 0,source,text,created_at,retweet_count,favorite_count,is_retweet,id_str
0,Twitter for Android,@yankzpat: HEY! I hope to meet @realDonaldTrum...,2015-06-01 09:54:51,10,36,False,605311537255956480
1,Twitter for Android,I will be on @foxandfriends at 7:00 A.M. ENJOY!,2015-06-01 10:23:13,26,91,False,605318676955439104
2,Twitter for Android,@pacsgirl36: @realDonaldTrump run !!! We need ...,2015-06-01 11:29:13,13,51,False,605335287150034945
3,Twitter for Android,@jkellywwip: @realDonaldTrump killed it on @fo...,2015-06-01 11:29:21,6,25,False,605335318192091136
4,Twitter for Android,@aalucero: @pacsgirl36 I luv Donald Trump in ...,2015-06-01 12:47:42,11,29,False,605355038488313856


In [15]:
df.text[4]

'@aalucero: @pacsgirl36  I luv Donald Trump in his sleep he is leaps &amp; bounds over what we have now - I have no doubt he luvs America!'