# All the President's Moods

During his tenure as US President, Donald Trump maintained a steady presence on Twitter. Since the words of the politicians tend to affect the behavior of other people, including markets, there is a sizable literature quantifying the effect of politician speeches on market behavior in the aggregate. 

Unllke other politicians, Trump's tweets were both impassioned and plentiful. This means Twitter can give a large volume of reactions to work with that have a distribution of "sentiment."

In this assignment:
1. (2 points) Load a json file of all the president's tweets from August 30 to November 5th. After loading them, put the following information into a dataframe:
    - The full text of the Tweet. (string)
    - Any users mentioned in the Tweet. (string)
    - The timestamp. (datetime)
    - The date (YYYY-MM-DD) of the Tweet. (date)
    - Retweets (numeric)
    - Favorites (numeric)
    - Whether the Tweet was censored, meaning zero retweets (binary/boolean).
    
   __Hint: list comprehensions can be your best friend!__
   
2. (2 points) Create a "clean_text" column by doing the following to "full_text":
    - removing links, which are strings that start with "http"
    - removing hashtags, which are strings that start with "#"
    - removing mentions, which are strings that start with "@"
    - converting text to lower case.
    - removing punctuation.
    
   Now report:
    - The top 10 most common words (omitting stop words).
    - The top 5 Twitter accounts that are mentioned.
    - The top 5 most liked ("favorited") tweets.
    - The top 5 hashtags (hint: use a regular expression).
3. (4 points) Graph the daily volme of tweets over time, while annotating any major events.
4. (6 points) Run the tweets through sentiment analysis, and report:
    - The 20 most "negative" and "positive" words.
    - A graph of the sentiment over time.
    
5. (6 points) Trump and the Stock Market
    - Read in the time series dataset of S&P 500 closing, daily prices. Then merge this data to Trump's daily sentiment scores. __Graph both of them (EXTRA CREDIT GRAPH)__, calculate their correlation, and interpret what you see.

    - Since Trump often comments on the stock market but the stock market may also react to his tweets, it is not clear how the causal relationship should work. Describe in detail how you would answer the question: "Did President Trump's tweet sentiment influence the stock market?" Specifically describe:
        - The data you would need in addition to these data sources.
        - The way you would read in the data and manipulate it.
        - The sort of analysis or statistics you would calculate to answer the question.
    

# Homework 1: Luca Torresani
## All the President's Moods

In [1]:
import pandas as pd
import json 
import matplotlib.pyplot as plt
import matplotlib
import re
import string
from nltk.corpus import stopwords
import regex
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [2]:
# Load a json file of all the president's tweets from August 30 to November 5th.
dir_= "/c/Users/Utente/Desktop/HW1"
tweets_data= pd.read_json(r'/Users/Utente/Desktop/HW1/Trump_tweets.json')
tweets_data= tweets_data.drop_duplicates(subset=['created_at', 'full_text'], keep = 'first')

In [4]:
type(tweets_data)

pandas.core.frame.DataFrame

In [7]:
tweets_data

Unnamed: 0,created_at,id,id_str,full_text,truncated,display_text_range,entities,source,in_reply_to_status_id,in_reply_to_status_id_str,...,possibly_sensitive,lang,quoted_status_id,quoted_status_id_str,quoted_status_permalink,quoted_status,extended_entities,withheld_scope,withheld_copyright,withheld_in_countries
0,2020-11-05 15:37:40+00:00,1324375334653988864,1324375334653988864,Fmr NV AG Laxalt: ‘No Question‘ Trump Would Ha...,False,"[0, 140]","{'hashtags': [], 'symbols': [], 'user_mentions...","<a href=""http://twitter.com/download/iphone"" r...",,,...,0.0,en,,,,,,,,
1,2020-11-05 15:09:19+00:00,1324368202139357186,1324368202139357184,ANY VOTE THAT CAME IN AFTER ELECTION DAY WILL ...,False,"[0, 61]","{'hashtags': [], 'symbols': [], 'user_mentions...","<a href=""http://twitter.com/download/iphone"" r...",,,...,,en,,,,,,,,
2,2020-11-05 14:12:37+00:00,1324353932022480896,1324353932022480896,STOP THE COUNT!,False,"[0, 15]","{'hashtags': [], 'symbols': [], 'user_mentions...","<a href=""http://twitter.com/download/iphone"" r...",,,...,,en,,,,,,,,
3,2020-11-05 00:01:07+00:00,1324139647111409667,1324139647111409664,"Detroit Absentee Ballot Counting Chaos, Blocke...",False,"[0, 112]","{'hashtags': [], 'symbols': [], 'user_mentions...","<a href=""http://twitter.com/download/iphone"" r...",,,...,0.0,en,,,,,,,,
4,2020-11-05 00:00:05+00:00,1324139387546984449,1324139387546984448,Demands Arise for PA Attorney General to ‘Step...,False,"[0, 96]","{'hashtags': [], 'symbols': [], 'user_mentions...","<a href=""http://twitter.com/download/iphone"" r...",,,...,0.0,en,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1784,2020-08-30 10:37:39+00:00,1300019849540886528,1300019849540886528,GREAT PATRIOTS! https://t.co/BWGxVoBTmI,False,"[0, 15]","{'hashtags': [], 'symbols': [], 'user_mentions...","<a href=""http://twitter.com/download/iphone"" r...",,,...,0.0,en,1.299899e+18,1.299899e+18,"{'url': 'https://t.co/BWGxVoBTmI', 'expanded':...",{'created_at': 'Sun Aug 30 02:38:46 +0000 2020...,,,,
1785,2020-08-30 10:36:14+00:00,1300019490177069060,1300019490177069056,Disgraceful Anarchists. We are watching them c...,False,"[0, 102]","{'hashtags': [], 'symbols': [], 'user_mentions...","<a href=""http://twitter.com/download/iphone"" r...",,,...,0.0,en,1.299902e+18,1.299902e+18,"{'url': 'https://t.co/IvuIh6cRz5', 'expanded':...",,,,,
1786,2020-08-30 10:31:53+00:00,1300018396130611200,1300018396130611200,Democrat “Leadership” has no clue. Request hel...,False,"[0, 70]","{'hashtags': [], 'symbols': [], 'user_mentions...","<a href=""http://twitter.com/download/iphone"" r...",,,...,0.0,en,1.299935e+18,1.299935e+18,"{'url': 'https://t.co/Jifo9JwTD0', 'expanded':...",{'created_at': 'Sun Aug 30 04:59:03 +0000 2020...,,,,
1787,2020-08-30 10:28:46+00:00,1300017613377097730,1300017613377097728,ANTIFA is a Radical Left group that only wants...,False,"[0, 104]","{'hashtags': [], 'symbols': [{'text': 'FOOLS',...","<a href=""http://twitter.com/download/iphone"" r...",,,...,0.0,en,1.299933e+18,1.299933e+18,"{'url': 'https://t.co/Be8avd2wPL', 'expanded':...",{'created_at': 'Sun Aug 30 04:51:06 +0000 2020...,,,,


In [None]:
#After loading them, put the following information into a dataframe:
#The full text of the Tweet. (string)
#Any users mentioned in the Tweet. (string)
#The timestamp. (datetime)
#The date (YYYY-MM-DD) of the Tweet. (date)
tweets_data.loc[:,("")("")("entities")]