# Trump's tone to Congress

We're going to reproduce [Trump Sounds a Different Tone in First Address to Congress](https://www.nytimes.com/interactive/2017/02/28/upshot/trump-sounds-different-tone-in-first-address-to-congress.html) from the Upshot.

**Datasource 1:** The [NRC Emotional Lexicon](http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm), a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing. 

**Datasource 2:** A database of [Trump speeches](https://github.com/PedramNavid/trump_speeches), one speech per file. There are a lot of GitHub repositories of Trump speeches, but this one is better than the vast majority.

**Datasource 3:** State of the Union addresses taken from [this repo's data directory](https://github.com/m-aleem/SOTU-Analyzer). I also cheated and pasted Trump's SOTU-y address in.

In [25]:
import pandas as pd
%matplotlib inline

## Reading in Trump's speeches

### Get a list of all of the files

In [26]:
import glob

filenames = glob.glob("trump_speeches-master/data/speech*")
filenames[:5]

['trump_speeches-master/data/speech_0.txt',
 'trump_speeches-master/data/speech_1.txt',
 'trump_speeches-master/data/speech_10.txt',
 'trump_speeches-master/data/speech_11.txt',
 'trump_speeches-master/data/speech_12.txt']

### Read them all in individually

In [27]:
speeches = [open(filename).read() for filename in filenames]
len(speeches)

56

### Create a dataframe out of the results

Instead of passing a list of dictionaries to `pd.DataFrame`, we pass a dictionary that says "here are all of the filenames" and "here are all of the texts" and it puts each list into a column.

In [28]:
speeches_df = pd.DataFrame({
    'text': speeches,
    'filename': filenames
})
speeches_df.head(3)

Unnamed: 0,filename,text
0,trump_speeches-master/data/speech_0.txt,Remarks Announcing Candidacy for President in ...
1,trump_speeches-master/data/speech_1.txt,Remarks at the AIPAC Policy Conference in Wash...
2,trump_speeches-master/data/speech_10.txt,Remarks at the Washington County Fair Park in ...


### Splitting out the title and content of the speech

The "text" column is formatted with first the title of the speech, then the text. Like this:

In [34]:
speeches_df.loc[0]['text'][:200]

"Remarks Announcing Candidacy for President in New York City\nTrump: Wow. Whoa. That is some group of people. Thousands.So nice, thank you very much. That's really nice. Thank you. It's great to be at T"

We're going to split those out into multiple columns, then delete the original column so we don't get mixed up later.

In [35]:
speeches_df['name'] = speeches_df['text'].apply(lambda value: value.split("\n")[0])
speeches_df['content'] = speeches_df['text'].apply(lambda value: value.split("\n")[1])
del speeches_df['text']
speeches_df.head(2)

Unnamed: 0,filename,name,content
0,trump_speeches-master/data/speech_0.txt,Remarks Announcing Candidacy for President in ...,Trump: Wow. Whoa. That is some group of people...
1,trump_speeches-master/data/speech_1.txt,Remarks at the AIPAC Policy Conference in Wash...,Good evening. Thank you very much. I speak to...


# How does Trump sound?

Let's analyze by counting words.

We would use the code below to count all of his words. **Do we really want all of them?**

```python
from sklearn.feature_extraction.text import CountVectorizer

vec = CountVectorizer()
matrix = vec.fit_transform(speeches_df['content'])
vocab = vec.get_feature_names()
wordcount_df = pd.DataFrame(matrix.toarray(), columns=vocab)
wordcount_df.head()
```

# Reading in the SOTU addresses

Pretty much the same thing as what we did with Trump!

In [36]:
# Get the filenames
# Read them in
# Create a dataframe from the results
filenames = glob.glob("SOTU/*.txt")
contents = [open(filename).read() for filename in filenames]
sotu_df = pd.DataFrame({
    'content': contents,
    'filename': filenames
})
sotu_df.head(3)

Unnamed: 0,content,filename
0,Gentlemen of the Congress:\n\nIn pursuance of ...,SOTU/1913.txt
1,GENTLEMEN OF THE CONGRESS:\n\nThe session upon...,SOTU/1914.txt
2,GENTLEMEN OF THE CONGRESS:\n\nSince I last had...,SOTU/1915.txt


### Add a column for the name 

We don't have a name for these, so we'll just use the filename.

In [39]:
sotu_df['name'] = sotu_df['filename']
sotu_df.head()

Unnamed: 0,content,filename,name
0,Gentlemen of the Congress:\n\nIn pursuance of ...,SOTU/1913.txt,SOTU/1913.txt
1,GENTLEMEN OF THE CONGRESS:\n\nThe session upon...,SOTU/1914.txt,SOTU/1914.txt
2,GENTLEMEN OF THE CONGRESS:\n\nSince I last had...,SOTU/1915.txt,SOTU/1915.txt
3,GENTLEMEN OF THE CONGRESS:\n\nIn fulfilling at...,SOTU/1916.txt,SOTU/1916.txt
4,Gentlemen of the Congress:\n\nEight months hav...,SOTU/1917.txt,SOTU/1917.txt


# How do State of the Unions sound?

Let's analyze by counting words.

# Comparing SOTU vs Trump