# a7 - Tweeter Sentiment

In this assignment you will write a program to perform simple [sentiment analysis](https://en.wikipedia.org/wiki/Sentiment_analysis) of Twitter data&mdash;that is, determining the "attitude" or "emotion" (e.g., how "positive", "negative", "joyful", etc) of tweets made by a particular Twitter user. Sentiment analysis is a fascinating field: researchers have shown that the "mood" of Twitter communication [reflects biological rhythms](http://www.nytimes.com/2011/09/30/science/30twitter.html) and can even be used to [predict the stock market](http://arxiv.org/pdf/1010.3003&embedded=true). The particular analysis you'll be performing is inspired by an investigation of [personal vs. organizational tweets](http://varianceexplained.org/r/trump-tweets/) (which has become less amusing over time).

You will be implementing a Python program that performs this analysis on **real data** taken directly from a Twitter user's timeline. In the end, your script will produce output similar to the following:

```
EMOTION       % WORDS  EXAMPLE WORDS                     HASHTAGS
positive      6.16%    learn, faculty, happy             #accesstoinfoday, #indigenouspeoplesday, #idealistfair
trust         3.08%    school, faculty, happy            #indigenouspeoplesday, #diversity
anticipation  2.53%    happy, top, ready                 #indigenouspeoplesday, #informatics, #info340
joy           1.76%    happy, peace, deal                #indigenouspeoplesday, #accesstoinfoday
surprise      0.99%    deal, award, surprised            #suzzallolibrary, #nobrainer
negative      0.88%    fall, rejection, outstanding        
sadness       0.55%    fall, rejection, problem            
disgust       0.44%    rejection, weird, finally           
fear          0.44%    rejection, surprise, problem        
anger         0.33%    rejection, disaster, involvement  #mlis
```

(The "hashtags" column is optional extra credit).

Fill in the below code cells as specified. Note that cells may utilize variables and functions defined in previous cells; we should be able to use the `Kernal > Restart & Clear All` menu item followed by `Cell > Run All` to execute your entire notebook and see the correct output.

## The Data
You'll be working with two different pieces of data for this assignment.

First, you'll be loading tweet data taken directly from [Twitter's API](https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline). You can find an example of this tweet data in the **`uw_ischool_sample.py`** file inside the `data/` folder. The below cell will import this data as a variable `SAMPLE_TWEETS` from the provided _module_ file:

In [1]:
# import from uw_ischool_sample file in the `data/` package (folder)
from data.uw_ischool_sample import SAMPLE_TWEETS

The data is represented as one giant **list of dictionaries**: the **list** contains a sequence of **dictionaries**, where each dictionary represents a tweet. Each dictionary contains many different _value_, some of which themselves may be dictionaries.

Print out the first three elements from the `SAMPLE_TWEETS` list to see what information can be found. The most relevant value is the `"text"` of the tweet.
- The Twitter API actually provides a lot more information about each tweet; I've stripped it down to only the most important properties for readability. Each dictionary is a proper subset of the full data you'd get from Twitter.
- Because of the source of the sentiment data, your analysis will be biased and only support English-language speakers. Nevertheless, Twitter is an international community so you may encounter non-English characters and words. You'll be working with real-world data and it will be messy!

In [2]:
[d['text'] for d in SAMPLE_TWEETS[:3]]

['RT @UWAPress: Happy #IndigenousPeoplesDay https://t.co/YmU9e9lj7v',
 "We'll be at the #IdealistFair this evening on the Seattle U. campus. Come and learn about our graduate programs: https://t.co/et1HrQshmr",
 'RT @iYouthUW: iYouth Tips for 1st\xa0Years https://t.co/K4SCIEhJ8k https://t.co/p4lbC6Jb5o']

The second piece of data you'll be working with is a set of **word-sentiments**&mdash;a list of English-language words and what emotions (e.g., "joy", "anger") [are associated with them](http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm).

- The [`nltk`](https://github.com/nltk/nltk/wiki/Sentiment-Analysis) library you used in a previous assignment does support sentiment analysis. However, for practice and extendability you'll be doing a more "manual" analysis using the provided data file for this assignment.

`import` the word sentiments as a variable **`SENTIMENTS`** from the **`data.sentiments_nrc`** module. You should also import the `EMOTIONS` variable provided by the same module: this is a _list_ of possible emotions. You can inspect the variables (e.g., print them out) to confirm that you have imported them.

In [3]:
from data.sentiments_nrc import SENTIMENTS
from data.sentiments_nrc import EMOTIONS

In [4]:
EMOTIONS

['positive',
 'negative',
 'anger',
 'anticipation',
 'disgust',
 'fear',
 'joy',
 'sadness',
 'surprise',
 'trust']

In [5]:
SENTIMENTS

{'abacus': {'trust': 1},
 'abandon': {'sadness': 1, 'fear': 1, 'negative': 1},
 'abandoned': {'sadness': 1, 'negative': 1, 'anger': 1, 'fear': 1},
 'abandonment': {'surprise': 1,
  'sadness': 1,
  'negative': 1,
  'anger': 1,
  'fear': 1},
 'abba': {'positive': 1},
 'abbot': {'trust': 1},
 'abduction': {'sadness': 1, 'surprise': 1, 'fear': 1, 'negative': 1},
 'aberrant': {'negative': 1},
 'aberration': {'disgust': 1, 'negative': 1},
 'abhor': {'disgust': 1, 'negative': 1, 'anger': 1, 'fear': 1},
 'abhorrent': {'disgust': 1, 'negative': 1, 'anger': 1, 'fear': 1},
 'ability': {'positive': 1},
 'abject': {'disgust': 1, 'negative': 1},
 'abnormal': {'disgust': 1, 'negative': 1},
 'abolish': {'anger': 1, 'negative': 1},
 'abolition': {'negative': 1},
 'abominable': {'disgust': 1, 'fear': 1, 'negative': 1},
 'abomination': {'disgust': 1, 'negative': 1, 'anger': 1, 'fear': 1},
 'abort': {'negative': 1},
 'abortion': {'sadness': 1, 'disgust': 1, 'fear': 1, 'negative': 1},
 'abortive': {'sadnes

## Text Sentiment
All of the sentiment analysis is based on the individual _words_ in the text. Thus you will need to will determine which words in a tweet have which sentiments.

- Note that the assignment explicitly does _not_ tell you what to name functions, what arguments they should take or values they should return: your task is to determine appropriate functions and arguments from the (guided) requirements! Use multiple functions for clarity, give them all informative names, and include a **doc string** to explain what each does.

Define a function that take a tweet's text (a string) and splits it up into a list of individual words.

- You should **not** use the `nltk` library to tokenize words. Instead, your analysis should split up the text using the [regular expression](https://www.regular-expressions.info/) **`"\W+"`** as a separator to "split up" the words by (rather than just a blank space). You can do this by using the [re.split()](https://docs.python.org/3/library/re.html#re.split) function (from the `re` module). This separator will cause your spitting to exclude punctuation and provide a reasonable (but not perfect!) list of words to consider. 

- All of the words in the sentiment dictionary are _lower case_, so you'll need to **map** your resulting words to be lower case. You will also need to **filter** out any words that have 1 letter or fewer. Your function must use a **list comprehension** to do this.

The string `"Amazingly, I prefer a #rainy day to #sunshine."` should produce a list with 6 lower-case words in it.

In [17]:
'''
import re

def lowertext(t):
    lowert = t.lower()
    return lowert

def split_text(text):
    individualwords = re.split('\W+',text)
    doc = []
    for i in individualwords:
        lower = lowertext(i)
        if len(lower)>1:
            doc.append(lower)         
    return doc
split_text('Amazingly, I prefer a #rainy day to #sunshine.')    
'''

"\nimport re\n\ndef lowertext(t):\n    lowert = t.lower()\n    return lowert\n\ndef split_text(text):\n    individualwords = re.split('\\W+',text)\n    doc = []\n    for i in individualwords:\n        lower = lowertext(i)\n        if len(lower)>1:\n            doc.append(lower)         \n    return doc\nsplit_text('Amazingly, I prefer a #rainy day to #sunshine.')    \n"

In [7]:
import re
def text_to_word(string):
    word_list = re.split(r'\W+', string)
    word_list = list(map(lambda x:x.lower(),word_list))
    word_list = list(filter(lambda x: len(x)>1,word_list))
    return word_list
text_to_word("Amazingly, I prefer a #rainy day to #sunshine.")

['amazingly', 'prefer', 'rainy', 'day', 'to', 'sunshine']

Define a function that **filters** a list of the words to get only those words that contain a specific emotion. Your function must use a **list comprehension** to do this.
- You can determine whether a word has a particular emotion by looking it up in the imported `SENTIMENTS` variable. Use the word as the "key" to find the dictionary of emotions for that word, and then use the emotion as the key to _that_ dictionary to determine if the word has it. 
    
    Do **not** use the `in` operator or a loop to search the list of keys one by one. Instead, use bracket notation or the `get()` method to "look up" a key directly (this is more efficient and is why the emotions are nested dictionaries in the first place!).
    
For testing, the `"positive"` words extracted from `"Amazingly, I prefer a #rainy day to #sunshine."` are `["amazingly", "prefer", "sunshine"]`.

In [18]:
'''
def emotion_word(sentence,emotion):
    wordlist = split_text(sentence)
    sentimentlist =[]
    for i in wordlist:
        if SENTIMENTS.get(i) is not None and SENTIMENTS.get(i).get(emotion)==1:
            sentimentlist.append(i)
    #print(positivelist)
    return sentimentlist
emotion_word("Amazingly, I prefer a #rainy day to #sunshine.","positive")
'''

'\ndef emotion_word(sentence,emotion):\n    wordlist = split_text(sentence)\n    sentimentlist =[]\n    for i in wordlist:\n        if SENTIMENTS.get(i) is not None and SENTIMENTS.get(i).get(emotion)==1:\n            sentimentlist.append(i)\n    #print(positivelist)\n    return sentimentlist\nemotion_word("Amazingly, I prefer a #rainy day to #sunshine.","positive")\n'

In [8]:
def word_to_emotion(string, emotion):
    emotion_list = []
    for word in text_to_word(string):
        # to solve the error of "'NoneType' object has no attribute 'get'"
        if SENTIMENTS.get(word,None)!= None:
            if SENTIMENTS.get(word,None).get(emotion) == 1:
                emotion_list.append(word)
    return emotion_list
word_to_emotion("Amazingly, I prefer a #rainy day to #sunshine.",'positive')

['amazingly', 'prefer', 'sunshine']

Define a function that determines which words from a list have _each_ emotion (i.e., the "emotional" words), producing a dictionary of that information. For example, the words extracted from `"Amazingly, I prefer a #rainy day to #sunshine."` should produce a dictionary that looks like:

```
{
 'anger': [],
 'anticipation': [],
 'disgust': [],
 'fear': [],
 'joy': ['amazingly', 'sunshine'],
 'negative': [],
 'positive': ['amazingly', 'prefer', 'sunshine'],
 'sadness': ['rainy'],
 'surprise': ['amazingly'],
 'trust': ['prefer']
}
```
    
(Note the empty lists for emotions that have no matching words).
    
- You can use the imported `EMOTIONS` variable to have a list of emotions to iterate through.
- Use the function you defined in the previous step to help you out!
- Using a [dictionary comprehension](https://www.smallsurething.com/list-dict-and-set-comprehensions-by-example/) is a nice way to do this, but is not required.

In [19]:
'''
emodic = {}
def emotionlist(text):
    wordlist = split_text(text)
    for i in EMOTIONS:
        #Build the keyset for the emodic
        emodic[i]=[]
    for i in wordlist:
        #Iterate the wordlist, check if the sentiments have the key of i
        if SENTIMENTS.get(i) is not None:
            #for each i, we iterate the keyset of key i in Sentiments
            for j in SENTIMENTS.get(i).keys():
                #Add each word i to the corresponding list in emodic
                emodic.get(j).append(i)
    return emodic
emotionlist("Amazingly, I prefer a #rainy day to #sunshine.")
'''

'\nemodic = {}\ndef emotionlist(text):\n    wordlist = split_text(text)\n    for i in EMOTIONS:\n        #Build the keyset for the emodic\n        emodic[i]=[]\n    for i in wordlist:\n        #Iterate the wordlist, check if the sentiments have the key of i\n        if SENTIMENTS.get(i) is not None:\n            #for each i, we iterate the keyset of key i in Sentiments\n            for j in SENTIMENTS.get(i).keys():\n                #Add each word i to the corresponding list in emodic\n                emodic.get(j).append(i)\n    return emodic\nemotionlist("Amazingly, I prefer a #rainy day to #sunshine.")\n'

In [9]:
def emotions_in_word(string):
    dict = {}
    for emo in EMOTIONS:
        dict[emo] = word_to_emotion(string, emo)
    return dict
emotions_in_word("Amazingly, I prefer a #rainy day to #sunshine.")

{'positive': ['amazingly', 'prefer', 'sunshine'],
 'negative': [],
 'anger': [],
 'anticipation': [],
 'disgust': [],
 'fear': [],
 'joy': ['amazingly', 'sunshine'],
 'sadness': ['rainy'],
 'surprise': ['amazingly'],
 'trust': ['prefer']}

Define a function that gets a list of the "most common" words in a list. This should be a new list containing each word in the original list, in descending order by how many times that word appears in the orignal list.

- You can determine the frequency (number of occurrences) of a word with a similar process to what you did with digits in the previous assignment.
- You should use the `sorted()` function to [sort](https://wiki.python.org/moin/HowTo/Sorting) the individual words. This function take a **`key`** argument which should be passed a [_callback function_](https://wiki.python.org/moin/HowTo/Sorting#Key_Functions) that can return a "transformed" value that you wish to sort by (e.g., which element in a tuple). An anonymous lambda function works well for this.

You can test this function with any list of "words" with repeated entries: `['a','b','c','c','c','a']` for example.


In [None]:
'''
from collections import Counter
import operator

def word_count(t):
    my_string = t
    my_dict = {}
    for item in my_string:
        if item in my_dict:
            my_dict[item] += 1
        else:
            my_dict[item] = 1
    return my_dict
word_count(['aba','baa','cdd','cee','cee','aba'])

def word_frequency(original_list):
    newlist = Counter()
    for i in original_list:
        newlist[i]+=1
    return newlist
word_frequency(['aba','baa','cdd','cee','cee','aba'])  

l = dict(word_frequency(['aba','baa','cdd','cee','cee','aba']))
l.values()

sorted_l = dict(sorted(l.items(), key=lambda kv: kv[1],reverse=True))
sorted_l
'''


In [10]:
def word_frequency(alist):
    dict = {}
    for item in alist:
        if not item in dict:
            dict[item] = alist.count(item)
    sorted_items = sorted(dict.items(),key=lambda x: x[1],reverse=True)
    return sorted_items

word_frequency(['a','b','c','c','c','a'])

[('c', 3), ('a', 2), ('b', 1)]

## Tweet Statistics
Once you are able to determine the sentiment of an individual string of text (e.g., a single tweet's content), you can analyze an entire set of tweets from the user's timeline. Note that I am _not_ telling you how to break this process up into individual functions&mdash;you should be able to determine that on your own by this point!

Define a function (e.g., `analyze_tweets()`) that takes as an argument a **list** of tweet data (with the same structure as the imported `SAMPLE_TWEETS` variable), and _returns_ the data of interest to display in a table like the one at the very top of the notebook. In particular, you'll need to produce the following information **for each emotion**:

1. The percentage of words _across all tweets_ that have that emotion
2. The most common words _across all tweets_ that have that emotion (in order!)

(Think carefully: should this data be stored in a _list_ or a _dictionary_?)

Some tips for this task:

- You can and should create some "helper" functions to break up this task even further. You can define those functions in the same notebook cell or you may insert additional cells.

- You'll need to use your previous functions to get the _list of words_ and _dictionary of emotional words_ for each tweet. I recommend you assign the results of these methods as **new values** of the respective tweet dictionary (so each tweet dictionary would gain a `words` key, for example).

- In order to get the percentage of emotional words, divide the number of words that have that emotion by the total number of words _across all the tweets_. Counting how many total words are in the tweet set is a **reducing** operation: you should use the `reduce()` function for this.

- For each emotion, you'll need to get a list of the words _across all the tweets_ that have that emotion (in order to determine how many there are for the percentage, as well as which are most common). This is another **reducing** operation; you should use the `reduce()` function to _add up_ all of these words (alternatively, the built-in `sum()` function can be used here).

You can test your function by passing in the `SAMPLE_TWEETS` variable as an argument and checking if your returned data has the same numbers as in the table at the top of the page. Note that only the first 3 most common words are listed in the example (and may be in a different order in the case of frequency ties!).

In [None]:
allwords=[]
def getAllWords(t):
    for i in t:
        tempwords=split_text(i)
        for j in tempwords:
            allwords.append(j)
    return allwords


In [None]:
sample_list = []
for i in SAMPLE_TWEETS:
    sample_list.append(i['text'])
sample_list

getAllWords(sample_list)

In [None]:
len(getAllWords(sample_list))

In [None]:
emomo = []
def getEmotions(ListofWords):   
    for i in ListofWords:
        if i in SENTIMENTS.keys():
            emomo.append(i)
    return emomo
getEmotions(getAllWords(sample_list))
             

In [None]:
len(emomo)

In [None]:
emotionset = set()
def analyze_tweets(listofwords):
    
    emotiondic = {}
    emotionstat = {}
    totalwords = len(listofwords)
    for i in emomo:
        for j in SENTIMENTS.get(i):
            emotionset.add(j)
    # build the emotion percentage
    for i in emotionset:
        emotionstat[i] = 0
        for j in emomo:
            if SENTIMENTS.get(j).get(i) is not None:
                emotionstat[i]+=1
    for i in emotionset:
        emotiondic[i] = []
        for j in emomo:
            if SENTIMENTS.get(j).get(i) is not None and j not in emotiondic[i]:
                emotiondic[i].append(j)
    for i in emotionstat.keys():
        emotionstat[i] = '{percent:.2%}'.format(percent=emotionstat[i]/totalwords)
    return emotionstat,emotiondic
analyze_tweets(getAllWords(sample_list))

In [None]:
for i in analyze_tweets(getAllWords(sample_list))[0].keys():
    print(i)
for k in analyze_tweets(getAllWords(sample_list))[0].values():
    print(k)
for j in analyze_tweets(getAllWords(sample_list))[1].values():
    print(j[:3])

In [11]:
from functools import reduce

def count_total(text_list):
    whole_list = [text_to_word(item) for item in text_list]
    return len(reduce(lambda x, y: x+y, whole_list))

def emotions_in_hashtag(string,hashtag):
    emodic = emotions_in_word('')
    for emo in EMOTIONS:
        emodic[emo] = word_to_emotion(string,emo)
        if len(emodic[emo]) > 0:
            emodic[emo] = hashtag
    return emodic

def get_table(text_list,hashtag_list):
    ans = emotions_in_word('')
    ans_hash = emotions_in_word('')
    for i in range(len(text_list)):
        dict_emotion = emotions_in_word(text_list[i])
        dict_emotion_hashtag = emotions_in_hashtag(text_list[i],hashtag_list[i])
        for emo in EMOTIONS:
            ans[emo] += dict_emotion[emo]
            ans_hash[emo] += dict_emotion_hashtag[emo]

    dict_word_freq = emotions_in_word('')
    dict_hash_freq = emotions_in_word('')
    table_dict = {}
    total_words = count_total(text_list)
    for emo in EMOTIONS:
        dict_word_freq[emo] = word_frequency(ans[emo])
        dict_hash_freq[emo] = word_frequency(ans_hash[emo]) 
        # "if": to solve the error of "reduce() of empty sequence with no initial value"
        if dict_word_freq[emo]!=[]: 
            eg_words = reduce(lambda x,y:x + ', ' + y,[dict_word_freq[emo][i][0] for i in range(len(dict_word_freq[emo])) if i < 3])
        if dict_hash_freq[emo]!=[]:
            eg_hashtags = reduce(lambda x,y:x + ', ' + y,[dict_hash_freq[emo][i][0] for i in range(len(dict_hash_freq[emo])) if i < 3])
        if dict_word_freq[emo]!=[]:
            table_dict[emo] = reduce(lambda x,y:x + y,[i[1] for i in dict_word_freq[emo]])
        table_dict[emo] = [table_dict[emo]/total_words,eg_words,eg_hashtags]
    return table_dict


def analyze_tweets(SAMPLE_TWEETS):
    tweets_list = [tweet['text'] for tweet in SAMPLE_TWEETS]
    hashtags = []
    for i in SAMPLE_TWEETS:
        temp = []
        for j in i['entities']['hashtags']:
                temp.append('#' + j['text'].lower())
        hashtags.append(temp)
    table_dict=get_table(tweets_list,hashtags)
    sorted_table = sorted(table_dict.items(), key=lambda x: x[1],reverse=True)
    return sorted_table
analyze_tweets(SAMPLE_TWEETS)

[('positive',
  [0.061606160616061605,
   'learn, faculty, happy',
   '#accesstoinfoday, #indigenouspeoplesday, #idealistfair']),
 ('trust',
  [0.030803080308030802,
   'school, faculty, happy',
   '#indigenouspeoplesday, #diversity']),
 ('anticipation',
  [0.025302530253025302,
   'happy, top, ready',
   '#indigenouspeoplesday, #informatics, #info340']),
 ('joy',
  [0.0176017601760176,
   'happy, peace, deal',
   '#indigenouspeoplesday, #accesstoinfoday']),
 ('surprise',
  [0.009900990099009901,
   'deal, award, surprised',
   '#suzzallolibrary, #nobrainer']),
 ('negative',
  [0.0088008800880088,
   'fall, rejection, outstanding',
   '#accesstoinfoday, #indigenouspeoplesday, #idealistfair']),
 ('sadness',
  [0.005500550055005501,
   'fall, rejection, problem',
   '#indigenouspeoplesday, #accesstoinfoday']),
 ('disgust',
  [0.0044004400440044,
   'rejection, weird, finally',
   '#indigenouspeoplesday, #informatics, #info340']),
 ('fear',
  [0.0044004400440044,
   'rejection, surprise, 

### Displaying Tweets 

Once you've analyzed the tweets, you will need to _display_ that information as a printed table (as in the example at the top of the page).

Define another function to display this table (your function should take as an argument the data structure returned from your "analysis" function).

This function will need to print out the table. Using the [string formatting](https://docs.python.org/3/library/string.html#format-examples) language (via the **`.format()`** string method) makes it possible to have equally sized "columns" of data. For more examples, [this tutorial](https://www.digitalocean.com/community/tutorials/how-to-use-string-formatters-in-python-3) is pretty good (check out the "Padding Variable Substitutions" section).


A few notes about formatting this output:

- For your reference, the example table at the top of the page uses `14` characters for the first column, `11` characters for the second,  `35` for the third, and the "remainder" for the fourth. You are not required to match these numbers.

- The percentage should be formatted with two decimals of precision (e.g., `1.23%`).

- Both the example sentiment words (and the hashtags, if you include them) should be outputted as a _comma-separated list_ with spaces between them&mdash;and no square brackets). The `join()` string method is good for converting lists to formatted strings. Both lists should also be limited to the 3 most common items for space.

- Make sure to include `#` in front of the hashtags!

- You should **sort** the emotions by percentage of words (descending), as in the example at the top. 

In [20]:
'''
def showEmotionData():
    print('{0:<14}\t{1:<11}\t{2:<35}'.format("EMOTION", "% WORDS","EXAMPLE WORDS"))
    for i in analyze_tweets(getAllWords(sample_list))[1].values():
        print('{0:<14}\t{1:<11}\t{2:<35}'.format(i[:3], "% WORDS","EXAMPLE WORDS")) 

showEmotionData()
'''

'\ndef showEmotionData():\n    print(\'{0:<14}\t{1:<11}\t{2:<35}\'.format("EMOTION", "% WORDS","EXAMPLE WORDS"))\n    for i in analyze_tweets(getAllWords(sample_list))[1].values():\n        print(\'{0:<14}\t{1:<11}\t{2:<35}\'.format(i[:3], "% WORDS","EXAMPLE WORDS")) \n\nshowEmotionData()\n'

In [12]:
def print_table(tweetlist):
    print("{:13s} {:8s} {:34} {:40s}".format('EMOTION','% WORDS','EXAMPLE WORDS','HASHTAGS'))
    for tweet in tweetlist:
        print("{:12s} {:5.2f}%    {:34} {:40s}".format(tweet[0],tweet[1][0]*100,tweet[1][1],tweet[1][2]))
print_table(analyze_tweets(SAMPLE_TWEETS))

EMOTION       % WORDS  EXAMPLE WORDS                      HASHTAGS                                
positive      6.16%    learn, faculty, happy              #accesstoinfoday, #indigenouspeoplesday, #idealistfair
trust         3.08%    school, faculty, happy             #indigenouspeoplesday, #diversity       
anticipation  2.53%    happy, top, ready                  #indigenouspeoplesday, #informatics, #info340
joy           1.76%    happy, peace, deal                 #indigenouspeoplesday, #accesstoinfoday 
surprise      0.99%    deal, award, surprised             #suzzallolibrary, #nobrainer            
negative      0.88%    fall, rejection, outstanding       #accesstoinfoday, #indigenouspeoplesday, #idealistfair
sadness       0.55%    fall, rejection, problem           #indigenouspeoplesday, #accesstoinfoday 
disgust       0.44%    rejection, weird, finally          #indigenouspeoplesday, #informatics, #info340
fear          0.44%    rejection, surprise, problem       #indigenouspe

## Getting Live Data
This is all good and well, but the real payoff is to be able to see the sentiments of tweets taken directly from the Twitter feed of real users!

Define _another_ function that takes in a Twitter username as an argument and then returns a list of dictionaries representing the tweets made by that user.

Normally you would fetch this data by sending a request directly to the web service's API (e.g., to the the [statuses/user_timeline](https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline) endpoint provided by the Twitter API at `https://api.twitter.com/1.1/statuses/user_timeline`). However, Twitter includes access controls so that only registered developers are allowed to send requests. While it is possible to register as a developer and access Twitter [directly through Python](https://python-twitter.readthedocs.io/en/latest/), this adds an extra level of complexity to the assignment.

Instead, I've set up a [proxy](https://en.wikipedia.org/wiki/Proxy) that has all the access keys specified which you can use to search Twitter. This proxy is available at:

**<https://faculty.washington.edu/joelross/proxy/twitter/timeline/>**

Send a request to _that_ url instead of `https://api.twitter.com/1.1/statuses/user_timeline`, and it will redirect your request with the proper authentication to Twitter, and then give you back whatever JSON Twitter's API responded with.

- You specify the same request parameters as you would when accessing Twitter directly. The request takes a `screen_name` request parameter which you can assign the given username. You can also specify the `count` parameter if you want to get more results back (up to 200); see the [documentation](https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline) for details and other options you are welcome to use (just document them in your function's **docstring**).

- **WARNING:** The proxy I have set up is **rate-limited** so that it can only accept 900 requests every 15 minutes. If all 40 students are working rapidly on the assignment at the same time, you may find yourself needing to wait a few minutes and try again. You are alternatively welcome to set up your own developer account and API keys; just make sure you don't put the keys under version control and upload them to GitHub!

You can download the timeline data from Twitter using the [requests](http://docs.python-requests.org/en/master/user/quickstart/) module: send a `GET` request to the [statuses/user_timeline](https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline) endpoint provided by the Twitter API, and then use the `.json()` method to extract the JSON response as a Python _list_ or _dictionary_ value you can work with.

In [21]:

'''import re
import requests
from functools import reduce
def download(username):
    screen_name = {'screen_name': username}
    r = requests.get('https://faculty.washington.edu/joelross/proxy/twitter/timeline', params=screen_name)
    return r.json()
    '''

"import re\nimport requests\nfrom functools import reduce\ndef download(username):\n    screen_name = {'screen_name': username}\n    r = requests.get('https://faculty.washington.edu/joelross/proxy/twitter/timeline', params=screen_name)\n    return r.json()\n    "

In [13]:
import requests
def download_tweets(UserName,count):
    response = requests.get('https://faculty.washington.edu/joelross/proxy/twitter/timeline?screen_name='+UserName+'&count='+str(count))
    return response.json()

Define one last "main" function that will [prompt the user](https://docs.python.org/3/library/functions.html#input) for a Twitter username. The function should then call your "download" function to fetch the tweets, and pass the returned tweet data into your "analyze" and "show" functions in order to display your sentiment analysis of the user's timeline. 

**ADDITIONALLY**, `if` the user specifies `SAMPLE` (all caps) as the username, the function should instead show the analysis for the `SAMPLE_TWEETS` (this will help us out with grading).

In [15]:
def main():
    UserName = input('user name: ')
    if UserName =='SAMPLE':
        print_table(analyze_tweets(SAMPLE_TWEETS))
    else:
        count = input('number of tweets to be analyzed: ')
        tweets = download_tweets(UserName,count)
        print_table(analyze_tweets(tweets))
main()

Please input user name: UW_iSchool
Please input number of tweets to be analyzed: 50
EMOTION       % WORDS  EXAMPLE WORDS                      HASHTAGS                                
positive      6.39%    working, technology, talk          #accesstoinformation, #da2i, #ischoolcapstone
trust         3.65%    team, good, present                #da2i, #ischoolcapstone, #accesstoinformation
joy           2.23%    good, present, create              #accesstoinformation, #da2i, #comicbooks
anticipation  1.93%    good, present, engaged             #ischoolcapstone, #accesstoinformation, #da2i
surprise      1.42%    good, present, lucky               #da2i, #datascience, #accesstoinformation
negative      0.81%    compulsion, hate, combat           #facebook                               
fear          0.71%    hate, combat, defense              #ischoolcapstone, #accesstoinformation, #facebook
sadness       0.51%    music, hate, missing               #als                                    


Use your main function to try analyzing the timelines of different users and comparing their results. Are the current sentiments of the [iSchool](https://twitter.com/uw_ischool) and [CSE](https://twitter.com/uwcse) different in interesting ways?

In [22]:
main()

Please input user name: uwcse
Please input number of tweets to be analyzed: 50
EMOTION       % WORDS  EXAMPLE WORDS                      HASHTAGS                                
positive      7.41%    learn, received, supporting        #uwallen, #ai, #cg                      
trust         4.29%    team, supporting, award            #uwallen, #ai, #cg                      
joy           2.34%    award, present, outstanding        #uwallen, #ai, #cg                      
anticipation  2.34%    award, present, explore            #uwallen, #ai, #realitylablecture       
negative      1.17%    late, outstanding, superficial     #uwallen, #realitylablecture, #ar       
surprise      1.17%    award, present, exciting           #uwallen, #cg                           
sadness       0.49%    late, interested, missing          #uwallen                                
fear          0.39%    challenge, fight, graduation       #realitylablecture, #ar, #vr            
disgust       0.29%    interes

In [None]:
'''The codes that being commented are my initial thoughts and just kept for reference. Do not need to run them'''