# a7 - Tweeter Sentiment

In this assignment you will write a program to perform simple [sentiment analysis](https://en.wikipedia.org/wiki/Sentiment_analysis) of Twitter data&mdash;that is, determining the "attitude" or "emotion" (e.g., how "positive", "negative", "joyful", etc) of tweets made by a particular Twitter user. Sentiment analysis is a fascinating field: researchers have shown that the "mood" of Twitter communication [reflects biological rhythms](http://www.nytimes.com/2011/09/30/science/30twitter.html) and can even be used to [predict the stock market](http://arxiv.org/pdf/1010.3003&embedded=true). The particular analysis you'll be performing is inspired by an investigation of [personal vs. organizational tweets](http://varianceexplained.org/r/trump-tweets/) (which has become less amusing over time).

You will be implementing a Python program that performs this analysis on **real data** taken directly from a Twitter user's timeline. In the end, your script will produce output similar to the following:

```
EMOTION       % WORDS  EXAMPLE WORDS                     HASHTAGS
positive      6.16%    learn, faculty, happy             #accesstoinfoday, #indigenouspeoplesday, #idealistfair
trust         3.08%    school, faculty, happy            #indigenouspeoplesday, #diversity
anticipation  2.53%    happy, top, ready                 #indigenouspeoplesday, #informatics, #info340
joy           1.76%    happy, peace, deal                #indigenouspeoplesday, #accesstoinfoday
surprise      0.99%    deal, award, surprised            #suzzallolibrary, #nobrainer
negative      0.88%    fall, rejection, outstanding        
sadness       0.55%    fall, rejection, problem            
disgust       0.44%    rejection, weird, finally           
fear          0.44%    rejection, surprise, problem        
anger         0.33%    rejection, disaster, involvement  #mlis
```

(The "hashtags" column is optional extra credit).

Fill in the below code cells as specified. Note that cells may utilize variables and functions defined in previous cells; we should be able to use the `Kernal > Restart & Clear All` menu item followed by `Cell > Run All` to execute your entire notebook and see the correct output.

## The Data
You'll be working with two different pieces of data for this assignment.

First, you'll be loading tweet data taken directly from [Twitter's API](https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline). You can find an example of this tweet data in the **`uw_ischool_sample.py`** file inside the `data/` folder. The below cell will import this data as a variable `SAMPLE_TWEETS` from the provided _module_ file:

In [1]:
# import from uw_ischool_sample file in the `data/` package (folder)
from data.uw_ischool_sample import SAMPLE_TWEETS

The data is represented as one giant **list of dictionaries**: the **list** contains a sequence of **dictionaries**, where each dictionary represents a tweet. Each dictionary contains many different _value_, some of which themselves may be dictionaries.

Print out the first three elements from the `SAMPLE_TWEETS` list to see what information can be found. The most relevant value is the `"text"` of the tweet.
- The Twitter API actually provides a lot more information about each tweet; I've stripped it down to only the most important properties for readability. Each dictionary is a proper subset of the full data you'd get from Twitter.
- Because of the source of the sentiment data, your analysis will be biased and only support English-language speakers. Nevertheless, Twitter is an international community so you may encounter non-English characters and words. You'll be working with real-world data and it will be messy!

In [2]:
print(SAMPLE_TWEETS[:3])

[{'created_at': 'Mon Oct 10 18:39:51 +0000 2016', 'retweet_count': 9, 'entities': {'hashtags': [{'indices': [20, 41], 'text': 'IndigenousPeoplesDay'}]}, 'user': {'screen_name': 'UW_iSchool'}, 'text': 'RT @UWAPress: Happy #IndigenousPeoplesDay https://t.co/YmU9e9lj7v'}, {'created_at': 'Mon Oct 10 18:00:00 +0000 2016', 'retweet_count': 0, 'entities': {'hashtags': [{'indices': [16, 29], 'text': 'IdealistFair'}]}, 'user': {'screen_name': 'UW_iSchool'}, 'text': "We'll be at the #IdealistFair this evening on the Seattle U. campus. Come and learn about our graduate programs: https://t.co/et1HrQshmr"}, {'created_at': 'Mon Oct 10 15:10:36 +0000 2016', 'retweet_count': 1, 'entities': {'hashtags': []}, 'user': {'screen_name': 'UW_iSchool'}, 'text': 'RT @iYouthUW: iYouth Tips for 1st\xa0Years https://t.co/K4SCIEhJ8k https://t.co/p4lbC6Jb5o'}]


The second piece of data you'll be working with is a set of **word-sentiments**&mdash;a list of English-language words and what emotions (e.g., "joy", "anger") [are associated with them](http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm).

- The [`nltk`](https://github.com/nltk/nltk/wiki/Sentiment-Analysis) library you used in a previous assignment does support sentiment analysis. However, for practice and extendability you'll be doing a more "manual" analysis using the provided data file for this assignment.

`import` the word sentiments as a variable **`SENTIMENTS`** from the **`data.sentiments_nrc`** module. You should also import the `EMOTIONS` variable provided by the same module: this is a _list_ of possible emotions. You can inspect the variables (e.g., print them out) to confirm that you have imported them.

In [3]:
from data.sentiments_nrc import SENTIMENTS
from data.sentiments_nrc import EMOTIONS

In [4]:
# inspect variables 
print(EMOTIONS)
#SENTIMENTS works well and I will not show the output since its output is too huge 
#print(SENTIMENTS)

['positive', 'negative', 'anger', 'anticipation', 'disgust', 'fear', 'joy', 'sadness', 'surprise', 'trust']


## Text Sentiment
All of the sentiment analysis is based on the individual _words_ in the text. Thus you will need to will determine which words in a tweet have which sentiments.

- Note that the assignment explicitly does _not_ tell you what to name functions, what arguments they should take or values they should return: your task is to determine appropriate functions and arguments from the (guided) requirements! Use multiple functions for clarity, give them all informative names, and include a **doc string** to explain what each does.

Define a function that take a tweet's text (a string) and splits it up into a list of individual words.

- You should **not** use the `nltk` library to tokenize words. Instead, your analysis should split up the text using the [regular expression](https://www.regular-expressions.info/) **`"\W+"`** as a separator to "split up" the words by (rather than just a blank space). You can do this by using the [re.split()](https://docs.python.org/3/library/re.html#re.split) function (from the `re` module). This separator will cause your spitting to exclude punctuation and provide a reasonable (but not perfect!) list of words to consider. 

- All of the words in the sentiment dictionary are _lower case_, so you'll need to **map** your resulting words to be lower case. You will also need to **filter** out any words that have 1 letter or fewer. Your function must use a **list comprehension** to do this.

The string `"Amazingly, I prefer a #rainy day to #sunshine."` should produce a list with 6 lower-case words in it.

In [5]:
import re
#  The function that take a tweet's text (a string) and splits it up into a list of individual words.
def split_text_to_words(ListofString):
    '''Define a function with a list of string as argument, 
       remove the punctions,split the list into individual words, transform to lower case 
       return the word having more than one letters in lower case'''
    Split_String=re.split(r'\W+',ListofString ) #remove punctions, split sentence into individual words
    lower_words=[word.lower() for word in Split_String]  # map for lower_case 
    length_filtered=[word for word in lower_words if len(word)>1] # filter words with 1 letter or fewer
    return length_filtered

In [6]:
Test_String="Amazingly, I prefer a #rainy day to #sunshine."

In [7]:
Split_Test_String=split_text_to_words(Test_String)
Split_Test_String

['amazingly', 'prefer', 'rainy', 'day', 'to', 'sunshine']

Define a function that **filters** a list of the words to get only those words that contain a specific emotion. Your function must use a **list comprehension** to do this.
- You can determine whether a word has a particular emotion by looking it up in the imported `SENTIMENTS` variable. Use the word as the "key" to find the dictionary of emotions for that word, and then use the emotion as the key to _that_ dictionary to determine if the word has it. 
    
    Do **not** use the `in` operator or a loop to search the list of keys one by one. Instead, use bracket notation or the `get()` method to "look up" a key directly (this is more efficient and is why the emotions are nested dictionaries in the first place!).
    
For testing, the `"positive"` words extracted from `"Amazingly, I prefer a #rainy day to #sunshine."` are `["amazingly", "prefer", "sunshine"]`.

In [8]:
def words_with_specific_emotion(list_of_split_words,emotion):
    '''Produce a list that include the words that contain a specific emotion
       Parameters: list_of_split_words (String): A list with split words 
                   emotion(string): the emotion words in EMOTIONS list
       Returns: list: the words in the string with one certain kind of emotion'''
    Look_Up=[SENTIMENTS.get(word) for word in list_of_split_words] #get the emotion list of every word in the list 
    Combine=list(zip(list_of_split_words,Look_Up)) # zip the emotion list with word in one list 
    Words_With_Specific_Emotion=[line[0] for line in Combine if line[1] is not None and line[1].get(emotion)==1 ]
    return Words_With_Specific_Emotion #find the words in a specific emotion

In [9]:
# what is braket notation
words_with_specific_emotion(Split_Test_String,'positive')

['amazingly', 'prefer', 'sunshine']

Define a function that determines which words from a list have _each_ emotion (i.e., the "emotional" words), producing a dictionary of that information. For example, the words extracted from `"Amazingly, I prefer a #rainy day to #sunshine."` should produce a dictionary that looks like:

```
{
 'anger': [],
 'anticipation': [],
 'disgust': [],
 'fear': [],
 'joy': ['amazingly', 'sunshine'],
 'negative': [],
 'positive': ['amazingly', 'prefer', 'sunshine'],
 'sadness': ['rainy'],
 'surprise': ['amazingly'],
 'trust': ['prefer']
}
```
    
(Note the empty lists for emotions that have no matching words).
    
- You can use the imported `EMOTIONS` variable to have a list of emotions to iterate through.
- Use the function you defined in the previous step to help you out!
- Using a [dictionary comprehension](https://www.smallsurething.com/list-dict-and-set-comprehensions-by-example/) is a nice way to do this, but is not required.

In [10]:
def words_list_for_each_emotion(Split_Test_String):
    '''Produce a disctionary with emotion words as keys and words list having that emotion as values
       Parameters: Split_Test_String (String): A list with split words
       Returns: list-the  most common words in the input list'''
    matching_words=[words_with_specific_emotion(Split_Test_String,emotion) for emotion in EMOTIONS]
    distinct_matching=[i for i in matching_words]
    return {emotion:matching_word for (emotion,matching_word) in zip(EMOTIONS,distinct_matching)}
  

In [11]:
words_list_for_each_emotion(Split_Test_String) 

{'anger': [],
 'anticipation': [],
 'disgust': [],
 'fear': [],
 'joy': ['amazingly', 'sunshine'],
 'negative': [],
 'positive': ['amazingly', 'prefer', 'sunshine'],
 'sadness': ['rainy'],
 'surprise': ['amazingly'],
 'trust': ['prefer']}

Define a function that gets a list of the "most common" words in a list. This should be a new list containing each word in the original list, in descending order by how many times that word appears in the orignal list.

- You can determine the frequency (number of occurrences) of a word with a similar process to what you did with digits in the previous assignment.
- You should use the `sorted()` function to [sort](https://wiki.python.org/moin/HowTo/Sorting) the individual words. This function take a **`key`** argument which should be passed a [_callback function_](https://wiki.python.org/moin/HowTo/Sorting#Key_Functions) that can return a "transformed" value that you wish to sort by (e.g., which element in a tuple). An anonymous lambda function works well for this.

You can test this function with any list of "words" with repeated entries: `['a','b','c','c','c','a']` for example.


In [12]:
###This function take a key argument which should be passed a callback function 
###that can return a "transformed" value that you wish to sort by 

def get_common_words_list(wordlist):
    '''Produce a list with the most common words of the input list
       Parameters: wordlist (string): 
       Returns: dict: the key-values pairs of emotions word and its corresponding words in the string'''
    wordfreq=list(zip(wordlist,[wordlist.count(w) for w in wordlist]))
    wordfreq_sort=list(set(sorted(wordfreq,key=lambda freq: freq[1], reverse=True)))
    new_word_freq_sort=sorted(wordfreq_sort,key=lambda freq: freq[1], reverse=True)
    return [i[0] for i in new_word_freq_sort]

In [13]:
wordlist=['a','b','c','c','c','a'] 
get_common_words_list(wordlist)


['c', 'a', 'b']

## Tweet Statistics
Once you are able to determine the sentiment of an individual string of text (e.g., a single tweet's content), you can analyze an entire set of tweets from the user's timeline. Note that I am _not_ telling you how to break this process up into individual functions&mdash;you should be able to determine that on your own by this point!

Define a function (e.g., `analyze_tweets()`) that takes as an argument a **list** of tweet data (with the same structure as the imported `SAMPLE_TWEETS` variable), and _returns_ the data of interest to display in a table like the one at the very top of the notebook. In particular, you'll need to produce the following information **for each emotion**:

1. The percentage of words _across all tweets_ that have that emotion
2. The most common words _across all tweets_ that have that emotion (in order!)

(Think carefully: should this data be stored in a _list_ or a _dictionary_?)

Some tips for this task:

- You can and should create some "helper" functions to break up this task even further. You can define those functions in the same notebook cell or you may insert additional cells.

- You'll need to use your previous functions to get the _list of words_ and _dictionary of emotional words_ for each tweet. I recommend you assign the results of these methods as **new values** of the respective tweet dictionary (so each tweet dictionary would gain a `words` key, for example).

- In order to get the percentage of emotional words, divide the number of words that have that emotion by the total number of words _across all the tweets_. Counting how many total words are in the tweet set is a **reducing** operation: you should use the `reduce()` function for this.

- For each emotion, you'll need to get a list of the words _across all the tweets_ that have that emotion (in order to determine how many there are for the percentage, as well as which are most common). This is another **reducing** operation; you should use the `reduce()` function to _add up_ all of these words (alternatively, the built-in `sum()` function can be used here).

You can test your function by passing in the `SAMPLE_TWEETS` variable as an argument and checking if your returned data has the same numbers as in the table at the top of the page. Note that only the first 3 most common words are listed in the example (and may be in a different order in the case of frequency ties!).

In [14]:
import functools 
import operator 
from functools import reduce
def get_tags_for_emotion(SAMPLE_TWEETS):
    '''Produce the most common hashtags across all tweets associated with that emotion
       Parameters: SAMPLE_TWEETS (string): tweet_data extracted from twitter API. 
       Returns: disctionary-key is emotion, values are hashtags'''
    original_list=[line['text'] for line in SAMPLE_TWEETS] # extract text from extracted twitter_data
    split_original_list=[split_text_to_words(line) for line in original_list] # split text into individual word 
    emotion_list_per_tweet=[words_list_for_each_emotion(i) for i in split_original_list]
    tweet_hashtags_emotion = {SAMPLE_TWEETS[n]['entities']['hashtags'][i]['text']:[item for item in emotion_list_per_tweet[n].keys() if emotion_list_per_tweet[n][item]!= []] for n in range(len(SAMPLE_TWEETS)) for i in range(len(SAMPLE_TWEETS[n]['entities']['hashtags']))}
    tag_emotion_list=[(k.lower(),v) for k,v in tweet_hashtags_emotion.items()]
    tag_individual_emotion=[(line[0],line[1][i]) for line in tag_emotion_list for i in range(0,len(line[1]))]
    tag_common_emotion=get_common_words_list(tag_individual_emotion)
    emotion_tags={item:[line[0] for line in tag_common_emotion if item == line[1]] for item in EMOTIONS }
    return emotion_tags
def analyze_tweets(SAMPLE_TWEETS):
    '''Produce a list of dictionaries that stores the emotion, percentage of words of that emotion in the total dataset
        and the top 3 common words of that emotion.
        Parameters: SAMPLE_TWEETS (string): tweet_data extracted from twitter API. 
        Returns: list of dictionaries'''
    Original_List=[line['text'] for line in SAMPLE_TWEETS] # extract text from extracted twitter_data
    Split_Original_List=[split_text_to_words(line) for line in Original_List] # split text into individual word 
    Combine_Total_Words=functools.reduce(operator.add,Split_Original_List ) 
    Total_Words=len(Combine_Total_Words) # use REDUCE function to calculate the total_lenth of the word_list 
    Words_List_For_Each_Emotion=words_list_for_each_emotion(Combine_Total_Words) # find the word_list for each emotion 
    Emotion_list_Per_Tweet=[words_list_for_each_emotion(i) for i in Split_Original_List] # find the emotion list for each emotion
    Emotion_tags=get_tags_for_emotion(SAMPLE_TWEETS)
    result_list=[]
    for i in EMOTIONS:
        result={}
        num_per_emotion_all_tweet=sum([len(line[i]) for line in Emotion_list_Per_Tweet]) # use sum 
        example_words=[get_common_words_list(value)[:3] for key,value in Words_List_For_Each_Emotion.items() if key==i]
        result['EMOTION']=i
        result['% WORDS']=num_per_emotion_all_tweet/Total_Words
        result['EXAMPLE WORDS']=example_words[0]
        HASHTAGES_3=Emotion_tags.get(i)
        result['HASHTAGS']=HASHTAGES_3[:3]
        result_list.append(result)
    result=sorted(result_list,key=lambda k:k['% WORDS'],reverse=True) # sort the list by word% in a descending way
    return result

    

In [15]:
print(analyze_tweets(SAMPLE_TWEETS))

[{'EMOTION': 'positive', '% WORDS': 0.061606160616061605, 'EXAMPLE WORDS': ['faculty', 'learn', 'happy'], 'HASHTAGS': ['accesstoinfoday', 'mlis', 'indigenouspeoplesday']}, {'EMOTION': 'trust', '% WORDS': 0.030803080308030802, 'EXAMPLE WORDS': ['school', 'faculty', 'happy'], 'HASHTAGS': ['diversity', 'indigenouspeoplesday']}, {'EMOTION': 'anticipation', '% WORDS': 0.025302530253025302, 'EXAMPLE WORDS': ['happy', 'ready', 'top'], 'HASHTAGS': ['info340', 'indigenouspeoplesday', 'informatics']}, {'EMOTION': 'joy', '% WORDS': 0.0176017601760176, 'EXAMPLE WORDS': ['happy', 'deal', 'excited'], 'HASHTAGS': ['indigenouspeoplesday', 'accesstoinfoday']}, {'EMOTION': 'surprise', '% WORDS': 0.009900990099009901, 'EXAMPLE WORDS': ['deal', 'excited', 'good'], 'HASHTAGS': ['suzzallolibrary', 'nobrainer']}, {'EMOTION': 'negative', '% WORDS': 0.0088008800880088, 'EXAMPLE WORDS': ['fall', 'problem', 'boring'], 'HASHTAGS': []}, {'EMOTION': 'sadness', '% WORDS': 0.005500550055005501, 'EXAMPLE WORDS': ['fal

***Optional extra credit***: For each emotion, also calculate the most common [**hashtags**](https://en.wikipedia.org/wiki/Hashtag) _across all tweets_ associated with that emotion. This means that out of all the weets that have _at least one word with that emotion_, what are the most common hashtags they use?

- The Twitter data for each tweet includes a _list_ containing the hashtags found in that tweet&mdash;you should **NOT** try and search the tweet text for `#` symbols. These hashtags can be found in the `['entities']['hashtags'][i]['text']` element of each tweet&mdash;that is, the `'text'` key from _each_ element in the _list_ of the `'hashtags'` key in the `'entities'` dictionary of the tweet. See the `uw_school.json` example file to see this structure more clearly.

    (You might use a _list comprehension_ to "flatten" this complex nesting structure into just a list of hashtag words).

- Since hashtags are just words, you can use your function for finding the most common words to find the most common hashtags!

- You can build your analysis into your previous `analyze_tweets()` function, or you can determine the hashtags using a second function (e.g., in the below cell).


In [16]:
def get_tags_for_emotion(SAMPLE_TWEETS):
    '''Produce the most common hashtags across all tweets associated with that emotion
       Parameters: SAMPLE_TWEETS (string): tweet_data extracted from twitter API. 
       Returns: disctionary-key is emotion, values are hashtags'''
    original_list=[line['text'] for line in SAMPLE_TWEETS] # extract text from extracted twitter_data
    split_original_list=[split_text_to_words(line) for line in original_list] # split text into individual word 
    emotion_list_per_tweet=[words_list_for_each_emotion(i) for i in split_original_list]
    tweet_hashtags_emotion = {SAMPLE_TWEETS[n]['entities']['hashtags'][i]['text']:[item for item in emotion_list_per_tweet[n].keys() if emotion_list_per_tweet[n][item]!= []] for n in range(len(SAMPLE_TWEETS)) for i in range(len(SAMPLE_TWEETS[n]['entities']['hashtags']))}
    tag_emotion_list=[(k.lower(),v) for k,v in tweet_hashtags_emotion.items()]
    tag_individual_emotion=[(line[0],line[1][i]) for line in tag_emotion_list for i in range(0,len(line[1]))]
    tag_common_emotion=get_common_words_list(tag_individual_emotion)
    emotion_tags={item:[line[0] for line in tag_common_emotion if item == line[1]] for item in EMOTIONS }
    return emotion_tags


In [17]:
get_tags_for_emotion(SAMPLE_TWEETS)

{'anger': ['mlis'],
 'anticipation': ['info340', 'indigenouspeoplesday', 'informatics', 'mlis'],
 'disgust': [],
 'fear': [],
 'joy': ['indigenouspeoplesday', 'accesstoinfoday'],
 'negative': [],
 'positive': ['accesstoinfoday',
  'mlis',
  'indigenouspeoplesday',
  'idealistfair',
  'diversity',
  'geekgirlcon'],
 'sadness': [],
 'surprise': ['suzzallolibrary', 'nobrainer'],
 'trust': ['diversity', 'indigenouspeoplesday']}

### Displaying Tweets 

Once you've analyzed the tweets, you will need to _display_ that information as a printed table (as in the example at the top of the page).

Define another function to display this table (your function should take as an argument the data structure returned from your "analysis" function).

This function will need to print out the table. Using the [string formatting](https://docs.python.org/3/library/string.html#format-examples) language (via the **`.format()`** string method) makes it possible to have equally sized "columns" of data. For more examples, [this tutorial](https://www.digitalocean.com/community/tutorials/how-to-use-string-formatters-in-python-3) is pretty good (check out the "Padding Variable Substitutions" section).


A few notes about formatting this output:

- For your reference, the example table at the top of the page uses `14` characters for the first column, `11` characters for the second,  `35` for the third, and the "remainder" for the fourth. You are not required to match these numbers.

- The percentage should be formatted with two decimals of precision (e.g., `1.23%`).

- Both the example sentiment words (and the hashtags, if you include them) should be outputted as a _comma-separated list_ with spaces between them&mdash;and no square brackets). The `join()` string method is good for converting lists to formatted strings. Both lists should also be limited to the 3 most common items for space.

- Make sure to include `#` in front of the hashtags!

- You should **sort** the emotions by percentage of words (descending), as in the example at the top. 

In [18]:
# assign the result of analyze_tweets to the variable result
result=analyze_tweets(SAMPLE_TWEETS)

In [19]:
def show_table(result):
    '''Display the result of the tweet_analysis function and return a well-formatted table'''
    print("{0:<14} {1:<11} {2:35} {3:80}".format('EMOTION','%WORDS','EXAMPLE WORDS','HASHTAGS'))
    for line in result:
        emotion=line['EMOTION']
        words_percent=line['% WORDS']
        format_words_percent='{:.2%}'.format(words_percent)
        example_words=line['EXAMPLE WORDS']
        top_example_words=','.join(example_words)
        hashtags=line['HASHTAGS']
        add_punction=['#'+i for i in hashtags]
        format_hashtages=', '.join(add_punction)
        print("{0:<14} {1:<11} {2:35} {3:80}".format(emotion,format_words_percent,top_example_words,format_hashtages))
show_table(result)

EMOTION        %WORDS      EXAMPLE WORDS                       HASHTAGS                                                                        
positive       6.16%       faculty,learn,happy                 #accesstoinfoday, #mlis, #indigenouspeoplesday                                  
trust          3.08%       school,faculty,happy                #diversity, #indigenouspeoplesday                                               
anticipation   2.53%       happy,ready,top                     #info340, #indigenouspeoplesday, #informatics                                   
joy            1.76%       happy,deal,excited                  #indigenouspeoplesday, #accesstoinfoday                                         
surprise       0.99%       deal,excited,good                   #suzzallolibrary, #nobrainer                                                    
negative       0.88%       fall,problem,boring                                                                                          

## Getting Live Data
This is all good and well, but the real payoff is to be able to see the sentiments of tweets taken directly from the Twitter feed of real users!

Define _another_ function that takes in a Twitter username as an argument and then returns a list of dictionaries representing the tweets made by that user.

Normally you would fetch this data by sending a request directly to the web service's API (e.g., to the the [statuses/user_timeline](https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline) endpoint provided by the Twitter API at `https://api.twitter.com/1.1/statuses/user_timeline`). However, Twitter includes access controls so that only registered developers are allowed to send requests. While it is possible to register as a developer and access Twitter [directly through Python](https://python-twitter.readthedocs.io/en/latest/), this adds an extra level of complexity to the assignment.

Instead, I've set up a [proxy](https://en.wikipedia.org/wiki/Proxy) that has all the access keys specified which you can use to search Twitter. This proxy is available at:

**<https://faculty.washington.edu/joelross/proxy/twitter/timeline/>**

Send a request to _that_ url instead of `https://api.twitter.com/1.1/statuses/user_timeline`, and it will redirect your request with the proper authentication to Twitter, and then give you back whatever JSON Twitter's API responded with.

- You specify the same request parameters as you would when accessing Twitter directly. The request takes a `screen_name` request parameter which you can assign the given username. You can also specify the `count` parameter if you want to get more results back (up to 200); see the [documentation](https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline) for details and other options you are welcome to use (just document them in your function's **docstring**).

- **WARNING:** The proxy I have set up is **rate-limited** so that it can only accept 900 requests every 15 minutes. If all 40 students are working rapidly on the assignment at the same time, you may find yourself needing to wait a few minutes and try again. You are alternatively welcome to set up your own developer account and API keys; just make sure you don't put the keys under version control and upload them to GitHub!

You can download the timeline data from Twitter using the [requests](http://docs.python-requests.org/en/master/user/quickstart/) module: send a `GET` request to the [statuses/user_timeline](https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline) endpoint provided by the Twitter API, and then use the `.json()` method to extract the JSON response as a Python _list_ or _dictionary_ value you can work with.

In [20]:
import requests
def get_live_data(username):
    '''Produce a list of dictionary by downloading MAX 50 tweets that produced by a certain user
       Parameters: username (string)
       Returns: A list of Dictionary (Dict)'''
    twitter_list=requests.get("https://faculty.washington.edu/joelross/proxy/twitter/timeline?screen_name="+str(username)+"&count=50")
    twitter_list=twitter_list.json()
    return twitter_list

Define one last "main" function that will [prompt the user](https://docs.python.org/3/library/functions.html#input) for a Twitter username. The function should then call your "download" function to fetch the tweets, and pass the returned tweet data into your "analyze" and "show" functions in order to display your sentiment analysis of the user's timeline. 

**ADDITIONALLY**, `if` the user specifies `SAMPLE` (all caps) as the username, the function should instead show the analysis for the `SAMPLE_TWEETS` (this will help us out with grading).

In [21]:
def display_to_user():
    '''This function will promt the user to ask for twitter's username and 
    return the sentiment analysis for the user's timeline '''
    username=input("Please enter the username that you are interested: ")
    if username=='SAMPLE':
        result=analyze_tweets(SAMPLE_TWEETS)
    else:
        tweet_data=get_live_data(username)
        result=analyze_tweets(tweet_data)
    return  show_table(result)

In [25]:
# TEST SAMPLE DATA 
display_to_user()

Please enter the username that you are interested: uwcse
EMOTION        %WORDS      EXAMPLE WORDS                       HASHTAGS                                                                        
positive       7.41%       received,learn,present              #ai, #siff, #microfluidics                                                      
trust          4.29%       team,present,award                  #algorithm, #accessibility, #cg                                                 
anticipation   2.34%       present,award,explore               #accessibility, #vr, #realitylablecture                                         
joy            2.34%       present,award,friendship            #siff, #cg, #animated                                                           
negative       1.17%       late,challenge,fight                #realitylablecture, #uwallen, #vr                                               
surprise       1.17%       award,present,deal                  #cg             

Use your main function to try analyzing the timelines of different users and comparing their results. Are the current sentiments of the [iSchool](https://twitter.com/uw_ischool) and [CSE](https://twitter.com/uwcse) different in interesting ways?

In [23]:
### sentiment analysis for CSE
display_to_user()

Please enter the username that you are interested: uwcse
EMOTION        %WORDS      EXAMPLE WORDS                       HASHTAGS                                                                        
positive       7.41%       received,learn,present              #ai, #siff, #microfluidics                                                      
trust          4.29%       team,present,award                  #algorithm, #accessibility, #cg                                                 
anticipation   2.34%       present,award,explore               #accessibility, #vr, #realitylablecture                                         
joy            2.34%       present,award,friendship            #siff, #cg, #animated                                                           
negative       1.17%       late,challenge,fight                #realitylablecture, #uwallen, #vr                                               
surprise       1.17%       award,present,deal                  #cg             

In [24]:
### sentiment analysis for iSchool 
display_to_user()

Please enter the username that you are interested: uw_ischool
EMOTION        %WORDS      EXAMPLE WORDS                       HASHTAGS                                                                        
positive       6.39%       working,technology,good             #da2i, #communications, #libraries                                              
trust          3.65%       team,good,present                   #facebook, #communications, #da2i                                               
joy            2.23%       good,present,music                  #als, #datascience, #comicbooks                                                 
anticipation   1.93%       good,present,excited                #accesstoinformation, #comicbooks, #communications                              
surprise       1.42%       good,present,excited                #facebook, #datascience, #da2i                                                  
negative       0.81%       missing,fake,combat                 #facebook  

### Explanation for the comparison results 

EMOTIONS:
The top emotions for both CSE and iSchool are "positive","trust","joy" and "anticipation".
But CSE' word percent in all these four emotions are higher than that in iSchool. 

EXAMPLE WORDS:
The example words in CSE and iSchool are quite different. 
CSE focuses more on "team","supporting","award",which indicates that they may state more on team achievement or supportive measures in their twitter. 
iSchool focuses more on "good","present","fortune", which means they may state more on students' presentation activities in twitter. 

HASHTAGS:
CSE's hashtags focus more on academic and  new technology like "algorithm","ar","cg".
iSchool's hashtags  focus more on social media like "facebook","communication" and "datascience" area.