This code runs from sample Twitter data to test sentiment analysis. First loads data from [Twitter's API](https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline). The below cell will import this data as a variable `SAMPLE_TWEETS` from the provided _module_ file:

In [1]:
# import from uw_ischool_sample file in the `data/` package (folder)
from data.uw_ischool_sample import SAMPLE_TWEETS

Prints out the first three elements from the `SAMPLE_TWEETS` list to see what information can be found. The most relevant value is the `"text"` of the tweet.

In [2]:
SAMPLE_TWEETS[0:3]

[{'created_at': 'Mon Oct 10 18:39:51 +0000 2016',
  'entities': {'hashtags': [{'indices': [20, 41],
     'text': 'IndigenousPeoplesDay'}]},
  'retweet_count': 9,
  'text': 'RT @UWAPress: Happy #IndigenousPeoplesDay https://t.co/YmU9e9lj7v',
  'user': {'screen_name': 'UW_iSchool'}},
 {'created_at': 'Mon Oct 10 18:00:00 +0000 2016',
  'entities': {'hashtags': [{'indices': [16, 29], 'text': 'IdealistFair'}]},
  'retweet_count': 0,
  'text': "We'll be at the #IdealistFair this evening on the Seattle U. campus. Come and learn about our graduate programs: https://t.co/et1HrQshmr",
  'user': {'screen_name': 'UW_iSchool'}},
 {'created_at': 'Mon Oct 10 15:10:36 +0000 2016',
  'entities': {'hashtags': []},
  'retweet_count': 1,
  'text': 'RT @iYouthUW: iYouth Tips for 1st\xa0Years https://t.co/K4SCIEhJ8k https://t.co/p4lbC6Jb5o',
  'user': {'screen_name': 'UW_iSchool'}}]

The second piece of data for this analysis is a set of **word-sentiments**&mdash;a list of English-language words and what emotions (e.g., "joy", "anger") [are associated with them](http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm).

In [3]:
from data.sentiments_nrc import SENTIMENTS
from data.sentiments_nrc import EMOTIONS
import re 

## Text Sentiment

Defines a function that take a tweet's text (a string) and splits it up into a list of individual words.

In [4]:
def tweet_filter(stringtext):
    """ Function converting text into lowercase and filtering out words with length less than 1 """
    wordtext = re.split('\W+', stringtext.lower()) 
    lengthywords = [word for word in wordtext if len(word) > 1]
    return lengthywords

wordlist = tweet_filter("Amazingly, I prefer a #rainy day to #sunshine.")


Defines a function that **filters** a list of the words to get only those words that contain a specific emotion.

In [5]:
def emotional_filter(wordtext,emotion):
    """ Function Filtering only words with emotion """
    default=-1
    emotion_filterer = [word for word in wordtext if SENTIMENTS.get(word,default) != -1 if SENTIMENTS.get(word,default).get(emotion,default) == 1 ]
    return(emotion_filterer)

emotional_filter("Amazingly, I prefer a #rainy day to #sunshine.", 'positive')

[]

Defines a function that determines which words from a list have _each_ emotion (i.e., the "emotional" words). 

In [6]:
def emotional_words(wordtext):
    """ Funtion finds words with emotion from a given list of EMOTIONS """
    dictionary_emowords = {}
    for emotion in EMOTIONS:
        emotion_words = emotional_filter(wordtext,emotion)
        dictionary_emowords[emotion] = emotion_words 
    return(dictionary_emowords)

Defines a function that gets a list of the "most common" words in a list: that is a new list containing each word in the original list, in descending order by how many times that word appears in the orignal list.

In [7]:
def common_words(wordtext):
    """ Evaluates the common words and their frequency in a list """

    lowercase_word = [word.lower() for word in wordtext] 
    dictionary_word_frequency = {word: 0 for word in lowercase_word} 
   
    for word in lowercase_word:
        dictionary_word_frequency[word]+=1
    words_frequency_list = sorted(dictionary_word_frequency.items(), key=lambda n: n[1], reverse=True)

    for i in range(len(words_frequency_list)):
        words_frequency_list[i] = words_frequency_list[i][0]
        
    return(words_frequency_list)

## Tweet Statistics

Defines a function (e.g., `analyze_tweets()`) that takes as an argument a **list** of tweet data (with the same structure as the imported `SAMPLE_TWEETS` variable), and _returns_ the data of interest to display in a table like the one at the very top of the notebook. Produces the following information **for each emotion**:

1. The percentage of words _across all tweets_ that have that emotion
2. The most common words _across all tweets_ that have that emotion (in order!)
3. The most common **hashtags** _across all tweets_ associated with that emotion (see below)

In [10]:
def analyze_tweets(list_tweets):
    """ For all of the emotions in EMOTION, this function calculates percentage of words having emotion,
        high frequency common words having emotion and the high frequency common hashtags in accordance 
        to that emotion """
    
    total=0
    total_emotion=0
    list_hashtags = []
    dictionary_emotionalwordsnum = dict(zip(EMOTIONS,[0]*10))         
    
    for i in range(len(list_tweets)):
        list_tweetwords = []
        dictionary_emowordings = {}

        tweettext = list_tweets[i]['text']
        list_tweetwords =  tweet_filter(tweettext)
        list_tweets[i]['words'] = list_tweetwords 
        dictionary_emowordings = emotional_words(list_tweetwords)
        list_tweets[i]['emotional_words'] = dictionary_emowordings
        
    
    for tdata in list_tweets:
        total += reduce(lambda x,y: x+1, tdata['words'], 0)
        dictionary_emotionalwordsnum ['positive'] += reduce(lambda x,y: x+1, tdata['emotional_words']['positive'], 0)

        dictionary_emotionalwordsnum ['negative'] += reduce(lambda x,y: x+1, tdata['emotional_words']['negative'], 0)
        
        dictionary_emotionalwordsnum ['anger'] += reduce(lambda x,y: x+1, tdata['emotional_words']['anger'], 0)
        
        dictionary_emotionalwordsnum ['anticipation'] += reduce(lambda x,y: x+1, tdata['emotional_words']['anticipation'], 0)
        
        dictionary_emotionalwordsnum ['disgust'] += reduce(lambda x,y: x+1, tdata['emotional_words']['disgust'], 0)
        
        dictionary_emotionalwordsnum ['fear'] += reduce(lambda x,y: x+1, tdata['emotional_words']['fear'], 0)
        
        dictionary_emotionalwordsnum ['joy'] += reduce(lambda x,y: x+1, tdata['emotional_words']['joy'], 0)
        
        dictionary_emotionalwordsnum ['sadness'] += reduce(lambda x,y: x+1, tdata['emotional_words']['sadness'], 0)
        
        dictionary_emotionalwordsnum ['surprise'] += reduce(lambda x,y: x+1, tdata['emotional_words']['surprise'], 0)
        
        dictionary_emotionalwordsnum ['trust'] += reduce(lambda x,y: x+1, tdata['emotional_words']['trust'], 0)
   
    dictionary_emotionalwordsnum_sorted = sorted(dictionary_emotionalwordsnum.items(), key=lambda x: x[1], reverse = True)
    
    dictionary_emotionalhastag = hashtags_for_emotion(list_tweets)
    
    return(total,dictionary_emotionalwordsnum_sorted,dictionary_emotionalhastag)

def hashtags_for_emotion(tdata):
    """ Finds hashtags for each emotion """

    initial_list = [[] for i in range(1,11)]
    dictionary_emotionalhastag = dict(zip(EMOTIONS,initial_list))
    for tweet in tdata:
        tweetwords = tweet_filter(tweet['text'])
        for emotion in EMOTIONS:
            if emotional_filter(tweetwords,emotion):
                if tweet['entities']['hashtags']:
                    for i in range(len(tweet['entities']['hashtags'])):
                        hashtag = tweet['entities']['hashtags'][i]['text']
                        dictionary_emotionalhastag[emotion].append(hashtag)
    return(dictionary_emotionalhastag)



Displays information as a printed table.

In [9]:
def display_stats(tdata,total,dictionary_emotionalwordsnum_sorted,dictionary_emotionalhastag):
    """ Prints each emotion and its attributes """

    tweets_textlist = [tdata[i]['text'] for i in range(len(tdata))]
    tweettext = tweet_filter (reduce(lambda x,y: x+y,tweets_textlist))

    print("{0:<15s}  {1:<5s}  {2:<30s}   {3:<30s}".format('EMOTION','% of WORDS','EXAMPLE WORDS','HASHTAGS'))

    for emotion in dictionary_emotionalwordsnum_sorted:
        percentage = (emotion[1]/total)*100
        com_words = ','.join(common_words(emotional_filter(tweettext,emotion[0]))[:3])
        com_hashtags_wordlist = common_words(dictionary_emotionalhastag[emotion[0]])[:3]
        com_hashtags = ','.join(['#'+tag for tag in com_hashtags_wordlist])
        print("{0:<15s} {1:5.2f}%       {2:<30s}   {3:30s}".format(emotion[0],percentage,com_words,com_hashtags))

display_stats(twitter_data,total,dictionary_emotionalwordsnum_sorted,dictionary_emotionalhastag)
        

NameError: name 'twitter_data' is not defined

## Pulling Live Data


Defines function that takes in a Twitter username as an argument and then returns a list of dictionaries representing the tweets made by that user.

In [11]:
def download_twitter_data(screen_name,count=200):
  """ Gets live data from twitter for a particular twitter screen name and number of tweets specified """

  parameters={'screen_name':screen_name,'count':count}
  myreq = requests.get(url='https://api.twitter.com/1.1/statuses/clintatron',params=parameters)
  tweetdata = json.loads(myreq.text)
  return tweetdata

Defines "main" function that will prompt the user for a Twitter username.

In [12]:
if __name__ == "__main__":

    from data.uw_ischool_sample import SAMPLE_TWEETS
    from data.sentiments_nrc import SENTIMENTS
    from data.sentiments_nrc import EMOTIONS
    from functools import reduce
    import re
    import json
    import requests

    screen_name = input('Enter the screen name: ')

    
    if screen_name == 'SAMPLE':
        twitter_data = SAMPLE_TWEETS
    else:
        count = input('Enter number of tweets to be Scraped: ')
        twitter_data = download_twitter_data(screen_name,count)

    total,dictionary_emotionalwordsnum_sorted,dictionary_emotionalhastag = analyze_tweets(twitter_data)

    display_stats(twitter_data,total,dictionary_emotionalwordsnum_sorted,dictionary_emotionalhastag)

Enter the screen name: @clintatron
Enter number of tweets to be Scraped: 20


JSONDecodeError: Expecting value: line 1 column 1 (char 0)

SyntaxError: invalid syntax (<ipython-input-1-96b81e570a90>, line 2)