# Twitter Sentiment Analysis

* In this problem, we'll analyze some fictional tweets and find out whether the overall sentiment of Twitter users is happy or sad. This is a simplified version of an important real world problem called **"sentiment analysis"**. 

* Before we begin, we need a list of tweets to analyze. We're picking a small number of tweets here, but the exact same analysis can also be done for thousands, or even millions of tweets. The collection of data that we perform analysis on is often called a **"dataset"** 

In [4]:
tweets = [
    "Wow, what a great day today!! #sunshine",
    "I feel sad about the things going on around us. #covid19",
    "I'm really excited to learn Python with @JovianML #zerotopandas",
    "This is a really nice song. #linkinpark",
    "The python programming language is useful for data science",
    "Why do bad things happen to me?",
    "Apple announces the release of the new iPhone 12. Fans are excited.",
    "Spent my day with family!! #happy",
    "Check out my blog post on common string operations in Python. #zerotopandas",
    "Freecodecamp has great coding tutorials. #skillup"
]

**Q1: How many tweets does the dataset contain?**

In [5]:
number_of_tweets = len(tweets)
print ("Total No of tweets is ", number_of_tweets)

Total No of tweets is  10


* Let's create two lists of words: `happy_words` and `sad_words`. We will use these to check if a tweet is happy or sad.

In [6]:
happy_words = ['great', 'excited', 'happy', 'nice', 'wonderful', 'amazing', 'good', 'best']

In [7]:
sad_words = ['sad', 'bad', 'tragic', 'unhappy', 'worst']

* To identify whether a tweet is happy, we can simply check if contains any of the words from `happy_words`. 

In [8]:
sample_tweet = tweets[0]
sample_tweet

'Wow, what a great day today!! #sunshine'

In [9]:
is_tweet_happy = False

# Get a word from happy_words
for word in happy_words:
    # Check if the tweet contains the word
    if word in sample_tweet:
        # Word found! Mark the tweet as happy
        is_tweet_happy = True

* Do you understand what we're doing above? 

> For each word in the list of happy words, we check if is a part of the selected tweet. If the word is indded a part of the tweet, we set the variable `is_tweet_happy` to `True`. 

In [10]:
is_tweet_happy

True

**Q2: Determine the number of tweets in the dataset that can be classified as happy.**


In [11]:
# store the final answer in this variable
number_of_happy_tweets = 0

# perform the calculations here
for i in tweets:
    is_tweet_happy = False
    for word in happy_words:
        # Check if the tweet contains the word
        if word in i:
            # Word found! Mark the tweet as happy
            is_tweet_happy = True
            if is_tweet_happy == True:
                number_of_happy_tweets += 1

In [12]:
print("Number of happy tweets:", number_of_happy_tweets)

Number of happy tweets: 6


**Q3: What fraction of the total number of tweets are happy?**

For example, if 2 out of 10 tweets are happy, then the answer is `2/10` i.e. `0.2`.

In [13]:
happy_fraction = number_of_happy_tweets/number_of_tweets
happy_fraction_str = str(number_of_happy_tweets)+str("/")+str(number_of_tweets)+str(" i.e ")+str(happy_fraction)

In [14]:
print("The fraction of happy tweets is:", happy_fraction_str)

The fraction of happy tweets is: 6/10 i.e 0.6


* To identify whether a tweet is sad, we can simply check if contains any of the words from `sad_words`.

**Q4: Determine the number of tweets in the dataset that can be classified as sad.**

In [15]:
# store the final answer in this variable
number_of_sad_tweets = 0

# perform the calculations here
for tweet in tweets:
    is_tweet_sad = False
    for word in sad_words:
        # Check if the tweet contains the word
        if word in tweet:
            # Word found! Mark the tweet as happy
            is_tweet_sad = True
            if is_tweet_sad == True:
                number_of_sad_tweets += 1

In [16]:
print("Number of sad tweets:", number_of_sad_tweets)

Number of sad tweets: 2


**Q5: What fraction of the total number of tweets are sad?**

In [17]:
sad_fraction = number_of_sad_tweets/number_of_tweets
sad_fraction_str = str(number_of_sad_tweets)+str("/")+str(number_of_tweets)+str(" i.e ")+str(sad_fraction)

In [18]:
print("The fraction of sad tweets is:", sad_fraction_str)

The fraction of sad tweets is: 2/10 i.e 0.2


 With some basic analysis, we already know a lot about the sentiment of the tweets given to us. Let us now define a metric called "sentiment score", to summarize the overall sentiment of the tweets.

**Q6: Calculate the sentiment score, which is defined as the difference betweek the fraction of happy tweets and the fraction of sad tweets.**

In [19]:
sentiment_score = happy_fraction - sad_fraction

In [20]:
print("The sentiment score for the given tweets is", round(sentiment_score,5))

The sentiment score for the given tweets is 0.4


**Q7: Display whether the overall sentiment of the given dataset of tweets is happy or sad, using the sentiment score.**

In [21]:
if sentiment_score >= 0:
    print("The overall sentiment is happy")
else:
    print("The overall sentiment is sad")

The overall sentiment is happy


Finally, it's also important to track how many tweets are neutral i.e. neither happy nor sad. If a large fraction of tweets are marked neutral, maybe we need to improve our lists of happy and sad words. 

**Q8: What is the fraction of tweets that are neutral i.e. neither happy nor sad.**

In [22]:
# store the final answer in this variable
number_of_neutral_tweets = 0

# perform the calculation here
for tweet in tweets:
    neutral_mood = True
    for word in happy_words:
        # Checking if the tweet contain the word
        if word in tweet:
            # Marking the tweet as true if found 
            neutral_mood = False
        else:
            for word in sad_words:
                # Checking if the tweet contains the word
                if word in tweet:
                    # Marking the tweet as happy
                    neutral_mood = False
    if neutral_mood == True:
        number_of_neutral_tweets +=1

In [23]:
neutral_fraction = number_of_neutral_tweets / number_of_tweets

In [24]:
print('The fraction of neutral tweets is', neutral_fraction)

The fraction of neutral tweets is 0.2
