<a href="https://colab.research.google.com/github/Namvi3t/DataProjects/blob/main/Sentiment_Analysis_on_Elon_Musk.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sentiment Analysis on Elon Musk

#### By Samuel Do

# Setup

The code block below is to set up the sentiment analysis. 

In [None]:
#Import Files for sentiment analysis
import tweepy #Needed tor Tweepy
import re #Needed for Tweepy
from textblob import TextBlob #Process Textual data
from wordcloud import WordCloud #Help create the world cloud
import pandas as pd #Help use data structures and data analysis tools
import numpy as np #Help perform mathimatical arrays
import matplotlib.pyplot as plt #Help plot graphs
plt.style.use('dark_background') #What type of style/looks of the graph. Dark mode helps my eyes
from google.colab import files

# save your twitter developer account API key and secret and access key and secret in a txt file
# make sure there are no extra white space in your txt file
# upload this text file to Google Colab:

txtfile = files.upload()
keys =  txtfile.get('twkeys.txt').splitlines()

# test your keys and secrets are correct or not:
apiKey = keys[0]
apiKeySecret = keys[1]
accessToken = keys[2]
accessTokenSecret = keys[3]
auth = tweepy.OAuthHandler(apiKey, apiKeySecret)
auth.set_access_token(accessToken, accessTokenSecret)
api = tweepy.API(auth)
try:
  api.verify_credentials()
  print("verification successful!")  
except: 
  print("authentication error")  # if keys are NOT correct, you should see error

Saving twkeys.txt to twkeys.txt
verification successful!


# Analysis

In [None]:
# Get 200 tweets from Elon Musk Twitter Page
posts = api.user_timeline(screen_name ="elonmusk", count=200, lang = "en", tweet_mode="extended")

# Print last 200 tweets from Elon Musk
print("Here are 200 recent tweets: \n")
i = 1
for tweet in posts[0:200]: #Print the 200 tweets 
  print(str(i) +')'+ tweet.full_text + "\n")
  i+=1

In [None]:
#Create a dataframe 
df = pd.DataFrame( [tweet.full_text for tweet in posts], columns =['Tweets'])

#Show the first 5 rows of data
df.head()

In [None]:
#Clean the text here
def cleanText(text):
  text = re.sub(r'@[A-Za-z0-9]+', '', text) #Remove the '@' symbol or the @mentions
  text = re.sub(r'#', '', text) #Remove '#' symbol which is the hash tag
  text = re.sub(r'RT[\s]+', '', text) #Remove Retweets
  text = re.sub(r'https?:\/\/S+', '', text) #Remove hyperlink from the tweets

  return text
  
#Call in the cleanText method
df['Tweets']= df['Tweets'].apply(cleanText)

#Show the cleaned text
df

In [None]:
# Create a Function to get the subjectivity
def getSubjectivity(text):
  return TextBlob(text).sentiment.subjectivity

# Function to determine if the positive or negative by using polarity
def getPolarity(text):
  return TextBlob(text).sentiment.polarity

# Create 2 columns
df['Subjectivity'] = df['Tweets'].apply(getSubjectivity)
df['Polarity'] = df['Tweets'].apply(getPolarity)

#Show dataframe with the new columns
df

In [None]:
#The World Cloud
allWords = ' '.join( [twts for twts in df['Tweets']])
wordCloud = WordCloud(width = 600, height = 300, random_state = 22, max_font_size = 120).generate(allWords)

#Customize the wordCloud and displaying it
plt.imshow(wordCloud, interpolation = "bilinear")
# Do not show the axies since it makes the image look bad
plt.axis('off')
plt.show()

In [None]:
#Create a funciton for negative, positive, and neutral analysis
def getAnalysis(score):
  #Determine the polarity  
  if score < 0:
    return 'Negative'
  elif score == 0:
    return 'Neutral'
  else:
    return 'Positive'

#Add another column displaying Analysis
df['Analysis'] = df['Polarity'].apply(getAnalysis)

# Display the dataframe
df

In [None]:
# Print all positive tweets
k=1 #Iterate the list of positive tweets
sortedDF = df.sort_values(by=['Polarity']) #Sort the dataframe into polarity
#Display all positive tweets based on the count from previous code
for i in range(0, sortedDF.shape[0]):
  if (sortedDF['Analysis'][i]== 'Positive'):
    print(str(k) + ')' +sortedDF['Tweets'][i])
    print()
    k+=1

In [None]:
# Print all negative tweets
k=1 #Iterate the list of negative tweets
sortedDF = df.sort_values(by=['Polarity'], ascending = 'False') 
#Display all negative tweets based on the count from previous code
for i in range(0, sortedDF.shape[0]):
  if (sortedDF['Analysis'][i]== 'Negative'):
    print(str(k) + ')' +sortedDF['Tweets'][i])
    print()
    k+=1

In [None]:
# Print all neutral tweets
k=1 #Iterate the list of negative tweets
sortedDF = df.sort_values(by=['Polarity']) 
#Display all neutral tweets based on the count from previous code
for i in range(0, sortedDF.shape[0]):
  if (sortedDF['Analysis'][i]== 'Neutral'):
    print(str(k) + ')' +sortedDF['Tweets'][i])
    print()
    k+=1

In [None]:
# Plot polarity and subjectivity using Scatter Plot
#create the Scatter Plot size
plt.figure(figsize=(8,6))
#configure the scatter plot and display the scatter plot
for i in range(0, df.shape[0]):
  plt.scatter(df['Polarity'][i], df['Subjectivity'][i], color = 'Red')
plt.title('Sentiment Analysis')
plt.xlabel('Polarity')
plt.ylabel('Subjectivity')
plt.show()

In [None]:
# Calcualte the percentages of positive tweets
ptweets = df[df.Analysis == 'Positive']
ptweets = ptweets['Tweets']

round((ptweets.shape[0] / df.shape[0]) * 100, 1) #Calculate the percentage

In [None]:
# Calcualte  the percentages of negative tweets
ntweets = df[df.Analysis == 'Negative']
ntweets = ntweets['Tweets']

round((ntweets.shape[0] / df.shape[0]) * 100, 1) #Calculate the percentage

In [None]:
# Calcualte  the percentages of neutral tweets
neutweets = df[df.Analysis == 'Neutral']
neutweets = neutweets['Tweets']

round((neutweets.shape[0] / df.shape[0]) * 100, 1) #Calculate the percentage

In [None]:
#Show value counts
df['Analysis'].value_counts()

#Plot and visualzie the counts using a pie chart
plt.title('Sentiment Analysis')
plt.xlabel('Sentiment')
plt.ylabel('Counts')
df['Analysis'].value_counts().plot(kind='pie', autopct='%1.1f%%', textprops={'color':"r"})
plt.show()

# Resources

In order to complete the Sentiment Analysis on Elon Musk, it required libraries in order to do my analysis.

#### Documentation:
*   https://docs.tweepy.org/en/stable/ (Tweepy)
*   https://pandas.pydata.org/docs/ (Pandas)
*   https://numpy.org/doc/ (Numpy)
*   https://matplotlib.org/stable/index.html (Matplotlib)
*   https://textblob.readthedocs.io/en/dev/ (Text Blob)
*   https://python-course.eu/applications-python/python-wordcloud-tutorial.php (Word Cloud)

### Code Skeleton:
*   https://www.geeksforgeeks.org/twitter-sentiment-analysis-using-python/ 

Help set up Sentiment Analysis. I extended this by adding libraries to help make further analysis.











