# Twitter Sentiment Product Data Analysis

Purpose of project:

## Import Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
pip install jedi==0.17.2

Collecting jedi==0.17.2
  Downloading jedi-0.17.2-py2.py3-none-any.whl (1.4 MB)
Collecting parso<0.8.0,>=0.7.0
  Downloading parso-0.7.1-py2.py3-none-any.whl (109 kB)
Installing collected packages: parso, jedi
  Attempting uninstall: parso
    Found existing installation: parso 0.8.2
    Uninstalling parso-0.8.2:
      Successfully uninstalled parso-0.8.2
  Attempting uninstall: jedi
    Found existing installation: jedi 0.18.0
    Uninstalling jedi-0.18.0:
      Successfully uninstalled jedi-0.18.0
Successfully installed jedi-0.17.2 parso-0.7.1
Note: you may need to restart the kernel to use updated packages.


## Pull in Raw Data

In [6]:
df = pd.read_csv('data/raw/tweet_product_company.csv')

## Explore Raw Data

In [7]:
df.head()

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion


In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9093 entries, 0 to 9092
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   text     9092 non-null   object
 1   product  3291 non-null   object
 2   emotion  9093 non-null   object
dtypes: object(3)
memory usage: 213.2+ KB


In [9]:
df.columns

Index(['tweet_text', 'emotion_in_tweet_is_directed_at',
       'is_there_an_emotion_directed_at_a_brand_or_product'],
      dtype='object')

In [11]:
df.tweet_text.value_counts()

RT @mention Marissa Mayer: Google Will Connect the Digital &amp; Physical Worlds Through Mobile - {link} #sxsw                              5
RT @mention Google to Launch Major New Social Network Called Circles, Possibly Today {link} #sxsw                                           4
RT @mention Marissa Mayer: Google Will Connect the Digital &amp; Physical Worlds Through Mobile - {link} #SXSW                              4
RT @mention Google to Launch Major New Social Network Called Circles, Possibly Today {link} #SXSW                                           3
Win free ipad 2 from webdoc.com #sxsw RT                                                                                                    2
                                                                                                                                           ..
Wandered in on Google Doodles presentation to see @mention presenting! Fun! #sxsw                                                           1
Leavin

In [12]:
df.emotion_in_tweet_is_directed_at.value_counts()

iPad                               946
Apple                              661
iPad or iPhone App                 470
Google                             430
iPhone                             297
Other Google product or service    293
Android App                         81
Android                             78
Other Apple product or service      35
Name: emotion_in_tweet_is_directed_at, dtype: int64

It appears these tweets are about different google or apple products. I wonder if we could compare these twitter sentiments to company stock prices as well.

In [13]:
df.is_there_an_emotion_directed_at_a_brand_or_product.value_counts()

No emotion toward brand or product    5389
Positive emotion                      2978
Negative emotion                       570
I can't tell                           156
Name: is_there_an_emotion_directed_at_a_brand_or_product, dtype: int64

There are 4 different possible categories in the column we are trying to predict--positive, negative, no emotion or unknown.

In [16]:
df.isna().sum()

text          1
product    5802
emotion       0
dtype: int64

There are 5,802 missing values in the product column. We will likely only be using the text column to see if we can predict the emotion column, so I will not worry about this for now.

## Data Cleaning

I want to rename the columns so that the information is easier to access.

In [14]:
df.columns = ['text','product','emotion']

In [18]:
# confirm the change was made
df.head()

Unnamed: 0,text,product,emotion
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion


Will want to convert emotion to a number scale. Since 'I can't tell' values are a very low percentage, I will categorize these as no emotion.

In [19]:
emotion_dict = {"No emotion toward brand or product":0,"Positive emotion":1, "Negative emotion":-1, "I can't tell":0}

In [20]:
df['emotion'] = df['emotion'].map(emotion_dict)
df.head()

Unnamed: 0,text,product,emotion
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,-1
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,1
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,1
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,-1
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,1


## Export Clean Data

In [23]:
df.to_csv('data/clean/clean_tweet_emotion.csv')