** add image of twitter bird **  

# Twitter NLP Sentimental Analysis

#### Authors: Raul Torres, Ziyuan Terry

### Overview

In this Twitter sentiment analysis project, we analyzed over 9,000 data points from Apple and Google, with positive, negative, and neutral responses. This was a supervised machine learning algorithm, where the training data was labeled with the correct sentiment (positive, negative, or neutral) for each tweet. The algorithm then learned from this training data to make predictions on the sentiment of new, unseen tweets. The results of the analysis showed that the majority of tweets about both Apple and Google were positive in sentiment, with a smaller percentage of negative and neutral tweets. This analysis provided valuable insights into public opinion about these two tech companies.

### Business Problem

Apple is interested in understanding how the public views the company and its products. In order to improve its marketing and public relation strategies, Apple has decided to conduct a Twitter sentiment analysis project to gather insights into public opinion. The project involves analyzing over 9,000 tweets about Apple and its products, using a supervised machine learning algorithm to classify the tweets as positive, negative, or neutral in sentiment. The results of the analysis will provide valuable information about the overall sentiment towards Apple, as well as specific areas where the company may need to improve in order to better align with public opinion. This information will be used to inform future marketing and public relation strategies for the company.

### Data Understanding

As part of the data understanding phase of this Twitter sentiment analysis project, we examined a dataset of over 9,000 tweets about both Apple and Google. The goal of this analysis was to understand the sentiment of these tweets, and to classify them as either positive or negative. This required us to use natural language processing (NLP) techniques to clean and prepare the data for analysis, as well as machine learning algorithms to make predictions on the sentiment of the tweets.

The data analysis showed that the majority of tweets about both Apple and Google were positive in sentiment, with a smaller percentage of negative tweets. This suggests that overall, people tend to have a positive sentiment towards these two companies. We filtered the data to only show positive and negative results for Apple, which may have influenced the results. In general, sentiment analysis can provide valuable insights into public opinion and attitudes towards a particular topic or company.

In [4]:
# Import necessary packages
import pandas as pd
from sklearn.model_selection import train_test_split
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import os
import sys
module_path = os.path.abspath(os.path.join(os.pardir, os.pardir))
if module_path not in sys.path:
    sys.path.append(module_path)
    
import pandas as pd
import nltk
from nltk.probability import FreqDist
from nltk.corpus import stopwords
from nltk.tokenize import regexp_tokenize, word_tokenize, RegexpTokenizer
from nltk import pos_tag
from nltk.corpus import wordnet
import matplotlib.pyplot as plt
import string
import re
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer,\
HashingVectorizer

In [10]:
# Reading data
df = pd.read_csv('data/judge-1377884607_tweet_product_company.csv', encoding='Latin-1')
df.tail(10)

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
9083,"Google says the future is all around you! (ie,...",,No emotion toward brand or product
9084,"Google says the future is location, location, ...",,No emotion toward brand or product
9085,I've always used Camera+ for my iPhone b/c it ...,iPad or iPhone App,Positive emotion
9086,Google says: want to give a lightning talk to ...,,No emotion toward brand or product
9087,"@mention Yup, but I don't have a third app yet...",,No emotion toward brand or product
9088,Ipad everywhere. #SXSW {link},iPad,Positive emotion
9089,"Wave, buzz... RT @mention We interrupt your re...",,No emotion toward brand or product
9090,"Google's Zeiger, a physician never reported po...",,No emotion toward brand or product
9091,Some Verizon iPhone customers complained their...,,No emotion toward brand or product
9092,Ï¡Ïàü_ÊÎÒ£Áââ_£â_ÛâRT @...,,No emotion toward brand or product


## Data preparation

In [11]:
# check for nulls on tweet_text
df['tweet_text'].isna().sum()

1

In [12]:
df['is_there_an_emotion_directed_at_a_brand_or_product'].value_counts()

No emotion toward brand or product    5389
Positive emotion                      2978
Negative emotion                       570
I can't tell                           156
Name: is_there_an_emotion_directed_at_a_brand_or_product, dtype: int64

In [13]:
df['emotion_in_tweet_is_directed_at'].value_counts()

iPad                               946
Apple                              661
iPad or iPhone App                 470
Google                             430
iPhone                             297
Other Google product or service    293
Android App                         81
Android                             78
Other Apple product or service      35
Name: emotion_in_tweet_is_directed_at, dtype: int64

In [22]:
# Replace iPhone to Apple
df['emotion_in_tweet_is_directed_at'].replace(to_replace='iPhone',
                                              value='Apple', 
                                              inplace=True)

In [23]:
# Replace ipad or iphone app to apple 
df['emotion_in_tweet_is_directed_at'].replace(to_replace='iPad or iPhone App', 
                                              value='Apple', 
                                              inplace=True)

In [18]:
# Replace ipad or iphone app to apple 
df['emotion_in_tweet_is_directed_at'].replace(to_replace='iPad', 
                                              value='Apple', 
                                              inplace=True)

In [19]:
# Replace ther Apple pr to apple 
df['emotion_in_tweet_is_directed_at'].replace(to_replace='Other Apple product or service', 
                                              value='Apple', 
                                              inplace=True)

In [21]:
# Checking how data 
df['emotion_in_tweet_is_directed_at'].value_counts()

Apple                              2409
Google                              430
Other Google product or service     293
Android App                          81
Android                              78
Name: emotion_in_tweet_is_directed_at, dtype: int64