# Sentiments On Crowd-Flower Brands and Products

# Table of contents

[1. Business Understanding](#1.-Business-Understanding) </br>
[1.1 Business Description](#1.1-Business-Description) </br>
[1.2 Problem Statement](#1.2-Problem-Statement) </br>
[1.3 Main Objective](#1.3-Main-Objective) </br>
[1.4 Specific Objectives](#1.4-Specific-Objectives) </br>
[2. Importing Libraries And Warnings](#2.-Importing-Libraries-And-Warnings) </br>
[3. Data Understanding](#3.-Data-Understanding) </br>
[4. Data Preparation](#4.-Data-Preparation) </br>
[4.1 Visualizing before cleaning](#4.1-Visualizing-Before-Cleaning) </br>
[4.2 Missing Values](#4.2-Missing-Values) </br>
[4.3 Duplicates](#4.3-Duplicates) </br>
[4.4 Place Holders](#4.4-Place-Holders) </br>
[4.5 Messy Columns](#4.5-Messy-Columns) </br>
[4.6 White Space](#4.6-White-Space) </br>
[4.7 Visualizing After Cleaning](#4.7-Visualizing-After-Cleaning) </br>
[5. Data Preprocessing](#5.-Data-Preprocessing) </br>
[6. Data Modelling](#6.-Data-Modelling) </br>
[7. Evaluation](#7.-Evaluation) </br>
[8. Recommendations](#8.-Recommendations) </br>
[9. Conclusions](#9.-Conclusions) </br>
[10. Limitations](#10.-Limitations)

# 1. Business Understanding

## 1.1 Business Description

Emotional branding is a strategy that aims to create a strong connection between a product or service and the emotions of the target audience. It can help to build loyalty, trust, and differentiation in a competitive market.

## 1.2 Problem Statement

## 1.3 Main Objective

## 1.4 Specific Objectives

# 2. Importing Libraries And Warnings

In [39]:
# Imports
import re
import nltk
import string
import warnings
import numpy as np
%matplotlib inline
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from wordcloud import WordCloud
warnings.filterwarnings("ignore")
from nltk.corpus import stopwords
from nltk.stem.wordnet import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer

# 3. Data Understanding

In [31]:
#loading the data
df = pd.read_csv("Data/crowdflower-brands-and-product-emotions/original/judge-1377884607_tweet_product_company.csv", encoding = "latin1")
df

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion
...,...,...,...
9088,Ipad everywhere. #SXSW {link},iPad,Positive emotion
9089,"Wave, buzz... RT @mention We interrupt your re...",,No emotion toward brand or product
9090,"Google's Zeiger, a physician never reported po...",,No emotion toward brand or product
9091,Some Verizon iPhone customers complained their...,,No emotion toward brand or product


In [32]:
# Checking the columns
df.columns

Index(['tweet_text', 'emotion_in_tweet_is_directed_at',
       'is_there_an_emotion_directed_at_a_brand_or_product'],
      dtype='object')

In [35]:
#checking the unique values in the emotion_in_tweet_is_directed_at column
df["emotion_in_tweet_is_directed_at"].unique()

array(['iPhone', 'iPad or iPhone App', 'iPad', 'Google', nan, 'Android',
       'Apple', 'Android App', 'Other Google product or service',
       'Other Apple product or service'], dtype=object)

In [34]:
#checking the uniques values in is_there_an_emotion_directed_at_a_brand_or_product column
df["is_there_an_emotion_directed_at_a_brand_or_product"].unique()

array(['Negative emotion', 'Positive emotion',
       'No emotion toward brand or product', "I can't tell"], dtype=object)

## Information about the columns
* tweet_text - It contains information about the text
  
* emotion_in_tweet_is_directed_at - It contains information about the brand </br>
  i.e; </br>
  iPhone </br>
  iPad or iPhone App </br>
  Google Android </br>
  Apple </br>
  Android App </br>
  Other Google product or service </br>
  Other Apple product or service </br>
  
* is_there_an_emotion_directed_at_a_brand_or_product - It contains information about the emotion towards a given brand (will be used as the target) </br>
  i.e; </br>
  Negative emotion </br>
  Positive emotion </br>
  No emotion toward brand or product </br>
  I can't tell </br>

In [38]:
# Data types of the column values
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9093 entries, 0 to 9092
Data columns (total 3 columns):
 #   Column                                              Non-Null Count  Dtype 
---  ------                                              --------------  ----- 
 0   tweet_text                                          9092 non-null   object
 1   emotion_in_tweet_is_directed_at                     3291 non-null   object
 2   is_there_an_emotion_directed_at_a_brand_or_product  9093 non-null   object
dtypes: object(3)
memory usage: 213.2+ KB


In [37]:
# Summarily statistics
df.describe()

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
count,9092,3291,9093
unique,9065,9,4
top,RT @mention Marissa Mayer: Google Will Connect...,iPad,No emotion toward brand or product
freq,5,946,5389


In [36]:
#Checking the shape of our data
df.shape

(9093, 3)

# 4. Data Preparation

## 4.1 Visualizing Before Cleaning

In [None]:
plt.figure(figsize = (10, 6))
plt.title("Word count visual before cleaning")
plt.xlabel("columns")
plt.ylabel("")
plt.xticks(rotation = 45)
plt.show()

## 4.2 Renaming the columns

## 4.2 Missing Values

In [11]:
df.isna().sum()

tweet_text                                               1
emotion_in_tweet_is_directed_at                       5552
is_there_an_emotion_directed_at_a_brand_or_product       0
dtype: int64

## 4.3 Duplicates

In [20]:
df.duplicated()

0       False
1       False
2       False
3       False
4       False
        ...  
8716    False
8717    False
8718    False
8719    False
8720    False
Length: 8721, dtype: bool

In [21]:
df.duplicated().sum()

22

## 4.4 Place Holders

## 4.5 Messy Columns

## 4.6 White Space

## 4.7 Visualizing After Cleaning

# 5. Data Preprocessing 

# 6. Data Modelling

# 7. Evaluation

# 8. Recommendations

# 9. Conclusions

# 10. Limitations