<a href="https://colab.research.google.com/github/E-Juliet/Mobile-Phone-Sentiment-Analysis/blob/main/Mobile_Phone_Sentiment_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Business Understanding

## 1.1 Problem Statement

Purchasing a product is an interaction between two entities, consumers and business owners. Consumers often use reviews to make decisions about what products to buy, while businesses, on the other hand, not only want to sell their products but also want to receive feedback in terms of consumer reviews. Consumer reviews about purchased products shared on the internet have a great impact. Human nature is generally structured to make decisions based on analyzing and getting the benefit of other consumer experience and opinions because others often have a great influence on our beliefs, behaviors, perception of reality, and the choices we make. Hence, we ask others for their feedback whenever we are deciding on doing something. Additionally, this fact applies not only to consumers but also to organizations and institutions.

As social media networks have evolved, so have the ways that consumers express their opinions and feelings. With the vast amount of data now available online, it has become a challenge to extract useful information from it all. Sentiment analysis has emerged as a way to predict the polarity (positive, negative, or neutral) of consumer opinion, which can help consumers better understand the textual data.

E-commerce websites have increased in popularity to the point where consumers rely on them for buying and selling. These websites give consumers the ability to write comments about different products and services, which has resulted in a huge amount of reviews becoming available. Consequently, the need to analyze these reviews to understand consumers’ feedback has increased for both vendors and consumers. However, it is difficult to read all the feedback for a particular item, especially for popular items with many comments. 

In this research, we attempt to build a predictor for consumers’ satisfaction on mobile phone products based on the reviews. We will also attempt to understand the factors that contribute to classifying reviews as positive, negative or neutral (based on important or most frequent words). This is believed to help companies improve their products and also help potential buyers make better decisions when buying products.

### Main objective
- To perform a sentiment analysis of mobile phone reviews from Amazon website to determine how these reviews help consumers to have conﬁdence that they have made the right decision about their purchases.

### Specific Objectives
- To help companies understand their consumers’ feedback to maintain their products/services or enhance them.
- To provide insights to companies in curating offers on speciﬁc products to increase their proﬁts and customer satisfaction.
- To understand the factors that contribute to classifying reviews as positive, negative or neutral (based on important or most frequent words).
- To determine mobile phones key features that influence smartphone purchases.
- To perform a market segmentation of consumers based on their reviews
- To advise the advertisement department in companies on these key features to use as selling points and to specific customer segments  in upcoming advertisements.


## 1.2 Metrics of Success

The best performing model will be selected based on:
- An accuracy score > 80%
- An F1 score > 0.85 


# 2. Data Understanding

The data used for this project is obtained from [data.world](https://data.world/promptcloud/amazon-mobile-phone-reviews) and contains more than 400 thousand reviews  of unlocked mobile phones sold on [amazon.com](https://www.amazon.com/). The data was collected from 2016 and last updated in April 2022. The data contains 6 columns:
- Product_name : Contains the name of the product
- Brand : Contains the brand of the product
- Price : Contains the price of the brans
- Rating : Contains the rating awarded to that product
- Reviews : Contains the review of that product
- Review_votes : Number of people who found the review helpful



# 3. Loading the Data

## 3.1 Loading the Libraries

In [15]:
import pandas as pd

from matplotlib import pyplot as pyplot
import seaborn as sns

In [16]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## 3.2 Loading the Data

In [17]:
# loading the data

df = pd.read_csv('/content/drive/Shareddrives/Alpha/Data/amazon_unlocked_mobile.csv')

## 3.3 Previewing the Data

In [18]:
# checking the shape of the data

print(f'The data has {df.shape[0]} rows and {df.shape[1]} columns')

The data has 413840 rows and 6 columns


In [19]:
# checking the data types of the data

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 413840 entries, 0 to 413839
Data columns (total 6 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   product_name  413840 non-null  object 
 1   brand_name    348669 non-null  object 
 2   price         407907 non-null  float64
 3   rating        413840 non-null  int64  
 4   reviews       413778 non-null  object 
 5   review_votes  401544 non-null  float64
dtypes: float64(2), int64(1), object(3)
memory usage: 18.9+ MB


# 4. Data Cleaning

## 4.1 Missing values


In [20]:
# Getting the sum of missing values per column

df.isnull().sum()

product_name        0
brand_name      65171
price            5933
rating              0
reviews            62
review_votes    12296
dtype: int64

In [21]:
# Percentage of missing values 

for col in ['brand_name','price','reviews','review_votes']:
  percentage = (df[col].isnull().sum()/len(df[col]))*100
  print(f"{col}:{round(percentage,2)}")



brand_name:15.75
price:1.43
reviews:0.01
review_votes:2.97


Out of the 6 columns,4 column have missing values.

The brand_name column has the highest percentage of missing values.

Since the dataset is large,the missing values can be dropped and still retain relevant information.

In [22]:
# Dropping the missing values

df.dropna(inplace = True)

In [23]:
# Confirming there are no missing values 

df.isna().sum()

product_name    0
brand_name      0
price           0
rating          0
reviews         0
review_votes    0
dtype: int64

## 4.2 Duplicates

In [39]:
# Checking for duplicates

print(f"The data has {df.duplicated().sum()} duplicated rows")

The data has 53081 duplicated rows


In [40]:
# Exploring the duplicates

duplicates = df[df.duplicated(keep = 'first')]

duplicates.head(10)

Unnamed: 0,product_name,brand_name,price,rating,reviews,review_votes
41,"""Nokia Asha 302 Unlocked GSM Phone with 3.2MP ...",Nokia,299.0,5,excelente,0.0
60,"""Nokia Asha 302 Unlocked GSM Phone with 3.2MP ...",Nokia,299.0,5,Excelente,0.0
65,"""Nokia Asha 302 Unlocked GSM Phone with 3.2MP ...",Nokia,299.0,5,excelente,0.0
66,"""Nokia Asha 302 Unlocked GSM Phone with 3.2MP ...",Nokia,299.0,5,Excelente,0.0
100,"""Nokia Asha 302 Unlocked GSM Phone with 3.2MP ...",Nokia,299.0,5,excelente,0.0
102,"""Nokia Asha 302 Unlocked GSM Phone with 3.2MP ...",Nokia,299.0,5,excelente,0.0
246,[XMAS DEAL] Jethro [SC118] Simple Unlocked Qua...,Jethro,59.99,3,Word to the wise: Check with the seller for th...,7.0
247,[XMAS DEAL] Jethro [SC118] Simple Unlocked Qua...,Jethro,59.99,2,The SIM card from the provider would not work ...,2.0
248,[XMAS DEAL] Jethro [SC118] Simple Unlocked Qua...,Jethro,59.99,5,Purchased this phone for my parents who have d...,14.0
249,[XMAS DEAL] Jethro [SC118] Simple Unlocked Qua...,Jethro,59.99,5,Great phone!,0.0


In [41]:
# Dropping the duplicates

df.drop_duplicates(inplace = True)




In [42]:
# Confirming if there are duplicates

df.duplicated().sum()

0

In [30]:
df[df['reviews'].duplicated()]

Unnamed: 0,product_name,brand_name,price,rating,reviews,review_votes
41,"""Nokia Asha 302 Unlocked GSM Phone with 3.2MP ...",Nokia,299.00,5,excelente,0.0
60,"""Nokia Asha 302 Unlocked GSM Phone with 3.2MP ...",Nokia,299.00,5,Excelente,0.0
65,"""Nokia Asha 302 Unlocked GSM Phone with 3.2MP ...",Nokia,299.00,5,excelente,0.0
66,"""Nokia Asha 302 Unlocked GSM Phone with 3.2MP ...",Nokia,299.00,5,Excelente,0.0
100,"""Nokia Asha 302 Unlocked GSM Phone with 3.2MP ...",Nokia,299.00,5,excelente,0.0
...,...,...,...,...,...,...
413835,Samsung Convoy U640 Phone for Verizon Wireless...,Samsung,79.95,5,another great deal great price,0.0
413836,Samsung Convoy U640 Phone for Verizon Wireless...,Samsung,79.95,3,Ok,0.0
413837,Samsung Convoy U640 Phone for Verizon Wireless...,Samsung,79.95,5,Passes every drop test onto porcelain tile!,0.0
413838,Samsung Convoy U640 Phone for Verizon Wireless...,Samsung,79.95,3,I returned it because it did not meet my needs...,0.0


# 5. Feature Engineering

# 6. Exploratory Data Analysis(EDA)

# 7. Implementing the Solution

## 7.1 Preprocessing

# 8. Challenging the Solution

# 9. Conclusions

# 10. Recommendations