# üí¨ Sentiment Analysis Project

#### üéØ 1. Defining the Business Problem

This project begins by defining the **business problem** that needs to be solved.  
In Data Science, this step is extremely important because it connects **business needs** with **data-driven solutions**.

Before building any model, we need to clearly understand:

- ‚ùì **What problem we want to solve**
- üí° **Why this problem matters for the company**
- üìè **How success will be measured**

---

#### üè¢ Business Context

An **e-commerce company** receives thousands of product reviews from customers every day.

Currently, the analysis of these reviews is done **manually**, which creates several challenges:

- ‚è≥ **Slow analysis process**
- üí∞ **High operational cost**
- üìà **Difficulty scaling with the growing number of reviews**

Because of this, valuable customer feedback may take too long to be identified, making it harder for the company to react quickly to problems or opportunities.

---

#### üéØ Project Objective

The goal of this project is to build a **Machine Learning model** capable of automatically classifying customer reviews into two categories:

- üòä **Positive**
- üò° **Negative**

By automating this process, the company will be able to analyze customer sentiment **faster, cheaper, and at scale**.

---

#### üöÄ Expected Business Benefits

**‚ö° Efficiency**  
Reduce the time and cost associated with manual review analysis.

**üìä Faster Decision-Making**  
Enable product and marketing teams to quickly identify products with issues or opportunities for improvement.

**üéØ Better Prioritization**  
Automatically route **negative reviews** to the customer support team, allowing faster responses and improving the overall customer experience.

---

#### üìå Introduction

Customer feedback is one of the most valuable sources of information for companies.  
However, when the volume of reviews grows significantly, manual analysis becomes inefficient and difficult to scale.

In this project, we apply **Natural Language Processing (NLP)** and **Machine Learning** techniques to build a **Sentiment Analysis model** capable of automatically classifying customer reviews as **positive** or **negative**.

This approach allows companies to **extract insights from large volumes of text data**, improving decision-making, operational efficiency, and customer satisfaction.

### üì• Loading the Dataset and Libraries

Before we start, we need to install the necessary Python libraries and **load the dataset**.


In [1]:
# Install the required libraries
%pip install pandas numpy matplotlib seaborn scikit-learn -q 
# Install and update the watermark package to display environment and library version information
%pip install -q -U watermark

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.2 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.2 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
# Data manipulation and visualization
import re
import pandas as pd
import numpy as np
import unicodedata
import seaborn as sns
import matplotlib.pyplot as plt

# Pre-Processing and Machine Learning
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import joblib

In [3]:
# Load the watermark extension
%reload_ext watermark

# Display metadata for your notebook
%watermark -a "Maykon - üí¨ Sentiment Analysis Project" -d -u -v -p numpy,pandas,matplotlib,seaborn

Author: Maykon - üí¨ Sentiment Analysis Project

Last updated: 2026-02-28

Python implementation: CPython
Python version       : 3.13.7
IPython version      : 9.10.0

numpy     : 2.4.2
pandas    : 3.0.1
matplotlib: 3.10.8
seaborn   : 0.13.2



In [4]:
# Load the Sentiment Analysis dataset from a local CSV file into a pandas DataFrame
data = r"data\dataset.csv"

df = pd.read_csv(data)

In [6]:
# Shape
df.shape

(500, 3)

In [11]:
df.head() #Displays the first rows of the DataFrame df.

Unnamed: 0,review_id,texto_review,sentimento
0,1,Estou muito feliz com a compra. O cadeira game...,positivo
1,2,,negativo
2,3,N√£o recomendo. A entrega foi lenta e o celular...,negativo
3,4,O monitor √© decepcionante. O suporte ao client...,positivo
4,5,√â UM LIVRO OK PELO PR√áEO QUE PAGUEI.,negativo


In [9]:
df.sample(15) # Random 15 rows

Unnamed: 0,review_id,texto_review,sentimento
482,483,Odiei o teclado. Qualidade de baixa qualidade ...,negativo
129,130,√ìtimo custo-benef√≠cio. O monitor √© fant√°stica ...,positivo
195,196,"Excelente cadeira gamer, superou minhas expect...",positivo
461,462,Amei o teclado! A qualidade √© fant√°stica e a e...,positivo
200,201,Estou muito frustrado com esta compra. Dinheir...,negativo
241,242,√ìtimo custo-benef√≠cio. O livro √© √≥tima e muito...,positivo
66,67,Amei o notebook! A qualidade √© muito boa e a e...,positivo
277,278,A embalagem do fone de ouvido chegou um pouco ...,negativo
302,303,√ìtimo custo-benef√≠cio. O livro √© muito boa e m...,positivo
253,254,Odiei o cadeira gamer. Qualidade p√©ssima e vei...,negativo


In [12]:
df.tail() #Displays the last 5 rows of the DataFrame df.

Unnamed: 0,review_id,texto_review,sentimento
495,496,Odiei o teclado. Qualidade de baixa qualidade ...,negativo
496,497,Estou muito impressionado com a compra. O moni...,positivo
497,498,N√£o recomendo. A entrega demorou uma eternidad...,negativo
498,499,Estou muito arrependido com esta compra. Dinhe...,negativo
499,500,√ìtimo custo-benef√≠cio. O cadeira gamer √© incr√≠...,positivo
