# **Tracking Customer Sentiment Trends Using Deep Learning**

# **Business Understanding**   
Customer sentiment plays a crucial role in shaping business strategies and decision-making. As businesses introduce new product lines, adjust pricing, or improve services, customer opinions evolve. Tracking sentiment over time provides valuable insights into how these changes impact customer satisfaction. By analyzing customer feedback, businesses can identify trends, enhance product offerings, and address potential issues before they escalate.  

The ability to monitor and analyze customer sentiment is essential for businesses aiming to improve customer experience and maintain a competitive edge. Positive sentiment indicates customer satisfaction, while negative sentiment can highlight areas that require immediate attention. Understanding sentiment trends helps businesses refine their products, optimize marketing strategies, and improve service delivery.  

# **Problem Statement** 
Customer sentiment is dynamic and influenced by multiple factors, including product quality, delivery efficiency, and pricing strategies. However, businesses often lack a structured approach to track sentiment changes over time. Without a robust sentiment analysis model, companies risk making uninformed decisions that could negatively impact customer retention and brand reputation. This project aims to develop a deep learning-based sentiment classification model that can accurately track customer sentiment trends and provide actionable insights.  

# **Objectives**  
The primary objectives of this project are:  

- Develop a sentiment classification model using deep learning techniques, specifically leveraging TensorFlow Hub and a Deep Neural Network (DNN) classifier.  
- Analyze sentiment trends over time to understand how business decisions affect customer satisfaction.  
- Identify key factors influencing sentiment, such as product quality, pricing, and delivery performance.  
- Provide actionable insights that can help businesses improve customer experience and optimize their operations.  

# **Metrics of Success**  
The success of this project will be evaluated using a combination of model performance metrics and business insights.  

## **Model Performance Metrics**  
- **F1-Score (≥ 78%)** – This will serve as the primary metric, as it balances precision and recall, ensuring that both positive and negative sentiments are correctly classified.  
- **Recall (≥ 75%)** – Since capturing negative sentiment is critical for business decision-making, recall will be prioritized to ensure that dissatisfied customers are correctly identified.  
- **Accuracy (≥ 85%)** – While overall correctness is important, accuracy alone may not be sufficient due to potential class imbalances.  

## **Business Insights Metrics**  
- **Sentiment Trend Analysis (Monthly/Quarterly Shifts)** – The model should successfully detect sentiment shifts over time, allowing businesses to correlate customer feedback with operational changes.  
- **Product and Service Impact Assessment** – The model should identify products or services that generate the most positive or negative sentiment, providing insights for business improvement.  
- **Delivery and Logistics Influence** – By analyzing sentiment variations in relation to delivery performance, the project will help businesses understand how logistics affect customer satisfaction.  


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
import tensorflow_hub as hub
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import zipfile
import os





In [2]:
zip_path = "data/olist_classified_public_dataset.csv.zip"
extract_path = "data"

with zipfile.ZipFile(zip_path, "r") as zip_ref:
    zip_ref.extractall(extract_path)

print("Data extracted successfully!")

Data extracted successfully!


In [4]:
data_path = os.path.join(extract_path, "olist_classified_public_dataset.csv") 
data = pd.read_csv(data_path)
print(data.head())


   Unnamed: 0  id order_status  order_products_value  order_freight_value  \
0           0   1    delivered                 89.99                14.38   
1           1   2    delivered                 69.00                15.23   
2           2   3    delivered                 99.80                15.86   
3           3   4    delivered                 87.00                12.74   
4           4   5    delivered                 99.90                17.95   

   order_items_qty  order_sellers_qty    order_purchase_timestamp  \
0                1                  1  2017-08-30 11:41:01.000000   
1                1                  1  2017-09-26 09:13:36.000000   
2                2                  4  2018-01-15 15:50:42.000000   
3                1                  1  2018-02-04 11:16:42.000000   
4                1                  2  2017-12-07 11:58:42.000000   

             order_aproved_at order_estimated_delivery_date  ...  \
0  2017-08-30 11:55:08.970352    2017-09-21 00:00:00.0

In [5]:

print(f"Dataset contains {data.shape[0]} rows and {data.shape[1]} columns.")

Dataset contains 3584 rows and 34 columns.


In [6]:
print("Columns in dataset:", data.columns)

Columns in dataset: Index(['Unnamed: 0', 'id', 'order_status', 'order_products_value',
       'order_freight_value', 'order_items_qty', 'order_sellers_qty',
       'order_purchase_timestamp', 'order_aproved_at',
       'order_estimated_delivery_date', 'order_delivered_customer_date',
       'customer_city', 'customer_state', 'customer_zip_code_prefix',
       'product_category_name', 'product_name_lenght',
       'product_description_lenght', 'product_photos_qty', 'review_score',
       'review_comment_title', 'review_comment_message',
       'review_creation_date', 'review_answer_timestamp',
       'votes_before_estimate', 'votes_delayed', 'votes_low_quality',
       'votes_return', 'votes_not_as_anounced', 'votes_partial_delivery',
       'votes_other_delivery', 'votes_other_order', 'votes_satisfied',
       'most_voted_subclass', 'most_voted_class'],
      dtype='object')


In [8]:
missing_values = data.isnull().sum()
missing_values = missing_values[missing_values > 0]  
print(missing_values)


order_delivered_customer_date     117
review_comment_title             3584
most_voted_subclass               171
most_voted_class                  171
dtype: int64
