## Simulate Customer Feedback Data

#### Creating a synthetic feedback dataset with:

    - CustomerID: Matches the ID from the main dataset to link feedback with customer records.
    - Feedback: Text feedback from customers, including a mix of positive, neutral, and negative comments.
    - Rating: Numeric rating for each feedback (e.g., 1-5), representing overall satisfaction.

In [7]:
import pandas as pd
import numpy as np

# Load the main dataset to get a list of CustomerIDs
data_path = '../data/processed/segmented-customer-data.csv' 
customer_data = pd.read_csv(data_path)

# Sample a subset of CustomerIDs
np.random.seed(42)  # For reproducibility
sample_customer_ids = np.random.choice(customer_data['ID'], 100, replace=True)

# Generate simulated feedback
feedback_texts = [
    "I love the quality of the products, but the delivery could be faster.",
    "Customer service was very helpful with my issue.",
    "The product was damaged on arrival, very disappointed.",
    "I’m satisfied with my purchase, great value for money.",
    "Had trouble navigating the website, but the products are worth it.",
    "Fantastic experience, will definitely recommend!",
    "Not satisfied, I expected better quality.",
    "The delivery was quick, and the product quality is top-notch.",
    "Had a bad experience with customer support.",
    "The new loyalty program is amazing, thank you!",
]

# Randomly assign feedback and ratings
feedback_data = pd.DataFrame({
    'CustomerID': sample_customer_ids,
    'Feedback': np.random.choice(feedback_texts, 100),
    'Rating': np.random.randint(1, 6, size=100)  # Ratings from 1 to 5
})

# Save the simulated feedback data to a CSV file
feedback_data.to_csv('../data/processed/feedback-segmented-customer-data.csv', index=False)

# Display the first few rows
print(feedback_data.head())


   CustomerID                                           Feedback  Rating
0        4297     The new loyalty program is amazing, thank you!       1
1        2811     The new loyalty program is amazing, thank you!       3
2        3412  The product was damaged on arrival, very disap...       4
3        9964          Not satisfied, I expected better quality.       1
4       10785  I love the quality of the products, but the de...       1


## Perform Sentiment Analysis on Feedback Data

With the simulated feedback data, performing sentiment analysis to classify feedback as positive, neutral, or negative. Will use TextBlob for this, which is a simple library that provides sentiment polarity (ranging from -1 for very negative to +1 for very positive).

In [9]:
from textblob import TextBlob

# Load the simulated feedback data
feedback_data = pd.read_csv('../data/processed/feedback-segmented-customer-data.csv')

# Perform sentiment analysis using TextBlob
feedback_data['Sentiment'] = feedback_data['Feedback'].apply(lambda x: TextBlob(x).sentiment.polarity)
feedback_data['Sentiment_Label'] = pd.cut(
    feedback_data['Sentiment'], bins=[-1, -0.1, 0.1, 1], labels=['Negative', 'Neutral', 'Positive']
)

# Display the updated feedback data with sentiment scores and labels
print(feedback_data.head())

# Save the feedback data with sentiment labels for further analysis
feedback_data.to_csv('../data/processed/feedback-with-sentiment.csv', index=False)


   CustomerID                                           Feedback  Rating  \
0        4297     The new loyalty program is amazing, thank you!       1   
1        2811     The new loyalty program is amazing, thank you!       3   
2        3412  The product was damaged on arrival, very disap...       4   
3        9964          Not satisfied, I expected better quality.       1   
4       10785  I love the quality of the products, but the de...       1   

   Sentiment Sentiment_Label  
0   0.443182        Positive  
1   0.443182        Positive  
2  -0.975000        Negative  
3   0.050000         Neutral  
4   0.500000        Positive  


## Analyze Sentiment Distribution and Key Insights

To understand overall customer sentiment, analyzing the distribution of sentiment labels and calculating average ratings across sentiment groups.

In [10]:
# Analyze sentiment distribution
sentiment_counts = feedback_data['Sentiment_Label'].value_counts()
print("Sentiment Distribution:\n", sentiment_counts)

# Calculate average rating by sentiment group
avg_rating_by_sentiment = feedback_data.groupby('Sentiment_Label')['Rating'].mean()
print("Average Rating by Sentiment:\n", avg_rating_by_sentiment)

Sentiment Distribution:
 Sentiment_Label
Positive    61
Neutral     20
Negative    19
Name: count, dtype: int64
Average Rating by Sentiment:
 Sentiment_Label
Negative    3.000000
Neutral     2.850000
Positive    2.836066
Name: Rating, dtype: float64


  avg_rating_by_sentiment = feedback_data.groupby('Sentiment_Label')['Rating'].mean()
