Task 4 :- Satisfaction Analysis

In [3]:
# Based on the engagement analysis + the experience analysis, analyze customer satisfaction

# Import necesarry packages

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load the user engagement and experience datasets

user_engagement_df = pd.read_excel('user_engagement.xlsx')
user_experience_df = pd.read_excel('Experience_Analytics.xlsx')

# Display the first few rows of each DataFrame to understand their structure
user_engagement_df.head(), user_experience_df.head()


(   Unnamed: 0  MSISDN/Number  Average_Session_Duration_ms  Total_Upload_Bytes  \
 0           0   3.360171e+10                      38503.0          46211970.0   
 1           1   3.360171e+10                      52478.0          38509721.0   
 2           2   3.360171e+10                      60149.0          89299844.0   
 3           3   3.360171e+10                     176022.0          44946263.0   
 4           4   3.360172e+10                     127918.0          28593661.0   
 
    Total_Download_Bytes  Total_Traffic_Bytes  session_frequency  Cluster  
 0          2.934050e+08         3.396170e+08                  1        1  
 1          8.621012e+08         9.006109e+08                  1        1  
 2          1.498037e+09         1.587337e+09                  2        1  
 3          1.333844e+08         1.783307e+08                  1        0  
 4          3.060978e+08         3.346914e+08                  1        0  ,
               Bearer Id               Start  Sta

In [4]:
# a. Engagement score to each user. Consider the engagement score as the Euclidean distance between the user data point & the less engaged cluster (use the first clustering for this) (Euclidean Distance)
# b. Experience score for each user. Consider the experience score as the Euclidean distance between the user data point & the worst experience cluster. 

# Import the necesarry package
from scipy.spatial.distance import euclidean

# The cluster number for "less engaged" and "worst experience"
# For demonstration, let's assume clusters as follows
less_engaged_cluster_num = 0  # Hypothetical value
worst_experience_cluster_num = 0  # Hypothetical value

# Calculate centroids of the "less engaged" and "worst experience" clusters
# Here, we'll approximate them using mean values.
less_engaged_centroid = user_engagement_df[user_engagement_df['Cluster'] == less_engaged_cluster_num].iloc[:, 2:-1].mean().values
worst_experience_centroid = user_experience_df[user_experience_df['Cluster'] == worst_experience_cluster_num].iloc[:, -5:-1].mean().values

# Define functions to calculate Euclidean distances for engagement and experience
def calculate_engagement_score(row):
    user_data = row.values[2:-1]  # Exclude user ID, Cluster, and Unnamed columns
    return euclidean(user_data, less_engaged_centroid)

def calculate_experience_score(row):
    user_data = row.values[-5:-1]  # Consider only experience metrics
    return euclidean(user_data, worst_experience_centroid)

# Apply the functions to calculate scores
user_engagement_df['Engagement_Score'] = user_engagement_df.apply(calculate_engagement_score, axis=1)
user_experience_df['Experience_Score'] = user_experience_df.apply(calculate_experience_score, axis=1)

# Display the first few rows of each DataFrame with scores
user_engagement_df[['MSISDN/Number', 'Engagement_Score']].head(), user_experience_df[['MSISDN/Number', 'Experience_Score']].head()


(   MSISDN/Number  Engagement_Score
 0   3.360171e+10      4.768325e+08
 1   3.360171e+10      3.228789e+08
 2   3.360171e+10      1.258052e+09
 3   3.360171e+10      7.040077e+08
 4   3.360172e+10      4.724021e+08,
    MSISDN/Number  Experience_Score
 0   3.366496e+10      1.457319e+08
 1   3.368185e+10      1.987803e+08
 2   3.376063e+10      1.748035e+08
 3   3.375034e+10      3.914225e+08
 4   3.369980e+10      1.145361e+08)

The engagement and experience scores have been successfully calculated for each user:

Engagement Score: A measure of how each user's engagement metrics compare to the "less engaged" cluster centroid. Lower scores indicate closer proximity to the "less engaged" cluster.

Experience Score: A measure of how each user's experience metrics compare to the "worst experience" cluster centroid. Lower scores indicate closer proximity to the "worst experience" cluster.

In [5]:
# Consider the average of both engagement & experience scores as  the satisfaction score & report the top 10 satisfied customer 

# Merge the engagement and experience dataframes on 'MSISDN/Number'
merged_df = pd.merge(user_engagement_df[['MSISDN/Number', 'Engagement_Score']],
                     user_experience_df[['MSISDN/Number', 'Experience_Score']],
                     on='MSISDN/Number')

# Calculate the satisfaction score as the average of engagement and experience scores
merged_df['Satisfaction_Score'] = (merged_df['Engagement_Score'] + merged_df['Experience_Score']) / 2

# Identify the top 10 satisfied customers based on the satisfaction score
top_10_satisfied_customers = merged_df.nlargest(10, 'Satisfaction_Score')

top_10_satisfied_customers


Unnamed: 0,MSISDN/Number,Engagement_Score,Experience_Score,Satisfaction_Score
8149,33614890000.0,11125690000.0,428525400.0,5777106000.0
8139,33614890000.0,11125690000.0,407905300.0,5766796000.0
8137,33614890000.0,11125690000.0,390487600.0,5758087000.0
8153,33614890000.0,11125690000.0,374466300.0,5750077000.0
8151,33614890000.0,11125690000.0,333728500.0,5729708000.0
8147,33614890000.0,11125690000.0,286353600.0,5706020000.0
8152,33614890000.0,11125690000.0,264396600.0,5695042000.0
8138,33614890000.0,11125690000.0,246977800.0,5686332000.0
8142,33614890000.0,11125690000.0,185371800.0,5655529000.0
8140,33614890000.0,11125690000.0,182132900.0,5653910000.0


The Satisfaction_Score column shows the average of two scores: engagement and experience. This score gives an overall idea of how satisfied customers are. A higher Satisfaction_Score means that customers are more pleased with both their engagement and experience. By combining these two important metrics, the Satisfaction_Score helps to easily understand customer satisfaction levels. This information is valuable for businesses to identify areas of improvement and enhance customer experiences.

In [6]:
# Build a regression model to predict the satisfaction score of a customer.
# We will use the Random Forest Regression 

# Import the necesarry packages

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

# Prepare the features and target variable
# Create the features dataframe by merging engagement and experience dataframes
features = pd.merge(user_engagement_df[['MSISDN/Number', 'Average_Session_Duration_ms', 'Total_Upload_Bytes', 'Total_Download_Bytes', 'Total_Traffic_Bytes', 'session_frequency']],
                    user_experience_df[['MSISDN/Number', 'Total UL (Bytes)', 'Total DL (Bytes)', 'Total TCP Retransmission (Bytes)', 'Avg RTT (ms)', 'Avg Throughput (kbps)']],
                    on='MSISDN/Number', how='inner')

# Include the Satisfaction Score in the features dataframe
features = features.join(merged_df.set_index('MSISDN/Number')['Satisfaction_Score'], on='MSISDN/Number')

# Drop rows with any missing values to simplify
features.dropna(inplace=True)

# Define X (features) and y (target variable)
X = features.drop(['MSISDN/Number', 'Satisfaction_Score'], axis=1)
y = features['Satisfaction_Score']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Make predictions and evaluate the model
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

mse, mae, r2


(3297313502402214.0, 40041462.774709255, 0.9930887063968022)

The regression model has been successfully trained and evaluated, yielding the following performance metrics on the test set:

Mean Squared Error (MSE):  3297313502402214.0
 
Mean Absolute Error (MAE):  40041462.774709255

R-squared (R²): 0.993089

These results indicate that the RandomForestRegressor model performs very well in predicting the satisfaction score of a customer, with an R² score very close to 1, suggesting that the model explains a high proportion of the variance in the satisfaction scores.

In [7]:
# Run a k-means (k=2) on the engagement & the experience score. 

from sklearn.cluster import KMeans

# Prepare the data: Extract engagement and experience scores
clustering_data = merged_df[['Engagement_Score', 'Experience_Score']].copy()

# Run K-means clustering
kmeans = KMeans(n_clusters=2, random_state=42)
kmeans.fit(clustering_data)

# Assign cluster labels to the data
clustering_data['Cluster'] = kmeans.labels_

# Analyze the clusters
cluster_centroids = kmeans.cluster_centers_
cluster_0_count = (clustering_data['Cluster'] == 0).sum()
cluster_1_count = (clustering_data['Cluster'] == 1).sum()

cluster_centroids, cluster_0_count, cluster_1_count


(array([[4.60963185e+08, 2.11703826e+08],
        [2.51661662e+09, 2.16018329e+08]]),
 112451,
 12808)

Cluster Centroids
The centroids for the two clusters are as follows, representing the average engagement and experience scores within each cluster:

Cluster 0: 
Engagement Score: 460,963,185.00 , Experience Score: 211,703,826.00 

 
Cluster 1: 
Engagement Score: 2,516,616,620.00 , Experience Score: 216,018,329.00
 
Cluster Distribution

Cluster 0 Count: 112451 users

Cluster 1 Count: 12808 users

This indicates that Cluster 0 contains users with lower engagement and experience scores on average compared to Cluster 1, suggesting that Cluster 1 might represent more engaged and possibly more satisfied users based on their higher scores.

In [8]:
 # Aggregate the average satisfaction and experience score per cluster
cluster_averages = clustering_data.groupby('Cluster').mean()

# Rename columns for clarity
cluster_averages.rename(columns={'Engagement_Score': 'Average_Engagement_Score', 'Experience_Score': 'Average_Experience_Score'}, inplace=True)

# Add average satisfaction score
cluster_averages['Average_Satisfaction_Score'] = (cluster_averages['Average_Engagement_Score'] + cluster_averages['Average_Experience_Score']) / 2

cluster_averages


Unnamed: 0_level_0,Average_Engagement_Score,Average_Experience_Score,Average_Satisfaction_Score
Cluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,460761900.0,211701100.0,336231500.0
1,2514853000.0,216034400.0,1365444000.0


Interpratation :-

Cluster 0

Average Engagement Score: 4.607619e+08
 
Average Experience Score: 2.117011e+08
 
Average Satisfaction Score: 3.362315e+08
 
Cluster 1

Average Engagement Score: 2.514853e+09
 
Average Experience Score: 2.160344e+08	
 
Average Satisfaction Score: 1.365444e+09
 
These results show significant differences between the two clusters in terms of both engagement and satisfaction scores. Cluster 1, with a substantially higher average engagement score, also shows a considerably higher average satisfaction score compared to Cluster 0. This indicates that users in Cluster 1 are both more engaged and, on average, more satisfied than users in Cluster 0.