# User Behavior Classification and Battery Optimization

This project focuses on analyzing mobile user behavior by applying machine learning techniques. We use clustering to segment users, a classification model to predict user behavior classes, and offer personalized battery optimization recommendations.

## Goals:
- Segment users based on app usage, screen time, and battery drain.
- Build a classifier to predict user behavior classes.
- Provide practical battery optimization tips based on the data.

In [None]:
import numpy as np
import pandas as pd

In [None]:
df = pd.read_csv("user_behavior_dataset.csv")

In [None]:
df.head()

In [None]:
df.isnull().sum()

## Clustering Analysis

In this section, we use K-Means clustering to group users into distinct segments based on their mobile usage patterns. The features considered are app usage time, screen on time, battery drain, and data usage. The elbow method helps determine the optimal number of clusters.

In [None]:
# Selecting features relevant for clustering
features = ['App Usage Time (min/day)', 'Screen On Time (hours/day)', 
            'Battery Drain (mAh/day)', 'Data Usage (MB/day)', 'Age']

# Check for any missing values
missing_data = df[features].isnull().sum()

# If missing values exist, fill them (e.g., with the mean or median)
data_cleaned = df[features].fillna(df[features].mean())

# Standardize the data (scaling it so all features have equal importance)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data_cleaned)

# Display the cleaned and scaled data
data_cleaned.head()

In [None]:
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Elbow method to find optimal number of clusters
wcss = []  # Within-cluster sum of squares

for i in range(1, 10):
    kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=500, n_init=10, random_state=0)
    kmeans.fit(data_scaled)
    wcss.append(kmeans.inertia_)

# Plotting the elbow curve
plt.plot(range(1, 10), wcss)
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

## Classification Model

We build a Random Forest Classifier to predict the user behavior class based on app usage and battery drain patterns. The data is split into training and testing sets to evaluate model performance. The classifier achieves 100% accuracy, indicating strong predictive power for this dataset.

In [None]:
# Applying KMeans with the chosen number of clusters
kmeans = KMeans(n_clusters=3, init='k-means++', max_iter=300, n_init=10, random_state=0)
clusters = kmeans.fit_predict(data_scaled)

# Add the cluster labels to the original dataframe
df['Cluster'] = clusters

# Display the first few rows with clusters assigned
df[['User ID', 'Cluster']].head()

In [None]:
import seaborn as sns

# Visualizing the clusters using a scatter plot for two features
plt.figure(figsize=(10, 6))
sns.scatterplot(x=df['App Usage Time (min/day)'], y=df['Screen On Time (hours/day)'], 
                hue=df['Cluster'], palette='viridis')
plt.title('User Segmentation Based on App Usage and Screen Time')
plt.xlabel('App Usage Time (min/day)')
plt.ylabel('Screen On Time (hours/day)')
plt.show()

In [None]:
# Selecting only numeric columns for aggregation
numeric_columns = df.select_dtypes(include='number').columns

# Grouping by 'Cluster' and calculating mean values for the numeric features
cluster_profiles = df.groupby('Cluster')[numeric_columns].mean()

# Display the cluster profiles to analyze the behaviors of each group
cluster_profiles

In [None]:
# Grouping by 'Cluster' and getting the most frequent (mode) non-numeric value
non_numeric_columns = df.select_dtypes(exclude='number').columns

# Apply mode to non-numeric columns
cluster_modes = df.groupby('Cluster')[non_numeric_columns].agg(lambda x: x.mode()[0])

# Display both numeric means and non-numeric modes
cluster_summary = pd.concat([cluster_profiles, cluster_modes], axis=1)

# Show the final cluster profiles with numeric and non-numeric info
cluster_summary

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Pairplot to show relationships between features for each cluster
sns.pairplot(df, hue='Cluster', vars=numeric_columns, palette='viridis')
plt.title('Pair Plot of Features Colored by Clusters')
plt.show()

In [None]:
plt.figure(figsize=(8, 6))
sns.violinplot(x='Cluster', y='App Usage Time (min/day)', data=df, hue='Cluster', palette='viridis', legend=False)
plt.title('App Usage Time Distribution Across Clusters')
plt.show()

In [None]:
# Box plot to compare Screen On Time across clusters
plt.figure(figsize=(8, 6))
sns.boxplot(x='Cluster', y='Screen On Time (hours/day)', data=df, hue='Cluster', palette='viridis', legend=False)
plt.title('Screen On Time Comparison Across Clusters')
plt.show()

In [None]:
from mpl_toolkits.mplot3d import Axes3D

# 3D Scatter plot for 3 main features
fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')

# Scatter plot using 3 features to visualize the clusters in 3D
ax.scatter(df['App Usage Time (min/day)'], df['Screen On Time (hours/day)'], df['Battery Drain (mAh/day)'], 
           c=df['Cluster'], cmap='viridis', s=50)

ax.set_xlabel('App Usage Time')
ax.set_ylabel('Screen On Time')
ax.set_zlabel('Battery Drain')
plt.title('3D Scatter Plot of Clusters')
plt.show()

## Battery Optimization Recommendations

Based on the battery drain levels of different users, we provide personalized recommendations to help users reduce battery drain. Users are segmented into low, medium, and high battery drain levels, and suggestions are made for each group on how to improve their battery life by adjusting screen time or data usage.

In [None]:
# Prepare the features and target variable for prediction
X = df[['App Usage Time (min/day)', 'Screen On Time (hours/day)', 
          'Battery Drain (mAh/day)', 'Data Usage (MB/day)', 'Age']]
y = df['User Behavior Class']

# Train/test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Check the shape of the split data
X_train.shape, X_test.shape

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Initialize the Random Forest Classifier
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model
rf_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = rf_model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.4f}')

# Detailed classification report
print("Classification Report:")
print(classification_report(y_test, y_pred))

# Confusion Matrix
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

In [None]:
from sklearn.model_selection import cross_val_score

# Perform 5-fold cross-validation
cv_scores = cross_val_score(rf_model, X, y, cv=5)
print(f'Cross-Validation Accuracy Scores: {cv_scores}')
print(f'Mean CV Accuracy: {cv_scores.mean():.4f}')

In [None]:
# Get feature importances from the RandomForest model
importances = rf_model.feature_importances_
feature_names = X.columns

# Display the feature importances
for feature, importance in zip(feature_names, importances):
    print(f'{feature}: {importance:.4f}')

In [None]:
# Step 1: Segment users by battery drain (low, medium, high)
# Define thresholds for battery drain levels (these can be adjusted)
bins = [0, 1000, 2000, 3000]  # Low: 0-1000, Medium: 1000-2000, High: 2000-3000
labels = ['Low Drain', 'Medium Drain', 'High Drain']
df['Battery Drain Level'] = pd.cut(df['Battery Drain (mAh/day)'], bins=bins, labels=labels)

# Step 2: Create personalized recommendations for each battery drain level
def recommend_battery_optimization(row):
    if row['Battery Drain Level'] == 'Low Drain':
        return 'Your battery usage is great! No major changes needed.'
    elif row['Battery Drain Level'] == 'Medium Drain':
        return 'Consider reducing app usage or screen on time to optimize battery life.'
    else:
        return 'High battery drain detected! Try reducing data usage and limiting screen time.'

# Apply the recommendation function to the dataset
df['Battery Optimization Recommendation'] = df.apply(recommend_battery_optimization, axis=1)

# Display sample recommendations
df[['Battery Drain (mAh/day)', 'Battery Drain Level', 'Battery Optimization Recommendation']].head()

## Conclusion

This project successfully segments users based on their mobile usage patterns, accurately predicts user behavior classes, and offers practical tips for optimizing battery life. Future work could explore additional machine learning models or refine the recommendations based on more granular user data.