<p style="font-family: 'Amiri'; font-size: 3rem; color: #755139FF; text-align: center; margin: 0; text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.3); background-color: #F2EDD7FF; padding: 5px; border-radius: 15px; border: 7px solid #755139FF; width: 95%; line-height: 1;">˚About Author˚</p>


# **Arshman Khalid**  
<p style="font-size: 1.5rem; font-weight: bold;">Data Scientist | Software Engineer | ex Consultant PwC | ex Senior Data Analyst Fortune 500</p>

With over 5 years of expertise in data science and software engineering, I am dedicated to transforming complex data into actionable insights. My focus lies in predictive analytics, data strategy, and the implementation of robust machine learning models that drive measurable business outcomes. I have a track record of optimizing operations, reducing costs, and improving decision-making processes across industries. Proficient in Python, Alteryx, Power BI, and cloud platforms.

When I am not wrangling datasets, you will find me attempting to code my way to the perfect cup of coffee!


<div style="text-align: left; font-family: Arial, sans-serif;">
    <table style="border-collapse: collapse; width: 100%;">
        <tr>
            <th style="background-color: #f2f2f2; padding: 10px; border: 1px solid #ddd;"><b>Attribute</b></th>
            <th style="background-color: #f2f2f2; padding: 10px; border: 1px solid #ddd;"><b>Details</b></th>
        </tr>
        <tr>
            <td style="padding: 10px; border: 1px solid #ddd;"><b>GitHub</b></td>
            <td style="padding: 10px; border: 1px solid #ddd;">
                <a href="https://github.com" style="text-decoration: none; color: white;">
                    <span style="background-color: #333; padding: 5px 10px; border-radius: 5px;">GitHub Profile</span>
                </a>
            </td>
        </tr>
        <tr>
            <td style="padding: 10px; border: 1px solid #ddd;"><b>LinkedIn</b></td>
            <td style="padding: 10px; border: 1px solid #ddd;">
                <a href="https://www.linkedin.com" style="text-decoration: none; color: white;">
                    <span style="background-color: #0077B5; padding: 5px 10px; border-radius: 5px;">LinkedIn Profile</span>
                </a>
            </td>
        </tr>
        <tr>
            <td style="padding: 10px; border: 1px solid #ddd;"><b>Kaggle</b></td>
            <td style="padding: 10px; border: 1px solid #ddd;">
                <a href="https://www.kaggle.com" style="text-decoration: none; color: white;">
                    <span style="background-color: #20BEFF; padding: 5px 10px; border-radius: 5px;">Kaggle Profile</span>
                </a>
            </td>
        </tr>
        <tr>
            <td style="padding: 10px; border: 1px solid #ddd;"><b>Twitter</b></td>
            <td style="padding: 10px; border: 1px solid #ddd;">
                <a href="https://twitter.com" style="text-decoration: none; color: white;">
                    <span style="background-color: #1DA1F2; padding: 5px 10px; border-radius: 5px;">Twitter Profile</span>
                </a>
            </td>
        </tr>
    </table>
</div>

<p style="font-family: 'Amiri'; font-size: 3rem; color: #755139FF; text-align: center; margin: 0; text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.3); background-color: #F2EDD7FF; padding: 5px; border-radius: 15px; border: 7px solid #755139FF; width: 95%; line-height: 1;">˚Objectives˚</p>


# **This notebook covers:**


- **Usage Patterns by Device & OS:**
  - Compare app usage time, screen-on time, and battery drain across different devices (e.g., Xiaomi vs. iPhone) and OS (Android vs. iOS).
  - Visualize key differences using charts.
  

- **App Usage vs. Battery Drain:**
  - Analyze the correlation between app usage and battery consumption.
  - Use heatmaps and scatter plots to illustrate patterns.
  

- **Demographic Impact:**
  - Explore how age and gender influence mobile usage trends.
  - Visualize variations in screen-on time and app usage with bar plots.
  

- **Data Consumption Analysis:**
  - Investigate daily data usage across user types and devices.
  - Highlight heavy data consumers with visual breakdowns.


# <p style="font-family: 'Amiri'; font-size: 3rem; color: #755139FF; text-align: center; margin: 0; text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.3); background-color: #F2EDD7FF; padding: 20px; border-radius: 20px; border: 7px solid #755139FF; width:95%">˚Load Libraries˚</p>

In [None]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

# <p style="font-family: 'Amiri'; font-size: 3rem; color: #755139FF; text-align: center; margin: 0; text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.3); background-color: #F2EDD7FF; padding: 20px; border-radius: 20px; border: 7px solid #755139FF; width:95%">˚Load Dataset˚</p>

In [None]:
df = pd.read_csv('/kaggle/input/mobile-device-usage-and-user-behavior-dataset/user_behavior_dataset.csv')

In [None]:
df.head()

# <p style="font-family: 'Amiri'; font-size: 3rem; color: #755139FF; text-align: center; margin: 0; text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.3); background-color: #F2EDD7FF; padding: 20px; border-radius: 20px; border: 7px solid #755139FF; width:95%">˚1 || Exploratory Data Analysis˚</p>

In [None]:
df.info()

In [None]:
df.isnull().sum()

In [None]:
df.duplicated().sum()

In [None]:
df.columns

# <p style="font-family: 'Amiri'; font-size: 3rem; color: #755139FF; text-align: center; margin: 0; text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.3); background-color: #F2EDD7FF; padding: 20px; border-radius: 20px; border: 7px solid #755139FF; width:95%">˚1.1 || Data Understanding and Preparation˚</p>

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Set Seaborn theme for a polished look
sns.set_theme(style="whitegrid")

# Group by Device Model and Operating System to calculate mean values of key metrics
device_os_usage = df.groupby(['Device Model', 'Operating System'])[['App Usage Time (min/day)', 'Screen On Time (hours/day)', 'Battery Drain (mAh/day)']].mean().reset_index()

# Create a figure with a larger, custom aspect ratio
plt.figure(figsize=(15, 6))

# App Usage Time by Device and OS
plt.subplot(1, 3, 1)
sns.barplot(x='Device Model', y='App Usage Time (min/day)', hue='Operating System', data=device_os_usage, palette="Set2")
plt.title('App Usage Time by Device and OS', fontsize=14, fontweight='bold')
plt.xticks(rotation=90, fontsize=10)
plt.yticks(fontsize=10)
plt.xlabel('Device Model', fontsize=12)
plt.ylabel('App Usage Time (min/day)', fontsize=12)
plt.grid(True)
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))  # Legend on the right

# Screen On Time by Device and OS
plt.subplot(1, 3, 2)
sns.barplot(x='Device Model', y='Screen On Time (hours/day)', hue='Operating System', data=device_os_usage, palette="Set2")
plt.title('Screen On Time by Device and OS', fontsize=14, fontweight='bold')
plt.xticks(rotation=90, fontsize=10)
plt.yticks(fontsize=10)
plt.xlabel('Device Model', fontsize=12)
plt.ylabel('Screen On Time (hours/day)', fontsize=12)
plt.grid(True)
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))  # Legend on the right

# Battery Drain by Device and OS
plt.subplot(1, 3, 3)
sns.barplot(x='Device Model', y='Battery Drain (mAh/day)', hue='Operating System', data=device_os_usage, palette="Set2")
plt.title('Battery Drain by Device and OS', fontsize=14, fontweight='bold')
plt.xticks(rotation=90, fontsize=10)
plt.yticks(fontsize=10)
plt.xlabel('Device Model', fontsize=12)
plt.ylabel('Battery Drain (mAh/day)', fontsize=12)
plt.grid(True)
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))  # Legend on the right

# Adjust layout for better spacing
plt.tight_layout()

# Show the plot
plt.show()


# <p style="font-family: 'Amiri'; font-size: 3rem; color: #755139FF; text-align: center; margin: 0; text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.3); background-color: #F2EDD7FF; padding: 20px; border-radius: 20px; border: 7px solid #755139FF; width:95%">˚1.2 || Correlation Analysis and Visualization˚</p>

In [None]:
# Select the relevant columns for correlation
correlation_df = df[['App Usage Time (min/day)', 'Screen On Time (hours/day)', 'Battery Drain (mAh/day)']]

# Calculate the correlation matrix
correlation_matrix = correlation_df.corr()

# Plot a heatmap to visualize the correlation matrix
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5, linecolor='black')
plt.title('Correlation Heatmap: App Usage, Screen On Time, and Battery Drain')
plt.show()



In [None]:
# Scatter plot: App Usage Time vs Battery Drain
plt.figure(figsize=(8, 6))
sns.scatterplot(x='App Usage Time (min/day)', y='Battery Drain (mAh/day)', data=df)
plt.title('App Usage Time vs Battery Drain')
plt.xlabel('App Usage Time (min/day)')
plt.ylabel('Battery Drain (mAh/day)')
plt.grid(True, which='both', axis='both', linestyle='--', color='gray')

plt.show()



In [None]:
# Scatter plot: Screen On Time vs Battery Drain
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Screen On Time (hours/day)', y='Battery Drain (mAh/day)', data=df)
plt.title('Screen On Time vs Battery Drain')
plt.xlabel('Screen On Time (hours/day)')
plt.ylabel('Battery Drain (mAh/day)')
plt.grid(True, which='both', axis='both', linestyle='--', color='gray')

plt.show()

# <p style="font-family: 'Amiri'; font-size: 3rem; color: #755139FF; text-align: center; margin: 0; text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.3); background-color: #F2EDD7FF; padding: 20px; border-radius: 20px; border: 7px solid #755139FF; width:95%">˚1.3 || Demographic Analysis and Visualization˚</p>

In [None]:
# Create age groups for analysis
bins = [0, 20, 30, 40, 50, 60, 100]
labels = ['<20', '20-29', '30-39', '40-49', '50-59', '60+']
df['Age Group'] = pd.cut(df['Age'], bins=bins, labels=labels)

# Calculate the mean values of app usage and screen-on time based on gender and age groups
age_gender_usage = df.groupby(['Age Group', 'Gender'])[['App Usage Time (min/day)', 'Screen On Time (hours/day)']].mean().reset_index()

# Visualization: Bar plot of app usage time by gender and age group
plt.figure(figsize=(12, 6))

# Set color palette for gender
sns.barplot(x='Age Group', y='App Usage Time (min/day)', hue='Gender', data=age_gender_usage, palette='Set1')

# Customizing the plot
plt.title('App Usage Time by Gender and Age Group', fontsize=16, fontweight='bold', color='#333')
plt.xlabel('Age Group', fontsize=14, color='#333')
plt.ylabel('App Usage Time (min/day)', fontsize=14, color='#333')
plt.xticks(rotation=45, fontsize=12, color='#333')
plt.yticks(fontsize=12, color='#333')
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend(title='Gender', title_fontsize='13', loc='upper right')

# Tight layout to avoid overlap
plt.tight_layout()

# Show plot
plt.show()


In [None]:
# Visualization 2: Bar plot of screen-on time by gender and age group
plt.figure(figsize=(12, 6))

# Set color palette for gender
sns.barplot(x='Age Group', y='Screen On Time (hours/day)', hue='Gender', data=age_gender_usage, palette='Set1')

# Customizing the plot
plt.title('Screen On Time by Gender and Age Group', fontsize=16, fontweight='bold', color='#333')
plt.xlabel('Age Group', fontsize=14, color='#333')
plt.ylabel('Screen On Time (hours/day)', fontsize=14, color='#333')
plt.xticks(rotation=45, fontsize=12, color='#333')
plt.yticks(fontsize=12, color='#333')
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend(title='Gender', title_fontsize='13', loc='upper right')

# Tight layout to avoid overlap
plt.tight_layout()

# Show plot
plt.show()


In [None]:
# Optional: Gender comparison for app usage and screen-on time
plt.figure(figsize=(12, 6))

# App Usage Time by Gender
plt.subplot(1, 2, 1)
sns.barplot(x='Gender', y='App Usage Time (min/day)', data=df)
plt.title('App Usage Time by Gender')
plt.grid(True)

# Screen On Time by Gender
plt.subplot(1, 2, 2)
sns.barplot(x='Gender', y='Screen On Time (hours/day)', data=df)
plt.title('Screen On Time by Gender')
plt.grid(True)
plt.tight_layout()
plt.show()

# <p style="font-family: 'Amiri'; font-size: 3rem; color: #755139FF; text-align: center; margin: 0; text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.3); background-color: #F2EDD7FF; padding: 20px; border-radius: 20px; border: 7px solid #755139FF; width:95%">˚1.4 || Data Consumption Analysis˚</p>

In [None]:
# Group by User Behavior Class and Device Model to calculate the mean data usage
data_usage_class_device = df.groupby(['User Behavior Class', 'Device Model'])['Data Usage (MB/day)'].mean().reset_index()

# Visualization 1: Bar plot of average data usage by user behavior class
plt.figure(figsize=(10, 6))
sns.barplot(x='User Behavior Class', y='Data Usage (MB/day)', data=df, estimator='mean')
plt.title('Average Daily Data Usage by User Behavior Class')
plt.xlabel('User Behavior Class')
plt.ylabel('Data Usage (MB/day)')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()

In [None]:
# Visualization 2: Bar plot of average data usage by device model
plt.figure(figsize=(12, 6))

# Set color palette for user behavior class
sns.barplot(x='Device Model', y='Data Usage (MB/day)', data=data_usage_class_device, hue='User Behavior Class', palette='Set2')

# Customizing the plot
plt.title('Average Daily Data Usage by Device Model and User Behavior Class', fontsize=16, fontweight='bold', color='#333')
plt.xlabel('Device Model', fontsize=14, color='#333')
plt.ylabel('Data Usage (MB/day)', fontsize=14, color='#333')
plt.xticks(rotation=0, fontsize=12, color='#333')
plt.yticks(fontsize=12, color='#333')
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend(title='User Behavior Class', title_fontsize='13', loc='upper right')

# Tight layout to avoid overlap
plt.tight_layout()

# Show plot
plt.show()


In [None]:
df['Device Model'].unique()

# <p style="font-family: 'Amiri'; font-size: 3rem; color: #755139FF; text-align: center; margin: 0; text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.3); background-color: #F2EDD7FF; padding: 20px; border-radius: 20px; border: 7px solid #755139FF; width:95%">˚1.5 || Conclusion˚</p>

**Mobile Usage Patterns by Device and OS:**

- iOS users show the highest values across all metrics, including Screen On Time, Battery Drain, and App Usage Time, compared to Android devices. This suggests that iOS users tend to use their devices more intensively.

**Correlation Between App Usage Time, Screen-On Time, and Battery Drain:**

- There is a strong correlation between App Usage Time, Screen-On Time, and Battery Drain:
   - App Usage Time and Battery Drain have a correlation of 0.96.
   - Screen-On Time and Battery Drain are also highly correlated, with a value of 0.95.
- This shows that users with high app usage times generally have higher screen-on times and greater battery drain.

**Demographic Analysis (Gender and Age Impact on Usage):**

- App Usage Time and Screen-On Time are similar across genders.
- However, when analyzed by age:
     - Males below 20 years old tend to have the highest app usage time and screen-on time.
     - Females in the 30-39 age group show the highest app usage and screen-on time.
     
**Data Consumption Analysis:**

- Average Daily Data Usage by User Behavior Class:
     - Users in Class 5 (extreme usage) consume the most data, while Class 1 (light usage) consumes the least. The data usage trend follows this pattern: 1 < 2 < 3 < 4 < 5.
- Average Daily Data Usage by Device Model and User Behavior Class:
     - Samsung Galaxy S21 and Xiaomi Mi 11 show the highest data usage compared to other devices, with Class 5 users leading in data consumption across these models.
     
These insights highlight significant usage behavior differences across devices, user classes, genders, and age groups