<a href="https://colab.research.google.com/github/AR980/py/blob/master/telecome_datausege_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -  **Telecom Usage Analysis**



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Problem Statement**


**BUSINESS PROBLEM OVERVIEW**


The goal of this project is to analyze the telecommunications usage data to uncover patterns, trends, and insights that can help improve customer satisfaction, optimize network performance, and identify opportunities for cost savings.

#### **Define Your Business Objective?**

To leverage detailed analysis of telecommunication usage patterns to enhance service quality, improve network performance, and increase customer satisfaction. This will be achieved by understanding daily and overall usage trends, identifying network performance issues, and detecting anomalies in the data

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import rcParams


import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Importing the dataset
df = pd.read_csv('/content/drive/MyDrive/TechMinds_DS/drive-download-20240602T164632Z-001.zip (Unzipped Files)/phone_data.csv')

### Dataset First View

In [None]:
# Dataset First
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

In [None]:
df.describe(include='all')

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(df[df.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(df.isnull().sum())

In [None]:
# Convert the 'date' column to datetime format
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%y %H:%M')
df.head()

### What did you know about your dataset?

The dataset provided contains records of various telecommunication activities such as data usage, calls, and SMS over a specific period.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe(include='all')

### Variables Description



*  index: A unique identifier for each record in the dataset.
*   date: The date and time when the call, SMS, or data session occurred, in the format dd/mm/yy HH:MM.

*  duration: The duration of the call or data session in seconds. For SMS, this value is typically not applicable (set to 1).
*  item: The type of communication, which can be one of the following:
call
sms
data
voicemail


*   month: The month and year when the communication occurred, in the format yyyy-mm.
*  network_type: The type of network used for the communication, which can be:
    mobile for mobile networks
    landline for landline networks
    data for data sessions
    voicemail for voicemail services
















network_type: The type of network used for the communication, which can be:
mobile for mobile networks
landline for landline networks
data for data sessions
voicemail for voicemail services





### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in df.columns.tolist():
  print("No. of unique values in ",i,"is",df[i].nunique(),".")

In [None]:
for column in df.columns:
    print(f"Value counts for column '{column}':")
    print(df[column].value_counts())
    print("\n")

In [None]:
# Convert 'date' column to datetime
df['date'] = pd.to_datetime(df['date'])

# Extract hour from 'date' column
df['hour'] = df['date'].dt.hour

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:


# Plot peak hours for call, SMS, and data usage
plt.figure(figsize=(10, 6))
sns.countplot(x='hour', hue='item', data=df)
plt.title('Peak Hours for Call, SMS, and Data Usage')
plt.xlabel('Hour of the Day')
plt.ylabel('Count')
plt.legend(title='Item')
plt.xticks(rotation=45)
plt.show()


In [None]:
df.head()

In [None]:
# Plot distribution of call durations across network providers
plt.figure(figsize=(10, 6))
sns.barplot(x='network', y='duration', data=df[df['item'] == 'call'])
plt.title('Distribution of Call Durations Across Network Providers')
plt.xlabel('Network Provider')
plt.ylabel('Call Duration (seconds)')
plt.xticks(rotation=45)
plt.show()


In [None]:
# Group data by date and calculate total data usage
data_usage = df[df['item'] == 'data'].groupby('date')['duration'].sum()

# Plot trends in data usage over time
plt.figure(figsize=(12, 6))
data_usage.plot()
plt.title('Trends in Data Usage Over Time')
plt.xlabel('Date')
plt.ylabel('Data Usage (seconds)')
plt.xticks(rotation=45)
plt.show()


In [None]:
# Calculate the proportion of each type of activity
activity_proportion = df['item'].value_counts(normalize=True) * 100

# Plot the proportion of different types of activities
plt.figure(figsize=(8, 6))
activity_proportion.plot(kind='bar', color='skyblue')
plt.title('Proportion of Different Types of Activities')
plt.xlabel('Activity Type')
plt.ylabel('Proportion (%)')
plt.xticks(rotation=0)
plt.show()


In [None]:
# Plot comparison of call durations between mobile and landline networks
plt.figure(figsize=(10, 6))
sns.barplot(x='network_type', y='duration', data=df[df['item'] == 'call'])
plt.title('Comparison of Call Durations Between Mobile and Landline Networks')
plt.xlabel('Network Type')
plt.ylabel('Call Duration (seconds)')
#plt.xticks(rotation=45)
plt.show()

In [None]:
# Extract month from 'date' column
df['month'] = df['date'].dt.month

# Plot distribution of SMS counts per month
plt.figure(figsize=(10, 6))
sns.countplot(x='month', data=df[df['item'] == 'sms'])
plt.title('Distribution of SMS Counts Per Month')
plt.xlabel('Month')
plt.ylabel('SMS Count')
plt.xticks(rotation=0)
plt.show()


In [None]:
# Plot network usage distribution over the observed period
plt.figure(figsize=(12, 6))
sns.countplot(x='network', data=df)
plt.title('Network Usage Distribution Over the Observed Period')
plt.xlabel('Network')
plt.ylabel('Usage Count')
plt.xticks(rotation=45)
plt.show()


In [None]:
# Calculate the total call duration for each network
network_call_duration = df[df['item'] == 'call'].groupby('network')['duration'].sum().sort_values(ascending=False).head(5)

# Plot the top 5 networks by call duration
plt.figure(figsize=(10, 6))
network_call_duration.plot(kind='bar', color='green')
plt.title('Top 5 Networks by Call Duration')
plt.xlabel('Network')
plt.ylabel('Total Call Duration (seconds)')
plt.xticks(rotation=45)
plt.show()


In [None]:
# Calculate the average call duration for each network type
avg_call_duration_network_type = df[df['item'] == 'call'].groupby('network_type')['duration'].mean()

# Plot the average call duration by network type
plt.figure(figsize=(8, 6))
avg_call_duration_network_type.plot(kind='bar', color='purple')
plt.title('Average Call Duration by Network Type')
plt.xlabel('Network Type')
plt.ylabel('Average Call Duration (seconds)')
plt.xticks(rotation=0)
plt.show()


In [None]:
# Extract day from 'date' column
df['day'] = pd.to_datetime(df['date'], format='%d/%m/%y %H:%M').dt.day

# Plot the number of activities per day
plt.figure(figsize=(12, 6))
sns.countplot(x='day', data=df)
plt.title('Number of Activities per Day')
plt.xlabel('Day')
plt.ylabel('Activity Count')
plt.xticks(rotation=0)
plt.show()
