## Introduction
#### Business Understanding
SyriaTel Communications, a leading telecommunications company, faces a significant challenge with customer churn, where customers discontinue their services. This project aims to predict and prevent customer churn, providing substantial real-world value for SyriaTel. By addressing customer churn, SyriaTel can:

1. Reduce Financial Losses: Retaining customers helps in maintaining steady revenue streams by avoiding the loss of monthly or yearly payments.
2. Minimize Customer Acquisition Costs: Acquiring new customers is often more expensive than retaining existing ones. By reducing churn, SyriaTel can lower these acquisition costs.
3. Enhance Customer Satisfaction and Loyalty: By understanding and addressing the reasons behind customer churn, SyriaTel can improve customer satisfaction, leading to increased loyalty and long-term engagement.
4. Gain Competitive Advantage: A lower churn rate can position SyriaTel more favorably in the competitive telecommunications market, attracting more customers through positive word-of-mouth and reputation.
### The project’s real-world value is clear:
 - It helps SyriaTel maintain a stable customer base, optimize operational costs, and improve overall customer experience.


## DATA UNDERSTANDING

The data used in this project is sourced from SyriaTel’s customer records and includes various attributes that are crucial for understanding customer behavior and predicting churn. The key data properties and their relevance to the real-world problem of customer churn are as follows:

1. Customer Service Calls:
Source: Customer service logs.
Properties: Frequency and duration of calls to customer service.
Relevance: Frequent calls to customer service may indicate dissatisfaction or unresolved issues, which are potential indicators of churn. Analyzing these patterns helps identify at-risk customers.
2. Usage Patterns:
Source: Usage records from customer accounts.
Properties: Data usage, call duration, and frequency of service use.
Relevance: Understanding how customers use their plans can reveal engagement levels. Low usage might indicate that customers are not finding value in their plans, which could lead to churn.
3. Geographic Data:
Source: Customer address records.
Properties: Geographic location of customers.
Relevance: Certain regions may have higher churn rates due to factors like network coverage, competition, or regional preferences. Identifying these areas allows for targeted retention strategies.
4. Demographic Information:
Source: Customer profiles.
Properties: Age, gender, income level, etc.
Relevance: Demographic factors can influence customer behavior and preferences. Understanding these can help tailor retention efforts to specific customer segments.
- By explicitly relating these data properties to the real-world problem of customer churn, the project can identify key indicators and patterns that contribute to churn. This comprehensive data understanding is crucial for developing an effective predictive model and crafting targeted interventions to retain customers.

## Exploratory Data Analysis (EDA)
In the EDA portion, the following questions were explored to gain insights into customer churn:

1. Customer Service Calls: Is calling customer service a sign of customer unhappiness/potential churn?
2. Usage Patterns: How much are people using their plan? What can this tell us about churn?
3. Geographic Analysis: Are customers in certain areas more likely to churn?
- By addressing these questions, the project aims to uncover patterns and trends that can inform the development of a predictive model for customer churn. This comprehensive analysis helps SyriaTel understand the factors driving churn and develop strategies to mitigate it.

## DATA PREPARATION

1. Import Necessary Libraries

In [11]:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from imblearn.over_sampling import SMOTE
import matplotlib.pyplot as plt
import seaborn as sns


In [18]:
##2. Data Loading
df = pd.read_csv('syriatel_customer_data.csv')
df.head()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


In [40]:
df.tail()

Unnamed: 0,account length,area code,number vmail messages,total day minutes,total day calls,total day charge,total eve minutes,total eve calls,total eve charge,total night minutes,...,phone number_422-5874,phone number_422-6685,phone number_422-6690,phone number_422-7728,phone number_422-8268,phone number_422-8333,phone number_422-8344,phone number_422-9964,international plan_yes,voice mail plan_yes
3328,192,415,36,156.2,77,26.55,215.5,126,18.32,279.1,...,False,False,False,False,False,False,False,False,False,True
3329,68,415,0,231.1,57,39.29,153.4,55,13.04,191.3,...,False,False,False,False,False,False,False,False,False,False
3330,28,510,0,180.8,109,30.74,288.8,58,24.55,191.9,...,False,False,False,False,False,False,False,False,False,False
3331,184,510,0,213.8,105,36.35,159.6,84,13.57,139.2,...,False,False,False,False,False,False,False,False,True,False
3332,74,415,25,234.4,113,39.85,265.9,82,22.6,241.4,...,False,False,False,False,False,False,False,False,False,True


# Data source and properties
The data used in this project is sourced from SyriaTel's customer records and includes various attributes such as customer service call frequency, usage patterns, geographic information, and demographic details. These properties are crucial for understanding customer behavior and predicting churn.

In [30]:
#check data types
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3333 entries, 0 to 3332
Columns: 3401 entries, account length to voice mail plan_yes
dtypes: bool(3385), float64(8), int64(8)
memory usage: 11.2 MB


### Creation of customer_churn.csv to categorise and deal precisely with Customer Churn from the Syriatel Customer Data

In [50]:
import pandas as pd

# Load the original dataset
df = pd.read_csv('syriatel_customer_data.csv')

# Display the first few rows of the dataset to understand its structure
print(df.head())
print(df.info())

# Assuming the dataset contains a column named 'churn' indicating customer churn
# If the column names are different, adjust accordingly

# Save the dataset as customer_churn.csv
df.to_csv('customer_churn.csv', index=False)

print("customer_churn.csv dataset created successfully!")


  state  account length  area code phone number international plan  \
0    KS             128        415     382-4657                 no   
1    OH             107        415     371-7191                 no   
2    NJ             137        415     358-1921                 no   
3    OH              84        408     375-9999                yes   
4    OK              75        415     330-6626                yes   

  voice mail plan  number vmail messages  total day minutes  total day calls  \
0             yes                     25              265.1              110   
1             yes                     26              161.6              123   
2              no                      0              243.4              114   
3              no                      0              299.4               71   
4              no                      0              166.7              113   

   total day charge  ...  total eve calls  total eve charge  \
0             45.07  ...           