# **TELECOM CUSTOMER CHURN PREDICTION**

### Customer Churn

Customer churn occurs when customers stop using a company’s service. In the telecom industry, customers frequently switch providers, resulting in high annual churn rates of 15–25%. Since retaining existing customers costs less than acquiring new ones, companies focus on predicting which customers are likely to leave. By analyzing customer behavior and interactions, businesses can target high-risk customers with effective retention strategies, helping reduce losses and increase profitability.

### Objectives
I will explore the data and try to answer some questions like:

- What's the % of Churn Customers and customers that keep in with the active services?
- Is there any patterns in Churn Customers based on the gender?
- Is there any patterns/preference in Churn Customers based on the type of service provided?
- What's the most profitable service types?
- Which features and services are most profitable?
- Many more questions that will arise during the analysis

## Importing Libraries and Importing Dataset

### Importing Libraries

Loading all necessary libraries

In [None]:
import pandas as pd
import numpy as np
import missingno as msno
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_auc_score

### Importing Dataset

Importing the dataset from Kaggle and converting it into a pandas DataFrame for analysis and preprocessing.

## ***For now using csv because of incorrrect data importing***

In [None]:
df = pd.read_csv('telco_dataset.csv')

# Exploratory Data Analysis (EDA)

In EDA, we will explore and understand the dataset by analyzing patterns, distributions, relationships, and detecting any missing or unusual values before building models.

In [None]:
# Loading top 5 rows
df.head()

In [None]:
# Loading bottom 5 rows
df.tail()

The data set includes information about:

- Customers who left within the last month – the column is called Churn

- Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies

- Customer account information - how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges

- Demographic info about customers – gender, age range, and if they have partners and dependents

In [None]:
# Dimention of Dataset
df.shape

In [None]:
# Info about the dataset
df.info()

In [None]:
# Array of all columns in Dataset
df.columns.values

In [None]:
# Showing all column Types
df.dtypes

-> The target the we will use to guide the exploration is Churn

## Visualize missing values

In [None]:
# Visualize missing values as a matrix
msno.matrix(df);

Using this matrix we can very quickly find the pattern of missingness in the dataset.

- From the above visualisation we can observe that it has no peculiar pattern that stands out. In fact there is no missing data.

## Data Manipulation