## Description

Expresso is an African telecommunications company that provides customers with airtime and mobile data bundles. The objective of this challenge is to develop a machine learning model to predict the likelihood of each Expresso customer “churning,” i.e. becoming inactive and not making any transactions for 90 days.

This solution will help Expresso to better serve their customers by understanding which customers are at risk of leaving.

#### Variables definitions:
- user id
- REGION - the location of each client
- TENURE - duration in the network
- MONTANT - top-up amount
- FREQUENCE_RECH - number of times the customer refilled
- REVENUE - monthly income of each client
- ARPU_SEGMENT - income over 90 days / 3
- FREQUENCE - number of times the client has made an income
- DATA_VOLUME - number of connections
- ON_NET - inter expresso call
- ORANGE - call to orange
- TIGO - call to Tigo
- ZONE1 - call to zones1
- ZONE2 - call to zones2
- MRG - a client who is going
- REGULARITY - number of times the client is active for 90 days
- TOP_PACK	 - the most active packs
- FREQ_TOP_PACK- number of times the client  has activated the top pack packages
- CHURN - variable to predict - Target

In [154]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 

In [None]:
data_train = pd.read_csv('Train.csv')
#data_test = pd.read_csv('Test.csv')

In [None]:
data_train.shape

In [None]:
data_train.columns

In [None]:
#data_test.shape

In [None]:
data_train.head()

In [None]:
data_train.count()

In [None]:
col = ['REGION', 'MONTANT', 'FREQUENCE_RECH', 'REVENUE',
       'ARPU_SEGMENT', 'FREQUENCE', 'DATA_VOLUME', 'ON_NET', 'ORANGE', 'TIGO',
       'ZONE1', 'ZONE2', 'TOP_PACK'] 

for i in col:
   if i == 'REGION':
     cond = data_train[i].isna()
     mask = data_train[cond]
   else:
     cond = mask[i].isna()
     mask = mask[cond]  

mask.head()

In [None]:
mask.count()

In [None]:
mask['CHURN'].value_counts()

In [None]:
data_train.describe()

### REGION

In [None]:
data_train['REGION'].unique()

In [None]:
sns.histplot(data=data_train['REGION'], bins=14)
plt.xticks(rotation=90)

### TENURE

In [None]:
data_train['TENURE'].value_counts()

In [None]:
sns.histplot(data=data_train['TENURE'], bins=14)
plt.xticks(rotation=90)

### MONTANT, FREQUENCE_RECH

In [None]:
sns.pairplot(data_train, 
             vars = ['MONTANT', 'FREQUENCE_RECH', 'REGULARITY'],
             hue = 'CHURN',
             kind = 'scatter',
             plot_kws=dict(alpha=0.3))

### REVENUE, ARPU_SEGMENT, FREQUENCE

In [None]:
sns.pairplot(data_train, 
             vars = ['REVENUE', 'ARPU_SEGMENT', 'FREQUENCE', 'REGULARITY'],
             hue = 'CHURN',
             kind = 'scatter',
             plot_kws=dict(alpha=0.3))

### DATA_VOLUME

### MRG

In [None]:
print(data_train['MRG'].nunique(), data_test['MRG'].nunique())

In [None]:
# we can  drop MRG column both in train and test sets

data_train.drop(['MRG'], axis = 1, inplace=True)
data_test.drop(['MRG'], axis = 1, inplace=True)

### REGULARITY

In [None]:
# What is the regularity distribution?

sns.histplot(data=data_train['REGULARITY'], bins=(data_train['REGULARITY'].max()))

In [None]:
# What is the regularity of those who became inactive?

data_train.groupby(['CHURN'])['REGULARITY'].mean()

In [None]:
# How often on average users with different favorite package use the service? 

data_train.groupby(['TOP_PACK'])['REGULARITY'].mean().sort_values(ascending = False).head(15)

### TOP_PACK

In [None]:
data_train['TOP_PACK'].value_counts().nunique()

In [None]:
# TOP 20 packs
data_train['TOP_PACK'].value_counts().head(20)

In [None]:
# Users with which favorite packages became inactive?

data_train.groupby(['CHURN'])['TOP_PACK'].value_counts()