<h1 style="text-align:center">Telecom Customer Churn</h1>

## Part 1: Data Exploration and Preparation 
1. Exploring the datasets
2. Creating a database tables form the datasets
3. Joining the tables in database
4. Preparing data for further analysis
5. Writing the final dataframe to a .csv file

---
### 1. Reading CSV files with pandas

In [1]:
# importing required library
import pandas as pd

In [2]:
# importing customer.csv file
customer = pd.read_csv("customer.csv")
customer.head()

Unnamed: 0,CustomerID,Gender,SeniorCitizen,Partner,Dependents
0,3668-QPYBK,Male,No,No,No
1,9237-HQITU,Female,No,No,Yes
2,9305-CDSKC,Female,No,No,Yes
3,7892-POOKP,Female,No,Yes,Yes
4,0280-XJGEX,Male,No,No,Yes


In [3]:
# validating the datatypes of customer dataframe
print(customer.dtypes)

CustomerID       object
Gender           object
SeniorCitizen    object
Partner          object
Dependents       object
dtype: object


In [4]:
# importing cust_loc.csv file
cust_loc = pd.read_csv("cust_loc.csv")
cust_loc.head()

Unnamed: 0,Cust_ID,State,Latitude,Longitude,ZipCode
0,3668-QPYBK,California,33.964131,-118.272783,90003
1,9237-HQITU,California,34.059281,-118.30742,90005
2,9305-CDSKC,California,34.048013,-118.293953,90006
3,7892-POOKP,California,34.062125,-118.315709,90010
4,0280-XJGEX,California,34.039224,-118.266293,90015


In [5]:
# validating the datatypes of cust_loc dataframe
print(cust_loc.dtypes)

Cust_ID       object
State         object
Latitude     float64
Longitude    float64
ZipCode        int64
dtype: object


In [6]:
# importing the cust_services.csv file
cust_services = pd.read_csv("cust_services.csv")
cust_services.head()

Unnamed: 0,Cust_ID,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies
0,3668-QPYBK,Yes,No,DSL,Yes,Yes,No,No,No,No
1,9237-HQITU,Yes,No,Fiber optic,No,No,No,No,No,No
2,9305-CDSKC,Yes,Yes,Fiber optic,No,No,Yes,No,Yes,Yes
3,7892-POOKP,Yes,Yes,Fiber optic,No,No,Yes,Yes,Yes,Yes
4,0280-XJGEX,Yes,Yes,Fiber optic,No,Yes,Yes,No,Yes,Yes


In [7]:
# validating the datatypes of cust_services dataframe
print(cust_services.dtypes)

Cust_ID             object
PhoneService        object
MultipleLines       object
InternetService     object
OnlineSecurity      object
OnlineBackup        object
DeviceProtection    object
TechSupport         object
StreamingTV         object
StreamingMovies     object
dtype: object


In [8]:
# importing the cust_account.csv file
cust_account = pd.read_csv("cust_account.csv")
cust_account.head()

Unnamed: 0,Account_id,Tenure,Contract,PaymentMethod,PaperlessBilling,MonthlyCharges,TotalCharges
0,3668-QPYBK,2,Month-to-month,Mailed0check,Yes,53.85,108.15
1,9237-HQITU,2,Month-to-month,Electronic0check,Yes,70.7,151.65
2,9305-CDSKC,8,Month-to-month,Electronic0check,Yes,99.65,820.5
3,7892-POOKP,28,Month-to-month,Electronic0check,Yes,104.8,3046.05
4,0280-XJGEX,49,Month-to-month,Bank0transfer0(automatic),Yes,103.7,5036.3


In [9]:
# validating the datatypes of cust_account dataframe
print(cust_account.dtypes)

Account_id           object
Tenure                int64
Contract             object
PaymentMethod        object
PaperlessBilling     object
MonthlyCharges      float64
TotalCharges        float64
dtype: object


In [10]:
# importing the cust_churn.csv file
cust_churn = pd.read_csv("cust_churn.csv")
cust_churn.head()

Unnamed: 0,Id,Churn
0,3668-QPYBK,Yes
1,9237-HQITU,Yes
2,9305-CDSKC,Yes
3,7892-POOKP,Yes
4,0280-XJGEX,Yes


In [11]:
# validating the datatypes of cust_churn dataframe
print(cust_churn.dtypes)

Id       object
Churn    object
dtype: object


---
### 2. Creating the SQLite database and tables

In [12]:
# imporing required library
import sqlite3

In [13]:
# creating a connection to create a new database
conn = sqlite3.connect("customer_churn.db")

In [14]:
# creating a cursor
cur = conn.cursor()

In [15]:
# creating the customer table in database
customer.to_sql("customer", conn, if_exists="replace", index=False)

7043

In [16]:
# creating the customer location table in database
cust_loc.to_sql("cust_loc", conn, if_exists="replace", index=False)

7043

In [17]:
# creating the customer services table in database
cust_services.to_sql("cust_services", conn, if_exists="replace", index=False)

7043

In [18]:
# creating the customer account table in database
cust_account.to_sql("cust_account", conn, if_exists="replace", index=False)

7043

In [19]:
# creating the customer churn table in database
cust_churn.to_sql("cust_churn", conn, if_exists="replace", index=False)

7043

---
### 3. Displaying table information

In [20]:
# function for displaying the table names and column names
def table_information(conn, cursor):
    tables = cur.execute("SELECT name FROM sqlite_master WHERE type='table';").fetchall()
    for table_name in tables:
        table_name = table_name[0]
        table = pd.read_sql_query("SELECT * FROM {} LIMIT 0".format(table_name), conn)
        print(table_name + ": ")
        for col in table.columns:
            print("\t" + col)

In [21]:
# calling the function and passing argumens: conn and cur
table_information(conn, cur)

cust_service: 
	CustomerID
	PhoneService
	MultipleLines
	InternetService
	OnlineSecurity
	OnlineBackup
	DeviceProtection
	TechSupport
	StreamingTV
	StreamingMovies
churn_all: 
	CustomerID
	Gender
	SeniorCitizen
	Partner
	Dependents
	State
	Latitude
	Longitude
	ZipCode
	PhoneService
	MultipleLines
	InternetService
	OnlineSecurity
	OnlineBackup
	DeviceProtection
	TechSupport
	StreamingTV
	StreamingMovies
	Tenure
	Contract
	PaymentMethod
	PaperlessBilling
	MonthlyCharges
	TotalCharges
	Churn
customer: 
	CustomerID
	Gender
	SeniorCitizen
	Partner
	Dependents
cust_loc: 
	Cust_ID
	State
	Latitude
	Longitude
	ZipCode
cust_services: 
	Cust_ID
	PhoneService
	MultipleLines
	InternetService
	OnlineSecurity
	OnlineBackup
	DeviceProtection
	TechSupport
	StreamingTV
	StreamingMovies
cust_account: 
	Account_id
	Tenure
	Contract
	PaymentMethod
	PaperlessBilling
	MonthlyCharges
	TotalCharges
cust_churn: 
	Id
	Churn


---
### 4. Joining all tables and writing dataframe to CSV file

#### 4.1 Renaming table column directly in SQLite database
In section 4.2, we can rename the tables directly in dataframe.

In [22]:
# renaming the Cust_ID column in cust_loc table to CustomerID
cur.execute(
"""
alter table cust_loc
rename column Cust_ID to CustomerID;
""")

<sqlite3.Cursor at 0x1f8c2444040>

In [23]:
# renaming the Cust_ID column in cust_service table to CustomerID
cur.execute(
"""
alter table cust_services
rename column Cust_ID to CustomerID;
""")

<sqlite3.Cursor at 0x1f8c2444040>

#### 4.2 Renaming table columns in Pandas dataframe

In [24]:
# renaming Cust_ID column to CustomerID in cust_loc dataframe
cust_loc = cust_loc.rename(columns={"Cust_ID": "CustomerID"})

In [25]:
# renaming Cust_ID column to CustomerID in cust_service dataframe
cust_services = cust_services.rename(columns={"Cust_ID": "CustomerID"})

In [26]:
# renaming Account_ID column to CustomerID in cust_account dataframe
cust_account = cust_account.rename(columns={"Account_id": "CustomerID"})

In [27]:
# renaming ID columns to CustomerID in cust_churn dataframe
cust_churn = cust_churn.rename(columns={"Id": "CustomerID"})

#### 4.3 Joining dataframes in Pandas

In [28]:
# list of dataframes to join
dfs_to_join = [customer, cust_loc, cust_services, cust_account, cust_churn]

In [29]:
# joining the dataframes
churn_all = pd.concat(dfs_to_join, join="inner", axis=1)

# removing the duplicated (CustomerID) column
churn_all = churn_all.loc[:, ~churn_all.columns.duplicated()]

In [30]:
# displaying the new joined dataframe
churn_all.head()

Unnamed: 0,CustomerID,Gender,SeniorCitizen,Partner,Dependents,State,Latitude,Longitude,ZipCode,PhoneService,...,TechSupport,StreamingTV,StreamingMovies,Tenure,Contract,PaymentMethod,PaperlessBilling,MonthlyCharges,TotalCharges,Churn
0,3668-QPYBK,Male,No,No,No,California,33.964131,-118.272783,90003,Yes,...,No,No,No,2,Month-to-month,Mailed0check,Yes,53.85,108.15,Yes
1,9237-HQITU,Female,No,No,Yes,California,34.059281,-118.30742,90005,Yes,...,No,No,No,2,Month-to-month,Electronic0check,Yes,70.7,151.65,Yes
2,9305-CDSKC,Female,No,No,Yes,California,34.048013,-118.293953,90006,Yes,...,No,Yes,Yes,8,Month-to-month,Electronic0check,Yes,99.65,820.5,Yes
3,7892-POOKP,Female,No,Yes,Yes,California,34.062125,-118.315709,90010,Yes,...,Yes,Yes,Yes,28,Month-to-month,Electronic0check,Yes,104.8,3046.05,Yes
4,0280-XJGEX,Male,No,No,Yes,California,34.039224,-118.266293,90015,Yes,...,No,Yes,Yes,49,Month-to-month,Bank0transfer0(automatic),Yes,103.7,5036.3,Yes


#### 4.4 Writing the churn_all dataframe to a CSV file

In [31]:
# displaying all the columns
print(churn_all.columns)

Index(['CustomerID', 'Gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'State', 'Latitude', 'Longitude', 'ZipCode', 'PhoneService',
       'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup',
       'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies',
       'Tenure', 'Contract', 'PaymentMethod', 'PaperlessBilling',
       'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')


In [32]:
# writing the new churn_all dataframe to a CSV file
churn_all.to_csv('churn_all.csv', index=False)

#### 4.5 Adding the churn_all table to the customer_churn.db database

In [33]:
# displaying all the tables and their columns
table_information(conn, cur)

cust_service: 
	CustomerID
	PhoneService
	MultipleLines
	InternetService
	OnlineSecurity
	OnlineBackup
	DeviceProtection
	TechSupport
	StreamingTV
	StreamingMovies
churn_all: 
	CustomerID
	Gender
	SeniorCitizen
	Partner
	Dependents
	State
	Latitude
	Longitude
	ZipCode
	PhoneService
	MultipleLines
	InternetService
	OnlineSecurity
	OnlineBackup
	DeviceProtection
	TechSupport
	StreamingTV
	StreamingMovies
	Tenure
	Contract
	PaymentMethod
	PaperlessBilling
	MonthlyCharges
	TotalCharges
	Churn
customer: 
	CustomerID
	Gender
	SeniorCitizen
	Partner
	Dependents
cust_loc: 
	CustomerID
	State
	Latitude
	Longitude
	ZipCode
cust_services: 
	CustomerID
	PhoneService
	MultipleLines
	InternetService
	OnlineSecurity
	OnlineBackup
	DeviceProtection
	TechSupport
	StreamingTV
	StreamingMovies
cust_account: 
	Account_id
	Tenure
	Contract
	PaymentMethod
	PaperlessBilling
	MonthlyCharges
	TotalCharges
cust_churn: 
	Id
	Churn


In [34]:
# validating the datatypes of all the columns in churn_all table
print(churn_all.dtypes)

CustomerID           object
Gender               object
SeniorCitizen        object
Partner              object
Dependents           object
State                object
Latitude            float64
Longitude           float64
ZipCode               int64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Tenure                int64
Contract             object
PaymentMethod        object
PaperlessBilling     object
MonthlyCharges      float64
TotalCharges        float64
Churn                object
dtype: object


In [35]:
# creating the churn_all table in database
churn_all.to_sql("churn_all", conn, if_exists="replace", index=False)

7043