# Data Manipulation using Pandas

Author: Andreas Chandra \
[Email](mailto:andreas@jakartaresearch.com) [Github](https://github.com/andreaschandra) [Blog](https://datafolksid.xyz/andreas) \
If you want to talk with me, proposed schedule [here](https://calendly.com/andreaschandra/)

## Contents

- A Brief Overview of Pandas
- Read/Write Pandas
- Creating DataFrame from Dict/List
- Basic Functionalities and Attributes (Head, Tail, Dtype, Shape, Describe, Missing Values)
- Type Casting
- Renaming Column
- Slicing and Dicing DataFrame (Filtering)
- Reindexing
- Dropping and Poping
- Duplicate data
- Numeric Calculation
- String Operation
- Datetime
- Sorting
- Grouping
- Pandas Apply and Map Function
- Appending, Joining, Merging, Concatenating 2 or more DataFrame
- Pivot and Stack
- Brief of Timeseries
- Window Function
- Basic Plotting

## Day 1

### Overview of Pandas

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.

Installation \
`pip install pandas`

Repo: https://github.com/pandas-dev/pandas

In [1]:
# Import the library
import pandas as pd

In [2]:
pd.options.display.max_columns = 50

### Read/Write Functions

https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html

Common read functions \
`read_csv()` `read_excel()` `read_table()` `read_json()`

In [3]:
d_data = pd.read_csv("telcom_user_extended.csv")

Common write functions \
`to_csv` `to_json` `to_excel`

In [4]:
d_data.to_csv("telecom_users_2.csv", index=False)

### Creating DataFrame from List/Dictionary

From list

In [5]:
user_profile = [
    {"id": 101, "gender": "L", "age": 20, "last education": "high school", "is_married": True},
    {"id": 102, "gender": "P", "age": 18, "last education": "middle school", "is_married": False},
    {"id": 103, "gender": "L", "age": 19, "last education": "high school", "is_married": True},
    {"id": 104, "gender": "P", "age": 28, "last education": "master's degree", "is_married": False},
    {"id": 105, "gender": None, "age": 21, "last education": "bachelor's degree", "is_married": True}
]

In [6]:
pd.DataFrame(user_profile)

Unnamed: 0,id,gender,age,last education,is_married
0,101,L,20,high school,True
1,102,P,18,middle school,False
2,103,L,19,high school,True
3,104,P,28,master's degree,False
4,105,,21,bachelor's degree,True


Only list

In [7]:
number_list_only = [
    [101,"L",20,'high school', True], 
    [102,'P',18,'middle school', False],
    [103,'L',19,'high school', True],
    [104,'P',28,"master's degree", False],
    [105,None,21,"bachelor's degree", True],
]

In [8]:
pd.DataFrame(data=number_list_only, columns=["id", "gender", "age", "last education", 'is_married'])

Unnamed: 0,id,gender,age,last education,is_married
0,101,L,20,high school,True
1,102,P,18,middle school,False
2,103,L,19,high school,True
3,104,P,28,master's degree,False
4,105,,21,bachelor's degree,True


From dictionary

In [9]:
user_profile_dict = {
    'id': [101,102,103,104,105],
    'gender': ["L", "P", "L", "P", None],
    'last education': ["high school", "middle school", "high school", "master's degree", "bachelor's degree"],
    'is_married': [True, False, True, False, True]
}

In [10]:
pd.DataFrame(user_profile_dict)

Unnamed: 0,id,gender,last education,is_married
0,101,L,high school,True
1,102,P,middle school,False
2,103,L,high school,True
3,104,P,master's degree,False
4,105,,bachelor's degree,True


### Basic Functionalities

Head & Tail

In [11]:
d_data.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes
0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.3,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.42558
1,3831-YCPUO,Female,No,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.5,7854.9,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168
2,1506-YJTYT,Male,No,Yes,Yes,45,Yes,Yes,DSL,Yes,Yes,No,Yes,Yes,No,Two year,No,Credit card (automatic),73.85,3371.0,No,,KLfPl@apple.com,59928,01/01/2021,37.0,78.0,79.0,168.0,22.0,19331.0,7069.0,6596.0,761.0,15.0,22.123215
3,2272-UOINI,Female,No,No,No,7,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,78.5,571.05,No,,RmCEn@fall.com.sg,55765,07/01/2021,39.0,66.0,89.0,166.0,28.0,16221.0,8202.0,6167.0,608.0,15.0,22.169331
4,1641-BYBTK,Male,No,No,Yes,6,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,Yes,Mailed check,20.2,98.35,No,,dLZoI@apple.com,35433,26/01/2021,37.0,66.0,70.0,132.0,22.0,17526.0,9309.0,8484.0,613.0,15.0,22.263776


In [12]:
d_data.tail()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes
7709,6332-FBZRI,Male,No,Yes,Yes,67,Yes,Yes,DSL,Yes,Yes,Yes,Yes,No,No,One year,Yes,Credit card (automatic),69.35,4653.25,No,,dxJJX@amail.comz,90058,17/01/2021,40.0,88.0,84.0,107.0,23.0,26881.0,10126.0,5983.0,587.0,224.0,10.154946
7710,7351-KYHQH,Female,1,No,No,7,Yes,No,DSL,No,Yes,Yes,No,Yes,No,Month-to-month,Yes,Electronic check,64.95,493.65,No,,VPdKY@gmail.com,87598,10/01/2021,61.0,12.0,33.0,133.0,10.0,2188.0,777.0,556.0,4.0,224.0,10.155405
7711,6261-LHRTG,Female,No,No,No,26,Yes,No,DSL,No,Yes,Yes,No,No,No,Month-to-month,Yes,Credit card (automatic),54.75,1406.9,No,,kWWJs@apple.com,95724,30/01/2021,40.0,73.0,43.0,165.0,24.0,10327.0,5479.0,9307.0,685.0,225.0,10.065166
7712,1728-BQDMA,Female,No,No,No,2,Yes,No,DSL,No,No,No,No,No,No,Month-to-month,No,Electronic check,44.45,82.7,No,,hvWGv@rocketmail.com,81300,17/01/2021,39.0,89.0,58.0,93.0,24.0,19240.0,5877.0,5635.0,881.0,225.0,10.065489
7713,0134-XWXCE,Female,1,No,No,44,Yes,No,DSL,No,No,Yes,Yes,Yes,Yes,One year,No,Bank transfer (automatic),74.85,3268.05,No,,IRuXb@yahoo.com,87597,25/01/2021,50.0,20.0,25.0,141.0,16.0,19613.0,581.0,725.0,4.0,225.0,10.066998


Shape of Dataset

In [13]:
d_data.shape

(7714, 36)

Data Types

In [14]:
d_data.dtypes

customerID                              object
gender                                  object
SeniorCitizen                           object
Partner                                 object
Dependents                              object
tenure                                   int64
PhoneService                            object
MultipleLines                           object
InternetService                         object
OnlineSecurity                          object
OnlineBackup                            object
DeviceProtection                        object
TechSupport                             object
StreamingTV                             object
StreamingMovies                         object
Contract                                object
PaperlessBilling                        object
PaymentMethod                           object
MonthlyCharges                         float64
TotalCharges                            object
Churn                                   object
InstallApp   

Statistical descriptive numeric columns

In [15]:
d_data.describe()

Unnamed: 0,tenure,MonthlyCharges,kodepos,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes
count,7714.0,7714.0,7714.0,7714.0,7714.0,7714.0,7714.0,7714.0,7714.0,7714.0,7714.0,7714.0,7714.0,7714.0
mean,32.325771,64.650745,60350.855328,36.130801,71.699507,72.963573,114.844957,26.724138,16232.904071,5734.112523,5728.490796,1878.779881,113.831216,18.939252
std,24.448769,30.211436,25567.055467,15.339467,45.090918,43.368072,52.609379,12.695575,11877.008036,3347.839763,3341.541199,2705.862163,67.431557,5.002963
min,0.0,18.25,10010.0,17.0,10.0,20.0,30.0,10.0,1025.0,512.0,512.0,1.0,15.0,5.247223
25%,9.0,35.15,39146.25,25.0,28.0,37.0,73.0,16.0,6094.0,965.0,962.0,9.0,58.0,17.046077
50%,29.0,70.3,64432.0,33.0,64.0,64.0,119.0,24.0,14239.5,6624.0,6662.5,761.0,116.0,19.072952
75%,55.0,90.0,81462.75,41.0,96.0,96.0,144.0,31.0,23504.25,8478.0,8425.75,1759.0,145.75,23.3055
max,72.0,118.75,99980.0,84.0,179.0,179.0,299.0,59.0,51160.0,10239.0,10239.0,10235.0,299.0,24.735832


Dataset information for missing values and data types

In [16]:
d_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7714 entries, 0 to 7713
Data columns (total 36 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   customerID                           7714 non-null   object 
 1   gender                               7581 non-null   object 
 2   SeniorCitizen                        7714 non-null   object 
 3   Partner                              7693 non-null   object 
 4   Dependents                           7714 non-null   object 
 5   tenure                               7714 non-null   int64  
 6   PhoneService                         7714 non-null   object 
 7   MultipleLines                        7714 non-null   object 
 8   InternetService                      7714 non-null   object 
 9   OnlineSecurity                       7714 non-null   object 
 10  OnlineBackup                         7714 non-null   object 
 11  DeviceProtection              

Counting missing values

In [17]:
d_data.isna().sum()

customerID                                0
gender                                  133
SeniorCitizen                             0
Partner                                  21
Dependents                                0
tenure                                    0
PhoneService                              0
MultipleLines                             0
InternetService                           0
OnlineSecurity                            0
OnlineBackup                              0
DeviceProtection                          0
TechSupport                               0
StreamingTV                               0
StreamingMovies                           0
Contract                                  0
PaperlessBilling                          0
PaymentMethod                             0
MonthlyCharges                            0
TotalCharges                              0
Churn                                     0
InstallApp                             7676
email                           

### Fill Missing Values

by `Series.fillna(value)` \
by `DataFrame.fillna(value)`

In [18]:
d_data.Partner.fillna('No', inplace=True)

In [19]:
d_data.Partner.isna().sum()

0

### Type Casting

https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#basics-dtypes

using `DataFrame.astype({'col': int, 'col2': str})` \
using `Series.astype(int|str|float)`

In [20]:
d_data.TotalCharges = d_data.TotalCharges.replace(' ', None)

In [21]:
d_data.TotalCharges = d_data.TotalCharges.astype(float)

In [22]:
d_data.dtypes

customerID                              object
gender                                  object
SeniorCitizen                           object
Partner                                 object
Dependents                              object
tenure                                   int64
PhoneService                            object
MultipleLines                           object
InternetService                         object
OnlineSecurity                          object
OnlineBackup                            object
DeviceProtection                        object
TechSupport                             object
StreamingTV                             object
StreamingMovies                         object
Contract                                object
PaperlessBilling                        object
PaymentMethod                           object
MonthlyCharges                         float64
TotalCharges                           float64
Churn                                   object
InstallApp   

### Renaming Columns

In [23]:
d_data.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes
0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.3,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.42558
1,3831-YCPUO,Female,No,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.5,7854.9,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168
2,1506-YJTYT,Male,No,Yes,Yes,45,Yes,Yes,DSL,Yes,Yes,No,Yes,Yes,No,Two year,No,Credit card (automatic),73.85,3371.0,No,,KLfPl@apple.com,59928,01/01/2021,37.0,78.0,79.0,168.0,22.0,19331.0,7069.0,6596.0,761.0,15.0,22.123215
3,2272-UOINI,Female,No,No,No,7,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,78.5,571.05,No,,RmCEn@fall.com.sg,55765,07/01/2021,39.0,66.0,89.0,166.0,28.0,16221.0,8202.0,6167.0,608.0,15.0,22.169331
4,1641-BYBTK,Male,No,No,Yes,6,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,Yes,Mailed check,20.2,98.35,No,,dLZoI@apple.com,35433,26/01/2021,37.0,66.0,70.0,132.0,22.0,17526.0,9309.0,8484.0,613.0,15.0,22.263776


In [24]:
d_data.rename(columns={'customerID':'customer_id', 'gender': 'gender', 'tenure': 'tenure'}, inplace=True)

In [25]:
d_data.head()

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes
0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.3,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.42558
1,3831-YCPUO,Female,No,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.5,7854.9,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168
2,1506-YJTYT,Male,No,Yes,Yes,45,Yes,Yes,DSL,Yes,Yes,No,Yes,Yes,No,Two year,No,Credit card (automatic),73.85,3371.0,No,,KLfPl@apple.com,59928,01/01/2021,37.0,78.0,79.0,168.0,22.0,19331.0,7069.0,6596.0,761.0,15.0,22.123215
3,2272-UOINI,Female,No,No,No,7,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,78.5,571.05,No,,RmCEn@fall.com.sg,55765,07/01/2021,39.0,66.0,89.0,166.0,28.0,16221.0,8202.0,6167.0,608.0,15.0,22.169331
4,1641-BYBTK,Male,No,No,Yes,6,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,Yes,Mailed check,20.2,98.35,No,,dLZoI@apple.com,35433,26/01/2021,37.0,66.0,70.0,132.0,22.0,17526.0,9309.0,8484.0,613.0,15.0,22.263776


### Duplicate Data

find duplicate entries using `DataFrame.duplicated()`

In [26]:
d_data[d_data.duplicated(subset='customer_id')]

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes
5986,6223-DHJGV,Female,No,No,No,42,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,One year,Yes,Mailed check,20.65,958.10,No,,RXWMk@fall.com.sg,72470,03/01/2021,32.0,45.0,85.0,113.0,27.0,4421.0,7126.0,10071.0,797.0,112.0,19.379743
5987,1732-FEKLD,Female,No,No,No,54,Yes,Yes,Fiber optic,No,Yes,Yes,No,Yes,No,One year,Yes,Bank transfer (automatic),94.75,5121.75,No,,NcPCI@rocketmail.com,65970,30/01/2021,28.0,80.0,84.0,179.0,27.0,16090.0,9676.0,8584.0,999.0,112.0,19.379970
5988,7554-NEWDD,Male,No,No,No,10,Yes,Yes,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,No,Bank transfer (automatic),25.70,251.60,No,,VyGMi@gmail.c,61890,03/01/2021,22.0,98.0,179.0,65.0,32.0,28264.0,5140.0,9199.0,2164.0,112.0,19.381090
5989,7572-KPVKK,Male,No,No,Yes,62,Yes,Yes,Fiber optic,No,No,Yes,Yes,Yes,Yes,Two year,Yes,Electronic check,104.05,6590.50,No,,xVbYS@fall.com.sg,74663,26/01/2021,26.0,61.0,77.0,60.0,21.0,19807.0,7709.0,9849.0,728.0,112.0,19.381623
5990,6946-LMSQS,Male,1,Yes,No,25,Yes,Yes,Fiber optic,Yes,No,No,No,No,Yes,One year,Yes,Electronic check,89.05,2177.45,Yes,,YRoaR@salesforce.com,69035,08/01/2021,79.0,23.0,36.0,279.0,10.0,3031.0,967.0,1004.0,4.0,112.0,19.381734
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7709,6332-FBZRI,Male,No,Yes,Yes,67,Yes,Yes,DSL,Yes,Yes,Yes,Yes,No,No,One year,Yes,Credit card (automatic),69.35,4653.25,No,,dxJJX@amail.comz,90058,17/01/2021,40.0,88.0,84.0,107.0,23.0,26881.0,10126.0,5983.0,587.0,224.0,10.154946
7710,7351-KYHQH,Female,1,No,No,7,Yes,No,DSL,No,Yes,Yes,No,Yes,No,Month-to-month,Yes,Electronic check,64.95,493.65,No,,VPdKY@gmail.com,87598,10/01/2021,61.0,12.0,33.0,133.0,10.0,2188.0,777.0,556.0,4.0,224.0,10.155405
7711,6261-LHRTG,Female,No,No,No,26,Yes,No,DSL,No,Yes,Yes,No,No,No,Month-to-month,Yes,Credit card (automatic),54.75,1406.90,No,,kWWJs@apple.com,95724,30/01/2021,40.0,73.0,43.0,165.0,24.0,10327.0,5479.0,9307.0,685.0,225.0,10.065166
7712,1728-BQDMA,Female,No,No,No,2,Yes,No,DSL,No,No,No,No,No,No,Month-to-month,No,Electronic check,44.45,82.70,No,,hvWGv@rocketmail.com,81300,17/01/2021,39.0,89.0,58.0,93.0,24.0,19240.0,5877.0,5635.0,881.0,225.0,10.065489


In [60]:
d_data.drop_duplicates(subset='customer_id', inplace=True)

In [61]:
d_data.shape

(5986, 38)

### Slicing

slicing and dicing in Pandas can be done using `.loc` `.iloc` `.at` `.iat` or just bracket

In [28]:
d_data.loc[:5, ['gender', 'SeniorCitizen', 'Partner']]

Unnamed: 0,gender,SeniorCitizen,Partner
0,Female,1,Yes
1,Female,No,Yes
2,Male,No,Yes
3,Female,No,No
4,Male,No,No
5,Male,No,No


In [29]:
d_data.gender.unique()

array(['Female', 'Male', nan], dtype=object)

In [30]:
d_data[d_data.gender == 'Female']

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes
0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.30,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.425580
1,3831-YCPUO,Female,No,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.50,7854.90,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168
3,2272-UOINI,Female,No,No,No,7,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,78.50,571.05,No,,RmCEn@fall.com.sg,55765,07/01/2021,39.0,66.0,89.0,166.0,28.0,16221.0,8202.0,6167.0,608.0,15.0,22.169331
7,8654-DHAOW,Female,No,No,No,2,Yes,Yes,DSL,No,Yes,No,No,No,No,Month-to-month,No,Mailed check,54.15,101.65,No,,xdQsX@salesforce.com,37348,03/01/2021,17.0,150.0,95.0,45.0,52.0,4801.0,7705.0,9359.0,7426.0,15.0,22.466924
8,7826-VVKWT,Female,1,Yes,Yes,24,Yes,No,Fiber optic,No,No,Yes,No,Yes,Yes,Two year,Yes,Electronic check,96.55,2263.45,No,,ToQer@face_book.com,15694,19/01/2021,49.0,21.0,28.0,139.0,16.0,11755.0,888.0,909.0,6.0,15.0,22.605444
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7706,5760-WRAHC,Female,1,No,No,22,Yes,No,DSL,Yes,No,Yes,Yes,No,Yes,Month-to-month,Yes,Mailed check,69.75,1545.40,No,,dWLPU@amail.comz,81440,13/01/2021,55.0,23.0,24.0,121.0,12.0,19941.0,838.0,710.0,1.0,224.0,10.148674
7710,7351-KYHQH,Female,1,No,No,7,Yes,No,DSL,No,Yes,Yes,No,Yes,No,Month-to-month,Yes,Electronic check,64.95,493.65,No,,VPdKY@gmail.com,87598,10/01/2021,61.0,12.0,33.0,133.0,10.0,2188.0,777.0,556.0,4.0,224.0,10.155405
7711,6261-LHRTG,Female,No,No,No,26,Yes,No,DSL,No,Yes,Yes,No,No,No,Month-to-month,Yes,Credit card (automatic),54.75,1406.90,No,,kWWJs@apple.com,95724,30/01/2021,40.0,73.0,43.0,165.0,24.0,10327.0,5479.0,9307.0,685.0,225.0,10.065166
7712,1728-BQDMA,Female,No,No,No,2,Yes,No,DSL,No,No,No,No,No,No,Month-to-month,No,Electronic check,44.45,82.70,No,,hvWGv@rocketmail.com,81300,17/01/2021,39.0,89.0,58.0,93.0,24.0,19240.0,5877.0,5635.0,881.0,225.0,10.065489


### Assignin new columns and replace

In [31]:
d_data['IsMarried'] = 'No'

In [32]:
d_data.head()

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes,IsMarried
0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.3,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.42558,No
1,3831-YCPUO,Female,No,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.5,7854.9,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168,No
2,1506-YJTYT,Male,No,Yes,Yes,45,Yes,Yes,DSL,Yes,Yes,No,Yes,Yes,No,Two year,No,Credit card (automatic),73.85,3371.0,No,,KLfPl@apple.com,59928,01/01/2021,37.0,78.0,79.0,168.0,22.0,19331.0,7069.0,6596.0,761.0,15.0,22.123215,No
3,2272-UOINI,Female,No,No,No,7,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,78.5,571.05,No,,RmCEn@fall.com.sg,55765,07/01/2021,39.0,66.0,89.0,166.0,28.0,16221.0,8202.0,6167.0,608.0,15.0,22.169331,No
4,1641-BYBTK,Male,No,No,Yes,6,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,Yes,Mailed check,20.2,98.35,No,,dLZoI@apple.com,35433,26/01/2021,37.0,66.0,70.0,132.0,22.0,17526.0,9309.0,8484.0,613.0,15.0,22.263776,No


Replace values
- Replace values `No` to `0` in SeniorCitizen

In [33]:
d_data.SeniorCitizen.unique()

array(['1', 'No'], dtype=object)

In [34]:
d_data.loc[d_data.SeniorCitizen=='No', 'SeniorCitizen'] = 0

- Replace Values Internet Connection `No` to `Wireless`

In [35]:
d_data.InternetService.unique()

array(['DSL', 'Fiber optic', 'No'], dtype=object)

In [36]:
d_data.loc[d_data.InternetService=='No', 'InternetService'] = 'Wireless'

In [37]:
d_data.head()

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes,IsMarried
0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.3,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.42558,No
1,3831-YCPUO,Female,0,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.5,7854.9,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168,No
2,1506-YJTYT,Male,0,Yes,Yes,45,Yes,Yes,DSL,Yes,Yes,No,Yes,Yes,No,Two year,No,Credit card (automatic),73.85,3371.0,No,,KLfPl@apple.com,59928,01/01/2021,37.0,78.0,79.0,168.0,22.0,19331.0,7069.0,6596.0,761.0,15.0,22.123215,No
3,2272-UOINI,Female,0,No,No,7,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,78.5,571.05,No,,RmCEn@fall.com.sg,55765,07/01/2021,39.0,66.0,89.0,166.0,28.0,16221.0,8202.0,6167.0,608.0,15.0,22.169331,No
4,1641-BYBTK,Male,0,No,Yes,6,Yes,No,Wireless,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,Yes,Mailed check,20.2,98.35,No,,dLZoI@apple.com,35433,26/01/2021,37.0,66.0,70.0,132.0,22.0,17526.0,9309.0,8484.0,613.0,15.0,22.263776,No


### Exercise

1. Casting PostalCode integer to string
2. Fill missing values in `Gender` to `Not disclose`
3. Create new column based on `Tenure` if `Tenure > 50` then `Old` else `New` 
4. Filter DataFrame `Gender=Male` and `Partner=No` and `TotalCharges>=100`
5. Replace `Electronic check` in `PaymentMethod` to 'E-Wallet'

## Day 2

### Reindexing

In [38]:
d_partner = d_data[d_data["Partner"] == "Yes"].copy()

In [39]:
d_partner.loc[:5]

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes,IsMarried
0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.3,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.42558,No
1,3831-YCPUO,Female,0,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.5,7854.9,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168,No
2,1506-YJTYT,Male,0,Yes,Yes,45,Yes,Yes,DSL,Yes,Yes,No,Yes,Yes,No,Two year,No,Credit card (automatic),73.85,3371.0,No,,KLfPl@apple.com,59928,01/01/2021,37.0,78.0,79.0,168.0,22.0,19331.0,7069.0,6596.0,761.0,15.0,22.123215,No


In [40]:
d_partner.iloc[:5]

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes,IsMarried
0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.3,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.42558,No
1,3831-YCPUO,Female,0,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.5,7854.9,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168,No
2,1506-YJTYT,Male,0,Yes,Yes,45,Yes,Yes,DSL,Yes,Yes,No,Yes,Yes,No,Two year,No,Credit card (automatic),73.85,3371.0,No,,KLfPl@apple.com,59928,01/01/2021,37.0,78.0,79.0,168.0,22.0,19331.0,7069.0,6596.0,761.0,15.0,22.123215,No
6,3889-VWBID,Male,0,Yes,Yes,68,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Two year,No,Bank transfer (automatic),82.85,5776.45,No,,XDJUd@yahoo.com,23102,16/01/2021,36.0,42.0,54.0,82.0,26.0,4852.0,9187.0,8473.0,701.0,15.0,22.434314,No
8,7826-VVKWT,Female,1,Yes,Yes,24,Yes,No,Fiber optic,No,No,Yes,No,Yes,Yes,Two year,Yes,Electronic check,96.55,2263.45,No,,ToQer@face_book.com,15694,19/01/2021,49.0,21.0,28.0,139.0,16.0,11755.0,888.0,909.0,6.0,15.0,22.605444,No


In [41]:
d_partner.reset_index()

Unnamed: 0,index,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes,IsMarried
0,0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.30,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.425580,No
1,1,3831-YCPUO,Female,0,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.50,7854.90,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168,No
2,2,1506-YJTYT,Male,0,Yes,Yes,45,Yes,Yes,DSL,Yes,Yes,No,Yes,Yes,No,Two year,No,Credit card (automatic),73.85,3371.00,No,,KLfPl@apple.com,59928,01/01/2021,37.0,78.0,79.0,168.0,22.0,19331.0,7069.0,6596.0,761.0,15.0,22.123215,No
3,6,3889-VWBID,Male,0,Yes,Yes,68,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Two year,No,Bank transfer (automatic),82.85,5776.45,No,,XDJUd@yahoo.com,23102,16/01/2021,36.0,42.0,54.0,82.0,26.0,4852.0,9187.0,8473.0,701.0,15.0,22.434314,No
4,8,7826-VVKWT,Female,1,Yes,Yes,24,Yes,No,Fiber optic,No,No,Yes,No,Yes,Yes,Two year,Yes,Electronic check,96.55,2263.45,No,,ToQer@face_book.com,15694,19/01/2021,49.0,21.0,28.0,139.0,16.0,11755.0,888.0,909.0,6.0,15.0,22.605444,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3686,7697,3585-YNADK,Female,0,Yes,No,57,Yes,Yes,DSL,No,Yes,Yes,Yes,Yes,No,One year,No,Bank transfer (automatic),74.60,4368.95,No,,tnTMY@rocketmail.com,89994,29/01/2021,17.0,152.0,163.0,77.0,58.0,13469.0,5572.0,6198.0,7125.0,222.0,10.333354,No
3687,7700,6198-PNNSZ,Female,0,Yes,No,56,Yes,Yes,Fiber optic,Yes,Yes,Yes,No,Yes,Yes,One year,No,Bank transfer (automatic),109.80,6109.65,No,,LXrXM@fall.com.sg,95799,14/01/2021,35.0,82.0,73.0,83.0,26.0,4059.0,7287.0,7637.0,914.0,223.0,10.239645,No
3688,7703,1301-LOPVR,Male,0,Yes,Yes,29,No,No phone service,DSL,Yes,Yes,Yes,Yes,No,Yes,One year,Yes,Credit card (automatic),55.35,1636.95,No,,aqPfM@salesforce.com,95312,22/01/2021,29.0,74.0,81.0,60.0,29.0,3604.0,8086.0,8199.0,562.0,223.0,10.242967,No
3689,7708,2533-QVMSK,Male,0,Yes,No,61,Yes,Yes,Fiber optic,No,Yes,No,Yes,Yes,No,Two year,Yes,Electronic check,94.10,5638.30,Yes,,SWFgH@gmail.c,99227,03/01/2021,35.0,44.0,80.0,113.0,28.0,14335.0,9488.0,6511.0,648.0,224.0,10.152071,No


In [42]:
d_partner.reset_index(drop=True)

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes,IsMarried
0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.30,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.425580,No
1,3831-YCPUO,Female,0,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.50,7854.90,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168,No
2,1506-YJTYT,Male,0,Yes,Yes,45,Yes,Yes,DSL,Yes,Yes,No,Yes,Yes,No,Two year,No,Credit card (automatic),73.85,3371.00,No,,KLfPl@apple.com,59928,01/01/2021,37.0,78.0,79.0,168.0,22.0,19331.0,7069.0,6596.0,761.0,15.0,22.123215,No
3,3889-VWBID,Male,0,Yes,Yes,68,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Two year,No,Bank transfer (automatic),82.85,5776.45,No,,XDJUd@yahoo.com,23102,16/01/2021,36.0,42.0,54.0,82.0,26.0,4852.0,9187.0,8473.0,701.0,15.0,22.434314,No
4,7826-VVKWT,Female,1,Yes,Yes,24,Yes,No,Fiber optic,No,No,Yes,No,Yes,Yes,Two year,Yes,Electronic check,96.55,2263.45,No,,ToQer@face_book.com,15694,19/01/2021,49.0,21.0,28.0,139.0,16.0,11755.0,888.0,909.0,6.0,15.0,22.605444,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3686,3585-YNADK,Female,0,Yes,No,57,Yes,Yes,DSL,No,Yes,Yes,Yes,Yes,No,One year,No,Bank transfer (automatic),74.60,4368.95,No,,tnTMY@rocketmail.com,89994,29/01/2021,17.0,152.0,163.0,77.0,58.0,13469.0,5572.0,6198.0,7125.0,222.0,10.333354,No
3687,6198-PNNSZ,Female,0,Yes,No,56,Yes,Yes,Fiber optic,Yes,Yes,Yes,No,Yes,Yes,One year,No,Bank transfer (automatic),109.80,6109.65,No,,LXrXM@fall.com.sg,95799,14/01/2021,35.0,82.0,73.0,83.0,26.0,4059.0,7287.0,7637.0,914.0,223.0,10.239645,No
3688,1301-LOPVR,Male,0,Yes,Yes,29,No,No phone service,DSL,Yes,Yes,Yes,Yes,No,Yes,One year,Yes,Credit card (automatic),55.35,1636.95,No,,aqPfM@salesforce.com,95312,22/01/2021,29.0,74.0,81.0,60.0,29.0,3604.0,8086.0,8199.0,562.0,223.0,10.242967,No
3689,2533-QVMSK,Male,0,Yes,No,61,Yes,Yes,Fiber optic,No,Yes,No,Yes,Yes,No,Two year,Yes,Electronic check,94.10,5638.30,Yes,,SWFgH@gmail.c,99227,03/01/2021,35.0,44.0,80.0,113.0,28.0,14335.0,9488.0,6511.0,648.0,224.0,10.152071,No


In [62]:
d_partner.reset_index(drop=True, inplace=True)

In [63]:
d_partner

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes,IsMarried
0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.30,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.425580,No
1,3831-YCPUO,Female,0,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.50,7854.90,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168,No
2,1506-YJTYT,Male,0,Yes,Yes,45,Yes,Yes,DSL,Yes,Yes,No,Yes,Yes,No,Two year,No,Credit card (automatic),73.85,3371.00,No,,KLfPl@apple.com,59928,01/01/2021,37.0,78.0,79.0,168.0,22.0,19331.0,7069.0,6596.0,761.0,15.0,22.123215,No
3,3889-VWBID,Male,0,Yes,Yes,68,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Two year,No,Bank transfer (automatic),82.85,5776.45,No,,XDJUd@yahoo.com,23102,16/01/2021,36.0,42.0,54.0,82.0,26.0,4852.0,9187.0,8473.0,701.0,15.0,22.434314,No
4,7826-VVKWT,Female,1,Yes,Yes,24,Yes,No,Fiber optic,No,No,Yes,No,Yes,Yes,Two year,Yes,Electronic check,96.55,2263.45,No,,ToQer@face_book.com,15694,19/01/2021,49.0,21.0,28.0,139.0,16.0,11755.0,888.0,909.0,6.0,15.0,22.605444,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3686,3585-YNADK,Female,0,Yes,No,57,Yes,Yes,DSL,No,Yes,Yes,Yes,Yes,No,One year,No,Bank transfer (automatic),74.60,4368.95,No,,tnTMY@rocketmail.com,89994,29/01/2021,17.0,152.0,163.0,77.0,58.0,13469.0,5572.0,6198.0,7125.0,222.0,10.333354,No
3687,6198-PNNSZ,Female,0,Yes,No,56,Yes,Yes,Fiber optic,Yes,Yes,Yes,No,Yes,Yes,One year,No,Bank transfer (automatic),109.80,6109.65,No,,LXrXM@fall.com.sg,95799,14/01/2021,35.0,82.0,73.0,83.0,26.0,4059.0,7287.0,7637.0,914.0,223.0,10.239645,No
3688,1301-LOPVR,Male,0,Yes,Yes,29,No,No phone service,DSL,Yes,Yes,Yes,Yes,No,Yes,One year,Yes,Credit card (automatic),55.35,1636.95,No,,aqPfM@salesforce.com,95312,22/01/2021,29.0,74.0,81.0,60.0,29.0,3604.0,8086.0,8199.0,562.0,223.0,10.242967,No
3689,2533-QVMSK,Male,0,Yes,No,61,Yes,Yes,Fiber optic,No,Yes,No,Yes,Yes,No,Two year,Yes,Electronic check,94.10,5638.30,Yes,,SWFgH@gmail.c,99227,03/01/2021,35.0,44.0,80.0,113.0,28.0,14335.0,9488.0,6511.0,648.0,224.0,10.152071,No


In [64]:
d_data.reset_index(drop=True, inplace=True)

### Dropping and Popping

In [45]:
d_data.columns

Index(['customer_id', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn',
       'InstallApp', 'email', 'kodepos', 'RecordedDate', 'age',
       'number_of_send_message', 'number_of_received_message',
       'minutes_of_call', 'num_of_call', 'internet_usage_megabytes',
       'netflix_usage_megabytes', 'youtube_usage_megabytes',
       'game_usage_megabytes', 'average_internet_ping',
       'average_internet_speed_in_megabytes', 'IsMarried'],
      dtype='object')

In [46]:
d_data.drop('IsMarried', axis=1)

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes
0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.30,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.425580
1,3831-YCPUO,Female,0,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.50,7854.90,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168
2,1506-YJTYT,Male,0,Yes,Yes,45,Yes,Yes,DSL,Yes,Yes,No,Yes,Yes,No,Two year,No,Credit card (automatic),73.85,3371.00,No,,KLfPl@apple.com,59928,01/01/2021,37.0,78.0,79.0,168.0,22.0,19331.0,7069.0,6596.0,761.0,15.0,22.123215
3,2272-UOINI,Female,0,No,No,7,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,78.50,571.05,No,,RmCEn@fall.com.sg,55765,07/01/2021,39.0,66.0,89.0,166.0,28.0,16221.0,8202.0,6167.0,608.0,15.0,22.169331
4,1641-BYBTK,Male,0,No,Yes,6,Yes,No,Wireless,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,Yes,Mailed check,20.20,98.35,No,,dLZoI@apple.com,35433,26/01/2021,37.0,66.0,70.0,132.0,22.0,17526.0,9309.0,8484.0,613.0,15.0,22.263776
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7709,6332-FBZRI,Male,0,Yes,Yes,67,Yes,Yes,DSL,Yes,Yes,Yes,Yes,No,No,One year,Yes,Credit card (automatic),69.35,4653.25,No,,dxJJX@amail.comz,90058,17/01/2021,40.0,88.0,84.0,107.0,23.0,26881.0,10126.0,5983.0,587.0,224.0,10.154946
7710,7351-KYHQH,Female,1,No,No,7,Yes,No,DSL,No,Yes,Yes,No,Yes,No,Month-to-month,Yes,Electronic check,64.95,493.65,No,,VPdKY@gmail.com,87598,10/01/2021,61.0,12.0,33.0,133.0,10.0,2188.0,777.0,556.0,4.0,224.0,10.155405
7711,6261-LHRTG,Female,0,No,No,26,Yes,No,DSL,No,Yes,Yes,No,No,No,Month-to-month,Yes,Credit card (automatic),54.75,1406.90,No,,kWWJs@apple.com,95724,30/01/2021,40.0,73.0,43.0,165.0,24.0,10327.0,5479.0,9307.0,685.0,225.0,10.065166
7712,1728-BQDMA,Female,0,No,No,2,Yes,No,DSL,No,No,No,No,No,No,Month-to-month,No,Electronic check,44.45,82.70,No,,hvWGv@rocketmail.com,81300,17/01/2021,39.0,89.0,58.0,93.0,24.0,19240.0,5877.0,5635.0,881.0,225.0,10.065489


In [47]:
s_is_married = d_data.pop('IsMarried')

In [48]:
s_is_married

0       No
1       No
2       No
3       No
4       No
        ..
7709    No
7710    No
7711    No
7712    No
7713    No
Name: IsMarried, Length: 7714, dtype: object

In [49]:
d_data.columns

Index(['customer_id', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn',
       'InstallApp', 'email', 'kodepos', 'RecordedDate', 'age',
       'number_of_send_message', 'number_of_received_message',
       'minutes_of_call', 'num_of_call', 'internet_usage_megabytes',
       'netflix_usage_megabytes', 'youtube_usage_megabytes',
       'game_usage_megabytes', 'average_internet_ping',
       'average_internet_speed_in_megabytes'],
      dtype='object')

### Numeric Calculation

In [50]:
d_data.dtypes

customer_id                             object
gender                                  object
SeniorCitizen                           object
Partner                                 object
Dependents                              object
tenure                                   int64
PhoneService                            object
MultipleLines                           object
InternetService                         object
OnlineSecurity                          object
OnlineBackup                            object
DeviceProtection                        object
TechSupport                             object
StreamingTV                             object
StreamingMovies                         object
Contract                                object
PaperlessBilling                        object
PaymentMethod                           object
MonthlyCharges                         float64
TotalCharges                           float64
Churn                                   object
InstallApp   

In [51]:
d_data['total_usage_internet'] = d_data.internet_usage_megabytes + d_data.netflix_usage_megabytes + d_data.youtube_usage_megabytes

In [52]:
d_data.head()

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes,total_usage_internet
0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.3,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.42558,4204.0
1,3831-YCPUO,Female,0,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.5,7854.9,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168,49678.0
2,1506-YJTYT,Male,0,Yes,Yes,45,Yes,Yes,DSL,Yes,Yes,No,Yes,Yes,No,Two year,No,Credit card (automatic),73.85,3371.0,No,,KLfPl@apple.com,59928,01/01/2021,37.0,78.0,79.0,168.0,22.0,19331.0,7069.0,6596.0,761.0,15.0,22.123215,32996.0
3,2272-UOINI,Female,0,No,No,7,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,78.5,571.05,No,,RmCEn@fall.com.sg,55765,07/01/2021,39.0,66.0,89.0,166.0,28.0,16221.0,8202.0,6167.0,608.0,15.0,22.169331,30590.0
4,1641-BYBTK,Male,0,No,Yes,6,Yes,No,Wireless,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,Yes,Mailed check,20.2,98.35,No,,dLZoI@apple.com,35433,26/01/2021,37.0,66.0,70.0,132.0,22.0,17526.0,9309.0,8484.0,613.0,15.0,22.263776,35319.0


### String Operation

In [53]:
d_data['customer_id_lower'] = d_data.customer_id.str.lower()

In [54]:
d_data

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes,total_usage_internet,customer_id_lower
0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.30,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.425580,4204.0,9381-ndkme
1,3831-YCPUO,Female,0,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.50,7854.90,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168,49678.0,3831-ycpuo
2,1506-YJTYT,Male,0,Yes,Yes,45,Yes,Yes,DSL,Yes,Yes,No,Yes,Yes,No,Two year,No,Credit card (automatic),73.85,3371.00,No,,KLfPl@apple.com,59928,01/01/2021,37.0,78.0,79.0,168.0,22.0,19331.0,7069.0,6596.0,761.0,15.0,22.123215,32996.0,1506-yjtyt
3,2272-UOINI,Female,0,No,No,7,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,78.50,571.05,No,,RmCEn@fall.com.sg,55765,07/01/2021,39.0,66.0,89.0,166.0,28.0,16221.0,8202.0,6167.0,608.0,15.0,22.169331,30590.0,2272-uoini
4,1641-BYBTK,Male,0,No,Yes,6,Yes,No,Wireless,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,Yes,Mailed check,20.20,98.35,No,,dLZoI@apple.com,35433,26/01/2021,37.0,66.0,70.0,132.0,22.0,17526.0,9309.0,8484.0,613.0,15.0,22.263776,35319.0,1641-bybtk
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7709,6332-FBZRI,Male,0,Yes,Yes,67,Yes,Yes,DSL,Yes,Yes,Yes,Yes,No,No,One year,Yes,Credit card (automatic),69.35,4653.25,No,,dxJJX@amail.comz,90058,17/01/2021,40.0,88.0,84.0,107.0,23.0,26881.0,10126.0,5983.0,587.0,224.0,10.154946,42990.0,6332-fbzri
7710,7351-KYHQH,Female,1,No,No,7,Yes,No,DSL,No,Yes,Yes,No,Yes,No,Month-to-month,Yes,Electronic check,64.95,493.65,No,,VPdKY@gmail.com,87598,10/01/2021,61.0,12.0,33.0,133.0,10.0,2188.0,777.0,556.0,4.0,224.0,10.155405,3521.0,7351-kyhqh
7711,6261-LHRTG,Female,0,No,No,26,Yes,No,DSL,No,Yes,Yes,No,No,No,Month-to-month,Yes,Credit card (automatic),54.75,1406.90,No,,kWWJs@apple.com,95724,30/01/2021,40.0,73.0,43.0,165.0,24.0,10327.0,5479.0,9307.0,685.0,225.0,10.065166,25113.0,6261-lhrtg
7712,1728-BQDMA,Female,0,No,No,2,Yes,No,DSL,No,No,No,No,No,No,Month-to-month,No,Electronic check,44.45,82.70,No,,hvWGv@rocketmail.com,81300,17/01/2021,39.0,89.0,58.0,93.0,24.0,19240.0,5877.0,5635.0,881.0,225.0,10.065489,30752.0,1728-bqdma


### Exercise

In [55]:
d_data.dtypes

customer_id                             object
gender                                  object
SeniorCitizen                           object
Partner                                 object
Dependents                              object
tenure                                   int64
PhoneService                            object
MultipleLines                           object
InternetService                         object
OnlineSecurity                          object
OnlineBackup                            object
DeviceProtection                        object
TechSupport                             object
StreamingTV                             object
StreamingMovies                         object
Contract                                object
PaperlessBilling                        object
PaymentMethod                           object
MonthlyCharges                         float64
TotalCharges                           float64
Churn                                   object
InstallApp   

1. drop column `InstallApp`
2. pop column `Churn` and assign to `target` variable
3. replace `Credit card (automatic)` in PaymentMethod to `Credit card`
4. `minutes of call` divided by `num_of_call` and assign to `average_call_duration`

## Day 3

### Sorting

In [58]:
d_data.head()

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes,total_usage_internet,customer_id_lower
0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.3,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.42558,4204.0,9381-ndkme
1,3831-YCPUO,Female,0,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.5,7854.9,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168,49678.0,3831-ycpuo
2,1506-YJTYT,Male,0,Yes,Yes,45,Yes,Yes,DSL,Yes,Yes,No,Yes,Yes,No,Two year,No,Credit card (automatic),73.85,3371.0,No,,KLfPl@apple.com,59928,01/01/2021,37.0,78.0,79.0,168.0,22.0,19331.0,7069.0,6596.0,761.0,15.0,22.123215,32996.0,1506-yjtyt
3,2272-UOINI,Female,0,No,No,7,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,78.5,571.05,No,,RmCEn@fall.com.sg,55765,07/01/2021,39.0,66.0,89.0,166.0,28.0,16221.0,8202.0,6167.0,608.0,15.0,22.169331,30590.0,2272-uoini
4,1641-BYBTK,Male,0,No,Yes,6,Yes,No,Wireless,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,Yes,Mailed check,20.2,98.35,No,,dLZoI@apple.com,35433,26/01/2021,37.0,66.0,70.0,132.0,22.0,17526.0,9309.0,8484.0,613.0,15.0,22.263776,35319.0,1641-bybtk


In [67]:
d_data.sort_values(by='tenure').head()

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes,total_usage_internet,customer_id_lower
258,7644-OMVMY,Male,0,Yes,Yes,0,Yes,No,Wireless,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,19.85,90.6,No,,zaWGT@face_book.com,38433,19/01/2021,42.0,25.0,28.0,145.0,12.0,18722.0,591.0,758.0,7.0,21.0,24.455995,20071.0,7644-omvmy
2799,1371-DWPAZ,Female,0,Yes,Yes,0,No,No phone service,DSL,Yes,Yes,Yes,Yes,Yes,No,Two year,No,Credit card (automatic),56.05,62.9,No,,mYJhn@salesforce.com,34929,16/01/2021,30.0,81.0,82.0,92.0,28.0,12165.0,6760.0,9175.0,807.0,77.0,22.119777,28100.0,1371-dwpaz
4278,2520-SGTTA,Female,0,Yes,Yes,0,Yes,No,Wireless,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,20.0,3847.6,No,,hHHzu@face_book.com,97048,07/01/2021,34.0,68.0,58.0,70.0,24.0,3960.0,5881.0,9529.0,727.0,132.0,17.897441,19370.0,2520-sgtta
2091,2923-ARZLG,Male,0,Yes,Yes,0,Yes,No,Wireless,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,One year,Yes,Mailed check,19.7,979.5,No,,DFTwR@fall.com.sg,42239,02/01/2021,26.0,50.0,56.0,77.0,26.0,3444.0,7068.0,8112.0,907.0,61.0,23.12874,18624.0,2923-arzlg
2281,2775-SEFEE,Male,0,No,Yes,0,Yes,Yes,DSL,Yes,Yes,No,Yes,No,No,Two year,Yes,Bank transfer (automatic),61.9,929.2,No,,eyUWI@outlook.com,48936,19/01/2021,33.0,40.0,57.0,144.0,25.0,19500.0,9358.0,6124.0,1018.0,65.0,22.896684,34982.0,2775-sefee


In [68]:
d_data.sort_values(by='tenure', ascending=False).head()

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes,total_usage_internet,customer_id_lower
1977,2226-ICFDO,Female,0,Yes,Yes,72,Yes,Yes,DSL,Yes,No,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),85.9,6110.75,No,,PhMti@yahoo.com,59315,25/01/2021,40.0,59.0,44.0,72.0,26.0,16208.0,6682.0,5342.0,776.0,58.0,23.323126,28232.0,2226-icfdo
5324,2234-EOFPT,Male,0,Yes,No,72,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Two year,Yes,Bank transfer (automatic),79.35,5753.25,No,,IKYfA@salesforce.com,87129,27/01/2021,36.0,72.0,72.0,126.0,29.0,7749.0,6649.0,7965.0,766.0,217.0,10.762449,22363.0,2234-eofpt
2361,5727-MYATE,Male,0,Yes,Yes,72,Yes,Yes,DSL,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),90.8,6397.6,No,,yvVRD@amail.comz,43033,16/01/2021,29.0,86.0,89.0,60.0,24.0,17117.0,7398.0,5997.0,768.0,67.0,22.757218,30512.0,5727-myate
518,9127-FHJBZ,Male,0,Yes,Yes,72,Yes,Yes,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Electronic check,114.0,8093.15,No,,avYZd@amail.comz,18623,03/01/2021,35.0,54.0,59.0,135.0,28.0,10772.0,5339.0,9037.0,934.0,26.0,24.680325,25148.0,9127-fhjbz
4583,4079-WWQQQ,Male,0,No,No,72,No,No phone service,DSL,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Bank transfer (automatic),65.55,4807.45,No,,sQNie@yahoo.com,76947,18/01/2021,27.0,83.0,47.0,79.0,24.0,21237.0,7776.0,9844.0,894.0,141.0,17.307997,38857.0,4079-wwqqq


In [66]:
d_data.sort_values(by=['average_internet_speed_in_megabytes', 'average_internet_ping'])

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes,total_usage_internet,customer_id_lower
5946,1121-QSIVB,Female,0,No,Yes,44,Yes,Yes,DSL,No,Yes,No,No,Yes,Yes,One year,Yes,Mailed check,77.55,3471.10,No,,CgMRz@gmail.c,80859,28/01/2021,41.0,29.0,29.0,125.0,15.0,11211.0,766.0,655.0,2.0,296.0,5.247223,12632.0,1121-qsivb
5938,8996-ZROXE,Male,1,No,No,57,Yes,No,DSL,No,No,Yes,Yes,No,No,One year,Yes,Electronic check,53.50,3035.80,No,,PbaRg@outlook.com,94456,30/01/2021,57.0,27.0,22.0,129.0,16.0,16544.0,928.0,625.0,6.0,295.0,5.255614,18097.0,8996-zroxe
5939,5879-SESNB,Female,0,No,No,11,Yes,Yes,Fiber optic,No,No,No,No,No,No,Month-to-month,No,Electronic check,75.25,888.65,No,,jAFin@face_book.com,98947,09/01/2021,27.0,70.0,63.0,135.0,20.0,28256.0,9455.0,7036.0,719.0,295.0,5.265765,44747.0,5879-sesnb
5947,9572-WUKSB,Male,0,Yes,No,3,No,No phone service,DSL,No,No,No,Yes,No,No,Month-to-month,Yes,Electronic check,29.90,92.25,No,,oUHZQ@face_book.com,84670,21/01/2021,32.0,87.0,81.0,69.0,23.0,23709.0,5596.0,8366.0,1003.0,296.0,5.268852,37671.0,9572-wuksb
5948,2408-PSJVE,,0,Yes,Yes,44,Yes,No,DSL,No,Yes,Yes,Yes,No,No,Month-to-month,Yes,Mailed check,61.90,2924.05,No,,DTmGL@outlook.com,84797,06/01/2021,37.0,65.0,42.0,116.0,29.0,7573.0,6977.0,9997.0,991.0,296.0,5.273158,24547.0,2408-psjve
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
579,2522-AHJXR,Male,0,Yes,No,60,Yes,Yes,Fiber optic,Yes,No,Yes,Yes,Yes,Yes,Month-to-month,Yes,Bank transfer (automatic),109.45,6572.85,No,,ucUFh@outlook.com,33964,01/01/2021,17.0,171.0,178.0,35.0,55.0,49226.0,7246.0,7603.0,4620.0,27.0,24.729088,64075.0,2522-ahjxr
580,8792-AOROI,Female,0,Yes,No,8,Yes,No,DSL,Yes,No,No,Yes,No,Yes,Two year,No,Mailed check,65.50,564.35,No,,jGThz@salesforce.com,16957,19/01/2021,33.0,71.0,73.0,123.0,21.0,22337.0,6938.0,9534.0,875.0,27.0,24.729408,38809.0,8792-aoroi
581,8760-ZRHKE,Female,1,Yes,Yes,71,Yes,Yes,DSL,Yes,Yes,No,No,Yes,No,One year,No,Electronic check,69.20,4982.50,No,,pLBeo@yahoo.com,43404,05/01/2021,66.0,23.0,36.0,261.0,12.0,1661.0,882.0,1009.0,2.0,27.0,24.730276,3552.0,8760-zrhke
582,8016-NCFVO,Male,1,No,No,55,Yes,Yes,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,116.50,6382.55,No,Yes,QapeG@outlook.com,50322,24/01/2021,46.0,23.0,27.0,130.0,14.0,18328.0,793.0,777.0,8.0,27.0,24.735049,19898.0,8016-ncfvo


### Grouping

In [81]:
help(d_data.groupby('gender'))

Help on DataFrameGroupBy in module pandas.core.groupby.generic object:

class DataFrameGroupBy(pandas.core.groupby.groupby.GroupBy)
 |  DataFrameGroupBy(*args, **kwds)
 |  
 |  Class for grouping and aggregating relational data.
 |  
 |  See aggregate, transform, and apply functions on this object.
 |  
 |  It's easiest to use obj.groupby(...) to use GroupBy, but you can also do:
 |  
 |  ::
 |  
 |      grouped = groupby(obj, ...)
 |  
 |  Parameters
 |  ----------
 |  obj : pandas object
 |  axis : int, default 0
 |  level : int, default None
 |      Level of MultiIndex
 |  groupings : list of Grouping objects
 |      Most users should ignore this
 |  exclusions : array-like, optional
 |      List of columns to exclude
 |  name : str
 |      Most users should ignore this
 |  
 |  Returns
 |  -------
 |  **Attributes**
 |  groups : dict
 |      {group name -> group labels}
 |  len(grouped) : int
 |      Number of groups
 |  
 |  Notes
 |  -----
 |  After grouping, see aggregate, apply

In [79]:
d_data.groupby('gender').mean()

Unnamed: 0_level_0,tenure,MonthlyCharges,TotalCharges,kodepos,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes,total_usage_internet
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Female,32.397645,65.110218,2303.662695,55709.343609,36.332525,71.240388,72.160374,115.612747,26.595774,16249.278143,5746.297541,5754.763422,1875.272601,105.206789,19.519505,27750.339106
Male,32.593198,64.483928,2297.634912,54330.479827,35.91964,72.05902,73.78126,114.233411,26.926642,16278.953651,5752.731911,5717.365122,1905.321774,102.262421,19.740241,27749.050684


In [77]:
d_data.groupby(by='gender').mean().T

gender,Female,Male
tenure,32.397645,32.593198
MonthlyCharges,65.110218,64.483928
TotalCharges,2303.662695,2297.634912
kodepos,55709.343609,54330.479827
age,36.332525,35.91964
number_of_send_message,71.240388,72.05902
number_of_received_message,72.160374,73.78126
minutes_of_call,115.612747,114.233411
num_of_call,26.595774,26.926642
internet_usage_megabytes,16249.278143,16278.953651


### Apply and Map Function

Common functions `apply()` `applymap()` `map()`

In [87]:
d_data.head()

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes,total_usage_internet,customer_id_lower,ping_class
0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.3,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.42558,4204.0,9381-ndkme,low
1,3831-YCPUO,Female,0,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.5,7854.9,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168,49678.0,3831-ycpuo,low
2,1506-YJTYT,Male,0,Yes,Yes,45,Yes,Yes,DSL,Yes,Yes,No,Yes,Yes,No,Two year,No,Credit card (automatic),73.85,3371.0,No,,KLfPl@apple.com,59928,01/01/2021,37.0,78.0,79.0,168.0,22.0,19331.0,7069.0,6596.0,761.0,15.0,22.123215,32996.0,1506-yjtyt,low
3,2272-UOINI,Female,0,No,No,7,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,78.5,571.05,No,,RmCEn@fall.com.sg,55765,07/01/2021,39.0,66.0,89.0,166.0,28.0,16221.0,8202.0,6167.0,608.0,15.0,22.169331,30590.0,2272-uoini,low
4,1641-BYBTK,Male,0,No,Yes,6,Yes,No,Wireless,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,Yes,Mailed check,20.2,98.35,No,,dLZoI@apple.com,35433,26/01/2021,37.0,66.0,70.0,132.0,22.0,17526.0,9309.0,8484.0,613.0,15.0,22.263776,35319.0,1641-bybtk,low


`apply()`

In [95]:
d_data['total_messages'] = d_data.apply(lambda row: int(row.number_of_send_message + row.number_of_send_message), axis = 1)

In [96]:
d_data

Unnamed: 0,customer_id,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,InstallApp,email,kodepos,RecordedDate,age,number_of_send_message,number_of_received_message,minutes_of_call,num_of_call,internet_usage_megabytes,netflix_usage_megabytes,youtube_usage_megabytes,game_usage_megabytes,average_internet_ping,average_internet_speed_in_megabytes,total_usage_internet,customer_id_lower,ping_class,total_messages
0,9381-NDKME,Female,1,Yes,No,24,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Credit card (automatic),40.65,933.30,Yes,,CjnfS@rocketmail.com,51365,30/01/2021,67.0,10.0,34.0,126.0,14.0,2629.0,976.0,599.0,8.0,15.0,20.425580,4204.0,9381-ndkme,low,20
1,3831-YCPUO,Female,0,Yes,Yes,72,Yes,No,Fiber optic,Yes,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Credit card (automatic),109.50,7854.90,No,,zzKVD@fall.com.sg,41104,03/01/2021,24.0,135.0,95.0,70.0,42.0,37258.0,7090.0,5330.0,6664.0,15.0,21.759168,49678.0,3831-ycpuo,low,270
2,1506-YJTYT,Male,0,Yes,Yes,45,Yes,Yes,DSL,Yes,Yes,No,Yes,Yes,No,Two year,No,Credit card (automatic),73.85,3371.00,No,,KLfPl@apple.com,59928,01/01/2021,37.0,78.0,79.0,168.0,22.0,19331.0,7069.0,6596.0,761.0,15.0,22.123215,32996.0,1506-yjtyt,low,156
3,2272-UOINI,Female,0,No,No,7,Yes,No,DSL,Yes,Yes,No,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,78.50,571.05,No,,RmCEn@fall.com.sg,55765,07/01/2021,39.0,66.0,89.0,166.0,28.0,16221.0,8202.0,6167.0,608.0,15.0,22.169331,30590.0,2272-uoini,low,132
4,1641-BYBTK,Male,0,No,Yes,6,Yes,No,Wireless,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,Yes,Mailed check,20.20,98.35,No,,dLZoI@apple.com,35433,26/01/2021,37.0,66.0,70.0,132.0,22.0,17526.0,9309.0,8484.0,613.0,15.0,22.263776,35319.0,1641-bybtk,low,132
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5981,8779-QRDMV,Male,1,No,No,1,No,No phone service,DSL,No,No,Yes,No,No,Yes,Month-to-month,Yes,Electronic check,39.65,39.65,Yes,,qEant@rocketmail.com,87516,01/01/2021,63.0,17.0,38.0,244.0,12.0,3046.0,937.0,777.0,8.0,299.0,6.436184,4760.0,8779-qrdmv,high,34
5982,7663-ZTEGJ,Male,0,No,Yes,29,Yes,Yes,Fiber optic,No,Yes,Yes,Yes,No,Yes,One year,No,Credit card (automatic),100.55,2830.45,No,,QIWoD@gmail.com,91526,09/01/2021,26.0,41.0,74.0,130.0,20.0,15494.0,7110.0,9321.0,583.0,299.0,6.527040,31925.0,7663-ztegj,high,82
5983,4238-JSSWH,Female,1,Yes,No,35,Yes,Yes,Fiber optic,No,No,Yes,No,Yes,Yes,Month-to-month,No,Bank transfer (automatic),102.05,3452.55,No,,fPEJU@yahoo.com,96751,12/01/2021,53.0,22.0,25.0,122.0,14.0,9380.0,897.0,969.0,6.0,299.0,6.782600,11246.0,4238-jsswh,high,44
5984,2740-JFBOK,Male,0,No,No,10,Yes,Yes,Wireless,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,Yes,Electronic check,24.00,226.55,No,,WAqqK@amail.comz,83782,20/01/2021,20.0,96.0,177.0,53.0,33.0,45238.0,7397.0,6438.0,2132.0,299.0,7.825139,59073.0,2740-jfbok,high,192


`applymap()`

In [89]:
d_data[['number_of_send_message', 'number_of_received_message']].applymap(lambda x: x*2)

Unnamed: 0,number_of_send_message,number_of_received_message
0,20.0,68.0
1,270.0,190.0
2,156.0,158.0
3,132.0,178.0
4,132.0,140.0
...,...,...
5981,34.0,76.0
5982,82.0,148.0
5983,44.0,50.0
5984,192.0,354.0


In [92]:
d_data.number_of_send_message.map(lambda x: int(x))

0        10
1       135
2        78
3        66
4        66
       ... 
5981     17
5982     41
5983     22
5984     96
5985     56
Name: number_of_send_message, Length: 5986, dtype: int64

### Append, Join, Merge, Concate 2 or More DataFrame

### Pivot and Stack

### Exercise

## Day 4

### Datetime

1. String to datetime format

In [13]:
d_data['RecordedDate'].head()

0    30/01/2021
1    03/01/2021
2    01/01/2021
3    07/01/2021
4    26/01/2021
Name: RecordedDate, dtype: object

In [15]:
## YYYY-MM-DD
d_data['RecordedDate_updated'] = pd.to_datetime(d_data['RecordedDate'])
d_data['RecordedDate_updated'].head()

0   2021-01-30
1   2021-03-01
2   2021-01-01
3   2021-07-01
4   2021-01-26
Name: RecordedDate_updated, dtype: datetime64[ns]

2. Datetime to string format

In [19]:
## MM-DD-YYYY
d_data['RecordedDate_updated_2'] = d_data['RecordedDate_updated'].dt.strftime('%m-%d-%Y')
d_data['RecordedDate_updated_2'].head()

0    01-30-2021
1    03-01-2021
2    01-01-2021
3    07-01-2021
4    01-26-2021
Name: RecordedDate_updated_2, dtype: object

In [20]:
## MM/DD/YYYY
d_data['RecordedDate_updated_3'] = d_data['RecordedDate_updated'].dt.strftime('%m/%d/%Y')
d_data['RecordedDate_updated_3'].head()

0    01/30/2021
1    03/01/2021
2    01/01/2021
3    07/01/2021
4    01/26/2021
Name: RecordedDate_updated_3, dtype: object

3. Timedelta

The date units are years (‘Y’), months (‘M’), weeks (‘W’), and days (‘D’), while the time units are hours (‘h’), minutes (‘m’), seconds (‘s’), milliseconds (‘ms’)

Source: https://numpy.org/doc/stable/reference/arrays.datetime.html

In [23]:
sample_time = d_data['RecordedDate_updated'][:2].values
sample_time

array(['2021-01-30T00:00:00.000000000', '2021-03-01T00:00:00.000000000'],
      dtype='datetime64[ns]')

In [25]:
# Time delta is in nano seconds (10^9 seconds), but it seems too high, let us change the units...

timedelta = sample_time[1] - sample_time[0]
timedelta

numpy.timedelta64(2592000000000000,'ns')

In [26]:
# Time delta in days

timedelta_days = timedelta.astype('timedelta64[D]')
timedelta_days

numpy.timedelta64(30,'D')

In [27]:
# Time delta in weeks

timedelta_weeks = timedelta.astype('timedelta64[W]')
timedelta_weeks

numpy.timedelta64(4,'W')

4. Timezone

In [36]:
import pytz
import datetime

In [30]:
pytz.all_timezones

['Africa/Abidjan',
 'Africa/Accra',
 'Africa/Addis_Ababa',
 'Africa/Algiers',
 'Africa/Asmara',
 'Africa/Asmera',
 'Africa/Bamako',
 'Africa/Bangui',
 'Africa/Banjul',
 'Africa/Bissau',
 'Africa/Blantyre',
 'Africa/Brazzaville',
 'Africa/Bujumbura',
 'Africa/Cairo',
 'Africa/Casablanca',
 'Africa/Ceuta',
 'Africa/Conakry',
 'Africa/Dakar',
 'Africa/Dar_es_Salaam',
 'Africa/Djibouti',
 'Africa/Douala',
 'Africa/El_Aaiun',
 'Africa/Freetown',
 'Africa/Gaborone',
 'Africa/Harare',
 'Africa/Johannesburg',
 'Africa/Juba',
 'Africa/Kampala',
 'Africa/Khartoum',
 'Africa/Kigali',
 'Africa/Kinshasa',
 'Africa/Lagos',
 'Africa/Libreville',
 'Africa/Lome',
 'Africa/Luanda',
 'Africa/Lubumbashi',
 'Africa/Lusaka',
 'Africa/Malabo',
 'Africa/Maputo',
 'Africa/Maseru',
 'Africa/Mbabane',
 'Africa/Mogadishu',
 'Africa/Monrovia',
 'Africa/Nairobi',
 'Africa/Ndjamena',
 'Africa/Niamey',
 'Africa/Nouakchott',
 'Africa/Ouagadougou',
 'Africa/Porto-Novo',
 'Africa/Sao_Tome',
 'Africa/Timbuktu',
 'Africa/

In [40]:
# You can now see the +07 timezone set to Jakarta time
d_data['RecordedDate_updated'].dt.tz_localize('Asia/Jakarta')

0      2021-01-30 00:00:00+07:00
1      2021-03-01 00:00:00+07:00
2      2021-01-01 00:00:00+07:00
3      2021-07-01 00:00:00+07:00
4      2021-01-26 00:00:00+07:00
                  ...           
7709   2021-01-17 00:00:00+07:00
7710   2021-10-01 00:00:00+07:00
7711   2021-01-30 00:00:00+07:00
7712   2021-01-17 00:00:00+07:00
7713   2021-01-25 00:00:00+07:00
Name: RecordedDate_updated, Length: 7714, dtype: datetime64[ns, Asia/Jakarta]

In [41]:
# Lets change to Kuala Lumpur +08 time
d_data['RecordedDate_updated'].dt.tz_localize('Asia/Jakarta').dt.tz_convert('Asia/Kuala_Lumpur')

0      2021-01-30 01:00:00+08:00
1      2021-03-01 01:00:00+08:00
2      2021-01-01 01:00:00+08:00
3      2021-07-01 01:00:00+08:00
4      2021-01-26 01:00:00+08:00
                  ...           
7709   2021-01-17 01:00:00+08:00
7710   2021-10-01 01:00:00+08:00
7711   2021-01-30 01:00:00+08:00
7712   2021-01-17 01:00:00+08:00
7713   2021-01-25 01:00:00+08:00
Name: RecordedDate_updated, Length: 7714, dtype: datetime64[ns, Asia/Kuala_Lumpur]

### Brief Timeseries

- pandas string to datetime, timedelta, timezone
- pandas resampling & aggregate

### Window Functions

- Basic Rolling window

### Basic Plotting

- Line
- Title
- XY axis label
- Styling (color, shape)

### Exercise

1. 
2.
3.

# End

#### Book Recommendation