# Lecture 08: Data Analysis in Python: Pandas

Instructor:
<br>**Md Shahidullah Kawsar**
<br>Data Scientist
<br>IDARE, Houston, TX, USA

#### Objectives:
1. How to look at the data?
2. Good data or bad data?
3. Data Statistics

### What is customer churn?
Source: DataCamp and Mark Peterson

When an existing customer, user, player, subscriber or any kind of return client stops doing business or ends the relationship with a company is called customer churn.

**Contractual churn:** When a customer is under contract for a service and decides to cancel their service. Example: Cable TV, SaaS products (Software as a Service e.g. Dropbox).

**Voluntary churn:** When a user voluntarily cancels a service and includes prepaid cell phones, streaming subscriptions.

**Non-contractual churn:** When a customer is not under contract for a service and includes customer loyality at a retail location or online browsing. 

**Involuntary churn:** When a churn occurs not at the request of the customer. For example: credit card expiration, utilities being shut off by the provider.

Most likely, you as a customer have cancelled a service for a variety of reasons including lack of usage, poor service or better price. 

Dataset: cellular usage dataset that consists of records of actual cell phone that include specific features such as

1. **Account_Length**: the number of days the customer has the subscription with the telecom company

2. **Vmail_Message**: the total number of voicemails the customer has sent

3. **Total_mins**: the total number of minutes the customer has talked over the phone

4. **CustServ_Calls**: the number of customer service calls the customer made

5. **Churn**: yes and no - indicating whether or not the customer has churned

6. **Intl_Plan**: yes and no - indicating whether or not the customer has international plan or not

7. **Vmail_Plan**: yes and no - indicating whether or not the customer has voicemail plan or not

8. **Total_calls**: the total number of calls the customer has made

9. **Total_charges**: the total amount of bill in $ the customer has paid

In [1]:
import pandas as pd

In [10]:
# reading a csv file as dataframe
df = pd.read_csv("telecom_data.csv")

# print(df)
display(df.head())

# display(df.tail(15))

Unnamed: 0,Account_Length,Vmail_Message,CustServ_Calls,Churn,Intl_Plan,Vmail_Plan,Total_mins,Total_calls,Total_charges
0,128,25,1,no,no,yes,717.2,303,320.26
1,107,26,1,no,no,yes,625.2,332,313.64
2,137,0,0,no,no,no,539.4,333,224.89
3,84,0,2,no,yes,no,564.8,255,263.7
4,75,0,3,no,yes,no,512.0,359,238.99


In [11]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3333 entries, 0 to 3332
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Account_Length  3333 non-null   int64  
 1   Vmail_Message   3333 non-null   int64  
 2   CustServ_Calls  3333 non-null   int64  
 3   Churn           3333 non-null   object 
 4   Intl_Plan       3333 non-null   object 
 5   Vmail_Plan      3333 non-null   object 
 6   Total_mins      3333 non-null   float64
 7   Total_calls     3333 non-null   int64  
 8   Total_charges   3333 non-null   float64
dtypes: float64(2), int64(4), object(3)
memory usage: 234.5+ KB
None


In [12]:
display(df.describe())

Unnamed: 0,Account_Length,Vmail_Message,CustServ_Calls,Total_mins,Total_calls,Total_charges
count,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0
mean,101.064806,8.09901,1.562856,591.864776,305.137114,260.321791
std,39.822106,13.688365,1.315491,89.954251,34.448164,53.810896
min,1.0,0.0,0.0,284.3,191.0,68.37
25%,74.0,0.0,1.0,531.5,282.0,224.22
50%,101.0,0.0,1.0,593.6,305.0,260.56
75%,127.0,20.0,2.0,652.4,328.0,295.41
max,243.0,51.0,9.0,885.0,416.0,460.63


In [13]:
import numpy as np

In [25]:
arr = np.array([1,-1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,3,2,4,5,7,6,8,10,9])

avg = np.mean(arr)
print(avg)

median = np.median(arr)
print(median)

2.68
1.0


In [26]:
sorted(arr)

[-1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [28]:
df['Churn'].value_counts()

no     2850
yes     483
Name: Churn, dtype: int64

In [29]:
display(df['Intl_Plan'].value_counts())

no     3010
yes     323
Name: Intl_Plan, dtype: int64

In [30]:
df['Vmail_Plan'].value_counts()

no     2411
yes     922
Name: Vmail_Plan, dtype: int64

In [31]:
df['CustServ_Calls'].value_counts()

1    1181
2     759
0     697
3     429
4     166
5      66
6      22
7       9
9       2
8       2
Name: CustServ_Calls, dtype: int64

#### Course outcome:

1. How to use with Anaconda, jupyter notebook, and Github
2. imporved our logical thinking by playing blockly games: Maze, Bird, Turtle
3. Learned about Python data types, Mathematical Operations, Comparison Operators, Logical Operators, and Membership Operators
4. Learned if-elseif-else condition, while, for loop
5. Learned strig manipulation
6. Python List, list slicing, changing, adding and removing list elements
7. Python Data Structure: List, tuple, dictionary, set
8. How to write functions in Python 
9. Solved 10 LeetCode problems and learned how to solve same problem from different perspectives
10. Learned how to use math functions and NumPy operations
11. Comparison between List and NumPy array
12. Mathematical and Matrix operation in NumPy array
13. NumPy array slicing and filtering
14. Reading a csv file using Pandas, finding missing values, data statistics and detail information

#python #leetcode #datascience #dataanalysis #numpy #pandas