## EDA (Exploratory Data Analysis)
In this section our goal is to derive some key insights from the data that we will be continuing to clean, transform and manipulate for our data analysis. We would like to understand what causes churn and what factors are contributing to higher churn rates if any. Some questions we would like to answer are:
1. Do customers who pay more lead to higher churn rates? (Numeric Features vs Churn)
2. Do customers who have a lower tenure tend to have higher churn rates? (Numeric Features vs Churn)
3. What categories are linked to higer churn rates? (Catergorical Features vs Churn)
4. Are certain combanations especially risky? (Cross Feature Insights)

In [60]:
import pandas as pd
import matplotlib

data = pd.read_csv("WA_Fn-UseC_-Telco-Customer-Churn.csv")

We want to begin with locating the relevant data that we will be working with. 

In [61]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


When we run the code "data.info()" we recieve an output of some basic information from our data set. Taking a look at this we can say that some important insights are Tenure, Monthly Charges, Total Charges, Churn, Contract, Payment Method. Lets go ahead and create a data frame with this information. 

In [62]:
edaMetrics = data[["tenure","MonthlyCharges","TotalCharges","Churn","Contract","PaymentMethod"]]
edaMetrics.head()

Unnamed: 0,tenure,MonthlyCharges,TotalCharges,Churn,Contract,PaymentMethod
0,1,29.85,29.85,No,Month-to-month,Electronic check
1,34,56.95,1889.5,No,One year,Mailed check
2,2,53.85,108.15,Yes,Month-to-month,Mailed check
3,45,42.3,1840.75,No,One year,Bank transfer (automatic)
4,2,70.7,151.65,Yes,Month-to-month,Electronic check


We are able to see that this information has now been organized into a data frame where we can visually see what we are doing. Lets begin by answering our first question. 
1. Do Customers who pay more tend to churn at a higher rate?

We can accomplish this by using a groupby statement and using our churn column as our index and "MonthlyCharges" column as our values to calculate churn by amount charged per month. We can then see the average amount paid based off whether a customer churned or stayed. 

In [63]:
"$" + edaMetrics.groupby("Churn")["MonthlyCharges"].mean().round(1).astype(str)

Churn
No     $61.3
Yes    $74.4
Name: MonthlyCharges, dtype: object

We are able to see that the average amount paid by customers who stayed was $64 and those who churned was $74 a month showing that these customers paid more on average. It is too risky to say that this was the sole cause of customers churning however it can be a cause so we should take this into account when conducting further churn analysis. 

We need to now examine our next question and revisit this calculation in further plotting and data analysis for churn. It is important to look at multiple angles to interpret the entire data set and give accurate reports. 