## EDA (Exploratory Data Analysis)
In this section our goal is to derive some key insights from the data that we will be continuing to clean, transform and manipulate for our data analysis. We would like to understand what causes churn and what factors are contributing to higher churn rates if any. Some questions we would like to answer are:
1. Do customers who pay more lead to higher churn rates? (Numeric Features vs Churn)
2. Do customers who have a lower tenure tend to have higher churn rates? (Numeric Features vs Churn)
3. What categories are linked to higer churn rates? (Catergorical Features vs Churn)
4. Are certain combanations especially risky? (Cross Feature Insights)

In [None]:
import pandas as pd
import matplotlib

data = pd.read_csv("WA_Fn-UseC_-Telco-Customer-Churn.csv")

We want to begin with locating the relevant data that we will be working with. 

In [None]:
data.info()

When we run the code "data.info()" we recieve an output of some basic information from our data set. Taking a look at this we can say that some important insights are Tenure, Monthly Charges, Total Charges, Churn, Contract, Payment Method. Lets go ahead and create a data frame with this information. 

In [None]:
edaMetrics = data[["tenure","MonthlyCharges","TotalCharges","Churn","Contract","PaymentMethod"]]
edaMetrics.head()

We are able to see that this information has now been organized into a data frame where we can visually see what we are doing. Lets begin by answering our first question. 
1. Do Customers who pay more tend to churn at a higher rate?

We can accomplish this by using a groupby statement and using our churn column as our index and "MonthlyCharges" column as our values to calculate churn by amount charged per month. We can then see the average amount paid based off whether a customer churned or stayed. 

In [None]:
"$" + edaMetrics.groupby("Churn")["MonthlyCharges"].mean().round(1).astype(str)

We are able to see that the average amount paid by customers who stayed was $64 and those who churned was $74 a month showing that these customers paid more on average. It is too risky to say that this was the sole cause of customers churning however it can be a cause so we should take this into account when conducting further churn analysis. 

We need to now examine our next question and revisit this calculation in further plotting and data analysis for churn. It is important to look at multiple angles to interpret the entire data set and give accurate reports. 

The next question we would like to answer is: 
1. Whether customers with lower tenure are more likely to churn? 
We can answer this by compairing lower tenure rates to higher tenure rates and seeing if lower tenure led to higher churn rates. 

In [None]:
edaMetrics["tenure"].describe()

A great way to determing what type of data that we are working with is by using the ".describe()" method which gives us valuable insights toward the data we are working with. When we ran this command we are able to see various metrics that are being completed. One metric that immediatley speaks out to me is the max metric which depicts 72. This allows me to make the inference that tenure is most likely tracking months not days. However, it is imporant to note that while it is most likely months we should always attempt to ask the source as to what this data is measuring instead of guessing. The ".describe()" method is used to gather insight, not give you a for sure answer. 

In [None]:
edaMetrics.groupby("Churn")["tenure"].mean().round(0).astype(str) + " Months"

We are able to see that the average tenure between customers that churned was 18 months or the equivalent of 1 year and half. This allows for us to understand that tenure may be playing a role in the churn rate for this customer database. We should continue along with our data analysis however, this is valuable information to consider as this may be a red flag indicator as to why customers are churning. 