## Reading Data From A File
Here we will use the command “pd.read_csv()" to load data into a variable by the name “data”.

In [None]:
import pandas as pd

#load data from a CSV file. 
data = pd.read_csv("WA_Fn-UseC_-Telco-Customer-Churn.csv")


Now that we have this data loaded in from a file we can further manipulate the contents by performing data cleaning practices and more.

## Data Info

Before diving into the data cleaning we should first take a look at some of the metrics of our data this includes taking a look at basic information that might help us further analyze the data set we are going to be working with. Lets start with the following metrics (Column Count, Row Count, Column Names, Type of Data). We use "data.info()" to output the (index, Column, non-null values, data type). We use "data.head()" to take a peak at the first five elements in the data set (it is 5 by default) we can also use the command "data.tail()" to view the last 5 elements in the data set. 

In [None]:
data.info()
data.head()

Now that we have gathered basic information regarding the dataset we should count how many non-missing values are in each column. We can accomplish this by using the "data.count(axis = 0)". We are calling the data set and using the count function on the 0 axis (aka. Columns) section of the data set. We are able to see that all rows have 7043 non-missing values.

In [None]:
data.count(axis=0)

We can also change one small aspect of the previous command data.count(axis=0) and instead use data.count(axis=1). What this accomplishes is slightly different. Now the function counts the number of non-missing values in each row across all columns. The output will show, for each row, how many values are present (non-missing) in that row.

In [None]:
data.count(axis=1)

Now that we have gathered basic information regarding the dataset we should count values are in each column and row. We can accomplish this by using "data.shape". We are calling the data set and using the shape function. We are able to see that there are 7043 rows and all 21 columns. What this does is show the shape of our data set so we can have some insight into what we are working with.

In [None]:
data.shape

Lets begin now by starting to find our **target variable**

We start by grabbing all relevant columns inside the data set that will help us in the future to predict churn. These values are located and outputted by using the double brackets after calling the dataset by name previously declared in the begining of this mini-project. 

**NOTE**

If we use single brackets this will throw an error as we are not able to call multiple columns with single brackets. The compiler will interpret this as a tuple rather then multiple column names. I encountered this issue when attempting to use single quotes and being confused why an error was being returned i researched what could be causing this and came across the explanation that when calling one single column it is permitted to use single brackets but when attempting to call multiple columns one must use double brackets to avoid the confusion of trying to call a tuple. 

In [None]:
churnMetrics = data[["customerID", "MonthlyCharges", "Churn"]]
churnMetrics

Lets now view how many customers churned vs stayed. The insight that this gives us is whether our dataset is skewed or balanced.

1. We can accomplish this by first selecting our value counts with the command "data["Churn"].value_counts()".

2. We then proceed to grab each of those outputted numeric values and dividing the total amount of customers in the data set. We do this by using the command "data["Churn"].count()". 

3. We then multiply the respective result of each churn response total by 100 to receive the percentage of the amount of customers that churned vs stayed. 

4. We can further clean this for presentation by rounding and converting the output into a string and adding the percentage symbol. 

In [None]:
((data["Churn"].value_counts()/data["Churn"].count()) * 100).round(1).astype(str) + "%"

We can see that there is a majority of customers that stayed. Our data set is definitely skewed. What this essentially means is that our program can just guess no for every customer and whether they churned and it would be accurate 73.5% of the time. 

## Conclusion
This mini project serves to help us solve the question of what CHURN is and why it is important. We can see how churn is calculated and which commands can be used to achieve this result. Churn is vital due to its role in revenue calculations as well as knowing the metrics that a company's data is providing. If we had a churn percentage that was low it would raise the question "why are customers not staying". This would lead to more investigation into data as a result. We would want to know what the average monthly charge is and take a dive into the role demographics may be playing.