# Data Preperation

### Encoding binary features
Recasting data types is an important part of data preprocessing. In this exercise you will assign the values 1 to 'yes' and 0 to 'no' to the 'Vmail_Plan' and 'Churn' features, respectively.

You saw two approaches to doing this in the video - one using pandas, and the other using scikit-learn. For straightforward tasks like this, sticking with pandas is recommended, so that's what we'll do in this exercise. If you're trying to build machine learning pipelines, on the other hand - which is beyond the scope of this course - you can explore using LabelEncoder(). When doing data science, it's important to be aware that there is always more than one way to accomplish a task, and you need to pick the one that is most effective for your application.

In [1]:
# Replace 'no' with 0 and 'yes' with 1 in 'Vmail_Plan'
telco['Vmail_Plan'] = telco['Vmail_Plan'].replace('no',0)
telco['Vmail_Plan'] = telco['Vmail_Plan'].replace('yes',1)
# Replace 'no' with 0 and 'yes' with 1 in 'Churn'
telco['Churn'] = telco['Churn'].replace('yes',1)
telco['Churn'] = telco['Churn'].replace('no',0)
# Print the results to verify
print(telco['Vmail_Plan'].head())
print(telco['Churn'].head())

NameError: name 'telco' is not defined

### One hot encoding
In the video, you saw how the 'State' feature can be encoded numerically using the technique of one hot encoding:

ohe_part3.png

Doing this manually would be quite tedious, especially when you have 50 states and over 3000 customers! Fortunately, pandas has a get_dummies() function which automatically applies one hot encoding over the selected feature.

In [2]:
# Import pandas
import pandas as pd

# Perform one hot encoding on 'State'
telco_state = pd.get_dummies(telco['State'])

NameError: name 'telco' is not defined

### Feature scaling
Recall from the video the different scales of the 'Intl_Calls' and 'Night_Mins' features:

feature scaling

Your job in this exercise is to re-scale them using StandardScaler.

In your workspace, the telco DataFrame has been subset to only include the features you want to rescale: 'Intl_Calls' and 'Night_Mins'. To apply StandardScaler, you need to first instantiate it using StandardScaler(), and then apply the fit_transform() method, passing in the DataFrame you want to rescale. You can do this in one line of code:

StandardScaler().fit_transform(

In [3]:
# Import StandardScaler
from sklearn.preprocessing import StandardScaler

# Scale telco using StandardScaler
telco_scaled = StandardScaler().fit_transform(telco)

# Add column names back for readability
telco_scaled_df = pd.DataFrame(telco_scaled, columns=["Intl_Calls", "Night_Mins"])

# Print summary statistics
print(telco_scaled_df.describe())

NameError: name 'telco' is not defined

# Feature Selection and Engineering

### Dropping unnecessary features
Some features such as 'Area_Code' and 'Phone' are not useful when it comes to predicting customer churn, and they need to be dropped prior to modeling. The easiest way to do so in Python is using the .drop() method of pandas DataFrames, just as you saw in the video, where 'Soc_Sec' and 'Tax_ID' were dropped:

telco.drop(['Soc_Sec', 'Tax_ID'], axis=1)
Here, axis=1 indicates that you want to drop 'Soc_Sec' and 'Tax_ID' from the columns.

In [4]:
# Drop the unnecessary features
telco = telco.drop(['Area_Code','Phone'],axis=1)

NameError: name 'telco' is not defined

### Engineering a new column
Leveraging domain knowledge to engineer new features is an essential part of modeling. This quote from Andrew Ng summarizes the importance of feature engineering:

Coming up with features is difficult, time-consuming, requires expert knowledge. "Applied machine learning" is basically feature engineering.

Your job in this exercise is to create a new feature that contains information about the average length of night calls made by customers.

In [5]:
# Create the new feature
telco['Avg_Night_Calls'] = telco['Night_Mins']/telco['Night_Calls']

# Print the first five rows of 'Avg_Night_Calls'
print(telco['Avg_Night_Calls'].head())

NameError: name 'telco' is not defined