# Churn Case Study

## Context
"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]


<img src="https://images.pexels.com/photos/3078/home-dialer-siemens-telephone.jpg?auto=compress&cs=tinysrgb&dpr=2&h=750&w=1260" style="width:400px">

**Client**: Telco Company in the USA offering triple play (phone, internet and TV).

New competitor entered offering triple play, resulting in increased churn.

Want better way to spot potential churning customers and suggested actions what to do.

## **Assignment**

- Define the business problem
- Determine which evaluation metric you find appropriate:
   - accuracy
   - precision
   - recall
   - f1 score
- Determine which type of slice/segment/type of churn you are interested
- Run "data prep code"
- Use logistic regression to create 2-3 model specifications
  - model 1 (vanilla model): uses cleaned data as is, find best cutoff using chosen metric
  - model 2: create at least **2 new features** and add them to the model
  - model 3 (if time, a 'reach' model): increase the LASSO penalty to decrease the feature set
- Pick the "best" model and find the "best" threshold
- Use "best" model to identify the drivers of churn in your segment analysis and make recommendations for the company
- Each group will have 5 minutes to present their recommendations to the rest of the class. Make sure to share:
   - segment you chose
   - evaluation metric you chose based on the business problem
   - evaluation metric of "best" model's threshold & threshold
   - what drives churn and what are your recommendations
   - **if you had more time** what would you work on?

## Data

<img src="https://images.pexels.com/photos/53621/calculator-calculation-insurance-finance-53621.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=750&w=1260" style = "width:400px" >
Each row represents a customer, each column contains customer’s attributes described on the column Metadata.

The data set includes information about:

- Customers who left within the last month – the column is called Churn
- Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
- Customer account information 
     - how long they’ve been a customer (tenure is in months)
     - contract, payment method, paperless billing, monthly charges, and total charges
     - all "totals" are over the length of the contract
- Demographic info about customers – gender, age range, and if they have partners and dependents
- Usage
    - information about their usage patterns
    - again, usage totals are over length of contract

## Concept: Churn

#### Type of churn:

**Voluntary** – they left after contract was up

**Involuntary** – we fired them

**Early churn** – left early, broke contract

### Churn is a survival problem:
- Predicting who will churn next month is really hard
- Predicting who may churn over next 3 months is easier

<img src = "./img/funnel.png" style="width:800px">

There are many reasons to churn &#8594; **feature engineering is king**

### Solutions need to be tied to root problems

<img src = "./img/solution.png" style="width:800px">

### Different solutions have different time frames

<img src = "./img/time.png" style="width:800px">

## Remember:

#### You will not be paid to create intricate models
### You will be paid to **Solve Problems**

# Get Started!

## Part 1: Business problem

#### End Users:



#### True business problem:



#### Context:

- **False negative** 
    - **Outcome**:
- **False positive**
    - **Outcome**: 

## Part 2: Evaluation Metric
Which metric (of the ones we've explore so far) would make sense to primarily use as we evaluate our models?

- Accuracy
- Precision 
- Recall - FOCUS given you want to catch them all
- F1-Score

## Part 3: Segment choice

What type slice/segment/type of churn you are interested in?

## Part 4: Data Prep Code

In [185]:
# Import pacakges
import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split

# Load dataset
url_link = 'https://docs.google.com/spreadsheets/d/1TAWfdKnWYiCzKUeDyGL6NzIOv7AxFt_Sfzzax464_FQ/export?format=csv&gid=882919979'
telco = pd.read_csv(url_link)

# Drop nas
telco.dropna(inplace=True)

# Train-test-split
X_train, X_test, y_train, y_test = train_test_split(telco.drop(columns=['customerID','Churn']), np.where(telco.Churn =="Yes", 1, 0), test_size=0.33, random_state=42)

# Separate out numeric from categorical variables
cat_var = telco.select_dtypes(include='object')
cat_var.drop(columns=['customerID','Churn'], inplace = True)

num_var = telco.select_dtypes(exclude = 'object') 

# Encode categorical variables
ohc = OneHotEncoder(drop='first')
encoded_cat = ohc.fit_transform(X_train[cat_var.columns.tolist()]).toarray()

# Add feature names to encoded vars
encoded=pd.DataFrame(encoded_cat, columns=ohc.get_feature_names(cat_var.columns.tolist()))
encoded.reset_index(inplace=True, drop=True)
X_train.reset_index(inplace=True, drop=True)

# Reassemble entire training dataset
clean_X_train = pd.concat([X_train[num_var.columns.tolist()] , encoded], axis=1,  sort=False)
clean_X_train.shape

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


(2229, 42)

In [186]:
encoded_cat = ohc.transform(X_test[cat_var.columns.tolist()]).toarray()
# Add feature names to encoded vars
encoded=pd.DataFrame(encoded_cat, columns=ohc.get_feature_names(cat_var.columns.tolist()))
encoded.reset_index(inplace=True, drop=True)
X_test.reset_index(inplace=True, drop=True)
# Reassemble entire training dataset
clean_X_test = pd.concat([X_test[num_var.columns.tolist()] , encoded], axis=1,  sort=False)

In [187]:
clean_X_train2 = clean_X_train
columns = ['TotalDayCalls', 'TotalEveCalls','TotalNightCalls', 'TotalIntlCalls', 'CustomerServiceCalls', 'TotalCall','TotalRevenue']
new_columns = ['TotalDayCalls_month', 'TotalEveCalls_month','TotalNightCalls_month', 'TotalIntlCalls_month', 'CustomerServiceCalls_month', 'TotalCall_month','TotalRevenue_month']

for v, column in enumerate(new_columns):
    clean_X_train2[column] = (clean_X_train2.iloc[ : , v]) / (clean_X_train2['tenure'])

In [188]:
clean_X_test2 = clean_X_test
columns = ['TotalDayCalls', 'TotalEveCalls','TotalNightCalls', 'TotalIntlCalls', 'CustomerServiceCalls', 'TotalCall','TotalRevenue']
new_columns = ['TotalDayCalls_month', 'TotalEveCalls_month','TotalNightCalls_month', 'TotalIntlCalls_month', 'CustomerServiceCalls_month', 'TotalCall_month','TotalRevenue_month']

for v, column in enumerate(new_columns):
    clean_X_test2[column] = (clean_X_test2.iloc[ : , v]) / (clean_X_test2['tenure'])

In [191]:
clean_X_train2.head()

Unnamed: 0,SeniorCitizen,tenure,NumbervMailMessages,TotalDayMinutes,TotalDayCalls,TotalEveMinutes,TotalEveCalls,TotalNightMinutes,TotalNightCalls,TotalIntlMinutes,...,PaymentMethod_Mailed check,InternationalPlan_Yes,VoiceMailPlan_Yes,TotalDayCalls_month,TotalEveCalls_month,TotalNightCalls_month,TotalIntlCalls_month,CustomerServiceCalls_month,TotalCall_month,TotalRevenue_month
0,0,1,0,97.2,88,155.6,85,261.6,105,12.4,...,1.0,0.0,0.0,0.0,1.0,0.0,97.2,88.0,155.6,85.0
1,0,58,34,138.8,80,142.0,108,183.8,77,11.8,...,0.0,0.0,1.0,0.0,1.0,0.586207,2.393103,1.37931,2.448276,1.862069
2,0,1,0,179.7,128,299.8,92,185.3,120,7.6,...,0.0,0.0,0.0,0.0,1.0,0.0,179.7,128.0,299.8,92.0
3,0,4,0,298.4,78,270.5,142,107.3,84,12.2,...,0.0,1.0,0.0,0.0,1.0,0.0,74.6,19.5,67.625,35.5
4,0,1,0,189.3,77,155.9,128,186.0,83,7.4,...,1.0,0.0,0.0,0.0,1.0,0.0,189.3,77.0,155.9,128.0


In [192]:
clean_X_test2.head()

Unnamed: 0,SeniorCitizen,tenure,NumbervMailMessages,TotalDayMinutes,TotalDayCalls,TotalEveMinutes,TotalEveCalls,TotalNightMinutes,TotalNightCalls,TotalIntlMinutes,...,PaymentMethod_Mailed check,InternationalPlan_Yes,VoiceMailPlan_Yes,TotalDayCalls_month,TotalEveCalls_month,TotalNightCalls_month,TotalIntlCalls_month,CustomerServiceCalls_month,TotalCall_month,TotalRevenue_month
0,0,55,14,143.2,99,169.9,91,221.6,77,11.6,...,0.0,0.0,1.0,0.0,1.0,0.254545,2.603636,1.8,3.089091,1.654545
1,0,37,0,190.3,98,252.7,70,220.6,97,7.2,...,0.0,0.0,0.0,0.0,1.0,0.0,5.143243,2.648649,6.82973,1.891892
2,0,4,0,106.4,71,240.1,83,147.7,114,5.3,...,0.0,0.0,0.0,0.0,1.0,0.0,26.6,17.75,60.025,20.75
3,0,64,33,88.8,104,109.6,94,172.7,107,7.1,...,1.0,0.0,1.0,0.0,1.0,0.515625,1.3875,1.625,1.7125,1.46875
4,0,9,0,170.5,103,254.3,77,197.3,138,10.5,...,0.0,0.0,0.0,0.0,1.0,0.0,18.944444,11.444444,28.255556,8.555556


In [125]:
clean_X_train.head()

Unnamed: 0,SeniorCitizen,tenure,NumbervMailMessages,TotalDayMinutes,TotalDayCalls,TotalEveMinutes,TotalEveCalls,TotalNightMinutes,TotalNightCalls,TotalIntlMinutes,...,StreamingMovies_Yes,Contract_One year,Contract_Two year,PaperlessBilling_Yes,PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,InternationalPlan_Yes,VoiceMailPlan_Yes,churn
0,0,1,0,97.2,88,155.6,85,261.6,105,12.4,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0
1,0,58,34,138.8,80,142.0,108,183.8,77,11.8,...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0
2,0,1,0,179.7,128,299.8,92,185.3,120,7.6,...,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0
3,0,4,0,298.4,78,270.5,142,107.3,84,12.2,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1
4,0,1,0,189.3,77,155.9,128,186.0,83,7.4,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0


In [None]:
# Total intl Calls; Total Night Calls' Customer SErvice Calls; Total Call; Total Revenue; 

In [126]:
clean_X_train2 = clean_X_train

In [124]:
clean_X_train.columns

Index(['SeniorCitizen', 'tenure', 'NumbervMailMessages', 'TotalDayMinutes',
       'TotalDayCalls', 'TotalEveMinutes', 'TotalEveCalls',
       'TotalNightMinutes', 'TotalNightCalls', 'TotalIntlMinutes',
       'TotalIntlCalls', 'CustomerServiceCalls', 'TotalCall',
       'TotalHighBandwidthMinutes', 'TotalHighLatencyMinutes', 'TotalRevenue',
       'gender_Male', 'MaritalStatus_Yes', 'Dependents_Yes',
       'MultipleLines_Yes', 'InternetService_Fiber optic',
       'InternetService_No', 'OnlineSecurity_No internet service',
       'OnlineSecurity_Yes', 'OnlineBackup_No internet service',
       'OnlineBackup_Yes', 'DeviceProtection_No internet service',
       'DeviceProtection_Yes', 'TechSupport_No internet service',
       'TechSupport_Yes', 'StreamingTV_No internet service', 'StreamingTV_Yes',
       'StreamingMovies_No internet service', 'StreamingMovies_Yes',
       'Contract_One year', 'Contract_Two year', 'PaperlessBilling_Yes',
       'PaymentMethod_Credit card (automatic)',
 

In [77]:
df_train_cleaned = clean_X_train
df_train_cleaned['churn'] = y_train 
df_train_cleaned

Unnamed: 0,SeniorCitizen,tenure,NumbervMailMessages,TotalDayMinutes,TotalDayCalls,TotalEveMinutes,TotalEveCalls,TotalNightMinutes,TotalNightCalls,TotalIntlMinutes,...,StreamingMovies_Yes,Contract_One year,Contract_Two year,PaperlessBilling_Yes,PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,InternationalPlan_Yes,VoiceMailPlan_Yes,churn
0,0,1,0,97.2,88,155.6,85,261.6,105,12.4,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0
1,0,58,34,138.8,80,142.0,108,183.8,77,11.8,...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0
2,0,1,0,179.7,128,299.8,92,185.3,120,7.6,...,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0
3,0,4,0,298.4,78,270.5,142,107.3,84,12.2,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1
4,0,1,0,189.3,77,155.9,128,186.0,83,7.4,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2224,0,3,26,170.5,107,217.2,77,225.7,71,13.6,...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0
2225,0,35,0,129.4,97,185.4,101,204.7,106,1.1,...,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0
2226,0,24,0,81.9,75,253.8,114,213.1,125,8.9,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1
2227,0,64,37,163.5,77,203.1,102,232.0,87,7.8,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0


In [None]:
df_train_cleaned.loc['']

In [93]:
df_train_cleaned.loc[df_train_cleaned['churn'] == 1]

Unnamed: 0,SeniorCitizen,tenure,NumbervMailMessages,TotalDayMinutes,TotalDayCalls,TotalEveMinutes,TotalEveCalls,TotalNightMinutes,TotalNightCalls,TotalIntlMinutes,...,StreamingMovies_Yes,Contract_One year,Contract_Two year,PaperlessBilling_Yes,PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,InternationalPlan_Yes,VoiceMailPlan_Yes,churn
3,0,4,0,298.4,78,270.5,142,107.3,84,12.2,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1
13,0,8,0,107.8,113,216.6,125,217.5,92,9.9,...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1
14,1,15,0,312.0,109,129.4,100,217.6,74,10.5,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1
17,0,7,0,269.8,106,228.8,101,257.5,106,10.1,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1
18,0,2,0,169.2,124,173.3,108,216.5,64,12.4,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2209,1,20,0,140.6,109,178.6,51,217.0,83,6.8,...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1
2210,0,25,0,159.7,86,197.5,76,121.6,105,13.9,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1
2215,0,8,0,227.9,130,302.6,71,191.5,82,5.5,...,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1
2217,1,38,0,242.5,83,245.4,97,219.6,80,10.0,...,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1


In [100]:
df_mbm = df_train_cleaned.loc[(df_train_cleaned['Contract_One year'] == 0) & (df_train_cleaned['Contract_Two year'] == 0)]

In [101]:
df_oneyear = df_train_cleaned.loc[df_train_cleaned['Contract_One year'] == 1]

In [102]:
df_twoyear = df_train_cleaned.loc[df_train_cleaned['Contract_Two year'] == 1]

In [103]:
print(df_mbm.shape)
print(df_oneyear.shape)
print(df_twoyear.shape)

(1185, 43)
(515, 43)
(529, 43)


In [108]:
df_mbm['tenure_years'] = df_mbm['tenure'] / 12
df_oneyear['tenure_years'] = df_oneyearchurn['tenure'] / 12 
df_twoyear['tenure_years'] = df_twoyearchurn['tenure'] / 12 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [107]:
print(df_mbm.shape)
print(df_oneyear.shape)
print(df_twoyear.shape)

(1185, 44)
(515, 44)
(529, 44)


In [196]:
df_mbm.shape

(1185, 44)

In [113]:
len(df_mbm.loc[df_mbm['churn'] == 1])

400

In [118]:
len(df_oneyear.loc[df_oneyear['churn'] == 1])

23

In [115]:
df_oneyear.loc[(df_oneyear['churn'] == 1) & (df_oneyear['tenure_years'] < 1)]       

Unnamed: 0,SeniorCitizen,tenure,NumbervMailMessages,TotalDayMinutes,TotalDayCalls,TotalEveMinutes,TotalEveCalls,TotalNightMinutes,TotalNightCalls,TotalIntlMinutes,...,Contract_One year,Contract_Two year,PaperlessBilling_Yes,PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,InternationalPlan_Yes,VoiceMailPlan_Yes,churn,tenure_years
1134,0,8,0,195.7,116,209.1,87,201.1,73,8.3,...,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1,0.666667
1223,0,5,0,245.0,97,250.7,75,270.2,124,13.7,...,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1,0.416667
1466,0,2,0,294.7,90,294.6,72,260.1,121,10.8,...,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1,0.166667


In [120]:
len(df_twoyear.loc[df_twoyear['churn'] == 1])

3

In [119]:
df_twoyear.loc[(df_twoyear['churn'] == 1) & (df_twoyear['tenure_years'] < 2)]       

Unnamed: 0,SeniorCitizen,tenure,NumbervMailMessages,TotalDayMinutes,TotalDayCalls,TotalEveMinutes,TotalEveCalls,TotalNightMinutes,TotalNightCalls,TotalIntlMinutes,...,Contract_One year,Contract_Two year,PaperlessBilling_Yes,PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,InternationalPlan_Yes,VoiceMailPlan_Yes,churn,tenure_years


In [85]:
df_oneyearchurn.head()

Unnamed: 0,SeniorCitizen,tenure,NumbervMailMessages,TotalDayMinutes,TotalDayCalls,TotalEveMinutes,TotalEveCalls,TotalNightMinutes,TotalNightCalls,TotalIntlMinutes,...,Contract_One year,Contract_Two year,PaperlessBilling_Yes,PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,InternationalPlan_Yes,VoiceMailPlan_Yes,churn,tenure_years
2,0,1,0,179.7,128,299.8,92,185.3,120,7.6,...,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0,0.083333
5,0,50,25,134.0,112,206.0,111,180.6,118,9.7,...,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0,4.166667
9,0,30,35,205.5,86,298.5,119,214.2,104,6.9,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0,2.5
25,0,21,0,139.2,140,191.4,113,286.5,125,11.8,...,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0,1.75
26,0,10,0,155.3,75,169.9,87,207.0,133,12.6,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0,0.833333


In [94]:
len(df_oneyearchurn.loc[df_oneyearchurn])

515

In [86]:
df_oneyearchurn.loc[df_oneyearchurn['tenure_years'] < 1]

Unnamed: 0,SeniorCitizen,tenure,NumbervMailMessages,TotalDayMinutes,TotalDayCalls,TotalEveMinutes,TotalEveCalls,TotalNightMinutes,TotalNightCalls,TotalIntlMinutes,...,Contract_One year,Contract_Two year,PaperlessBilling_Yes,PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,InternationalPlan_Yes,VoiceMailPlan_Yes,churn,tenure_years
2,0,1,0,179.7,128,299.8,92,185.3,120,7.6,...,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0,0.083333
26,0,10,0,155.3,75,169.9,87,207.0,133,12.6,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0,0.833333
119,0,2,0,189.3,100,239.3,107,89.7,89,9.9,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.166667
188,0,4,0,185.0,88,224.9,98,212.4,105,11.4,...,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0,0.333333
211,0,8,0,119.3,93,223.9,103,211.9,122,8.7,...,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0,0.666667
229,0,2,0,137.6,108,162.0,80,187.7,126,5.8,...,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0,0.166667
253,0,7,0,161.9,138,200.9,114,134.0,134,10.7,...,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0,0.583333
303,0,6,0,133.9,87,166.4,110,193.5,139,15.4,...,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0,0.5
322,0,4,30,122.9,93,233.5,91,199.5,144,9.6,...,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0,0.333333
328,0,2,0,99.5,110,129.1,80,125.1,124,9.7,...,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0,0.166667


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [92]:
df_twoyearchurn.loc[df_twoyearchurn['tenure_years'] < 2]

Unnamed: 0,SeniorCitizen,tenure,NumbervMailMessages,TotalDayMinutes,TotalDayCalls,TotalEveMinutes,TotalEveCalls,TotalNightMinutes,TotalNightCalls,TotalIntlMinutes,...,Contract_One year,Contract_Two year,PaperlessBilling_Yes,PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,InternationalPlan_Yes,VoiceMailPlan_Yes,churn,tenure_years
11,0,5,31,302.7,93,240.5,119,193.9,103,13.6,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0,0.416667
68,0,15,0,119.1,117,287.7,136,223.0,100,12.2,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0,1.25
145,0,20,0,160.5,114,240.5,103,233.5,121,11.3,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0,1.666667
218,0,15,0,162.3,116,192.4,86,240.6,100,10.1,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0,1.25
276,0,16,0,299.4,71,61.9,88,196.9,89,6.6,...,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0,1.333333
335,0,3,26,137.1,88,155.7,125,247.6,94,11.5,...,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0,0.25
506,0,17,0,243.1,105,231.4,108,180.9,120,7.8,...,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0,1.416667
525,0,17,0,191.3,134,261.5,113,182.3,111,10.0,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0,1.416667
562,0,11,0,218.0,86,184.0,94,240.5,110,6.4,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0,0.916667
563,0,10,28,220.3,96,285.8,72,203.0,111,9.4,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0,0.833333


## Part 5: Create models

## Part 6: Pick model & find best threshold

## Part 7: What drives churn?

## Part 8: What are your recommendations?