# Gousto Model Feature Importance

Gousto are interested in exploring explainability as a tool which helps them extract as much value as possible from their existing AI. 

They currently have an AI model which predicts customer churn - they are currently using this to help forecast sales. However, they want to start using their AI to extract real business insights which will help them retain more customers and ultimately save them money. 

In [1]:
import shap
import numpy as np
import pandas as pd
from decima2 import model_feature_importance
from decima2 import grouped_feature_importance


from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

from decima2.utils.utils import feature_names
import time

## Developer Role at Gousto - What They've Achieved So Far 

Let's assume A machine learning developer at Gousto has built a model which predicts whether or not a customer will stay a subscriber based based on the features 'Age' 'Number of Sales Calls made to that person' 'Whether or not that person received a discount', 'Gender' and 'How much was spent on adverts which targeted that person'. The model predicts customer churn with a high accuracy of 0.85

In [288]:
def create_custom_dataframe():
    num_rows = 5000

    age = np.random.normal(40,10, size=num_rows)
    age = [int(i) for i in age]
    sales_calls = np.random.normal(20,5, size=num_rows)
    discount = [np.random.binomial(n=1, p=0.5) for i in range(num_rows)]
    gender = [np.random.binomial(n=1, p=0.5) for i in range(num_rows)]
    ad_spend = np.random.normal(50, 10, size=num_rows)

    target_1 = [1 if i > 10 else 0.5 for i in age]
    target = [(ad_spend[i] * target_1[i]) + np.random.normal(5,5) for i in range(num_rows)]
        
    
    data = {
        'Age': age, #independent feature
        'Sales Calls': sales_calls, # dependent feature   
        'Discount': discount, #independent feature
        'Gender': gender,
        'Ad Spend': ad_spend,
    }
    
    X = pd.DataFrame(data)
    y = np.array([1 if i > 50 else 0 for i in target])
    return X, y

In [289]:
X,y = create_custom_dataframe()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
model = RandomForestClassifier(max_depth=10, random_state=42)
model.fit(X_train, y_train)
model.score(X_test,y_test)

0.85

## Extracting More Value From AI With Feature Importance

Ok so now let's assume that our developer at Gousto is reporting their model back at a weekly data meeting. The head of growth appreciates the accuracy of the model but wants to know what the model is using to make decisions. The head of marketing wants to know whether they should be focusing on sales calls or adverts to stop people cancelling their subscriptions. 

Decima2 can help with this!

By looking at model feature importance we can see clearly that the most important feature used by the model is 'ad spend' giving an answer to head of growth. 

Further more if we examine the grouped feature importance grouping by Age, we can report back to the head of product that not only should we be focusing more on adverts than sales calls, but for customers over the age of 50, ad spend appears to be double as indicative of churn than other age categories so we shoudl be tageting our personalised advertising here 

In [291]:
explanation_app = model_feature_importance(X_test,y_test,model)
explanation_app.run()

In [292]:
explanation_app = grouped_feature_importance(X_test,y_test,model,'Age')


In [293]:
explanation_app.run()