## Problem Statement:

A telecom company is experiencing high customer churn rates and is looking for innovative AI solutions to reduce churn. The company wants to leverage machine learning along with other techniques to:
- Monitor customer behaviour and identify early warning signs of churn
- Identify the factors that lead customers to churn
- Create personalized solutions for retaining the customers and run simulations
- Update the solutions based on the simulation output

The company wants to implement a solution that is cost-effective, scalable, and easy to maintain. They reach out to their AI team and discuss the problem at hand. They believe that their AI team can help them reduce customer churn, improve customer satisfaction, and increase revenue through innovate AI solution.

## Solution:

The AI team proposes a hybrid solution to solve the given problem. The plan is to first build a machine learning model that takes historical customer details like account information, demographics and services used, along with churn status. The model output will be used to find potential customers who are likely to churn in the coming time. The team also proposes a counterfactual generation engine which can provide various scenarios wherein the predicted outcomes change. For example, if a customer is likely to churn and his/her churn probability is 90%, the counterfactual engine can generate change in input features that might bring the churn probability for the given customer to maybe 40%. This change in the input parameters can be used as a part of their retention strategy. A third component of a what-if analysis solution can help the decision makers to tweak the counterfactual suggestions based on certain constraints and check whether those changes can still retain customers or not.

The AI team suggests following benefits of using the proposed hybrid solution:

- Improved Accuracy: Machine learning model can analyze large volumes of data and identify patterns and insights that are not easily visible to human analysts. Counterfactuals and what-if analysis can be used to test and validate the accuracy of the machine learning models and get ideas on retaining the customers who are likely to churn, leading to higher customer retention


- Personalized Solutions: Machine learning can analyze customer behavior and preferences to offer personalized solutions that can help retain customers. Counterfactuals and what-if analysis can be used to test different scenarios and predict how customers will respond to different solutions, leading to more effective personalized solutions


- Faster Decision-Making: Machine learning can process data in real-time and offer insights that can help decision-makers make faster and more informed decisions. Counterfactuals and what-if analysis can be used to test different scenarios and predict how different decisions will impact customer churn, enabling decision-makers to make more informed decisions more quickly


- Cost-Effective: The proposed solution can automate many tasks that would otherwise require human analysts, leading to cost savings for the company. By automating routine tasks, such as data analysis and report generation, human analysts can focus on more complex tasks that require human intuition and expertise


- Scalability: The solution can handle large volumes of data and scale as the customer base grows. This can be particularly useful for the company which is experiencing rapid growth and need a solution that can keep up with their expanding customer base


- Continuous Improvement: The solution can learn from new data and continuously improve their predictions and recommendations. This can lead to better outcomes over time as the decision intelligence system becomes more sophisticated and accurate


- Revenue Growth: Retaining current customers can contribute to revenue growth, as satisfied customers are more likely to purchase additional products and services 


- Competitive Advantage: Successfully predicting and mitigating customer churn can set the company apart from its competitors. By delivering exceptional customer experiences and satisfying customer demands, it can establish a devoted customer base and stand out in the marketplace

### Dataset Details:

The dataset on customer churn pertains to a hypothetical telecom business that operated in California during the third quarter, and encompasses information on 7043 customers who had either churned, retained, or acquired the company's services. The dataset further features various crucial demographic variables for each customer.

Demographics:

- CustomerID: A unique ID that identifies each customer

- Gender: The customer’s gender: Male, Female

- Senior Citizen: Indicates if the customer is 65 or older: Yes, No

- Partner: Indicates if the customer has a partner: Yes, No

- Dependents: Indicates if the customer lives with any dependents: Yes, No. Dependents could be children, parents, grandparents, etc.

Services:

- Tenure in Months: Indicates the total amount of months that the customer has been with the company by the end of the quarter

- Phone Service: Indicates if the customer subscribes to home phone service with the company: Yes, No

- Multiple Lines: Indicates if the customer subscribes to multiple telephone lines with the company: Yes, No

- Internet Service: Indicates if the customer subscribes to Internet service with the company: No, DSL, Fiber Optic, Cable

- Online Security: Indicates if the customer subscribes to an additional online security service provided by the company: Yes, No

- Online Backup: Indicates if the customer subscribes to an additional online backup service provided by the company: Yes, No

- Device Protection Plan: Indicates if the customer subscribes to an additional device protection plan for their Internet equipment provided by the company: Yes, No

- Premium Tech Support: Indicates if the customer subscribes to an additional technical support plan from the company with reduced wait times: Yes, No

- Streaming TV: Indicates if the customer uses their Internet service to stream television programing from a third party provider: Yes, No. The company does not charge an additional fee for this service

- Streaming Movies: Indicates if the customer uses their Internet service to stream movies from a third party provider: Yes, No. The company does not charge an additional fee for this service

- Contract: Indicates the customer’s current contract type: Month-to-Month, One Year, Two Year

- Paperless Billing: Indicates if the customer has chosen paperless billing: Yes, No

- Payment Method: Indicates how the customer pays their bill: Bank Withdrawal, Credit Card, Mailed Check

- Monthly Charge: Indicates the customer’s current total monthly charge for all their services from the company

- Total Charges: Indicates the customer’s total charges, calculated to the end of the quarter

Churn Status:

- Churn Label: Yes = the customer left the company this quarter. No = the customer remained with the company. Directly related to Churn Value

Source: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113 

### Stage 1: AI Model Creation:

P.S. The goal of this activity is to build a machine learning model that can fairly predict the customer churn. Although advanced data processing techniques and ml algorithms  are available, we will use the ones that give satisfactory results, to ensure we are spending more efforts towards the goal i.e. building a Decision Intelligence system.

#### Step 1: Import the required libraries

In [1]:
import dice_ml
import numpy as np 
import pandas as pd 
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

C:\Users\Avinash\anaconda3\lib\site-packages\numpy\.libs\libopenblas.NOIJJG62EMASZI6NYURL6JBKM4EVBGM7.gfortran-win_amd64.dll
C:\Users\Avinash\anaconda3\lib\site-packages\numpy\.libs\libopenblas.XWYDX2IKJW2NMTWSFYNGFUWKQU3LYTCZ.gfortran-win_amd64.dll
  stacklevel=1)


#### Step 2: Load the Telecom Churn Dataset

In [2]:
churn_data_all = pd.read_csv('telco_churn.csv')

In [3]:
churn_data_all.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


#### Step 3: Checking the data characteristics and if there are any discrepancies in the data (e.g. missing values, wrong data types etc.)

In [4]:
churn_data_all.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


#### Step 4: Data Preprocessing

We see that all columns except the TotalCharges do not have any null values. The TotalCharges column needs to be treated with some appropriate missing value strategy. Let us observe the rows where the column has null values.

In [5]:
churn_data_all[churn_data_all['TotalCharges'].isnull()].head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
488,4472-LVYGI,Female,0,Yes,Yes,0,No,No phone service,DSL,Yes,...,Yes,Yes,Yes,No,Two year,Yes,Bank transfer (automatic),52.55,,No
753,3115-CZMZD,Male,0,No,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,20.25,,No
936,5709-LVOEQ,Female,0,Yes,Yes,0,Yes,No,DSL,Yes,...,Yes,No,Yes,Yes,Two year,No,Mailed check,80.85,,No
1082,4367-NUYAO,Male,0,Yes,Yes,0,Yes,Yes,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,25.75,,No
1340,1371-DWPAZ,Female,0,Yes,Yes,0,No,No phone service,DSL,Yes,...,Yes,Yes,Yes,No,Two year,No,Credit card (automatic),56.05,,No


From the above table, we find that the TotalCharges column is null all the time when the tenure is 0. This means that the customers who have joined recently and havent even completed 1 month with the telecom operator, they will have their TotalCharges as null. Let us convert the null value as 0 as it would be the most appropriate value. 

In [6]:
churn_data_all['TotalCharges'] = np.where(churn_data_all['TotalCharges'].isnull(), 0, churn_data_all['TotalCharges'])

While the SeniorCitizen column contains binary values, its encoding differs from that of other attributes. Therefore, we convert it to a format that conforms to a standard representation for all variables.

In [7]:
churn_data_all['SeniorCitizen'] = np.where(churn_data_all.SeniorCitizen == 1,"Yes","No")

Creating a list of column names based on their type and use.

In [8]:
all_columns = [x for x in churn_data_all.drop('customerID', axis = 1).columns]
id_column = ['customerID']
target_column = ['Churn']
categorical_columns = [y for y in churn_data_all.drop('customerID', axis = 1).select_dtypes(include = [object]).columns]
numeric_columns = [z for z in all_columns if z not in categorical_columns]

In [9]:
get_dummies = []
label_encoding = []
for i in categorical_columns:
    print('Column Name:', i, ', Unique Value Counts:', len(churn_data_all[i].unique()), ', Values:', churn_data_all[i].unique())
    if len(churn_data_all[i].unique()) > 2:
        get_dummies.append(i)
    else:
        label_encoding.append(i)

Column Name: gender , Unique Value Counts: 2 , Values: ['Female' 'Male']
Column Name: SeniorCitizen , Unique Value Counts: 2 , Values: ['No' 'Yes']
Column Name: Partner , Unique Value Counts: 2 , Values: ['Yes' 'No']
Column Name: Dependents , Unique Value Counts: 2 , Values: ['No' 'Yes']
Column Name: PhoneService , Unique Value Counts: 2 , Values: ['No' 'Yes']
Column Name: MultipleLines , Unique Value Counts: 3 , Values: ['No phone service' 'No' 'Yes']
Column Name: InternetService , Unique Value Counts: 3 , Values: ['DSL' 'Fiber optic' 'No']
Column Name: OnlineSecurity , Unique Value Counts: 3 , Values: ['No' 'Yes' 'No internet service']
Column Name: OnlineBackup , Unique Value Counts: 3 , Values: ['Yes' 'No' 'No internet service']
Column Name: DeviceProtection , Unique Value Counts: 3 , Values: ['No' 'Yes' 'No internet service']
Column Name: TechSupport , Unique Value Counts: 3 , Values: ['No' 'Yes' 'No internet service']
Column Name: StreamingTV , Unique Value Counts: 3 , Values: ['N

We see that some categorical columns have 2 unique values whereas some have more than 2. In order to apply appropriate techniques, we have split the columns.

Applying dummy variable creation techniques to columns having more than 2 unique values.

In [10]:
churn_data_all_dl = pd.get_dummies(churn_data_all, prefix=get_dummies, columns=get_dummies)

In [11]:
churn_data_all_dl.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,...,StreamingMovies_No,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check
0,7590-VHVEG,Female,No,Yes,No,1,No,Yes,29.85,29.85,...,1,0,0,1,0,0,0,0,1,0
1,5575-GNVDE,Male,No,No,No,34,Yes,No,56.95,1889.5,...,1,0,0,0,1,0,0,0,0,1
2,3668-QPYBK,Male,No,No,No,2,Yes,Yes,53.85,108.15,...,1,0,0,1,0,0,0,0,0,1
3,7795-CFOCW,Male,No,No,No,45,No,No,42.3,1840.75,...,1,0,0,0,1,0,1,0,0,0
4,9237-HQITU,Female,No,No,No,2,Yes,Yes,70.7,151.65,...,1,0,0,1,0,0,0,0,1,0


Applying Label Encoding technique to columns with 2 unique values and saving the mappings.

In [12]:
mappings = {}
for col in label_encoding:
    le = LabelEncoder()
    churn_data_all_dl[col] = le.fit_transform(churn_data_all_dl[col])
    mappings[col] = dict(zip(le.classes_,range(len(le.classes_))))
    
mappings

{'gender': {'Female': 0, 'Male': 1},
 'SeniorCitizen': {'No': 0, 'Yes': 1},
 'Partner': {'No': 0, 'Yes': 1},
 'Dependents': {'No': 0, 'Yes': 1},
 'PhoneService': {'No': 0, 'Yes': 1},
 'PaperlessBilling': {'No': 0, 'Yes': 1},
 'Churn': {'No': 0, 'Yes': 1}}

#### Step 5: Modeling

Splitting the dataset for training and inference.

In [13]:
X = churn_data_all_dl.drop(['customerID', 'Churn'], axis=1)
y = churn_data_all_dl['Churn']

In [14]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, stratify=y, random_state = 0)

Hyperparameter Tuning

The process of hyperparameter tuning entails identifying the best hyperparameter combination for a learning algorithm, which can be employed to optimize its performance on any given dataset. By minimizing a pre-specified loss function, the selected hyperparameters can reduce errors and improve the model's results. 

Running the following lines of code aids in determining the ideal hyperparameter for our machine learning algorithm.

In [15]:
n_estimators = [int(x) for x in np.linspace(start = 100, stop = 2100, num = 6)]
feature_name = list(X_test.columns)

max_depth = [int(x) for x in np.linspace(10, 100, num = 5)]
max_depth.append(None)
min_samples_split = [2, 5, 10]
min_samples_leaf = [1, 2, 4, 6, 8, 10]
random_grid = {'n_estimators':n_estimators,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf}
print(random_grid)
rf = RandomForestClassifier()
rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid, n_iter = 10, cv = 2, verbose=2, random_state=42, n_jobs = -1)

rf_random.fit(X_train[feature_name], y_train)
print(rf_random.best_params_)


{'n_estimators': [100, 500, 900, 1300, 1700, 2100], 'max_depth': [10, 32, 55, 77, 100, None], 'min_samples_split': [2, 5, 10], 'min_samples_leaf': [1, 2, 4, 6, 8, 10]}
Fitting 2 folds for each of 10 candidates, totalling 20 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  16 out of  20 | elapsed:   36.6s remaining:    9.1s
[Parallel(n_jobs=-1)]: Done  20 out of  20 | elapsed:   47.4s finished


{'n_estimators': 100, 'min_samples_split': 2, 'min_samples_leaf': 8, 'max_depth': 10}


We see that for the random forest model (the algorithm of choice in this case), the best combination of hyperparameters is as follows:

- n_estimators = 100
- min_samples_split = 2
- min_samples_leaf = 8
- max_depth = 10

Using the above hyperparameters to build the random forest model

In [16]:
feature_name = list(X_test.columns)
churn_classifier=RandomForestClassifier(n_estimators=100,min_samples_split=2,min_samples_leaf=8,max_depth=10)
churn_classifier.fit(X_train[feature_name],y_train)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=10, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=8, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)

Now, using the trained model for making predictions on the test data and saving them along with the prediction probabilities.

In [17]:
pred_df = X_test.copy()
pred_df['Churn'] = y_test
pred_df['pred'] = churn_classifier.predict(X_test[feature_name])
prediction_of_probability = churn_classifier.predict_proba(X_test[feature_name])
pred_df['prob_0'] = prediction_of_probability[:,0] 
pred_df['prob_1'] = prediction_of_probability[:,1]

Evaluating the model performance.

In [18]:
print("Accuracy: ", (accuracy_score(pred_df['Churn'], pred_df['pred']))*100)
print("Precision: ", (precision_score(pred_df['Churn'], pred_df['pred']))*100)
print("Recall: ", (recall_score(pred_df['Churn'], pred_df['pred']))*100)
print("F1 Score: ", (f1_score(pred_df['Churn'], pred_df['pred']))*100)

Accuracy:  80.48261178140525
Precision:  67.1280276816609
Recall:  51.87165775401069
F1 Score:  58.521870286576174


As mentioned above already, we are aiming for satisfactory model performance for the demonstration. There are multiple ways to get better model performance, some of the prominent ones are as below:

- Feature engineering: This involves creating new features or transforming existing features to make them more informative for the model.

- Regularization: This involves adding a penalty term to the loss function to prevent overfitting of the model to the training data

- Ensemble methods: This involves combining multiple models to create a more accurate prediction

- Data augmentation: This involves increasing the size of the training dataset by generating new examples using techniques like rotation, flipping, or adding noise to the data

- Oversampling/Undersampling - This involves balancing imbalanced datasets by either increasing the number of instances in the minority class (oversampling) or decreasing the number of instances in the majority class (undersampling)

- Transfer learning: This involves leveraging a pre-trained model to solve a similar problem or using a pre-trained model as a feature extractor

- Gradient clipping: This involves clipping the gradients during training to prevent them from becoming too large or too small

- Early stopping: This involves stopping the training process when the model's performance on the validation set stops improving

Now that we have the model and predictions ready, let us move further and start using them for decision making.

### Stage 2: Generating Counterfactuals

We will be using a combination of Counterfactual and What-If analysis to drive decision intelligence. Let us have a look into what they mean and how they can be used.

Counterfactual Analysis:

Counterfactuals refer to a type of analysis that involves identifying the minimal set of changes required to an input instance such that the machine learning model output changes to a desired outcome. In simpler terms, counterfactuals are hypothetical examples of inputs that would lead to a different output from a machine learning model.

Counterfactual analysis can be used to explain the behavior of a machine learning model and to provide insights into how the model can be improved as well as what actions can be taken so that favourable outcomes can be generated. It can also be used for various applications such as fairness analysis, where the minimal set of changes required to achieve a desired outcome can be used to identify potential sources of bias in the model. Additionally, counterfactual analysis can be used for causal inference, where the minimal set of changes can be used to estimate the causal effect of a treatment on an outcome.

We will be using the DiCEML python package for generating counterfactuals.

DiCEML (Diverse Counterfactual Explanations through Mixed Integer Linear Programming) is a framework that enables the generation of counterfactual examples for machine learning models. DiCEML employs mixed-integer linear programming to find a minimal set of changes to input instances required to change the model output to a desired outcome. It is a state-of-the-art method for generating diverse counterfactual explanations that take into account the constraints and features of the input data, and can be used for various applications such as explainable AI, fairness analysis, and causal inference.

#### Step 6: Building Counterfactual explainer object

We will be building the counterfactual explainer object using the DiCEML python package, along with some helper functions.

In [19]:
def initialize_counterfactuals(
    train_df, model, feature_list, continuous_features, target, model_type
):
    """
    Initialize Counterfactual explainer object for the given input model and also the feature range dictionary
    (both are used in calculating Counterfactuals in Local Explainability)

    Args:
        train_df (dataframe) : train dataframe
        model (object) : input model (to explain)
        feature_list (list) : list of features used in model
        continuous_features (list) : list of continuous features
        target (str) : target column name
        model_type (str) : classification or regression

    Returns:
        explainer (object) : counterfactual exapliner object
        feature_range (dict) : dictionary containing permissable feature ranges for continuous features
    """
    df_model = train_df[feature_list + [target]]

    ####Round the decimal upto 4 digits
    df_model[continuous_features] = (
        df_model[continuous_features].astype(float).round(4)
    )  # .astype(str)
    print("continuous features", continuous_features)

    #### dice data  initialisations
    data_dice = dice_ml.Data(
        dataframe=df_model,  # For perturbation strategy
        continuous_features=continuous_features,
        outcome_name=target,
    )

    ## dice model initialisation
    if model_type == "regression" or model_type == "time series":
        model_dice = dice_ml.Model(model=model, backend="sklearn", model_type="regressor")
    else:
        model_dice = dice_ml.Model(model=model, backend="sklearn")

    # getting model and data together
    explainer = dice_ml.Dice(data_dice, model_dice, method="random")
    df_model.drop(target, axis=1, inplace=True)
    feature_range = {}

    for i in continuous_features:
        feature_range[i] = [
            df_model[i].astype(float).min(),
            df_model[i].astype(float).max(),
        ]

    return explainer, feature_range


In [20]:
def preprocess_encode(df,mappings,feature_list):
    if mappings != None:
        encode_df = df[feature_list]
        cat_cols = (encode_df.dtypes == object)
        cat_cols = list(cat_cols[cat_cols==True].index)
        encode_df = df.copy()
        if len(cat_cols)>0:
            encode_df.update(encode_df[cat_cols].apply(lambda col: col.map(mappings[col.name])).astype(int))
            encode_df[cat_cols] = encode_df[cat_cols].astype(int)
    else:
        encode_df = df.copy()
    return encode_df

In [21]:
def postprocess_decode(df,mappings):
    decoded_output = df.copy()
    if mappings != None:
        inv_mapping_dict = {cat: {v: k for k, v in map_dict.items()} for cat, map_dict in mappings.items()}
        decoded_output[list(mappings.keys())] = decoded_output[list(mappings.keys())].astype(int)
        decoded_output = decoded_output.replace(inv_mapping_dict)
    return decoded_output


In [22]:
def generate_output_counterfactuals(explainer,feature_range,query_instance,model_type,model,feature_list,desired_class_or_range,continuous_features,features_vary='all',num_cf=3,mappings={}):
    """
    Generating the counterfactuals output for a particular query_instance (record), given various other inputs like features to vary, etc.

    Args:
        explainer (object) : couterfactuals explainer object
        feature_range (dict) : dictionary containing permissable feature ranges for continuous features
        query_instance (dataframe) : selected record to explain (user input)
        model_type (str) : classification or regression
        model (object) : model object, required only in the case of classification
        feature_list (list) : list of features used in model
        desired_class_or_range (list) : if classification then desired output class; if regression then desired output range (user input)
        continuous_features (list) : list of continuous features
        features_vary (list) : list of features that can be varied (user input)
        num_cf (int) : number of counterfactuals to generate
        mappings (dict) : encoder mapping dictionary

    Returns:
        cf_output (dataframe) : counterfactuals output dataframe
    """
    query_instance[continuous_features] = query_instance[continuous_features].astype(float).round(4)
    query_instance_processed = preprocess_encode(query_instance,mappings,list(query_instance.columns))
    display(query_instance_processed)
    if model_type == 'classification':
        cf_exp = explainer.generate_counterfactuals(query_instance_processed, total_CFs=20,desired_class=desired_class_or_range,permitted_range=feature_range,features_to_vary=features_vary)
    else:
        cf_exp = explainer.generate_counterfactuals(query_instance_processed, total_CFs=num_cf,desired_range=desired_class_or_range,permitted_range=feature_range,features_to_vary=features_vary)
    cf_exp_df = cf_exp.cf_examples_list[0].final_cfs_df

    if model_type == "classification":
        cf_exp_df["Probability"] = model.predict_proba(cf_exp_df[feature_list])[:, desired_class_or_range]
        cf_exp_df = (cf_exp_df.sort_values("Probability", ascending=False).reset_index(drop=True).head(num_cf))


    cf_exp_df = postprocess_decode(cf_exp_df,mappings)
    cf_output = cf_ouput_df_fn(cf_exp_df,query_instance)
    return cf_output

def cf_ouput_df_fn(df,org_df):
    newdf = df.values.tolist()
    org = org_df.values.tolist()[0]
    for ix in range(df.shape[0]):
        for jx in range(len(org)):
            if str(newdf[ix][jx]) == str(org[jx]):
                newdf[ix][jx] = '-'
            else:
                newdf[ix][jx] = str(newdf[ix][jx])
    return pd.DataFrame(newdf, columns=df.columns, index=df.index)

In [23]:
features = [x for x in X_train.columns]

In [24]:
train_df = X_train.copy()
train_df["Churn"] = y_train

In [25]:
model_type='classification'
target = 'Churn'
explainer, feature_range = initialize_counterfactuals(
    train_df, churn_classifier, features, numeric_columns, target, model_type
)

continuous features ['tenure', 'MonthlyCharges', 'TotalCharges']


### Stage 3: What-If Analysis

#### Step 7: Using the explainer object to generate possible strategies to retain customer

Now we have the 'explainer' counterfactual object that can be used to produce counterfactual explanations i.e. what changes in the inputs can be made to get the desired output. In our case, it would be:

"For the customers who are likely to churn, what can be changed (services, account related, demographics) so that he/she is retained". Let us understand this through an example. 

We will see how DiCEML works and inputs it expects:

- The trained machine learning model
- The input instance for which we want to generate a counterfactual. We will take a sample user from a pool of users who are predicted to be churning 
- The features to vary. For this example we will see if providing services like paperless billing, online security, online backup, device protection, tech support, internet services for streaming tv or movies, one or two year contract, or if changing the monthly plan or a combination of features can change the customer's mind and he/she might plan to stay back with the telecom operator 
- The desired output or the target class for which we want to generate a counterfactual. In this case, it will be non-churn i.e. 0 (from 1)

In [26]:
test_df = X_test.copy()
test_df["Churn"] = y_test

In [27]:
test_df_churners = pred_df[pred_df['Churn'] == 1].sort_values(by=['prob_1'], ascending=False)

In [28]:
test_df_churners_input = test_df_churners.drop(['Churn', 'pred', 'prob_0', 'prob_1'], inplace=False, axis=1)

In [29]:
test_df_churners_input[(test_df_churners_input['MonthlyCharges']>100) & (test_df_churners_input['tenure']<5)]

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check
6385,1,0,0,1,4,1,1,101.7,364.55,0,...,0,0,1,1,0,0,0,0,1,0
462,1,0,1,1,4,1,0,101.15,385.9,0,...,0,0,1,1,0,0,0,0,1,0
2246,0,0,1,1,1,1,1,102.45,102.45,0,...,0,0,1,1,0,0,0,0,1,0
3837,0,0,0,0,4,1,1,105.65,443.9,1,...,0,0,1,0,1,0,0,0,1,0


In [48]:
test_df_churners_input_record = test_df_churners_input[61:62]

In [49]:
features_vary = ['PaperlessBilling', 'MonthlyCharges',
                 'OnlineSecurity_Yes', 'OnlineBackup_Yes',
                 'DeviceProtection_Yes', 'TechSupport_Yes', 
                 'StreamingTV_Yes', 'StreamingMovies_Yes', 
                 'Contract_One year', 'Contract_Two year']

In [50]:
dice_exp = explainer.generate_counterfactuals(test_df_churners_input_record, total_CFs=4, desired_class="opposite", features_to_vary=features_vary)
dice_exp.visualize_as_dataframe()

100%|██████████| 1/1 [00:00<00:00,  2.93it/s]

Query instance (original outcome : 1)





Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,1,1,4.0,1,0,101.150002,385.899994,0,...,0,1,1,0,0,0,0,1,0,1



Diverse Counterfactual set (new outcome: 0.0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1.0,0.0,1.0,1.0,4.0,1.0,0.0,55.36,386.0,0.0,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0
1,1.0,0.0,1.0,1.0,4.0,1.0,0.0,35.86,386.0,0.0,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0
2,1.0,0.0,1.0,1.0,4.0,1.0,0.0,22.32,386.0,0.0,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0
3,1.0,0.0,1.0,1.0,4.0,1.0,0.0,67.89,386.0,0.0,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0


In [51]:
test_df_churners_input_record[features_vary]

Unnamed: 0,PaperlessBilling,MonthlyCharges,OnlineSecurity_Yes,OnlineBackup_Yes,DeviceProtection_Yes,TechSupport_Yes,StreamingTV_Yes,StreamingMovies_Yes,Contract_One year,Contract_Two year
462,0,101.15,0,0,1,0,1,1,0,0


In [52]:
dice_exp.cf_examples_list[0].final_cfs_df[features_vary]

Unnamed: 0,PaperlessBilling,MonthlyCharges,OnlineSecurity_Yes,OnlineBackup_Yes,DeviceProtection_Yes,TechSupport_Yes,StreamingTV_Yes,StreamingMovies_Yes,Contract_One year,Contract_Two year
0,0.0,55.36,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0
1,0.0,35.86,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0
2,0.0,22.32,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0
3,0.0,67.89,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0


From the above output, we see that the explainer model has suggested 4 possible ways to retain the customer:

1. Change Monthly Charges from 101.15 to 55.36 and provide Online Security service 
2. Change Monthly Charges from 101.15 to 35.86 and provide Online Security service
3. Change Monthly Charges from 101.15 to 22.32 and provide Online Security service
4. Change Monthly Charges from 101.15 to 67.89 and provide Online Security service

However, there might be certain constraints for the business to take into consideration and one cannot go directly with the suggestions from the counterfactual analysis. One might want to take the inputs from the counterfactual analysis, tweak them a bit based on the business constraints and then check if those tweaked inputs generate the desired output. This is possible through What-If analysis.

#### Step 8: Using What-If analysis for getting the right set of strategies to retain customer (based on constraints)

What-If analysis:

What-if analysis is a technique used to understand the behavior of a machine learning model under different scenarios or what-if situations. It involves testing the model with different input values or scenarios to understand how the model behaves in those situations.

Now suppose for the given counterfactual recommendations, we have a constraint that the customer Monthly charges cannot be reduced by more than 10%, as it can result in large revenue losses. So, for the given customer, we cannot bring the monthly charges below 91. Also, we do not have any constraint on providing services for free, so let's see with the given constrained monthly charges and a combination of services, if we can retain the customer or not.

Options:

1. Change Monthly Charges from 101.7 to 91, Provide Online Security and Online Backup services 

In [53]:
custom_input = test_df_churners_input_record.copy()
custom_input['MonthlyCharges'] = custom_input['MonthlyCharges'].replace(101.15, 91)
custom_input['OnlineBackup_Yes'] = custom_input['OnlineBackup_Yes'].replace(0, 1)
custom_input['OnlineSecurity_Yes'] = custom_input['OnlineSecurity_Yes'].replace(0, 1)

In [54]:
if churn_classifier.predict(custom_input)==0:
    print("Successfully retained the customer")  
else:
    print("Sorry, customer could not be retained")

print("\nProbability of Churn:",churn_classifier.predict_proba(custom_input)[0][1].round(2))

Sorry, customer could not be retained

Probability of Churn: 0.57


We see that with the given option 1 inputs, we cannot retain the customer. Let us try with another set of inputs

2. Change Monthly Charges from 101.7 to 91, Provide Online Security and Online Backup services with a 2 year contract

In [57]:
custom_input2 = test_df_churners_input_record.copy()
custom_input2['MonthlyCharges'] = custom_input2['MonthlyCharges'].replace(101.15, 91)
custom_input2['OnlineBackup_Yes'] = custom_input2['OnlineBackup_Yes'].replace(0, 1)
custom_input2['OnlineSecurity_Yes'] = custom_input2['OnlineSecurity_Yes'].replace(0, 1)
custom_input2['Contract_Two year'] = custom_input2['Contract_Two year'].replace(0, 1)

In [58]:
if churn_classifier.predict(custom_input2)==0:
    print("Successfully retained the customer")  
else:
    print("Sorry, customer could not be retained")

print("\nProbability of Churn:",churn_classifier.predict_proba(custom_input2)[0][1].round(2))

Successfully retained the customer

Probability of Churn: 0.47


We see that with the given set of inputs in option 2 we were successful in retaining the customer with churn probability of the customer coming down to 47%. So option 2 can be converted to a personalized offering to the customer.

Counterfactuals and "What-if" analysis can be used to detect bias in machine learning models. It involves analyzing counterfactual outputs or testing the model's sensitivity to changes in the input data or model parameters to identify areas where the model may be biased. Let us see how bias can be detected using counterfactual analysis + what-if for the same previously used record, adding demographics SenorCitizen and gender into the features to vary matrix.

In [59]:
test_df_churners_bias_inputs = test_df_churners_input[(test_df_churners_input['SeniorCitizen'] == 1) | (test_df_churners_input['gender'] == 1)]

In [60]:
features_vary_bias = ['SeniorCitizen', 'gender']

In [61]:
dice_exp_bias = explainer.generate_counterfactuals(test_df_churners_bias_inputs[100:130], total_CFs=4, desired_class="opposite", features_to_vary=features_vary_bias)
dice_exp_bias.visualize_as_dataframe()

  7%|▋         | 2/30 [00:00<00:03,  8.85it/s]

No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec
No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec


 13%|█▎        | 4/30 [00:00<00:02,  9.10it/s]

No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec
No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec


 20%|██        | 6/30 [00:00<00:04,  5.33it/s]

Only 1 (required 4)  Diverse Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec
No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec


 23%|██▎       | 7/30 [00:01<00:04,  5.68it/s]

No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec
No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00

 30%|███       | 9/30 [00:01<00:03,  5.98it/s]

 sec
No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec


 37%|███▋      | 11/30 [00:01<00:02,  7.01it/s]

No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec
No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec


 40%|████      | 12/30 [00:01<00:02,  7.20it/s]

No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec


 43%|████▎     | 13/30 [00:02<00:06,  2.68it/s]

Only 2 (required 4)  Diverse Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec


 50%|█████     | 15/30 [00:03<00:05,  2.70it/s]

Only 1 (required 4)  Diverse Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec
No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec


 57%|█████▋    | 17/30 [00:03<00:03,  4.08it/s]

No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec
No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec


 63%|██████▎   | 19/30 [00:04<00:02,  5.09it/s]

No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec
No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec


 67%|██████▋   | 20/30 [00:04<00:01,  5.30it/s]

No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec


 70%|███████   | 21/30 [00:06<00:05,  1.53it/s]

Only 2 (required 4)  Diverse Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 01 sec


 77%|███████▋  | 23/30 [00:07<00:03,  1.78it/s]

Only 1 (required 4)  Diverse Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec
No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec


 80%|████████  | 24/30 [00:08<00:04,  1.26it/s]

Only 3 (required 4)  Diverse Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 01 sec


 83%|████████▎ | 25/30 [00:09<00:04,  1.10it/s]

Only 2 (required 4)  Diverse Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 01 sec


 87%|████████▋ | 26/30 [00:11<00:04,  1.19s/it]

Only 3 (required 4)  Diverse Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 01 sec


 93%|█████████▎| 28/30 [00:12<00:01,  1.39it/s]

Only 1 (required 4)  Diverse Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec
No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec


100%|██████████| 30/30 [00:12<00:00,  2.33it/s]

Only 2 (required 4)  Diverse Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec
No Counterfactuals found for the given configuration, perhaps try with different parameters... ; total time taken: 00 min 00 sec
Query instance (original outcome : 1)





Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,1,0,5.0,0,1,24.950001,100.400002,0,...,0,0,1,0,0,0,0,1,0,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,1,0,0,2.0,1,1,95.099998,180.25,1,...,0,1,1,0,0,0,1,0,0,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,1,0,10.0,1,1,99.849998,990.900024,1,...,0,1,1,0,0,0,0,1,0,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,0,0,23.0,1,1,90.599998,1943.199951,0,...,0,0,1,0,0,0,0,1,0,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,1,0,51.0,1,1,94.650002,4812.75,0,...,0,1,1,0,0,0,0,1,0,1



Diverse Counterfactual set (new outcome: 0.0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,0.0,0.0,1.0,0.0,51.0,1.0,1.0,95.0,4813.05,0.0,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0


Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,1,1,0,53.0,1,1,101.900002,5549.399902,0,...,0,1,1,0,0,0,0,1,0,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,0,1,0,0,20.0,1,1,94.099998,1782.400024,0,...,0,1,1,0,0,0,0,1,0,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,0,0,3.0,1,1,43.299999,123.650002,1,...,0,0,1,0,0,0,0,1,0,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,0,0,1.0,1,1,55.700001,55.700001,1,...,0,1,1,0,0,0,0,0,1,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,0,0,2.0,1,1,79.849998,152.449997,1,...,0,0,1,0,0,0,0,1,0,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,1,1,0,10.0,1,1,92.5,934.099976,1,...,0,0,1,0,0,0,0,0,1,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,1,1,0,20.0,1,1,73.650002,1463.5,1,...,0,0,1,0,0,0,0,1,0,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,1,0,4.0,0,1,48.549999,201.0,0,...,0,1,1,0,0,0,0,1,0,1



Diverse Counterfactual set (new outcome: 0.0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,0.0,0.0,1.0,0.0,4.0,0.0,1.0,49.01,201.0,0.0,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0
1,0.0,1.0,1.0,0.0,4.0,0.0,1.0,49.01,201.0,0.0,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0


Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,1,0,0,14.0,1,0,90.449997,1266.099976,1,...,0,1,1,0,0,0,1,0,0,1



Diverse Counterfactual set (new outcome: 0.0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1.0,0.0,0.0,0.0,14.0,1.0,0.0,91.0,1267.1,1.0,...,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0


Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,0,0,23.0,1,1,74.949997,1710.449951,0,...,0,0,1,0,0,0,0,1,0,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,1,1,0,29.0,1,1,95.900002,2745.199951,0,...,0,1,1,0,0,0,0,1,0,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,1,0,0,8.0,1,0,100.300003,832.349976,1,...,0,1,1,0,0,0,1,0,0,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,1,0,35.0,1,1,75.199997,2576.199951,0,...,0,0,1,0,0,0,0,1,0,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,0,0,13.0,1,0,98.150002,1230.25,0,...,0,1,1,0,0,0,0,0,1,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,0,0,1.0,1,1,50.75,50.75,1,...,0,0,1,0,0,0,0,1,0,1



No counterfactuals found!
Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,1,1,20.0,1,1,90.199997,1776.550049,1,...,0,1,1,0,0,0,0,1,0,1



Diverse Counterfactual set (new outcome: 0.0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1.0,1.0,1.0,1.0,20.0,1.0,1.0,91.0,1777.05,1.0,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0
1,0.0,1.0,1.0,1.0,20.0,1.0,1.0,91.0,1777.05,1.0,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0


Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,1,0,0,6.0,1,1,85.150002,503.600006,1,...,0,0,1,0,0,1,0,0,0,1



Diverse Counterfactual set (new outcome: 0.0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1.0,0.0,0.0,0.0,6.0,1.0,1.0,86.0,504.0,1.0,...,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0


Query instance (original outcome : 1)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,0,0,1.0,1,0,46.200001,46.200001,1,...,0,0,1,0,0,0,1,0,0,1



No counterfactuals found!
Query instance (original outcome : 0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,0,1,3.0,1,0,69.650002,220.100006,1,...,0,0,1,0,0,0,0,1,0,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,0.0,0.0,0.0,1.0,3.0,1.0,0.0,70.0,221.1,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1
1,1.0,1.0,0.0,1.0,3.0,1.0,0.0,70.0,221.1,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1
2,0.0,1.0,0.0,1.0,3.0,1.0,0.0,70.0,221.1,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1


Query instance (original outcome : 0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,0,0,1.0,1,0,44.400002,44.400002,1,...,0,0,1,0,0,0,0,0,1,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,45.01,45.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1
1,0.0,1.0,0.0,0.0,1.0,1.0,0.0,45.01,45.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1


Query instance (original outcome : 0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,0,0,22.0,1,1,101.349998,2317.100098,0,...,0,1,1,0,0,1,0,0,0,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1.0,1.0,0.0,0.0,22.0,1.0,1.0,102.0,2318.1,0.0,...,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1
1,0.0,0.0,0.0,0.0,22.0,1.0,1.0,102.0,2318.1,0.0,...,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1
2,0.0,1.0,0.0,0.0,22.0,1.0,1.0,102.0,2318.1,0.0,...,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1


Query instance (original outcome : 0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,1,1,0,16.0,1,1,74.300003,1178.25,1,...,0,0,1,0,0,0,1,0,0,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,0.0,1.0,1.0,0.0,16.0,1.0,1.0,74.0,1179.05,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1


Query instance (original outcome : 0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,0,1,38.0,1,1,99.25,3777.149902,0,...,0,1,1,0,0,1,0,0,0,0



No counterfactuals found!
Query instance (original outcome : 0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,1,1,26.0,1,1,83.75,2070.600098,0,...,0,0,1,0,0,0,0,1,0,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1.0,1.0,1.0,1.0,26.0,1.0,1.0,84.0,2071.1,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1
1,0.0,1.0,1.0,1.0,26.0,1.0,1.0,84.0,2071.1,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1


Query instance (original outcome : 0)


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,PaperlessBilling,MonthlyCharges,TotalCharges,MultipleLines_No,...,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,Churn
0,1,0,1,1,21.0,1,0,104.400002,2157.949951,0,...,0,1,1,0,0,0,0,1,0,0



No counterfactuals found!


The above counterfactual output, in a few instances, provides recommendations to change either the gender or age (senior citizen to non-senior citizen) for favorable outcomes. his indicates there is a need to check for bias in the model. It is important to note that these methods are not foolproof for detecting and mitigating bias in AI systems, and it is crucial to use a combination of techniques, data validation, auditing, and diversity and inclusion practices to ensure AI systems produce equitable outcomes. Fairness metrics such as disparate impact, equal opportunity difference, false positive rate parity, equalized odds, and calibration are used to assess whether an AI model is treating different groups of individuals fairly, and the choice of metrics depends on the specific context and goals. Disparate impact is one such measure of fairness that examines whether the AI model's outcomes have a disproportionate impact on particular groups based on protected characteristics, and its formula is used to calculate the disparate impact ratio for two groups being compared. Let's calculate the disparate impact value for the Gender and Senior Citizen columns in our dataset.

In [62]:
protected_columns = ["gender", "SeniorCitizen"]
target="pred"
for pro_col in protected_columns:
    categories = list(pred_df[pro_col].unique())
    selection_rate_list = []
    for cat in categories:
        selection_rate = pred_df[(pred_df[target]==1)&(pred_df[pro_col]==cat)].shape[0]/pred_df[pred_df[pro_col]==cat].shape[0]
        selection_rate_list.append(np.round(selection_rate,3))
    disparate_impact = np.round(min(selection_rate_list)/max(selection_rate_list),3)
    print("Disparate Impact for",pro_col,":",disparate_impact)

Disparate Impact for gender : 0.843
Disparate Impact for SeniorCitizen : 0.438


The Disparate Impact score for the SeniorCitizen column is quite low, suggesting a bias towards this feature, while the Gender column has a score of ~84%, indicating a very fair treatment, which is above the threshold of 80%. However, the Senior Citizen score of 42.5% is significantly lower than the threshold value, indicating an unfair bias. It is important to use multiple fairness metrics in conjunction with each other to arrive at a conclusion.