# Examples
In this notebook we will see an example for each prompting strategy of how an explanation is generated and evaluated from start to finish

In [1]:
import dice_ml
from dice_ml.utils import helpers
import pandas as pd
import pickle
from prompts import *
from utils import *
from llm_explainers import *

In [2]:
# Load the model we want to explain
with open("""./models/loan_model.pkl""", 'rb') as file:
    model = pickle.load(file)

#Load the train and test data set
train_dataset = pd.read_csv('./data/adult_train_dataset.csv')
test_dataset = pd.read_csv('./data/adult_test_dataset.csv')

#Load the examples we will try to explain
test_df = pd.read_csv('./data/test_examples.csv')
test_df = test_df.drop(columns=['income'], axis = 1)

## Zero Shot

In [5]:
exp_m = LLMExplanation4CFs(model = model, #Load the model we want to explain
                            model_description = """ML-system that predicts wether a person will earn more than 50k $ a year""", # brief explanation of the ML model
                            backend='sklearn', # Framework used to build the model (used to generate counterfactuals)
                            dataset_info=string_info(train_dataset.columns, helpers.get_adult_data_info()) , # string information about the dataset
                            continuous_features=['age', 'hours_per_week'], # Necessary for the counterfactual generation
                            outcome_name= 'income', #Necessary for counterfactual generation
                            training_set=train_dataset, #Necessary for counterfactual generation
                            test_set= test_dataset, #Necessary to  check novelty of the evaluation example
                            llm='gpt-4o', #LLM used, works with Langchain
                            prompt_type='zero', # zero or one
                            n_counterfactuals=5, #Number of counterfactuals used in the explanation 
                            user_input=False #Human in the loop helping select the causes
                           )


exp_m.fit()
counterfactuals, rules, code1, result1, explanation, code2, final_cf, code3, prediction, n_rules,rules_followed, first_rule, second_rule,third_rule,  is_in_data = exp_m.explain_evaluate(user_data = test_df.iloc[[0]], verbose = False,return_all=True)

100%|██████████| 1/1 [00:00<00:00,  2.01it/s]


We will looked at the following example of a woman who is predicted to earn less than 50k$ a year. We will look at the whole process followed by the LLM in order to obtain this final explanation.

In [107]:
print(test_df.iloc[0].to_string)

<bound method Series.to_string of age                        29
workclass             Private
education             HS-grad
marital_status        Married
occupation        Blue-Collar
race                    White
gender                 Female
hours_per_week             38
Name: 0, dtype: object>


First, a set of counterfactuals will be generated using the DiCE ML package

In [108]:
counterfactuals.to_string

<bound method DataFrame.to_string of    age   workclass  education marital_status    occupation   race  gender  \
0   29  Government  Doctorate        Married   Blue-Collar  White  Female   
1   29     Private  Bachelors        Married  Professional  White  Female   
2   29     Private      Assoc        Married  White-Collar  White  Female   
3   29     Private  Bachelors        Married   Blue-Collar  White    Male   
4   29     Private    Masters        Married  Professional  White  Female   

   hours_per_week  income  
0              38       1  
1              38       1  
2              38       1  
3              38       1  
4              38       1  >

A set of rules is extracted from this counterfactual using the LLM

In [109]:
print(rules)

From the provided data, we can extract the most important observed rules that could potentially affect the outcome of whether a person earns more than $50k a year. Here are the key differences and rules based on the counterfactual cases:

### Negative Assessment Outcome
- **Age**: 29
- **Workclass**: Private
- **Education**: HS-grad
- **Marital Status**: Married
- **Occupation**: Blue-Collar
- **Race**: White
- **Gender**: Female
- **Hours per Week**: 38
- **Income**: ≤ $50k

### Positive Counterfactual Outcomes
1. **Age**: 29
   - **Workclass**: Government
   - **Education**: Doctorate
   - **Marital Status**: Married
   - **Occupation**: Blue-Collar
   - **Race**: White
   - **Gender**: Female
   - **Hours per Week**: 38
   - **Income**: > $50k

2. **Age**: 29
   - **Workclass**: Private
   - **Education**: Bachelors
   - **Marital Status**: Married
   - **Occupation**: Professional
   - **Race**: White
   - **Gender**: Female
   - **Hours per Week**: 38
   - **Income**: > $50k

3. *

In order to check whether this rules are correct or not, we ask the LLM to create a program that checks it. In the following cell we can see the code generated by the LLM.

In [110]:
print(code1)

import pandas as pd

# Define the negative assessment outcome
negative_outcome = pd.DataFrame({
    'age': [29],
    'workclass': ['Private'],
    'education': ['HS-grad'],
    'marital_status': ['Married'],
    'occupation': ['Blue-Collar'],
    'race': ['White'],
    'gender': ['Female'],
    'hours_per_week': [38],
    'income': [0]
})

# Define the positive counterfactual outcomes
positive_counterfactuals = pd.DataFrame({
    'age': [29, 29, 29, 29, 29],
    'workclass': ['Government', 'Private', 'Private', 'Private', 'Private'],
    'education': ['Doctorate', 'Bachelors', 'Assoc', 'Bachelors', 'Masters'],
    'marital_status': ['Married', 'Married', 'Married', 'Married', 'Married'],
    'occupation': ['Blue-Collar', 'Professional', 'White-Collar', 'Blue-Collar', 'Professional'],
    'race': ['White', 'White', 'White', 'White', 'White'],
    'gender': ['Female', 'Female', 'Female', 'Male', 'Female'],
    'hours_per_week': [38, 38, 38, 38, 38],
    'income': [1, 1, 1, 1, 1]
})

# De

After executing this code, the following results were obtained

In [111]:
print(result1)

Rule 1: Higher Education Level: 5 counterfactual(s) consistent
Rule 2: Different Occupation: 3 counterfactual(s) consistent
Rule 3: Workclass Change: 1 counterfactual(s) consistent
Rule 4: Gender: 1 counterfactual(s) consistent



By using this causes, the LLM will produce a final explanation, hopefully using the most important causes

In [112]:
print(explanation)

<explanation>

Based on the analysis of your data and the counterfactual outcomes, we have identified several key factors that can help you improve your chances of earning more than $50k a year. Here are the most important steps you can take:

### 1. Increase Your Education Level
The most consistent factor among the positive outcomes is a higher level of education. All of the counterfactual cases that predicted an income of more than $50k involved individuals with higher education levels than your current "HS-grad" status. Specifically, obtaining a:

- **Doctorate**
- **Bachelors**
- **Assoc (Associate Degree)**
- **Masters**

Investing in further education is the most impactful change you can make to increase your earning potential.

### 2. Consider Changing Your Occupation
Another significant factor is the type of occupation you are in. Moving from a "Blue-Collar" job to one of the following can significantly impact your income:

- **Professional**
- **White-Collar**

If possible, se

Now we would like to check the quality of our explanation. As we explain in the paper, we created a close loop evaluation method that checks whether the 

In [113]:
print(code2)

import pandas as pd

# Constructing the DataFrame with an example of a positive class profile
df = pd.DataFrame({
    'age': [35],  # Age above 29, to show more experience
    'workclass': ['Government'],  # Changing workclass to 'Government'
    'education': ['Bachelors'],  # Higher education level
    'marital_status': ['Married'],  # Marital status can remain the same
    'occupation': ['Professional'],  # Changing occupation to 'Professional'
    'race': ['White'],  # Race remains the same
    'gender': ['Female'],  # Gender remains the same
    'hours_per_week': [40],  # Slightly higher working hours per week
    'income': [1]  # Positive class (earning >$50k)
})

# Save the DataFrame to a CSV file
df.to_csv('temp_csv.csv', index=False)


After executing the code the LLM correctly generated a counterfactual example.

In [114]:
final_cf

Unnamed: 0,age,workclass,education,marital_status,occupation,race,gender,hours_per_week
0,35,Government,Bachelors,Married,Professional,White,Female,40


We will now try to extract metrics from this counterfactual like the predicted class, rules followed and if it exists in the data set. We ask the LLM for code again and this time it will create a table that we can analyze.

In [115]:
print(code3)

import pandas as pd

# Read the provided positive example from temp_csv.csv
df = pd.read_csv('temp_csv.csv')

# Extracted rules and their importance
rules = [
    {
        'Rule': 'Higher Education Level',
        'Importance': 5,
        'Condition': lambda x: x['education'] in ['Doctorate', 'Bachelors', 'Assoc', 'Masters']
    },
    {
        'Rule': 'Different Occupation',
        'Importance': 3,
        'Condition': lambda x: x['occupation'] in ['Professional', 'White-Collar']
    },
    {
        'Rule': 'Workclass Change',
        'Importance': 1,
        'Condition': lambda x: x['workclass'] == 'Government'
    },
    {
        'Rule': 'Gender',
        'Importance': 1,
        'Condition': lambda x: x['gender'] == 'Male'
    }
]

# Evaluate each rule on the example row
results = []
for rule in rules:
    rule_followed = int(rule['Condition'](df.iloc[0]))
    results.append({
        'Rule': rule['Rule'],
        'Importance': rule['Importance'],
        'In explanation': rul

In [116]:
eval_df = pd.read_csv('./temp_files/evaluation.csv')
eval_df

Unnamed: 0,Rule,Importance,In explanation
0,Higher Education Level,5,1
1,Different Occupation,3,1
2,Workclass Change,1,1
3,Gender,1,0


After analyzing the table we get the following results.

In [117]:
print('Prediction of the generated example: ', prediction)
print('Number of rules generated: ', n_rules)
print('Number of rules followed: ', rules_followed)
print('1st rule followed ', first_rule)
print('2nd rule followed ', second_rule)
print('3rd rule followed ', third_rule)
print('Example exists in data: ', is_in_data)

Prediction of the generated example:  1
Number of rules generated:  4
Number of rules followed:  3
1st rule followed  1
2nd rule followed  1
3rd rule followed  1
Example exists in data:  False


## One Shot

In [118]:
exp_m = LLMExplanation4CFs(model = model, #Load the model we want to explain
                            model_description = """ML-system that predicts wether a person will earn more than 50k $ a year""", # brief explanation of the ML model
                            backend='sklearn', # Framework used to build the model (used to generate counterfactuals)
                            dataset_info=string_info(train_dataset.columns, helpers.get_adult_data_info()) , # string information about the dataset
                            continuous_features=['age', 'hours_per_week'], # Necessary for the counterfactual generation
                            outcome_name= 'income', #Necessary for counterfactual generation
                            training_set=train_dataset, #Necessary for counterfactual generation
                            test_set= test_dataset, #Necessary to  check novelty of the evaluation example
                            llm='gpt-4o', #LLM used, works with Langchain
                            prompt_type='one', # zero or one
                            n_counterfactuals=5, #Number of counterfactuals used in the explanation 
                            user_input=False #Human in the loop helping select the causes
                           )


exp_m.fit()
counterfactuals, rules, code1, result1, explanation, code2, final_cf, code3, prediction, n_rules,rules_followed, first_rule, second_rule,third_rule, is_in_cfs, is_in_data = exp_m.explain_evaluate(example = test_df.iloc[[1]], verbose = False,return_all=True)

100%|██████████| 1/1 [00:00<00:00,  2.02it/s]


We will looked at the following example of a woman who is predicted to earn less than 50k$ a year. We will look at the whole process followed by the LLM in order to obtain this final explanation.

In [119]:
print(test_df.iloc[0])

age                        29
workclass             Private
education             HS-grad
marital_status        Married
occupation        Blue-Collar
race                    White
gender                 Female
hours_per_week             38
Name: 0, dtype: object


First, a set of counterfactuals will be generated using the DiCE ML package

In [120]:
counterfactuals

Unnamed: 0,age,workclass,education,marital_status,occupation,race,gender,hours_per_week,income
0,50,Other/Unknown,Some-college,Married,Other/Unknown,Other,Male,23,1
1,50,Other/Unknown,Some-college,Married,Other/Unknown,Other,Male,75,1
2,50,Private,Doctorate,Married,Other/Unknown,White,Male,40,1
3,50,Other/Unknown,Some-college,Married,Other/Unknown,Other,Male,69,1
4,50,Other/Unknown,Some-college,Married,Other/Unknown,Other,Male,99,1


A set of rules is extracted from this counterfactual using the LLM

In [121]:
print(rules)

To solve the problem, we need to:

1. Extract the most important observed rules from the counterfactual cases.
2. Generate Python code to count how many of the counterfactuals are consistent with the rules.
3. Order the rules based on their consistency with the counterfactuals.
4. Print the results.

Let's start by analyzing the counterfactual cases to extract the rules:

**Rules from Positive Counterfactuals:**

1. Higher education (Doctorate) leads to higher income.
2. More hours per week (23, 75, 40, 69, 99) can lead to higher income.

Now let's write Python code to count how many of the counterfactuals are consistent with the rules:

```python
import pandas as pd

# Given negative assessment outcome
negative_outcome = pd.DataFrame({
    'age': [50],
    'workclass': ['Other/Unknown'],
    'education': ['Some-college'],
    'marital_status': ['Married'],
    'occupation': ['Other/Unknown'],
    'race': ['White'],
    'gender': ['Male'],
    'hours_per_week': [40],
    'income': [0]


In order to check whether this rules are correct or not, we ask the LLM to create a program that checks it. In the following cell we can see the code generated by the LLM.

In [122]:
print(code1)

import pandas as pd

# Given negative assessment outcome
negative_outcome = pd.DataFrame({
    'age': [50],
    'workclass': ['Other/Unknown'],
    'education': ['Some-college'],
    'marital_status': ['Married'],
    'occupation': ['Other/Unknown'],
    'race': ['White'],
    'gender': ['Male'],
    'hours_per_week': [40],
    'income': [0]
})

# Given positive counterfactual outcome
positive_counterfactuals = pd.DataFrame({
    'age': [50, 50, 50, 50, 50],
    'workclass': ['Other/Unknown', 'Other/Unknown', 'Private', 'Other/Unknown', 'Other/Unknown'],
    'education': ['Some-college', 'Some-college', 'Doctorate', 'Some-college', 'Some-college'],
    'marital_status': ['Married', 'Married', 'Married', 'Married', 'Married'],
    'occupation': ['Other/Unknown', 'Other/Unknown', 'Other/Unknown', 'Other/Unknown', 'Other/Unknown'],
    'race': ['Other', 'Other', 'White', 'Other', 'Other'],
    'gender': ['Male'],
    'hours_per_week': [23, 75, 40, 69, 99],
    'income': [1, 1, 1, 1, 1]
})

After executing this code, the following results were obtained

In [123]:
print(result1)

Traceback (most recent call last):
  File "c:\Users\afred\REPOS\HI-AI-KDD24-LMM-4-CFs-Explanation\temp_code.py", line 17, in <module>
    positive_counterfactuals = pd.DataFrame({
  File "c:\Users\afred\miniconda3\envs\kdd\lib\site-packages\pandas\core\frame.py", line 664, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File "c:\Users\afred\miniconda3\envs\kdd\lib\site-packages\pandas\core\internals\construction.py", line 493, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
  File "c:\Users\afred\miniconda3\envs\kdd\lib\site-packages\pandas\core\internals\construction.py", line 118, in arrays_to_mgr
    index = _extract_index(arrays)
  File "c:\Users\afred\miniconda3\envs\kdd\lib\site-packages\pandas\core\internals\construction.py", line 666, in _extract_index
    raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length



By using this causes, the LLM will produce a final explanation, hopefully using the most important causes

In [124]:
print(explanation)

Based on the provided information, let's proceed with the necessary steps to extract the observed rules from the counterfactual cases, write Python code to count the consistent counterfactuals, and provide a clear explanation to the user.

### Step 1: Extract Observed Rules from Counterfactual Cases

**Rules from Positive Counterfactuals:**
1. Higher education (Doctorate) leads to higher income.
2. More hours per week (23, 75, 40, 69, 99) can lead to higher income.

### Step 2: Write Python Code to Count Consistent Counterfactuals

Let's write the Python code to count how many of the counterfactuals are consistent with the rules:

```python
import pandas as pd

# Given negative assessment outcome
negative_outcome = pd.DataFrame({
    'age': [50],
    'workclass': ['Other/Unknown'],
    'education': ['Some-college'],
    'marital_status': ['Married'],
    'occupation': ['Other/Unknown'],
    'race': ['White'],
    'gender': ['Male'],
    'hours_per_week': [40],
    'income': [0]
})

# G

Now we would like to check the quality of our explanation. As we explain in the paper, we created a close loop evaluation method that checks whether the 

In [125]:
print(code2)

import pandas as pd

# Define the data for the DataFrame
data = {
    'age': [50],
    'workclass': ['Other/Unknown'],
    'education': ['Doctorate'],  # Higher education
    'marital_status': ['Married'],
    'occupation': ['Other/Unknown'],
    'race': ['White'],
    'gender': ['Male'],
    'hours_per_week': [60],  # Increased work hours
    'income': [1]  # Positive class
}

# Create the DataFrame
df = pd.DataFrame(data)

# Save to csv
df.to_csv('temp_csv.csv', index=False)


After executing the code the LLM correctly generated a counterfactual example.

In [126]:
final_cf

Unnamed: 0,age,workclass,education,marital_status,occupation,race,gender,hours_per_week
0,50,Other/Unknown,Doctorate,Married,Other/Unknown,White,Male,60


We will now try to extract metrics from this counterfactual like the predicted class, rules followed and if it exists in the data set. We ask the LLM for code again and this time it will create a table that we can analyze.

In [127]:
print(code3)

import pandas as pd

# Read the example from temp_csv.csv
df = pd.read_csv('temp_csv.csv')

# Define the rules and their importance
rules = [
    {'Rule': 'Higher education (Masters, Doctorate, Bachelors, Prof-school) leads to higher income.', 'Importance': 5},
    {'Rule': 'Sales, Professional, and White-Collar occupations lead to higher income.', 'Importance': 3},
    {'Rule': 'Gender being Male leads to higher income.', 'Importance': 1}
]

# Initialize the 'In explanation' column with 0
for rule in rules:
    rule['In explanation'] = 0

# Check the rules
if df['education'].iloc[0] in ['Masters', 'Doctorate', 'Bachelors', 'Prof-school']:
    rules[0]['In explanation'] = 1
if df['occupation'].iloc[0] in ['Sales', 'Professional', 'White-Collar']:
    rules[1]['In explanation'] = 1
if df['gender'].iloc[0] == 'Male':
    rules[2]['In explanation'] = 1

# Create the DataFrame
df_final = pd.DataFrame(rules)

# Save to csv
df_final.to_csv('evaluation.csv', index=False)


In [128]:
eval_df = pd.read_csv('./temp_files/evaluation.csv')
eval_df

Unnamed: 0,Rule,Importance,In explanation
0,"Higher education (Masters, Doctorate, Bachelor...",5,1
1,"Sales, Professional, and White-Collar occupati...",3,0
2,Gender being Male leads to higher income.,1,1


After analyzing the table we get the following results.

In [129]:
print('Prediction of the generated example: ', prediction)
print('Number of rules generated: ', n_rules)
print('Number of rules followed: ', rules_followed)
print('1st rule followed ', first_rule)
print('2nd rule followed ', second_rule)
print('3rd rule followed ', third_rule)
print('Example exists in data: ', is_in_data)

Prediction of the generated example:  1
Number of rules generated:  3
Number of rules followed:  2
1st rule followed  1
2nd rule followed  0
3rd rule followed  1
Example exists in data:  False


## Tree of Thought
Tree of thought will open a given amount of branches and for each one of them an explanation will be generated like in the previous examples

In [15]:
exp_m = ToTLLMExplanation4CFs(model = model, #Load the model we want to explain
                        model_description = """ML-system that predicts wether a person will earn more than 50k $ a year""", # brief explanation of the ML model
                        backend='sklearn', # Framework used to build the model (used to generate counterfactuals)
                        dataset_info=string_info(train_dataset.columns, helpers.get_adult_data_info()) , # string information about the dataset
                        continuous_features=['age', 'hours_per_week'], # Necessary for the counterfactual generation
                        outcome_name= 'income', #Necessary for counterfactual generation
                        training_set=train_dataset, #Necessary for counterfactual generation
                        test_set= test_dataset, #Necessary to  check novelty of the evaluation example
                        llm='gpt-4o', #LLM used, works with Langchain
                        prompt_type='zero', # zero or one
                        n_counterfactuals=5, #Number of counterfactuals used in the explanation 
                        user_input=False, #Human in the loop helping select the causes
                        branches = 3
                    )
exp_m.fit()

out, explanation, code2, final_cf, code3, final_df, prediction, n_rules,rules_followed, first_rule, second_rule,third_rule, in_data = exp_m.explain_evaluate(user_data = test_df.iloc[[1]], verbose = False,return_all=True)  

100%|██████████| 1/1 [00:00<00:00,  3.97it/s]
100%|██████████| 1/1 [00:00<00:00,  4.92it/s]
100%|██████████| 1/1 [00:00<00:00,  4.18it/s]


After this process is done, we join all the examples in order ro feed them to the LLM

In [4]:
print(out)



System1:
Rules:
Based on the provided negative and positive counterfactual outcomes from the machine learning system, here are the extracted rules that seem to influence whether a person will earn more than $50k a year:

1. **Age**:
   - **Positive Influence**: Being older, especially around 62 and 75 years old, appears to positively influence the likelihood of earning more than $50k.
   
2. **Workclass**:
   - **Positive Influence**: Being "Self-Employed" has a positive impact on income prediction.
   
3. **Education**:
   - **Positive Influence**: Having higher education levels such as "Masters" or "Doctorate" increases the likelihood of earning more than $50k. Even "Some-college" education can result in a positive income outcome.
   
4. **Occupation**:
   - **Positive Influence**: Working in the "Service" sector appears to be influential in earning more than $50k when compared to "Other/Unknown" occupations.
   
5. **Race**:
   - **Positive Influence**: Being of a race other than 

Using all this information, the LLM generates an explanation.

In [5]:
print(explanation)

Based on the analysis of the data from the machine learning system, here are some practical steps you can take to improve your chances of earning more than $50,000 a year. These suggestions are based on the most important patterns observed in people who have successfully achieved a higher income.

### Actionable Steps to Increase Your Income:

1. **Advance Your Education (Most Important)**
   - **Why:** Higher education levels are the most consistent factor associated with earning more than $50,000.
   - **Action:** Consider pursuing additional education such as a Bachelor's, Master's, or professional school degree. Enroll in further education programs or obtain higher degrees if possible.

2. **Seek Employment in the Private Sector**
   - **Why:** Working in the private sector is another highly consistent factor that positively impacts income.
   - **Action:** If you are currently in an undefined or less stable job category, look for job opportunities in the private sector. This chang

Again, we follow the same process for the evaluation

In [6]:
print(code2)

import pandas as pd

# Define the data for the DataFrame
data = {
    'age': [50],
    'workclass': ['Private'],
    'education': ['Bachelors'],
    'marital_status': ['Married'],
    'occupation': ['Sales'],
    'race': ['White'],
    'gender': ['Male'],
    'hours_per_week': [50],
    'income': [1]
}

# Create the DataFrame
df = pd.DataFrame(data)

# Save to csv
df.to_csv('temp_csv.csv', index=False)


In [7]:
final_cf

Unnamed: 0,age,workclass,education,marital_status,occupation,race,gender,hours_per_week
0,50,Private,Bachelors,Married,Sales,White,Male,50


In [8]:
print(code3)

import pandas as pd

# Load the provided example
df = pd.read_csv('temp_csv.csv')

# Define the rules and their importance from each system
rules = [
    {"Rule": "Age Rule: Individuals around the age of 62 and 75 are more likely to earn more than $50k.", "Importance": 2},
    {"Rule": "Workclass Rule: Being 'Self-Employed' increases the likelihood of earning more than $50k.", "Importance": 2},
    {"Rule": "Education Rule: Having a 'Doctorate' or 'Masters' degree, or even 'Some-college' education, positively influences income.", "Importance": 5},
    {"Rule": "Occupation Rule: Working in the 'Service' sector can result in earning more than $50k.", "Importance": 1},
    {"Rule": "Race Rule: Being of a race other than 'White' can positively impact income prediction.", "Importance": 3},
    {"Rule": "Rule 1: Higher Education Level", "Importance": 3},
    {"Rule": "Rule 2: Change in Workclass", "Importance": 1},
    {"Rule": "Rule 3: Change in Occupation", "Importance": 1},
    {"Rule": "

In [9]:
final_df

Unnamed: 0,Rule,Importance,In explanation
0,Age Rule: Individuals around the age of 62 and...,2,0
1,Workclass Rule: Being 'Self-Employed' increase...,2,0
2,Education Rule: Having a 'Doctorate' or 'Maste...,5,0
3,Occupation Rule: Working in the 'Service' sect...,1,0
4,Race Rule: Being of a race other than 'White' ...,3,0
5,Rule 1: Higher Education Level,3,1
6,Rule 2: Change in Workclass,1,0
7,Rule 3: Change in Occupation,1,1
8,Rule 4: Increased Working Hours,1,1
9,Rule 5: Racial Factor,2,0


In [11]:
print('Prediction of the generated example: ', prediction)
print('Number of rules generated: ', n_rules)
print('Number of rules followed: ', rules_followed)
print('1st rule followed ', first_rule)
print('2nd rule followed ', second_rule)
print('3rd rule followed ', third_rule)
print('Example exists in data: ', in_data)

Prediction of the generated example:  1
Number of rules generated:  16
Number of rules followed:  4
1st rule followed  0
2nd rule followed  0
3rd rule followed  1
Example exists in data:  False


As we can see, the rule set is much bigger now, so different causes can be selected