# Explanations using LIME

## Preparation

Install lime and read Github Page ✔
Go through LIME_Tutorial_notebook ✔
Answer Questions:

#### What did you learn about the model?

* The model has a very high F1 Score -> Should be great model, right?
* Meta-Data like "Re:" or "Post" feature heavily in the local explanation. Email ending in .edu seems to point quite heavily towards Atheism.
* Other words like "the" or "of" seem to have an influence, too, which seems questionable.
* Very few words seem to increase probability of classification as "christian", and if so very weakly. A good exception to this seems to be Document 423, which only has local variables, increasing likelihood of class "christian". 
* The same word can steer the model in different directions depending on context. Especially the word "god" seems to just steer the model stronger in either direction depending on the other local vocabulary.

#### How well do you think the classifier works and why?

Given the above described problems, we think that the classifier is probably overfitted and highly dependent on the exact context of the messages. 

#### How useful is LIME for a non-data-scientist (e.g. non-ML-experts or designers) and why?

LIME is relatively simple to setup for data scientists / software developers, however in our experience, non-technical people already have a problem with using cli / github downloads. Python skills are also required. 
Once the setup is done, the results can however be visualised well and are partially easy to interpret. However, they also only seem to offer a superficial explanation on many parts of the model. Some things the model does we *noticed* thanks to LIME, however would need to use other methods and substantially more work to actually understand.
So we would say that as long as the setup is done by technical folks, LIME can be very useful for non-ML folks seeking to understand the models they are being served.

#### What questions is LIME able to answer and why?

As described above, lime can detect bad models by reducing the complexity of the model to linear scales using approximation. 

#### Would you incorporate tools like LIME into your data science practice and how?

LIME seems to be very useful in getting an initial understanding of if your Model has obvious flaws and if it is under/ overfitted. With the right visualisations, it can be a powerful tool to educate stakeholders about the models used. As such we would try to incorporate LIME as well as other explainability tools in our data science practice as a continous tool to detect bullshit and educate lay people.


## Setup

### Fetching Data, training classifier

In [2]:
import lime
import pandas as pd
import sklearn
import sklearn.ensemble
import sklearn.metrics

In [7]:
df = pd.read_csv('df_german.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,checking_account,duration_in_month,credit_history,purpose,credit_amount,savings_account,employment_since,installment_rate,status_and_sex,...,property,age,other_installment_plans,housing,existing_credits,job,people_to_provide_maintenance_for,telephone,foreign_worker,customer_goodness
0,0,negative,6,critical_account,radio_tv,1169,unknown,over_7y,4,male_single,...,real_estate,67,none,own,2,employed,1,yes,yes,yes
1,1,0_to_200,48,existing_duly_paid,radio_tv,5951,up_to_100,1_to_4y,2,female_married_separated,...,real_estate,22,none,own,1,employed,1,no,yes,no
2,2,none,12,critical_account,education,2096,up_to_100,4_to_7y,2,male_single,...,real_estate,49,none,own,1,unskilled_resident,2,no,yes,yes
3,3,negative,42,existing_duly_paid,furniture,7882,up_to_100,4_to_7y,2,male_single,...,building_society_savings_agreement_or_life_ins...,45,none,free,1,employed,2,no,yes,yes
4,4,negative,24,delayed,car_new,4870,up_to_100,1_to_4y,3,male_single,...,none_unknown,53,none,free,2,employed,2,no,yes,no
