### LIME Implementation 
Sources:

- Based on: https://towardsdatascience.com/lime-how-to-interpret-machine-learning-models-with-python-94b0e7e4432e

- Github Repo for LIME: https://github.com/marcotcr/lime

- dataset: https://www.kaggle.com/datasets/piyushagni5/white-wine-quality?resource=download

> more resources on LIME:
* https://homes.cs.washington.edu/~marcotcr/blog/lime/
* https://subscription.packtpub.com/book/data/9781800208131/8/ch08lvl1sec64/getting-started-with-lime
* https://towardsdatascience.com/top-5-techniques-for-explainable-ai-34349990cc83
* https://www.thepythoncode.com/article/explainable-ai-model-python (ELI5) 
* https://towardsdatascience.com/essential-explainable-ai-python-frameworks-that-you-should-know-about-84d5063b75e9 (other frameworks)
* https://christophm.github.io/interpretable-ml-book/ (really detailed - Book)
* https://github.com/PacktPublishing/Hands-On-Explainable-AI-XAI-with-Python (Book) 

R tutorial: 
https://algoritmaonline.com/interpreting-classification-model-with-lime/



In [None]:
#install lime 

%pip install lime

In [None]:
#imports and reading data 

import numpy as np
import pandas as pd 

df = pd.read_csv('winedata.csv')
df.head()

In [None]:
df.info()

In [None]:
df.describe()

### Task at hand: Classification of Wine Quality 



In [None]:
df['quality'].unique()

In [None]:
#replace with "good" and "bad" 

def replace_numeric(val):
    if val <= 5:
        val = 'bad'
    else:
        val = 'good'
    
    return val



In [None]:
quality = df['quality'].tolist()
quality[:20]

In [None]:
quality[15]

In [None]:
cat_qual = []
for i in range(len(quality)):
    cat_qual.insert(i, replace_numeric(quality[i]))
    
cat_qual[:20]

In [None]:
ser_qual = pd.Series(cat_qual)

In [None]:
df['cat_quality'] = ser_qual

In [None]:
df.head()

### training classifier: Random Forest 


In [None]:
from sklearn.model_selection import train_test_split

x = df.drop(columns = ['quality', 'cat_quality'], axis = 1)

In [None]:
y = df['cat_quality']

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 42)

In [None]:
from sklearn.ensemble import RandomForestClassifier 

RF = RandomForestClassifier(random_state = 42)
RF_fit = RF.fit(x_train, y_train)
RF_pred = RF.predict(x_test)

score = RF_fit.score(x_test, y_test)

In [None]:
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score



precision = precision_score(y_test, RF_pred, average = 'micro')
print('Precision: %.3f' % precision)

recall = recall_score(y_test, RF_pred, average = 'micro')
print('Recall: %.3f' % recall)

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns 

confMat = confusion_matrix(y_test, RF_pred)

print(" Confusion Matrix: ")
print("-------------------")
  
confMat_df = pd.DataFrame(confMat, columns = ['TP&FN', 'FP&TN'])
print(confMat_df)
print('====================')

sns.heatmap(confMat/np.sum(confMat), annot=True, fmt='.2%') 

### Model Interpretation 

So far, we haven't done anything new. Now, we will import LIME and create a tabular explainer object. It requires 4 parameters: 

- training data: what are we training on 
- feature names: column names 
- class names: in this case, "good" and "bad" 
- model: the type of ML problem, in this case just a classification

For lime tabular, source: 
https://lime-ml.readthedocs.io/en/latest/lime.html#module-lime.lime_tabular

In [None]:
import lime 
from lime import lime_tabular 

explainer = lime_tabular.LimeTabularExplainer(
    training_data = np.array(x_train),
    feature_names = x_train.columns, 
    class_names = ['bad', 'good'],
    mode = 'classification'
)

In [None]:
explainer

Now, we create an instance of the explainanation. 

> "explain_instance(data_row, predict_fn, labels=(1, ), top_labels=None, num_features=10, num_samples=5000, distance_metric='euclidean', model_regressor=None, sampling_method='gaussian')

> Generates explanations for a prediction.

> First, we generate neighborhood data by randomly perturbing features from the instance (see __data_inverse). We then learn locally weighted linear models on this neighborhood data to explain each of the classes in an interpretable way (see lime_base.py)." 

Explaination: 

- data_row: a row of your data you want to use to explain the result of the prediction on 
- predict_fn: prediction function, in a classifier this outputs a probability (of belonging to a class). In regression, it is the actual predicted value. 

In [None]:
RF_fit

https://github.com/scikit-learn/scikit-learn/blob/9aaed4987/sklearn/ensemble/_forest.py#L838

Let's see a good wine and a bad wine: 

In [None]:
y_test[:10]

Locations: 0, 1, 2 are "good" and 3 is "bad". 

In [None]:
#good wine
exp = explainer.explain_instance(
    data_row = x_test.iloc[1], 
    predict_fn = RF_fit.predict_proba
)

exp.show_in_notebook(show_table = True)

### Interpretation: 

#### Part 1. (Left) : Confidence 
- Model is 96% confident that this is a "good" wine. (We know it is) 


#### Part 2. (Middle) : Feature Importance 
- Things that matter in this classification: Alcohol level (19%), volatile acidity (10%), density (7%), and citric acid (5%)

> When is a wine bad? 
- high levels of "volatile acidity" (greater than 0.33), it is 0.53 here 
- low levels of "citric acid" (lower than 0.27), it is 0.16 here

> When is a wine good? 
- Alcohol greater than 11.40, it's 13.2 here 
- density lower than 0.99, it's 0.99 here 
- chlorides lower than 0.04, it is 0.04 here 

and so on 

#### Part 3. (Right) : Values 
Just to check. 

### Class of "BAD" Example: 

In [None]:
exp2 = explainer.explain_instance(
    data_row = x_test.iloc[3],
    predict_fn = RF_fit.predict_proba
)

exp2.show_in_notebook(show_table = True)

### Interpretation: 

#### Part 1. (Left) : Confidence 
- Model is 84% confident that this is a "bad" wine. (We know it is) 


#### Part 2. (Middle) : Feature Importance 
- Things that matter in this classification: Alcohol level (8%), residual sugar (6%), chlorides (5%)

> When is a wine bad? 
- low levels of "residual sugar" (less than 1.7), it is 1.6 here 
- high levels of chlorides (greater than 0.05), it is 0.05 here

> When is a wine good? 
- Alcohol between 10.4 and 11.40, it's 10.7 here 


#### Part 3. (Right) : Values 
Just to check. 