# LIME Text Explainer via XAI

This tutorial demonstrates how to generate explanations using LIME's text explainer implemented by the Contextual AI library. Much of the tutorial overlaps with what is covered in the [LIME tabular tutorial](lime_tabular_explainer.ipynb). To recap, the main steps for generating explanations are:

1. Get an explainer via the `ExplainerFactory` class
2. Build the text explainer
3. Call `explain_instance`


## Credits
1. Pramodh, Manduri <manduri.pramodh@sap.com>

### Step 1: Import libraries

In [1]:
# Some auxiliary imports for the tutorial
import pprint
import sys
import random
import numpy as np
from pprint import pprint
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer

# Set seed for reproducibility
np.random.seed(123456)

# Set the path so that we can import ExplainerFactory
sys.path.append('../../../')

# Main Contextual AI imports
import xai
from xai.explainer import ExplainerFactory

### Step 2: Load dataset and train a model

In this tutorial, we rely on the 20newsgroups text dataset, which can be loaded via sklearn's dataset utility. Documentation on the dataset itself can be found [here](https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html). To keep things simple, we will extract data for 3 topics - baseball, Christianity, and medicine.

Our target model is a multinomial Naive Bayes classifier, which we train using TF-IDF vectors.

In [2]:
# Train on a subset of categories

categories = [
    'rec.sport.baseball',
    'soc.religion.christian',
    'sci.med'
]

raw_train = datasets.fetch_20newsgroups(subset='train', categories=categories)
print(list(raw_train.keys()))
print(raw_train.target_names)
print(raw_train.target[:10])
raw_test = datasets.fetch_20newsgroups(subset='test', categories=categories)

X_train = raw_train.data
vectorizer = TfidfVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
y_train = raw_train.target

X_test_vec = vectorizer.transform(raw_test.data)
y_test = raw_test.target

clf = MultinomialNB(alpha=0.1)
clf.fit(X_train_vec, y_train)

limit_size=200
pprint('Subsetting training sample to %s to speed up.' % limit_size)
X_train = X_train[:limit_size]
pprint('Classifier score: %s' % clf.score(X_test_vec, y_test))
pprint('Classifier predict func %s:' % clf.predict_proba)

['data', 'filenames', 'target_names', 'target', 'DESCR']
['rec.sport.baseball', 'sci.med', 'soc.religion.christian']
[1 0 2 2 0 2 0 0 0 1]
'Subsetting training sample to 200 to speed up.'
'Classifier score: 0.9689336691855583'
('Classifier predict func <bound method _BaseNB.predict_proba of '
 'MultinomialNB(alpha=0.1, class_prior=None, fit_prior=True)>:')


### Step 3: Instantiate the explainer

Here, we will use the LIME Text Explainer.

In [3]:
explainer = ExplainerFactory.get_explainer(domain=xai.DOMAIN.TEXT)
clf.predict_proba

<bound method _BaseNB.predict_proba of MultinomialNB(alpha=0.1, class_prior=None, fit_prior=True)>

### Step 4: Build the explainer

This initializes the underlying explainer object. We provide the `explain_instance` method below with the raw text - LIME's text explainer algorithm will conduct its own preprocessing in order to generate interpretable representations of the data. Hence we must define a custom `predict_fn` which takes a raw piece of text, vectorizes it via a pre-trained TF-IDF vectorizer, and passes the vector into the trained Naive Bayes model to generate class probabilities. LIME uses `predict_fn` to query our Naive Bayes model in order to learn its behavior around the provided data instance.

In [4]:
def predict_fn(instance):
    vec = vectorizer.transform(instance)
    return clf.predict_proba(vec)

explainer.build_explainer(predict_fn)

In [5]:
clf = clf
feature_names = []
clf_fn = predict_fn
target_names_list = []

import os
import json
import sys
sys.path.append('../../../')
from xai.compiler.base import Configuration, Controller
json_config = 'lime-text-classification-model-interpreter.json'
with open(json_config) as file:
    config = json.load(file)
config

The sklearn.metrics.classification module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.metrics. Anything that cannot be imported from sklearn.metrics is now part of the private API.


{'name': 'Report for Lime Text Explainer',
 'overview': True,
 'content_table': True,
 'contents': [{'title': 'Model Interpreter Text Explainer',
   'desc': 'This section provides the Interpretation of model',
   'sections': [{'title': 'Model Interpreter Analysis ',
     'desc': 'Model and train data from 20 News Group',
     'component': {'_comment': 'refer to document section xxxx',
      'class': 'ModelInterpreter',
      'attr': {'domain': 'text',
       'method': 'lime',
       'mode': 'classification',
       'train_data': 'var:X_train',
       'labels': 'var:y_train',
       'predict_func': 'var:clf_fn',
       'target_names': 'var:target_names_list',
       'model_interpret_stats_type': 'top_k',
       'model_interpret_k_value': 5,
       'model_interpret_top_value': 15,
       'num_of_class': 1,
       'valid_x': 'var:X_test',
       'valid_y': 'var:y_test',
       'error_analysis_stats_type': 'average_score',
       'error_analysis_k_value': 5,
       'error_analysis_top_valu

In [6]:
controller = Controller(config=Configuration(config, locals()))
controller.render()

Interpret 100/200 samples
Interpret 200/200 samples


### Results

In [7]:
pprint("report generated : %s/20newsgroup-clsssification-model-interpreter-report.pdf" % os.getcwd())
('report generated : '
 '/Users/i062308/Development/Explainable_AI/tutorials/compiler/20newsgroup/20newsgroup-clsssification-model-interpreter-report.pdf')

('report generated : '
 '/Users/i062308/Development/Explainable_AI/tutorials/compiler/20newsgroup/20newsgroup-clsssification-model-interpreter-report.pdf')


'report generated : /Users/i062308/Development/Explainable_AI/tutorials/compiler/20newsgroup/20newsgroup-clsssification-model-interpreter-report.pdf'