# <center> Welcome to PyExplainer Quickstart Guide </center>

## 1. Prepare data and model

Note. We use the default data and model here for an example

### 1.1 Import required library

In [1]:
from pyexplainer import pyexplainer_pyexplainer
# from pyexplainer.pyexplainer_pyexplainer import *
# import PyExplainer.pyexplainer
# from pyexplainer.pyexplainer_pyexplainer import PyExplainer as pyexp

### 1.2 Obtain default dataset and global model (Random Forest)

In [2]:
default_data_and_model = pyexplainer_pyexplainer.get_dflt()
py_explainer = pyexplainer_pyexplainer.PyExplainer(X_train = default_data_and_model['X_train'],
                           y_train = default_data_and_model['y_train'],
                           indep = default_data_and_model['indep'],
                           dep = default_data_and_model['dep'],
                           blackbox_model = default_data_and_model['blackbox_model'])

## 🔧2. Create PyExplainer object 

### 2.1 Prepare data for creating PyExplainer

In [3]:
X_explain = default_data_and_model['X_explain']
y_explain = default_data_and_model['y_explain']

### 2.2 Create PyExplainer

In [4]:
created_rule_obj = py_explainer.explain(X_explain=X_explain,
                                        y_explain=y_explain,
                                        search_function='crossoverinterpolation')

## 3. Create interactive visualization

You can change feature values at the slider bar to observe change of risk score.

In [5]:
py_explainer.visualise(created_rule_obj)

HBox(children=(Label(value='Risk Score: '), FloatProgress(value=0.0, bar_style='info', layout=Layout(width='40…

Output(layout=Layout(border='3px solid black'))

FloatSlider(value=0.74, continuous_update=False, description='#1 The value of ent is more than 0.74', layout=L…

FloatSlider(value=3.0, continuous_update=False, description='#2 The value of app is less than 3.0', layout=Lay…

FloatSlider(value=3.0, continuous_update=False, description='#3 The value of nd is less than 3', layout=Layout…

# Appendix

## The detail of variables used to to create PyExplainer

### Synthetic_data

Synthetic_data is data that are generated by PyExplainer using one of the following approaches.

1. Crossover and Interpolation
2. Random Perturbation.

After Synthetic_data is generated, it is stored as a pandas DataFrame object. 

In [17]:
print("Type of pyExp_rule_obj['synthetic_data'] - ", type(created_rule_obj['synthetic_data']), "\n")

print('Example')
display(created_rule_obj['synthetic_data'].head(2))

Type of pyExp_rule_obj['synthetic_data'] -  <class 'pandas.core.frame.DataFrame'> 

Example


Unnamed: 0,la,nd,ns,ent,nrev,rtime,self,ndev,age,app,rrexp,asawr,rsawr
0,194.0,2.0,1.0,0.8,12.0,9.4,0.0,44.0,1.97,2.0,1290.0,0.01,0.61
1,130.0,4.0,1.0,0.83,15.0,14.47,0.0,55.0,0.06,2.0,1206.0,0.0,0.54


### Synthetic_predictions

Synthetic_predictions is the prediction of Synthetic_data, which is obtained from the global model inside PyExplainer.

In [12]:
print("Type of pyExp_rule_obj['synthetic_predictions'] - ", type(created_rule_obj['synthetic_predictions']), "\n")
print("Example", "\n\n", created_rule_obj['synthetic_predictions'])

Type of pyExp_rule_obj['synthetic_predictions'] -  <class 'numpy.ndarray'> 

Example 

 [False  True False ...  True  True  True]


### X_explain

X_explain is an instance to be explained (which is a defective commit in this context)

In [15]:
print("Type of pyExp_rule_obj['X_explain'] - ", type(created_rule_obj['X_explain']), "\n")

print('Example')
display(created_rule_obj['X_explain'])

Type of pyExp_rule_obj['X_explain'] -  <class 'pandas.core.frame.DataFrame'> 

Example


Unnamed: 0_level_0,la,nd,ns,ent,nrev,rtime,self,ndev,age,app,rrexp,asawr,rsawr
commit_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
a9a59ccbacafd6eb94f57861cfc28f5a24f474db,155,3,1,0.736602,12.0,32.516586,0,70.0,0.669703,3.0,2374.0,0.143199,0.390069


### y_explain

y_explain is a label of X_explain 

In [18]:
print("Type of pyExp_rule_obj['y_explain'] - ", type(created_rule_obj['y_explain']), "\n")
print("Example", "\n\n", created_rule_obj['y_explain'])

Type of pyExp_rule_obj['y_explain'] -  <class 'pandas.core.series.Series'> 

Example 

 commit_id
a9a59ccbacafd6eb94f57861cfc28f5a24f474db    True
Name: defect, dtype: bool


### indep
#### indep is feature names of X_explain

In [19]:
print("Type of pyExp_rule_obj['indep'] - ", type(created_rule_obj['indep']), "\n")
print("Example", "\n\n", created_rule_obj['indep'])

Type of pyExp_rule_obj['indep'] -  <class 'pandas.core.indexes.base.Index'> 

Example 

 Index(['la', 'nd', 'ns', 'ent', 'nrev', 'rtime', 'self', 'ndev', 'age', 'app',
       'rrexp', 'asawr', 'rsawr'],
      dtype='object')


### dep
#### dep is a label name

In [20]:
print("Type of pyExp_rule_obj['dep'] - ", type(created_rule_obj['dep']), "\n")
print("Example", "\n\n", created_rule_obj['dep'])

Type of pyExp_rule_obj['dep'] -  <class 'str'> 

Example 

 defect


### top_k_positive_rules

top_k_positive_rules is top-k rules that are genereated by PyExplainer to explain why a commit is predicted as defective.

Here we show top-3 rules that lead to defective commits=

In [26]:
print("Type of pyExp_rule_obj['top_k_positive_rules'] - ", type(created_rule_obj['top_k_positive_rules']), "\n")
print('Example')
display(created_rule_obj['top_k_positive_rules'].head(3))

Type of pyExp_rule_obj['top_k_positive_rules'] -  <class 'pandas.core.frame.DataFrame'> 

Example


Unnamed: 0,index,rule,type,coef,support,importance,is_satisfy_instance
0,572,la <= 104.84500122070312 & ndev > 71.319999694...,rule,0.220501,0.232376,0.093128,True
1,359,la > 103.68999862670898 & nrev > 8.11499977111...,rule,0.167275,0.237598,0.071194,True
2,98,app <= 3.9550000429153442 & ndev <= 103.875 & ...,rule,0.136704,0.631854,0.065933,True


### top_k_negative_rules

top_k_negative_rules is top-k negative rules that are genereated by PyExplainer to explain why a commit is predicted as clean.

The default number of generated rules is 3.


In [27]:
print("Type of pyExp_rule_obj['top_k_negative_rules'] - ", type(created_rule_obj['top_k_negative_rules']), "\n")
print('Example')
display(created_rule_obj['top_k_negative_rules'])

Type of pyExp_rule_obj['top_k_negative_rules'] -  <class 'pandas.core.frame.DataFrame'> 

Example


Unnamed: 0,rule,type,coef,support,importance,Class
820,ent <= 0.9350000023841858 & app > 2.9900000095...,rule,-0.216263,0.389034,0.105435,Clean
1609,app <= 4.009999990463257 & app > 2.99000000953...,rule,-0.206549,0.428198,0.102204,Clean
323,la <= 104.84500122070312 & ndev <= 71.31999969...,rule,-0.173105,0.391645,0.084496,Clean


# Bug Report Channel
#### Please report <a href="https://github.com/awsm-research/pyExplainer/issues">here</a>
#### 📧 or email your report to michaelfu1998@gmail.com