# Tutorial
This tutorial shows how to use the mllp package.

In [1]:
import torch
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from mllp.models import MLLP
from mllp.utils import DBEncoder

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

## Prepare data
We use the [breast cancer wisconsin dataset](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)) for example.  
X_df stores the feature values of all the instances.  
y_df stores the class labels of all the instances.  
f_df stores the feature names and feature types (continuous or discrete).

In [2]:
data = load_breast_cancer()

X_df = pd.DataFrame(data['data'], columns=data['feature_names'])
y_df = pd.DataFrame(data['target'], columns=['class'])
f_df = pd.DataFrame(zip(data['feature_names'], ['continuous'] * len(data.feature_names)))
X_train, X_test, y_train, y_test = train_test_split(X_df, y_df, train_size=0.8)

## Discrete and binarize data

For features in the data set are continuous (real numbers), we need to discrete them first.  
After data discretization, we use the one-hot encoding to encode all the features and the class.  
`DBEncoder` does all of this for us.

In [3]:
db_enc = DBEncoder(f_df, discrete=True)
db_enc.fit(X_train, y_train)
X_train, y_train = db_enc.transform(X_train, y_train)
X_test, y_test = db_enc.transform(X_test, y_test)

`DBEncoder` uses the recursive minimal entropy partitioning algorithm for data discretization.  
The class member `me_discretizer` is the discretizer and we can get the partition boundaries by:

In [4]:
db_enc.me_discretizer.boundaries

defaultdict(list,
            {'mean radius': [12.99, 15.04, 16.84],
             'mean texture': [18.45],
             'mean perimeter': [85.24, 96.45, 108.4],
             'mean area': [496.6, 690.2, 880.2],
             'mean smoothness': [0.08992],
             'mean compactness': [0.1021, 0.1364],
             'mean concavity': [0.0716, 0.09657],
             'mean concave points': [0.02657, 0.05102, 0.07404],
             'mean symmetry': [0.1723, 0.2081],
             'mean fractal dimension': [],
             'radius error': [0.1935, 0.3857, 0.5462],
             'texture error': [],
             'perimeter error': [2.056, 2.759, 4.36],
             'area error': [31.24, 53.65],
             'smoothness error': [],
             'compactness error': [0.0182],
             'concavity error': [0.01099, 0.02105],
             'concave points error': [0.009199, 0.012],
             'symmetry error': [],
             'fractal dimension error': [0.00233],
             'worst radius': 

## Set the MLLP
Set the network structure, device, rate of random binarization and whether use the NOT (~) operator.  
Run `MLLP?` for more information.

In [5]:
net_structure = [X_train.shape[-1], 32, y_train.shape[-1]]
# below is a more complex MLLP structure that can be used for a complex data set
# net_structure = [X_train.shape[-1], 128, 128, 64, y_train.shape[-1]]
net = MLLP(net_structure,
           device=device,
           random_binarization_rate=0.0,
           use_not=False)
net.to(device)

MLLP(
  (conj0): ConjunctionLayer(
    (randomly_binarize_layer): RandomBinarizationLayer()
  )
  (disj0): DisjunctionLayer(
    (randomly_binarize_layer): RandomBinarizationLayer()
  )
)

## Train the MLLP
Set the parameters for training and train the MLLP. The log is displayed during the training.  
Run `MLLP.train?` for more information.

In [6]:
training_log = net.train(
    X_train,
    y_train,
    lr=0.005,
    batch_size=16,
    epoch=100,
    lr_decay_rate=0.75,
    lr_decay_epoch=100,
    weight_decay=1e-7)

[INFO] - LR is set to 0.005
[INFO] - epoch: 0, loss: 6.935159310698509
[INFO] - ------------------------------------------------------------
[INFO] - On Training Set:
	Accuracy of MLLP Model: 0.7142857142857143
	Accuracy of CRS  Model: 0.2857142857142857
[INFO] - On Training Set:
	F1 Score of MLLP Model: 0.41666666666666663
	F1 Score of CRS  Model: 0.22222222222222224
[INFO] - ------------------------------------------------------------
[INFO] - epoch: 1, loss: 3.321583613753319
[INFO] - epoch: 2, loss: 2.1850833036005497
[INFO] - epoch: 3, loss: 1.9101239051669836
[INFO] - epoch: 4, loss: 1.6809748206287622


  'precision', 'predicted', average, warn_for)


[INFO] - epoch: 5, loss: 1.5773893147706985
[INFO] - ------------------------------------------------------------
[INFO] - On Training Set:
	Accuracy of MLLP Model: 1.0
	Accuracy of CRS  Model: 0.5714285714285714
[INFO] - On Training Set:
	F1 Score of MLLP Model: 1.0
	F1 Score of CRS  Model: 0.36363636363636365
[INFO] - ------------------------------------------------------------
[INFO] - epoch: 6, loss: 1.511269235983491
[INFO] - epoch: 7, loss: 1.4710526894778013
[INFO] - epoch: 8, loss: 1.3050645738840103
[INFO] - epoch: 9, loss: 1.2551615005359054
[INFO] - epoch: 10, loss: 1.1588296201080084
[INFO] - ------------------------------------------------------------
[INFO] - On Training Set:
	Accuracy of MLLP Model: 0.8571428571428571
	Accuracy of CRS  Model: 1.0
[INFO] - On Training Set:
	F1 Score of MLLP Model: 0.7878787878787878
	F1 Score of CRS  Model: 1.0
[INFO] - ------------------------------------------------------------
[INFO] - epoch: 11, loss: 1.0267322724685073
[INFO] - epoch

[INFO] - ------------------------------------------------------------
[INFO] - On Training Set:
	Accuracy of MLLP Model: 1.0
	Accuracy of CRS  Model: 1.0
[INFO] - On Training Set:
	F1 Score of MLLP Model: 1.0
	F1 Score of CRS  Model: 1.0
[INFO] - ------------------------------------------------------------
[INFO] - epoch: 81, loss: 0.1172674756780907
[INFO] - epoch: 82, loss: 0.11682874178586644
[INFO] - epoch: 83, loss: 0.1132623516359672
[INFO] - epoch: 84, loss: 0.12572347479908785
[INFO] - epoch: 85, loss: 0.11208124875884096
[INFO] - ------------------------------------------------------------
[INFO] - On Training Set:
	Accuracy of MLLP Model: 1.0
	Accuracy of CRS  Model: 1.0
[INFO] - On Training Set:
	F1 Score of MLLP Model: 1.0
	F1 Score of CRS  Model: 1.0
[INFO] - ------------------------------------------------------------
[INFO] - epoch: 86, loss: 0.11346478953873884
[INFO] - epoch: 87, loss: 0.10797896300391585
[INFO] - epoch: 88, loss: 0.10717325240329956
[INFO] - epoch: 89

## Test the trained MLLP and extracted CRS

In [7]:
acc, acc_b, f1, f1_b = net.test(X_test, y_test, need_transform=True)

print('Accuracy of MLLP Model: {}'
      '\nAccuracy of CRS  Model: {}'
      '\nF1 Score of MLLP Model: {}'
      '\nF1 Score of CRS  Model: {}'.format(acc, acc_b, f1, f1_b))

Accuracy of MLLP Model: 0.9649122807017544
Accuracy of CRS  Model: 0.9736842105263158
F1 Score of MLLP Model: 0.961025641025641
F1 Score of CRS  Model: 0.9715828832571666


## Display the extracted CRS

In [8]:
net.concept_rule_set_print(X_fname=db_enc.X_fname, y_fname=db_enc.y_fname, eliminate_redundancy=True)

------------------------------------------------------------------------------------------
 class_0:
	       r1,0:	 [' mean smoothness_>0.08992', ' mean concavity_>0.09657', ' worst concave points_>0.175']
	       r1,1:	 [' mean concavity_>0.09657', ' area error_(31.24, 53.65]', ' concavity error_>0.02105', ' worst radius_(14.9, 16.77]', ' worst area_(739.1, 880.8]', ' worst smoothness_>0.1389']
	       r1,5:	 [' mean texture_>18.45', ' mean smoothness_>0.08992', ' mean symmetry_(0.1723, 0.2081]', ' radius error_(0.1935, 0.3857]', ' worst texture_>23.84', ' worst concavity_>0.366']
	      r1,11:	 [' mean smoothness_>0.08992', ' compactness error_>0.0182', ' worst perimeter_(105.0, 120.3]', ' worst smoothness_>0.1389']
	      r1,12:	 [' mean texture_>18.45', ' mean concavity_>0.09657', ' concavity error_>0.02105', ' concave points error_>0.012', ' worst texture_>23.84', ' worst area_(739.1, 880.8]']
	      r1,19:	 [' mean texture_>18.45', ' mean area_(496.6, 690.2]', ' mean compactness_