# DIVALGO demonstration

This notebook contains a demostration of the DIVALGO tool. At first we train a basic logistic model with no hyperparameter tuning, as we focus on demonstrating the divalgo tool and Evaluation class methods

In [1]:
import divalgo_class as div
import os
import sklearn.linear_model as lm
import numpy as np
from PIL import Image

#### Train model for example case

In [2]:
# Load data
dogs = sorted(os.listdir(os.path.join("..", "data", "dogs")))
wolves =  sorted(os.listdir(os.path.join("..", "data", "wolves")))


Load in images one at a time, convert to numerical arrays and append to list

In [3]:
# Preprocessing
img_size = 50
dogs_images = []
wolves_images = [] 

for i in dogs:
    if os.path.isfile(os.path.join("..","data", "dogs", f"{i}")):
        img = Image.open(os.path.join("..","data", "dogs", f"{i}")).convert('L')            
        img = img.resize((img_size,img_size), Image.ANTIALIAS)
        img = np.asarray(img)/255.0
        dogs_images.append(img)    

for i in wolves:
    if os.path.isfile(os.path.join("..","data", "wolves", f"{i}")):
        img = Image.open(os.path.join("..","data", "wolves", f"{i}")).convert('L')
        img = img.resize((img_size,img_size), Image.ANTIALIAS)
        img = np.asarray(img)/255.0     
        wolves_images.append(img)   

Manually split the train-test set for showcase purposes. In an actual ML pipeline, this step would need careful consideration to ensure balanced classes. 

In [4]:
# Manual train-test split (to track filenames)
X_train = np.asarray(dogs_images[0:800] + wolves_images[0:800])
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1]*X_train.shape[2])
X_test = np.asarray(dogs_images[800:1000] + wolves_images[800:1000])
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1]*X_test.shape[2])
y_train = np.asarray(["dog" for y in range(800)] + ["wolf" for y in range(800)])
y_train = y_train.reshape(y_train.shape[0],1)
y_test_ar = np.asarray(["dog" for y in range(200)] + ["wolf" for y in range(200)])
y_test = y_test_ar.reshape(y_test_ar.shape[0],1)

y_train, y_test = [k.T for k in [y_train, y_test]]
filenames_test = [os.path.join("..", "data", "dogs", d) for d in dogs[800:1000]] + [os.path.join("..", "data", "wolves", w) for w in wolves[800:1000]]

In [5]:
# Train model
model = lm.LogisticRegression(penalty='none', tol=0.1, max_iter=500).fit(X_train, y_train[0])

#### Showing tool
Now we demonstrate the divalgo tool by creating an instante of the Evaluate class and exploring the methods in it

In [6]:
# Instantiating class
dog_wolf = div.Evaluate((X_test, y_test[0], filenames_test), model)

In [7]:
# Accuracy charts - overall and by type
dog_wolf.accuracy()
dog_wolf.accuracy_type()

In [8]:
# Get table with performance metrics - w/o column showing formulas
dog_wolf.get_metrics(equations=True)
dog_wolf.get_metrics(equations=False) # This is also the default

In [9]:
dog_wolf.confusion()

In [10]:
# Show AUC-ROC curve
dog_wolf.plot_roc_curve()

In [None]:
# Plot coefficient heatmaps - as absolutes or not
dog_wolf.plot_coefs(absolute=False) # Default
dog_wolf.plot_coefs(absolute=True)

In [None]:
dog_wolf.explore_embeddings()

In [None]:
dog_wolf.open_visualization()

2022-06-09 12:50:29.628 INFO    numexpr.utils: NumExpr defaulting to 8 threads.



  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8562
  Network URL: http://192.168.8.150:8562

  For better performance, install the Watchdog module:

  $ xcode-select --install
  $ pip install watchdog
            


100it [00:07, 12.81it/s]
