# Example 7: Using different classifiers

hcga has been constructed so that the user can easily choose the statistical learning algorithm of their choice. By default we use the xgboost algorithm because it seamlessly integrates with the shapley values toolbox. However, users can input their own or choose from some already pre-defined algorithms. 

Here, we assume that you have already run example 1 and you can simply load in the features that were produced and saved.

## Important!

SHAPley values were first designed for Tree based algorithms. They are fast to compute for tree based algorithms.  However, Kernel based algorithms require a different Explainer function which is both slow and requires a lot of memory and computing power. I highly recommend users to not compute SHAP values when using Kernel based algorithms such as Support Vector Machines (pass the argument 'compute_shap=False' to stop this step).





In [None]:
import numpy as np
import networkx as nx
import pandas as pd
import scipy as sc

import os
from pathlib import Path

if not Path("datasets").exists():
    os.mkdir("datasets")
if not Path("results").exists():
    os.mkdir("results")

# Load data from Example 1

In [None]:
from hcga.hcga import Hcga

# WARNING: you need to run example_1 before this notebook

h = Hcga()
h.load_features("./results/custom_dataset_classification/all_features.pkl")

# Using default

In [None]:
# first we use the default
model = "XG"
h.analyse_features(
    model=model,
    plot=False,
    feature_file="./results/custom_dataset_classification/all_features.pkl",
    results_folder="./results/custom_dataset_classification",
)

# Use the random forest classifier inbuilt

In [None]:
model = "RF"
h.analyse_features(
    model=model,
    plot=False,
    feature_file="./results/custom_dataset_classification/all_features.pkl",
    results_folder="./results/custom_dataset_classification",
)

# Use custom Support Vector Machine classifier

In [None]:
from sklearn.svm import SVC

model = SVC(
    probability=True
)  # it is necessary to use probability=True to compute SHAP values

In [None]:
# we can compute with shap values
h.analyse_features(
    compute_shap=False,
    model=model,
    plot=False,
    feature_file="./results/custom_dataset_classification/all_features.pkl",
    results_folder="./results/custom_dataset_classification",
)

In [None]:
# or with shap values:
# WARNING the Kernel Explainer (for general models) is slow and requires a lot of memory
h.analyse_features(
    compute_shap=True,
    kfold=False,
    model=model,
    plot=False,
    feature_file="./results/custom_dataset_classification/all_features.pkl",
    results_folder="./results/custom_dataset_classification",
)

# KNN

In [None]:
from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier()
h.analyse_features(
    compute_shap=False,
    model=model,
    plot=False,
    feature_file="./results/custom_dataset_classification/all_features.pkl",
    results_folder="./results/custom_dataset_classification",
)