<div style="text-align: justify; padding:5px; background-color:rgb(252, 253, 255); border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">
    <font color='red'>To begin: Click anywhere in this cell and press <kbd>Run</kbd> on the menu bar. This executes the current cell and then highlights the next cell. There are two types of cells. A <i>text cell</i> and a <i>code cell</i>. When you <kbd>Run</kbd> a text cell (<i>we are in a text cell now</i>), you advance to the next cell without executing any code. When you <kbd>Run</kbd> a code cell (<i>identified by <span style="font-family: courier; color:black; background-color:white;">In[ ]:</span> to the left of the cell</i>) you advance to the next cell after executing all the Python code within that cell. Any visual results produced by the code (text/figures) are reported directly below that cell. Press <kbd>Run</kbd> again. Repeat this process until the end of the notebook. <b>NOTE:</b> All the cells in this notebook can be automatically executed sequentially by clicking <kbd>Kernel</kbd><font color='black'>→</font><kbd>Restart and Run All</kbd>. Should anything crash then restart the Jupyter Kernal by clicking <kbd>Kernel</kbd><font color='black'>→</font><kbd>Restart</kbd>, and start again from the top.
        
</div>

### 1. Import Packages

In [None]:
import numpy as np
import pandas as pd
import cimcb as cb

print('All packages successfully loaded')

%load_ext autoreload
%autoreload 2

### 2. Load Data & Peak Sheet

In [None]:
home = 'data/' 
file = 'MTBLS90.xlsx'  

DataTable,PeakTable = cb.utils.load_dataXL(home + file, DataSheet='Data', PeakSheet='Peak') 

### 3. Extract X & Y

In [None]:
# Select Subset of Data
DataTable2 = DataTable[(DataTable.Class == 1) | (DataTable.Class == 0)]

# Create a Binary Y Vector 
Outcomes = DataTable2['Class']
Y = Outcomes.values 

# Extract and Scale Metabolite Data 
peaklist = PeakTable['Name']                           
XT = DataTable2[peaklist]
XTlog = np.log(XT)                                          
XTscale = cb.utils.scale(XTlog, method='auto')              
XTknn = cb.utils.knnimpute(XTscale, k=3)  

### 4. Hyperparameter Optimisation

In [None]:
# Parameter Dictionary
param_dict = {'n_components': [1, 2, 3, 4, 5, 6]}                   

# Initialise
cv = cb.cross_val.kfold(model=cb.model.PLS_SIMPLS,                      
                                X=XTknn,                                 
                                Y=Y,                               
                                param_dict=param_dict,                   
                                folds=5,
                                n_mc=100)                                

# Run and Plot
cv.run()  
cv.plot(metric='r2q2', ci=95)

In [None]:
# Build Model and plot projections (kfold - monte carlo reps)
# To do: Parallel
modelOptimise = cb.model.PLS_SIMPLS(n_components=3)
modelOptimise.train(XTknn, Y)
Ytest = modelOptimise.test(XTknn)

modelOptimise.plot_projections_kfold(label=DataTable[['Idx','SampleID']],
                             size=12,
                             ci95=True,
                             scatterplot=True,
                             folds=5,
                             n_mc=100)


### 5. Build Model & Evaluate

In [None]:
# Build Model
model = cb.model.PLS_SIMPLS(n_components=3)
model.train(XTknn, Y)
model.test(XTknn)

# Evaluate Model 
model.evaluate(cutoffscore=0.5) 

### 6. Visualise

In [None]:
# To do:
    # Parallel bootstrap resampling

# Calculate the bootstrapped confidence intervals 
model.calc_bootci(type='bc', bootnum=1000)                # decrease bootnum if it this takes too long on your machine

In [None]:
# To do:
    # density figure: figure dimensions
    # weight alt: figure dimensions + ci95 + intersecting line
    
model.plot_projections(label=DataTable[['Idx','SampleID']],
                       size=12,
                       scatterplot=False) 

In [None]:
# To do:
    # Plot density to check if there is flipping
    # Fix sorting

model.plot_loadings(PeakTable,
                    peaklist,
                    ylabel='Label',  # change ylabel to 'Name' 
                    sort=True)      # change sort to False

In [None]:
# To do:
    # Rename output in peakSheet_featureimportance
    # Fix sorting

peakSheet_featureimportance = model.plot_featureimportance(PeakTable,
                                         peaklist,
                                         ylabel='Label',  # change ylabel to 'Name' 
                                         sort=True)      # change sort to False

### 7. Evaluate

In [None]:
model.booteval(XTknn, Y, bootnum=100)

In [None]:
# To do: Parallel

model.permutation_test(nperm=100) 