# ASKLB Azure Widget

## Initial Setup

- Run the two cells below. Note that the first cell may take some time to complete.

## Create User/Password

- If this is your first time using ASKLB, enter a username/password and click "Register." Otherwise log in with your user credentials. 
- The widget will then display the full user interface.

## Workflow

1. Determine your query budget using the slider (we recommend a maximum of 10 queries)
2. Determine your autoML run time.
3. Upload data for a particular run (detailed instructions below).
4. After fitting, view the results under the "Model Run Info" tab and make any feature adjustments to your dataset.
5. Repeat steps 2-4 until your query budget is exhausted; the "Select Final Model" tab will then appear.
6. Make your final model choice in the drop-down, and the true test accuracy as well as the true AUC will be displayed.


## Data upload instructions
- Data uploaded should be in a csv file, with no headers, and the target labels in the first column
- The test samples should occur at the **end of the file**, and the overall test sample size must be specified
- Upload files by navigating to Files menu on the left toolbar and clicking on the "Upload" icon.
- Subsequent uploads should have the same number of examples, with the data samples in the **same order** as the original upload.

## FAQs

#### Q: How do I define my training and test set in the .csv file uploaded?

**A:** In your .csv file, please place the test samples after all the training samples, and define the number of test sample in the GUI.

#### Q: How much time should I allow this automated machine learning tool to run per query?

**A:** We recommend a minimum of 10 minutes. The automated machine learning tool builds
ensemble out of the machine learning models it creates within the time budget. Therefore,
longer time given to the tool can potentially result in better prediction performance.

#### Q: Do I have to finish all the queries I define? Is any result before I finish the last query and choose model report?

**A:** Yes you have to finish all the queries you define in order to choose a model and reveal its
true performance metrics. All the test set accuracies you have been reported before this
procedure are “noised” via a differential privacy algorithm to prevent overfitting, therefore non
of these accuracies are reportable.

#### Q: Why do I observe relatively big fluctuation (> 0.1) of reported test set accuracy scores between queries when I did not even modify the dataset that is uploaded?

**A:** The test set accuracies you have been reported after each query are “noised” via a
differential privacy algorithm to prevent overfitting, sometimes when the actual difference of the
accuracy of training and test sets are within a random “threshold” value, the training accuracy
is reported as “noised” test set accuracy. Therefore when the “threshold” value is large it is
normal to observe relatively high variance of reported test accuracies between queries.

#### Q: Is it normal to have reported test accuracy equal to the training accuracy in a query?

**A:** Yes, when the difference of the training accuracy and the true test accuracy are within a
random “threshold” value, the training accuracy is reported as “noised” test set accuracy.

#### Q: Will I get the same outcome with the same dataset and same configuration of the tool?

**A:** No, this tool is based upon the package Autosklearn, which is stochastic by nature. The way
Autosklearn works is that it builds an ensemble out of the models it generate within the time
budget. Each time Autosklearn build different libraries of machine learning models, therefore
the ensembles that are built can be different.

### Setup

Run the cell below to initialize ASKLB (takes ~3-5 minutes to initialize). 

After the cell has finished running, you need to **restart the runtime** by going to "Runtime -> Restart runtime" to reload packages. Then the widget can be run.

In [1]:
#@title
%%capture
%%time

!wget https://gist.githubusercontent.com/tliu526/07234087daa9120f6ad0e6241c2881b0/raw/3b9d36c6e97321a6e86218f7190e08385f27899d/.widget_config.ini
!wget https://raw.githubusercontent.com/KordingLab/ASKLB/master/asklb/model_utils.py
!wget https://raw.githubusercontent.com/KordingLab/ASKLB/master/asklb/widget.py
!wget https://github.com/KordingLab/ASKLB/raw/master/metalearning/metalearning_files.zip
!unzip metalearning_files.zip
!mkdir .metalearning
!mv metalearning_files .metalearning

# Initial setup of dependencies
!pip uninstall scikit-learn -y
!pip install scikit-learn===0.23.0
!apt-get install build-essential swig
!curl https://raw.githubusercontent.com/automl/auto-sklearn/master/requirements.txt | xargs -n 1 -L 1 pip install
!pip install auto-sklearn 2> /dev/null
!pip install bcrypt

--2020-12-11 21:31:47--  https://gist.githubusercontent.com/tliu526/07234087daa9120f6ad0e6241c2881b0/raw/3b9d36c6e97321a6e86218f7190e08385f27899d/.widget_config.ini
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 164 [text/plain]
Saving to: ‘.widget_config.ini’


2020-12-11 21:31:47 (6.48 MB/s) - ‘.widget_config.ini’ saved [164/164]

--2020-12-11 21:31:47--  https://raw.githubusercontent.com/KordingLab/ASKLB/master/asklb/model_utils.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1769 (1.7K) [text/plain]
Saving to: ‘model_utils.py’



In [2]:
from widget import ASKLBWidget
ASKLBWidget()

ASKLBWidget(children=(VBox(children=(HBox(children=(Text(value='', description='Username:', placeholder='Usern…

## Changes made to widgets.py:

- line 42 `config.read("widget_config.ini")` (no dot)

- line 416 `automl_args['metadata_directory'] = "metalearning/metalearning_files/"
` (no dot)

## Uploaded files:

- widgets.py
- model_utils.py
- config.ini
- metalearning_files.zip

## TODOs

- encrypt config.ini
- develop workflow for users to download files
    - can host everything in a public GDrive folder and then import