This is the code repo for the research project Model Variablilty to investigate the variability of deep learning models and how variance based metrics can be used to debug deep learning models.
This is a summary of how to runs various scripts. The process of running the experiment is as following:
- Prepare the data
- Train the models
- Analyze and generate result
- Use the "azureml_py36_pytorch" anaconda env as the base (Already in the new node).
- You also need to mount the mlvariance to "~/teamdrive/mlvariance".
- Install packages to "azureml_py36_pytorch" using:
- python3 -m pip install --disable-pip-version-check --extra-index-url https://azuremlsdktestpypi.azureedge.net/K8s-Compute/D58E86006C65 azureml_contrib_k8s
Data preparation scripts are located under the folder prepare_data.
prepare_artificial_compas.py and prepare_holdout_compas.py prepare COMPAS data for two scenarios: artificial correlated features and holding out clustered sets.
The path to the data is hard coded in the scripts. Change them to point to the correct path if needed.
prepare_holdout_cifar10.py prepares CIFAR10 data for holdout scenarios.
The script takes two arguments:
-
The data_folder that would contain the prepared data. This should be pointed to ~/teamdrive/mlvariance/data if run from GPU dev machine.
-
The mode which represents the scenario:
-
holdout: holdout a portion of class 0
-
holdout-dup: holdout a portion of class 0 with duplication so the number of training examples is evently distributed across all classes
-
augmentation: convert a portion of class 0 to grayscale
-
augmentation-all: convert a portion of all classes to grayscale
prepare_holdout_cifar100-cifar10.py prepares CIFAR100 data for holdout scenarios.
The script takes two arguments:
-
The data_folder that would contain the prepared data. This should be pointed to ~/teamdrive/mlvariance/data if run from GPU dev machine.
-
The mode which represents the scenario:
- holdout: holdout a portion of a subclass. The list of subclasses are hard coded in the script.
local_export_images_cifar100.py: export images to display in html report
There are various training scripts for different datasets for COMPAS, CIFAR10, and CIFAR100
To train the models for the COMPAS dataset, you can run one of several scripts
- main_compas.py: for training with original COMPAS dataset
- main_artificial_compas.py: for training using COMPAS dataset with artificial correlated features
- main_holdout_compas.py: for training using COMPAS dataset with holdout clusted set
To train models for CIFAR10 and CIFAR100, the AML platform would be used. To schedule the training jobs, the script cluster_schedulel_all.py can be used. This is a master script that queue jobs to the AML cluster. The script take a single argument local_teamdrive_folder which should point to ~/teamdrive/mlvariance. There should be a config.json file with the subscription info for the AML cluster. This is the information from [https://dev.azure.com/msresearch/GCR/_wiki/wikis/GCR.wiki/3438/AML-K8s-(aka-ITP)-Overview] under the table Connection Details.
- To train CIFAR10 models uncomment the line runs_list = list_cifar10_runs(...). This function generate the jobs list which consist of single run of cluster_single_cifar10.py
- To train CIFAR100 models uncomment the line runs_list = list_cifar100_runs(...). This function generate the jobs list which consist of single run of cluster_single_cifar100.py
There are several scripts that can be used to generate the analysis result. There are some analysis scripts under the analysis folder.
There are several script to analyze COMPAS result under the analysis folder.
- _analyze.py: analyze COMPAS result without holdout
- _analyze_artificial.py: analyze COMPAS result with artificial correlated features
- _analyze_holdout.py: analyze COMPAS result with holdout cluster set
Run the analyze_cifar10.py script to analyze the accuracy of the CIFAR10 models. This script takes one argument result_folder which should be pointed to ~/teamdrive/mlvariance/result if run in GPU Dev machine.
There are several analysis script for CIFAR100.
-
cluster_single_detail_analysis_cifar100.py: runs this via cluster_schedulel_all.py by uncommenting runs_list = list_cifar100_holdout_analysis_runs(..., "cluster_single_detail_analysis_cifar100.py"). This generates detail analysis for each sample and also create ranking results.
-
cluster_single_saliency_cifar100.py: runs this via cluster_schedulel_all.py by uncommenting runs_list = list_cifar100_holdout_generate_map_runs(...). This generates gradcam images for all samples.
-
cluster_single_detail_analysis_cifar100_html.py: runs this via cluster_schedulel_all.py by uncommenting runs_list = list_cifar100_holdout_analysis_runs(..., "cluster_single_detail_analysis_cifar100_html.py"). This generates detail analysis in the form of html pages for each sample which includes gradcam images.
- analyze_cifar100.py: analyze CIFAR100 models accuracy
- local_merge_rank_report.py: to merge the result once all cluster_single_detail_analysis_cifar100.py has finished.