Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
plotting
README.md
Technical_Report.pdf
cnn.py
dnn.py
download_data.py
preprocess.py
run_2013.py
run_2018.py
run_gpu.py
submit_2013.slurm
submit_2018.slurm
submit_gpu.slurm

README.md

Team 3 Project of the CyberTraining program at UMBC in 2019 http://cybertraining.umbc.edu/

Title: An Approach to Tuning Hyperparameters in Parallel - A Performance Study

Team members: Charlie Becker; Bin Wang; Will Mayfield; Sarah Murphy

Mentors: Dr. Matthias Gobbert; Carlos Barajas

This is a working example of a performance study completed on the 'Taki' HPC cluster at UMBC. It uses a combination of popular Python modules for hyperparameter tuning in parallel. The data and base model configuration is borrowed from the Machine Learning in Python forEnvironmental Science Problems AMS Short Course, provided by David John Gagne from the National Center for Atmospheric Research. The repository for that course can be found at [https://github.com/djgagne/ams-ml-python-course]

Below, are some brief directions to reproduce the results in the technical report. Full results are seen and discussed in Technical_Report.pdf

Workflow

After cloning this directory, first run data_download.py which will download the data into a data direcory.

Next, run preprocess.py which will preproces the data and augement it to give a balanced dataset. It will create and place .npy files into the data directory for easy access.

Next, you can run either submit_2013.slurm (2013 partition), submit_2018.slurm (2018 partition) or submit_gpu.slurm (2018 GPU nodes) to submit the performance study across the cluster. These scripts call run_2013.py, run_2018.py and run_gpu.py respectively, which is where additional SLURM argumentes are defined, such as the number of nodes and hyperparameters. Specifically, cluster.scale(x) will refer to the number of nodes desired.

Output for the study will be delivered to slurm-2013.out, slurm-2018.out or slurm-gpu.out with error logs being delivered to slurm-xxxx.err. Additionally, each process within each node will prodeuce training output in slurm-jobID.out, though this probably won't be useful.

Data augmentation

RandomOverSampler class from imblearn.over_sampling was used to oversample the minority classes (non-tornadic data) fed into the deep neural network. This is done to achieve an approximate 50/50 class split within the training data; which began as approximately a 95/5 split. The relevant script is dnn.py

For convolutional neural network, the input data are tensor images. We augment the minority classes (non-tornadic images) by duplicating, shuffling, and transforming the images through small angle rotation but keeping the labels unchanged. This can be done in real time while training the model via ImageDataGenerator from Keras or at the preprocessing stage using skimage.transform.rotate before data are feeding into the model. In the code, this can be selected via two parameters 'augmentation' and 'on_the_fly'. For example, if augmentation==True, and on_the_fly==False, this means that the augmented data is generated before training. The working script is cnn.py

Overall, we did not see a significant spike in performance when using transformed data as opposed to resampled data only. However, augmentation is highly specific to the dataset and will have varying benefits dependent on each specfic dataset.

You can’t perform that action at this time.