The HWT_mode package uses machine learning image recognition to classify storm mode from convection-allowing numerical weather prediction model output. This initial package uses interpretation of the latent representations in a convolutional neural network to determine which storm mode they represent.
HWT_mode requires Python >= 3.6. The HWT_mode package requires the following Python libraries:
- tensorflow>=2.0
- tensorflow-probability
- numpy
- scipy
- matplotlib
- pandas
- scikit-learn
- xarray
- netcdf4
- pyyaml
- tqdm
This module is designed to operate on netCDF patch data generated by the hsdata script in the hagelslag package. If you have a large collection of model output, please install and run hagelslag first to generate the data needed for this package.
Install the miniconda python distribution in your chosen directory. The $ indicates command line inputs and should not be copied into your terminal.
$ wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ sh Miniconda3-latest-Linux-x86_64.sh -b
Include the base miniconda bin directory in your $PATH
environment variable. Change $HOME
to the
appropriate path if you are not installing miniconda in your home directory.
$ export PATH="$HOME/miniconda/bin:$PATH"
Add the conda-forge channel and set it as the priority channel to prevent conflicts between the Anaconda and conda-forge versions of core libraries like numpy.
$ conda config --add channels conda-forge
$ conda config --set channel_priority strict
Now, create a new conda environment with the main dependencies (except tensorflow) installed in it. The --yes command will automatically install everything without any extra confirmation from you.
$ conda create -n mode --yes -c conda-forge python=3.7 pip numpy scipy matplotlib pandas xarray pyyaml netcdf4 scikit-learn tqdm pytest
To activate the newly created environment, run the following command.
$ conda activate mode
You can also edit your $PATH environment variable directly. Note that the environment path may differ depending on the version of miniconda you are using.
$ export PATH="$HOME/miniconda/envs/mode/bin:$PATH"
Verify that the correct Python environment is being used.
$ which python
Install tensorflow using pip. There is a tensorflow channel on conda-forge, but I have experienced issues with it in the past. For this package, I recommend using tensorflow 2.1.0, but any version of tensorflow beyond 2.0.0 should work (barring future significant API changes).
$ pip install tensorflow==2.1.0
If you have not already, clone the HWT_mode repository locally with git.
$ git clone https://github.com/djgagne/HWT_mode.git
Install the hwtmode package with pip. Note that any missing dependencies will also be installed with pip. Theoretically you could install everything with the command below once you have created a basic Python environment, but that may be risky.
$ cd HWT_mode
$ pip install .
Test your installation by first running pytest to verify the unit tests pass. Pytest will automatically detect any unit test files in your package and run them.
$ python -m pytest .
The final test is to run the train_mode_cnn.py script with the test data.
$ python -u train_mode_cnn.py config/ws_mode_cnn_train_small.yml -t -i -p
If all tests pass, and the train_mode_cnn.py script completes without any errors, you should be good to go for full-scale runs.
Creating new convolutional neural networks and interpreting them can be done with the train_mode_cnn.py
script. To run it, use the following command.
$ python -u train_mode_cnn.py config/config_file.yml -t -i -p
The -t
option runs the training of the network, the -i
option runs the interpretation functions
(neuron activations and saliency maps), and the -p
option runs the plotting functions. If one
encounters an issue in any step, they can re-run that step only with the command flags. The
script will reload the data, models, and interpretation information and continue from that point.
The config file format is described below.
data_path: "testdata/track_data_ncarstorm_3km_REFL_COM_ws_nc_small/" # path to netCDF patch files
patch_radius: 16 # number of grid cells from center to include from each patch.
input_variables: ["REFL_1KM_AGL_curr", "U10_curr", "V10_curr"] # Input variables to CNN
output_variables: ["UP_HELI_MAX_curr"] # Output variable for CNN
meta_variables: ["masks", "i", "j", "time", "centroid_lon", "centroid_lat",
"centroid_i", "centroid_j", "track_id", "track_step", "run_date"]
# metadata variables to include in csv files generated from script.
train_start_date: "2011-04-26" # Beginning of training period
train_end_date: "2011-04-27" # Ending of training period
val_start_date: "2011-04-28" # Beginning of validation period
val_end_date: "2011-04-28" # End of validation period
test_start_date: "2011-04-28" # Beginning of testing period
test_end_date: "2011-04-28" # End of testing period
out_path: "model_cnn_20200416/" # Path to store model output and other derived files
classifier: 1 # If 1, classifier model used. If 0, regression model used.
classifier_threshold: 50 # Threshold for output variable if classifier is 1.
models: # You can specify multiple CNN model configurations below.
cnn_20200416_000:
min_filters: 8
filter_width: 3
filter_growth_rate: 1.5
pooling_width: 2
min_data_width: 4
output_type: "sigmoid"
pooling: "max"
loss: "binary_crossentropy"
learning_rate: 0.0001
batch_size: 8
epochs: 20
dense_neurons: 4
early_stopping: 0
verbose: 1
Once the model is trained, running the interpretation code in real-time can be performed with run_mode_cnn.py
.