# Install conda using miniconda
We will be using python and tensorflow for training. For this we need to create an environment we can use across nodes. We will be using miniconda to create a python virtual environment for our experiments.

## Training using High Performance Computing and Tensorflow
1. Download Miniconda from https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
2. Install conda using miniconda script we downloaded
3. Make sure you can use the python from the conda environment
4. Create a virtual environment 
5. Update environment variables
6. Folder structure and files we will be using
7. Update configuration for training
8. Update batch_job file.
9. Submit Job 
10. Check progress
11. Test saved model

## Use cloud computing for inferencing
1. Push trained model to AWS environment using boto3 (Share credentials before this step)
2. Access Setup environments in sagemaker
3. Load model in sagemaker
4. deploy model
5. test deployed model
6. Deploy API endpoint to interact with model
7. Interact with API endpoint to get inferences from the model

## Download Miniconda from https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

In [9]:
! wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

--2022-06-03 21:10:41--  https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.131.3, 104.16.130.3, 2606:4700::6810:8203, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.131.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 76607678 (73M) [application/x-sh]
Saving to: ‘Miniconda3-latest-Linux-x86_64.sh’


2022-06-03 21:10:42 (153 MB/s) - ‘Miniconda3-latest-Linux-x86_64.sh’ saved [76607678/76607678]



## Install conda using the downloaded miniconda shell script

In [10]:
!./Miniconda3-latest-Linux-x86_64.sh

Anaconda3-2019.03-Linux-x86_64.sh  shared
Miniconda3-latest-Linux-x86_64.sh  Untitled.ipynb


### Check if the installation updated your bashrc to auto use python from conda

In [11]:
!cat ~/.bashrc

# ******************************************************************************
# bash environment file in $HOME
# Please see:
# http://www.fz-juelich.de/ias/jsc/EN/Expertise/Datamanagement/OnlineStorage/JUST/FAQ/just-FAQ_node.html
# for more information and possible modifications to this file
# ******************************************************************************

# Source global definitions: Copied from CentOS 7 /etc/skel/.bashrc
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

#
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/p/project/training2206/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/p/project/training2206/anaconda3/etc/profile.d/conda.sh" ]; then
        . "/p/project/training2206/anaconda3/etc/profile.d/conda.sh"
    else
        export PATH="/p/project/training2206/anaconda3/bin:$PATH"
    fi
fi
unset __conda_setup
# <<

## Folder Structure

All of the project related files are located at `/p/project/training2206/<user>/`. (Replace user with the identifier provided for you)

Change directory to the aforementioned directory.

Check for `pixel-detector` folder in the directory. If it is not present in, you can use git to download it from https://github.com/nasa-impact/pixel-detector using `git clone https://github.com/nasa-impact/pixel-detector.git`

Once cloned, change directory into pixel-detector folder using `cd pixel-detector`

Following is the folder structure for the code:
```
|> code
    |> lib
        |> data_utils: `Contains files to rasterize files and create dataset`
        |> slurm_utils `Contails helper files for distributed training using tensorflow`
    |> train.py `Main File used to train the model`
    |> models.py `Contains architecture of the model we are training.`
    |> config.py `Configuration file`
    |> config.json `Configuration file`
|> data `Contains training and validation data`
|> train_job.sh `main batch file`
```

## Create virtual environment
We will use the python from conda environment to create a virtual environment which will be used throughout.

In some cases conda might not be activated after installation. You can just refresh your bash terminal using `exec bash` and it should enable the conda environment for you.

Once in the conda environment, you can create a new python virtual environment using `python -m venv .venv`

## Update configuration and environment variables

In `config.json` file change all instances of `<username>` with the user name provided to you.

In the `train_job.sh` file update instances of `<username>` with the user name provided to you.



In [None]:
!cat /p/project/training2206/<username>/train_job.sh 

# Submit training job
In the `train_job.sh` we can specify the number of nodes we want to use for training. As an example we are going to use 2 nodes for training. 

We can submit the training job using the `sbatch` command. Like so: `sbatch train_job.sh`

Once submitted, two new files will be created by the process: `output.out` and `error.err`. `output.out` will contain details of output from our processes and `error.err` will detail out any errors from the scripts. Once the job is submitted and the files are created, we can check for updates simply by using `tail -f output.out error.err`.  (Any warnings/automated messages/errors are tracked in the `error.err` file while only the 

# Uploading the model to a cloud environment

After the model is finished training, the model is stored in the location specified in your config file `/p/project/training2206/<username>/pixel-detector/models/<username>_smoke_wmts_ref_3layer.h5`. We will be taking this model and push it to a S3 bucket using `boto3` and the credentials from the aws account shared with you.

## Get AWS credentials
Account creation links should have been shared with all. Once the account is setup, we will gather the credentials required for upload from the AWS SSO homepage.
Please follow the steps listed below:

1. Navigate to https://nasa-impact.awsapps.com/start
2. Login
3. Click on `AWS Account`
4. Click on `summerSchool`
5. Click on `Command line or Programmatic access`
6. Copy the `AWS Access Key Id`, `AWS Secret Access Key`, and `AWS session token` from the pop up
7. Update the following script and run it.

This will upload the files directly into the S3 bucket. We will then fetch the file from S3 bucket into the sagemaker notebook from where we will be deploying the model and hosting an API to interact with the model.


In [None]:
import boto3 


AWS_ACCESS_KEY_ID = <Copied over from sso login>
AWS_SECRET_ACCESS_KEY = <Copied over from sso login>
AWS_SESSION_TOKEN = <Copied over from sso login>

BUCKET_NAME = 'smoke-dataset-bucket'

def generate_federated_session():
    """
    Method to generate federated session to upload the file from HPC to S3 bucket.
    ARGs:
        filename: Upload filename
    Returns: 
        Signed URL for file upload 
    """
    return boto3.session.Session(
            aws_access_key_id=AWS_ACCESS_KEY_ID,
            aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
            aws_session_token=AWS_SESSION_TOKEN
        )

model_filename = "/p/project/training2206/<username>/pixel-detector/models/"
session = generate_federated_session()
s3_connector = session.client('s3')

s3_connector.upload_file(model_filename, BUCKET_NAME, "<username>/<username>_smoke_wmts_ref_3layer.h5")


Once the process is done, we can check for the files in s3 using the AWS console.

1. Navigate to https://nasa-impact.awsapps.com/start
2. Login
3. Click on `AWS Account`
4. Click on `summerSchool`
5. Click on `Management Console`
6. In the search bar, search for `s3`
7. Click on `s3`
8. Click on `smoke-dataset-bucket`
9. Click on your `username`

You should be able to view your file there now. 

# Notes on Deep learning Model Details

## Deep Learning
Deep learning refers to neural networks with multiple hidden layers that can learn increasingly abstract representations of the input data.

deep learning has led to major advances in computer vision. We’re now able to classify images, find objects in them, and even label them with captions. To do so, deep neural networks with many hidden layers can sequentially learn more complex features from the raw input image:
* The first hidden layers might only learn local edge patterns.
* Then, each subsequent layer (or filter) learns more complex representations.
* Finally, the last layer can classify the image as a cat or kangaroo.
These types of deep neural networks are called Convolutional Neural Networks.

## Convolutional Neural Networks
Convolutional Neural Networks (CNN’s) are multi-layer neural networks (sometimes up to 17 or more layers) that assume the input data to be images.

<img src="https://raw.githubusercontent.com/NASA-IMPACT/workshop_notebooks/master/chapter-3/Feature_maps.png">

By making this requirement, CNN's can drastically reduce the number of parameters that need to be tuned. Therefore, CNN's can efficiently handle the high dimensionality of raw images.

There are multitude of different neural network architectures that use CNN that are used for various tasks. For Image segmentation task, we use U-Net model:

## U-Net Segmentation model

The model we are using for this workshop is the U-Net segmentation model (https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/). The architecture is a stack of convolutions followed by de-convolutions that gives it's U-shape. 
<img src="https://raw.githubusercontent.com/NASA-IMPACT/workshop_notebooks/master/chapter-3/u-net-architecture.png">

This model assigns a class label to each pixel of the input and gives an output matching the size of the input. The resulting output, once trained with smoke masks, will segment any given image into smoke and non-smoke regions.




# Notes on Model Implementation

## Keras: A Deep Learning Framework
Keras is a library for deep learning in Python. Its minimalistic, modular approach makes it easy to get deep neural networks up and running. You can read more about it here: https://keras.io/

Assuming keras is installed in your python environment, The main usage of the library is listed as follows:

### import keras modules

```
from keras.layers import Dense, Dropout, Activation, Flatten
```
### Preprocess / Load Images.

Using Image Libraries and preprocessing, convert images into numpy arrays.

we use keras sequence library (https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence) to load a subset of input images in memory at a time and process in batches. This makes sure we are not running out of memory trying to load all images at once. imgaug library (https://github.com/aleju/imgaug) is used to augment images by translations and introducing random noise to inclrease the variability in input images and help with model generalization.

### Define model architecture

We define a neural network model as a set of keras layer objects, starting from Input to Output, and hidden layers in between. Here are some example models: https://keras.io/examples/

### Add Regularization methods to prevent overfitting
Overfitting is a case when the model learns too much of training data that it fails to replicate the performance in real-world (test) data. regularization techniques are measures used to prevent this case.
#### Dropouts
This is a method for regularizing our model in order to prevent overfitting. You can read more about it here.
#### MaxPooling
MaxPooling2D is a way to reduce the number of parameters in our model by sliding a 2x2 pooling filter across the previous layer and taking the max of the 4 values in the 2x2 filter.

#### BatchNormalization

Batch normalization is a technique for training very deep neural networks that standardizes the inputs to a layer for each mini-batch. This has the effect of stabilizing the learning process and dramatically reducing the number of training epochs required to train deep networks.

### Compile model with loss metric

When we compile the model, we declare the loss function and the optimizer (SGD, Adam, etc.)
Keras has a variety of [loss functions](https://keras.io/api/losses/) and out-of-the-box [optimizers](https://keras.io/api/optimizers/) to choose from.
```
Loss function dictates how far the model estimate is from the actual output (truth value), and the optimizer makes adjustments to the model variables so that the estimate is closer to the actual output.

Model.compile(
    optimizer="rmsprop",
    loss=None,
    metrics=None,
    loss_weights=None,
    weighted_metrics=None,
    run_eagerly=None,
    steps_per_execution=None,
    **kwargs
)
```

### Define Callbacks
Callbacks are pieces of code that get executed every time the model trains through one pass of all available input images. This is particularly useful functionality provided by keras to do various tasks such as stopping training if the model does not improve significantly, saving the weights of the best model only, and plotting training graph to better understand how the model learns in training phase.

 ### training the model with the generators
 
 Finally, training the model is as simple as calling `model.fit()` method. 

```
Model.fit(
    x=None,
    y=None,
    batch_size=None,
    epochs=1,
    verbose="auto",
    callbacks=None,
    validation_split=0.0,
    validation_data=None,
    shuffle=True,
    class_weight=None,
    sample_weight=None,
    initial_epoch=0,
    steps_per_epoch=None,
    validation_steps=None,
    validation_batch_size=None,
    validation_freq=1,
    max_queue_size=10,
    workers=1,
    use_multiprocessing=False,
)
``` 
 some notes:

'steps_per_epoch' defines the number of times the generator should be called for each epoch. this number is the number of input samples (54) divided by the batch_size (4) ~= 13. similrly, validation_step is num of images in validation split (38) divided by batch_size ~= 10

'epochs' just need to be sufficiently large, since we are using EarlyStopping callback to preemptively stop model training if the loss does not improve for several epochs

### Test Model:

Predictions can be made on any input image by calling `model.predict()` with the list of input images to predict.
