#### Note: 
1. You will be working off of terminal for this chapter.
2. Change your working directory: ```cd /p/project/training2206/$USER/```

# Install Monda using Miniconda
You will be using Python and TensorFlow for training. For this you need to create an environment you can use across nodes. You will be using Miniconda to create a Python virtual environment for your experiments.

## Training using high performance computing and TensorFlow
1. Download Miniconda from https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
2. Install conda using the Miniconda script you downloaded
3. Make sure you can use the Python from the conda environment
4. Create a virtual environment
5. Update environment variables
6. Folder structure and files you will be using
7. Update configuration for training
8. Update batch_job file.
9. Submit job
10. Check progress
11. Test saved model

## Use cloud computing for inferencing
1. Push trained model to AWS environment using boto3 (share credentials before this step)
2. Access setup environments in SageMaker
3. Load model in SageMaker
4. Deploy model
5. Test deployed model
6. Deploy API endpoint to interact with model
7. Interact with API endpoint to get inferences from the model

## Download Miniconda from https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

In [None]:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

## Install conda using the downloaded Miniconda shell script

In [None]:
chmod 770 Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh
# Where to install miniconda (after the installation, it will ask): `/p/project/training2206/$USER/miniconda3` 
# Do you wish the installer to Initialize miniconda3? Yes 


### Check if the installation updated your bashrc to automatically use Python from conda

In [None]:
cat ~/.bashrc

## Folder structure

All of the project-related files are located at `/p/project/training2206/$USER/`. (User is the environment variable with your user name. You can check what it is set as using `echo $USER`)

Change directory to the aforementioned directory.

Check for `pixel-detector` folder in the directory. If it is not present, you can use Git to download it from https://github.com/nasa-impact/pixel-detector using `git clone https://github.com/nasa-impact/pixel-detector.git`

Once cloned, change the directory to the pixel-detector folder using `cd pixel-detector`

Below is the folder structure for the code:
```
|> code
    |> lib
        |> data_utils: `Contains files to rasterize files and create a dataset`
        |> slurm_utils `Contains helper files for distributed training using TensorFlow`
    |> train.py `Main file used to train the model`
    |> models.py `Contains the architecture of the model you are training`
    |> config.py `Configuration file`
    |> config.json `Configuration file`
|> data `Contains training and validation data`
|> train_job.sh `Main batch file`
```

## Create virtual environment
You will use the Python from the conda environment to create a virtual environment which will be used throughout.

In some cases, conda might not be activated after installation. You can just refresh your bash terminal using `exec bash`, and it should enable the conda environment for you.

Once in the conda environment, you can create a new Python virtual environment using `python -m venv .venv`

Then you will use the environment you just created using `source .venv/bin/activate`

Once the environment is activated, you will need to make sure you are starting from scratch. To make sure no other modules are installed, use `module purge` to remove all the unwanted modules.

You can then install the requirements using `pip install -r requirements.txt`

This will install all the required packages.

## Update configuration and environment variables

To find your user name, run the following command in the terminal:

`echo $USER`

Copy data from `/p/project/training2206/sedona3/pixel-detector/data/` to your folder: 

`cp -r /p/project/training2206/sedona3/pixel-detector/data/ /p/project/training2206/$USER/pixel-detector/`. 

In `/p/project/training2206/$USER/pixel-detector/code` you will find a configuration file called `config.json`. In this file replace all instances of `<username>` with the username provided to you. 

Before we start working on any of this, we will change our directory to `/p/project/training2206/$USER/pixel-detector/`

In the jupyter lab interface, find `code/config.json` file. Right click on `config.json` in the left pane, and select `editor`. Once the file is open, you can update the `<username>` instances with your `username`.

You will also need to update the `train_job.sh` file.

In the `train_job.sh` file replace all instances of `<username>` with the username provided to you.



# Submit Training Job
In the `train_job.sh` you can specify the number of nodes you want to use for training. As an example, you are going to use 2 nodes for training.

Check details of the training job:

`cat /p/project/training2206/$USER/train_job.sh`

You can submit the training job using the `sbatch` command. Like so: `sbatch train_job.sh`

Once submitted, two new files will be created by the process: `output.out` and `error.err`. `output.out` will contain details of the output from your processes, and `error.err` will provide details on any errors from the scripts. Once the job is submitted and the files are created, you can check for updates simply by using `tail -f output.out error.err`. (Any warnings, automated messages, and errors are tracked in the `error.err` file while only the [ed. note: incomplete sentence]

You can see how good or bad the model training is by watching the loss outputs in `output.out`. We have also prepared methods to create charts for the metrics. The training/validation, loss/accuracy plots can be found in the `plots` folder. You can scp the files using `scp .  <username>@jureca.fz-juelich.de:/p/project/training2206/<username>/pixel-detector/plots/*.png` or view it in the jupyter hub.

# Uploading the Model to a Cloud Environment

After the model is finished training, the model is stored in the location specified in your config file `/p/project/training2206/<username>/pixel-detector/models/<username>_smoke_wmts_ref_3layer.h5`. You will be taking this model and pushing it to an S3 bucket using `boto3` and the credentials from the AWS account shared with you.

## Get AWS credentials
Account creation links should have been shared with you. Once the account is setup, you can obtain the credentials required for upload from the AWS SSO homepage.
Please follow the steps listed below:

1. Navigate to https://nasa-impact.awsapps.com/start
2. Login
3. Click on `AWS Account`
4. Click on `summerSchool`
5. Click on `Command line or Programmatic access`
6. Copy the `AWS Access Key Id`, `AWS Secret Access Key`, and `AWS session token` from the pop up
7. Update the following script and run it in a python shell. (You can start a python shell by just typing `python` in the terminal).

This will upload the files directly into the S3 bucket. You will then fetch the file from S3 bucket into the SageMaker notebook from where you will be deploying the model and hosting an API to interact with the model.


*Note: Please make sure the virtual environment is active while working with the python shell.

In [None]:
import boto3 
import os

AWS_ACCESS_KEY_ID = <Copied over from SSO login>
AWS_SECRET_ACCESS_KEY = <Copied over from SSO login>
AWS_SESSION_TOKEN = <Copied over from SSO login>

BUCKET_NAME = 'smoke-dataset-bucket'

USER = os.environ.get('USER')

def generate_federated_session():
    """
    Method to generate federated session to upload the file from HPC to S3 bucket.
    ARGs:
        filename: Upload filename
    Returns: 
        Signed URL for file upload 
    """
    return boto3.session.Session(
            aws_access_key_id=AWS_ACCESS_KEY_ID,
            aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
            aws_session_token=AWS_SESSION_TOKEN
        )

model_filename = f"/p/project/training2206/{USER}/pixel-detector/models/"
session = generate_federated_session()
s3_connector = session.client('s3')

s3_connector.upload_file(model_filename, BUCKET_NAME, f"{USER}/{USER}_smoke_wmts_ref_3layer.h5")


Once the process is done, you can check for the files in S3 using the AWS console.

1. Navigate to https://nasa-impact.awsapps.com/start
2. Login
3. Click on `AWS Account`
4. Click on `summerSchool`
5. Click on `Management Console`
6. In the search bar, search for `s3`
7. Click on `s3`
8. Click on `smoke-dataset-bucket`
9. Click on your `username`

You should be able to view your file there now. 

# Notes on Deep Learning Model Details

## Deep learning
Deep learning refers to neural networks with multiple hidden layers that can learn increasingly abstract representations of the input data.

Deep learning has led to major advances in computer vision. You’re now able to classify images, find objects in them, and even label them with captions. To do so, deep neural networks with many hidden layers can sequentially learn more complex features from the raw input image:
* The first hidden layers might only learn local edge patterns.
* Then, each subsequent layer (or filter) learns more complex representations.
* Finally, the last layer can classify the image as a cat or kangaroo.
These types of deep neural networks are called convolutional neural networks.

## Convolutional neural networks
Convolutional neural networks (CNN’s) are multi-layer neural networks (sometimes up to 17 or more layers) that assume the input data to be images.

<img src="https://raw.githubusercontent.com/NASA-IMPACT/workshop_notebooks/master/chapter-3/Feature_maps.png">

By making this requirement, CNN's can drastically reduce the number of parameters that need to be tuned. Therefore, CNN's can efficiently handle the high dimensionality of raw images.

There are a multitude of different neural network architectures that use CNN for various tasks. For image segmentation tasks, you use U-Net model.

## U-Net segmentation model

The model you are using for this workshop is the U-Net segmentation model (https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/). The architecture is a stack of convolutions followed by de-convolutions that gives its U-shape.
<img src="https://raw.githubusercontent.com/NASA-IMPACT/workshop_notebooks/master/chapter-3/u-net-architecture.png">

This model assigns a class label to each pixel of the input and gives an output matching the size of the input. The resulting output, once trained with smoke masks, will segment any given image into smoke and non-smoke regions.




# Notes on Model Implementation

## Keras: a deep learning framework
Keras is a library for deep learning in Python. Its minimalistic, modular approach makes it easy to get deep neural networks up and running. You can read more about it here: https://keras.io/.

Assuming Keras is installed in your Python environment, the main usage of the library is listed as follows.

### Import Keras modules

```
from keras.layers import Dense, Dropout, Activation, Flatten
```
### Preprocess/load images

Using image libraries and preprocessing, convert images into NumPy arrays.

You use the Keras sequence library (https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence) to load a subset of input images in memory at a time and process in batches. This makes sure we are not running out of memory trying to load all images at once. Imgaug library (https://github.com/aleju/imgaug) is used to augment images by translations and introduce random noise to inclrease the variability in input images and help with model generalization.

### Define model architecture

You define a neural network model as a set of Keras layer objects, starting from input to output, and hidden layers in between. Here are some example models: https://keras.io/examples/.

### Add regularization methods to prevent overfitting
Overfitting is a case when the model learns too much of the training data and fails to replicate the performance in real-world (test) data. Regularization techniques are measures used to prevent this case.
#### Dropouts
This is a method for regularizing your model in order to prevent overfitting. You can read more about it here.[ed. note: no link]
#### MaxPooling
MaxPooling2D is a way to reduce the number of parameters in your model by sliding a 2x2 pooling filter across the previous layer and taking the max of the 4 values in the 2x2 filter.

#### Batch normalization

Batch normalization is a technique for training very deep neural networks that standardizes the inputs to a layer for each mini-batch. This has the effect of stabilizing the learning process and dramatically reducing the number of training epochs required to train deep networks.

### Compile model with loss metric

When you compile the model, you declare the loss function and the optimizer (e.g., SGD, Adam, etc.)
Keras has a variety of [loss functions](https://keras.io/api/losses/) and out-of-the-box [optimizers](https://keras.io/api/optimizers/) to choose from.
```
Loss function dictates how far the model estimate is from the actual output (truth value), and the optimizer makes adjustments to the model variables so that the estimate is closer to the actual output.

Model.compile(
    optimizer="rmsprop",
    loss=None,
    metrics=None,
    loss_weights=None,
    weighted_metrics=None,
    run_eagerly=None,
    steps_per_execution=None,
    **kwargs
)
```

### Define callbacks
Callbacks are pieces of code that get executed every time the model trains through one pass of all available input images. This is particularly useful functionality provided by Keras that does various tasks such as stopping training if the model does not improve significantly, saving the weights of the best model only, and plotting training graphs to better understand how the model learns in the training phase.

### Training the model with the generators
 
 Finally, training the model is as simple as calling `model.fit()` method.

```
Model.fit(
    x=None,
    y=None,
    batch_size=None,
    epochs=1,
    verbose="auto",
    callbacks=None,
    validation_split=0.0,
    validation_data=None,
    shuffle=True,
    class_weight=None,
    sample_weight=None,
    initial_epoch=0,
    steps_per_epoch=None,
    validation_steps=None,
    validation_batch_size=None,
    validation_freq=1,
    max_queue_size=10,
    workers=1,
    use_multiprocessing=False,
)
``` 
Some notes:

'steps_per_epoch' defines the number of times the generator should be called for each epoch. This number is the number of input samples (54) divided by the batch_size (4) ~= 13. Similrly, validation_step is the number of images in the validation split (38) divided by batch_size ~= 10.

'epochs' just need to be sufficiently large, since we are using EarlyStopping callback to preemptively stop model training if the loss does not improve for several epochs.

### Test model:

Predictions can be made on any input image by calling `model.predict()` with the list of input images to predict.
