<img src="https://raw.githubusercontent.com/determined-ai/determined/master/determined-logo.png" align="right" width="150" />

# Training and Scaling your Model Computer Vision model with Determined AI

<img src="https://www.cis.upenn.edu/~jshi/ped_html/images/PennPed00071_1.png" align="center" width="600" />

This notebook will walk through the benefits of building a Deep Learning model with Determined.  We will build an object detection model trained on the [Penn-Fudan Database for Pedestrian Detection and Segmentation](https://www.cis.upenn.edu/~jshi/ped_html/).

# Prep

We'll be creating a `checkpoints` directory later. This `.detignore` will ensure we don't upload that directory along with our model code when we start an experiment!

In [None]:
!echo "checkpoints" >> .detignore

-----

# Imagine...

### You're a **model developer** and you've been tasked with developing a **computer vision model**.

# **MNIST**!

<img src="http://neupy.com/_images/random-digits.png" />

In [None]:
# Illustrative MNIST run...

#!git clone https://github.com/pytorch/examples
#%cd examples/mnistip install -qr requirements.txt
#!python main.py --epochs=50 --batch-size=64

### Clarification: An actually *useful* computer vision model

We'll be training a Pedestrian Detection model with PyTorch!

## **Object Detection with PyTorch**


### Google an off-the-shelf Object Detection model

In [None]:
# Illustrative model training...

#!git clone <model_repo>
#%cd repo
#!python main.py

### You've got your model training, but now you're thinking:

- [x] Find a model
- [x] Training
- [ ] Checkpointing?
- [ ] Fault tolerance??
- [ ] Visualization???

------------

# **Day 2** of Model Development with Determined AI

If Day 1 was about finding your model and making sure it ran, Day 2 of model development is about making sure your model is manageable and scalable.

<img src="https://determined.ai/assets/images/developers/determined-components.jpg" align="center" width="1000" />

## Here’s where Determined’s platform starts to come in handy:

### 1. Structure your code to adhere to the Determined APIs.

#### Example Determined model repository: *this one!*

### 2. Install Determined Locally

#### Example command to install Determined locally

In [1]:
# !pip install determined-cli
# !det deploy local cluster-up <--no-gpu if you do not have a GPU>

*Full instructions to install Determined: https://docs.determined.ai/latest/index.html

#### 3. Launch your model and train it on your local cluster:

<img src="https://github.com/determined-ai/public_assets/blob/c7dca8d616c3e5c87eac46dd164bf8a8c2ee46d1/images/StartAnExperiment.png?raw=true" width="450" align="left" />

In [None]:
!det experiment create standard.yaml .

**Note**: *Experiments will take slightly longer the first time they are executed on any given Determined installation since Determined will pull the appropriate Docker containers to run the experiment. After the first run, the containers are cached and reused.*

## So what's happening on the backend
### AKA Infrastructure you don't need to worry about

<img src="https://github.com/determined-ai/public_assets/blob/main/images/1GPUexperiment.png?raw=true" align="left" width="600" />

## Local Inference

#### See how the trained model performs on a local image

**Note**: *Check the Determined Web UI to make sure the experiment is `Complete` before trying to run predictions*

In [None]:
from support.helper import predict

In [None]:
experiment_id = <experiment id>
predict(experiment_id, "test.jpg")

**Note**: *Model training is a stochastic process so expect variable results, especially with the shortened training time used in this experiment. Feel free to update the configuration `.yaml` files to train for more epochs to ensure convergence.*

----

# **Multi-GPU training**

#### Example of configuration change to enable multi-GPU training

<img src="https://github.com/determined-ai/public_assets/blob/main/images/DistributedTrainingConfig.png?raw=true" align="left" width="800" />

### Launch a multi-GPU training job

You can install Determined on a local machine with multiple GPUs the same way as before, with:

In [2]:
# !pip install determined-cli
# !det deploy local cluster-up <--no-gpu if you do not have a GPU>

If you'd like to install in the cloud to get more GPUs, skip to the next section: **Cloud cluster** for installation instructions, then come back to run this experiment.

In [None]:
DET_MASTER="your-remote-cluster-ip" # e.g. 35.128.23.11

In [None]:
!det -m $DET_MASTER experiment create multi.yaml .

### What's happening now?

<img src="https://github.com/determined-ai/public_assets/blob/main/images/4GPUexperiment.png?raw=true" align="left" width="600" />

### Better Model performance

In [None]:
# Downloading checkpoints requires credentials to where the
# checkpoints are stored. If your remote cluster is using GCS,
# for example, you will need to launch a terminal in this Jupyter 
# instance and run the following:
#
# gcloud auth application-default login

# Then, run the following to set your default project in GCP
import os
os.environ["GCLOUD_PROJECT"] = "your-gcp-project-id"

In [None]:
experiment_id = <experiment id>
predict(experiment_id, "test.jpg", DET_MASTER)

**Note**: *Model training is a stochastic process so expect variable results, especially with the shortened training time used in this experiment. Feel free to update the configuration `.yaml` files to train for more epochs to ensure convergence.*

**Note**: *Experiments will take slightly longer the first time they are executed on any given Determined installation since Determined will pull the appropriate Docker containers to run the experiment. After the first run, the containers are cached and reused.*

# **Cloud cluster**

### Deploying a Determined Cluster to the cloud

#### Example command to start a Determined Cluster in the cloud

In [None]:
# For GCP
# !det-deploy gcp up --cluster-id <my-cluster> --project-id <your-gcp-project-id>

# For AWS
# !det-deploy aws up --cluster-id <my-cluster> --keypair <your-keypair-name>

* Full GCP installation instructions: https://docs.determined.ai/latest/how-to/installation/gcp.html
* Full AWS installation instructions: https://docs.determined.ai/latest/how-to/installation/aws.html

-----

#### Example configuration change to enable **Hyperparameter search**

<img src="https://github.com/determined-ai/public_assets/blob/main/images/HyperparameterConfig.png?raw=true" width=800 />

### Launch a Distributed Hyperparameter search job

In [None]:
# DET_MASTER="your-remote-cluster-ip" # if different than the one you set above

In [None]:
!det -m $DET_MASTER experiment create hyper.yaml .

### What's happening in the cloud?

<img src="https://github.com/determined-ai/public_assets/blob/main/images/HyperparamExp.png?raw=true" align="left" width="1000" />

In [None]:
experiment_id = <experiment id>
predict(experiment_id, "test.jpg", DET_MASTER)

**Note**: *Model training is a stochastic process so expect variable results, especially with the shortened training time used in this experiment. Feel free to update the configuration `.yaml` files to train for more epochs to ensure convergence.*

------

# Conclusion

### Determined enables you to:

* Easily scale from your laptop to a GPU cluster
* Automatically manage experiments, checkpoints, and model
* Leverage advanced capabilities like Distributed Training and Hyperparameter Search

### Keep in Touch!
* [Determined Github](https://github.com/determined-ai/determined)
* [Determined Community Slack](https://join.slack.com/t/determined-community/shared_invite/zt-cnj7802v-KcVbaUrIzQOwmkmY7gP0Ew)
* hello@determined.ai

If you have questions about this notebook or want to reach out to me directly, feel free to email me at hoang@determined.ai.

<img src="https://raw.githubusercontent.com/determined-ai/determined/master/determined-logo.png" align='right' width=150 />