# Crop mask model training 🏋
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nasaharvest/crop-mask/blob/master/notebooks/train.ipynb)

**Author:** Ivan Zvonkov (izvonkov@umd.edu)

**Description:** Stand alone notebook for training crop-mask models. 

The notebook is in beta mode so issue reports and suggestions are welcome! 

# 1. Setup

If you don't already have one, obtain a Github Personal Access Token using the steps [here](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token). Save this token somewhere private.

In [None]:
email = input("Github email: ")
username = input("Github username: ")

!git config --global user.email $username
!git config --global user.name $email

from getpass import getpass
token = getpass('Github Personal Access Token:')
!git clone https://$username:$token@github.com/nasaharvest/crop-mask.git
%cd crop-mask

In [None]:
# Install required packages
!pip install dvc[gs] pytorch-lightning==0.7.1 wandb openmapflow==0.2.2 -q

In [None]:
from google.colab import auth
from openmapflow.config import PROJECT_ROOT, DataPaths
from src.bboxes import bboxes

In [None]:
# Login to wandb for tracking model runs
!wandb login

In [None]:
# Login to Google Cloud, you must have access to bsos-geog-harvest1 project to download data
auth.authenticate_user()

# 2. Download latest data

In [None]:
# Pull in latest training data
!dvc pull data/models -q
!dvc pull data/datasets -q

In [None]:
# Currently available models
def get_model_names():
  return sorted([p.stem for p in (PROJECT_ROOT / DataPaths.MODELS).glob('*.pt')])

In [None]:
get_model_names()

In [None]:
# Available datasets for training and evaluation
!openmapflow datasets

# 3. Train model
![model](https://github.com/nasaharvest/crop-mask/blob/master/assets/models.png?raw=true)

In [None]:
# A bounding box tells the model which area to focus on
bboxes

In [None]:
model_name = input("Model name (suggested format: <country>-<region>-<year>): ")
eval_datasets = input("Evaluation dataset(s): ") 
selected_bbox = input("Bounding box name: ")
# Example evaluation datasets: Kenya,Rwanda,Uganda,Tanzania_CEO_2019"

In [None]:
from datasets import datasets
train_datasets = [d.name for d in datasets if d.name != "EthiopiaTigrayGhent2021"]

In [None]:
# Train a new model (may take up to 30 minutes)
!python train.py --model_name {model_name} --train_datasets {train_datasets} --eval_datasets {eval_datasets} --bbox {selected_bbox} --wandb

In [None]:
# Newly available models
get_model_names()

# 4. Pushing the model to the repository

In [None]:
!dvc commit data/models.dvc      # Saves model to repository
!dvc push data/models            # Uploads model to remote storage 

In [None]:
# Push changes to github
!git checkout -b'$model_name'
!git add .
!git commit -m 'Trained new: $model_name'
!git push --set-upstream origin "$model_name"

Create a Pull Request so the model can be merged into the master branch. When the branch is merged into master.