# Crop mask model training 🏋
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nasaharvest/crop-mask/blob/master/notebooks/train.ipynb)

**Author:** Ivan Zvonkov (izvonkov@umd.edu)

**Description:** Stand alone notebook for training crop-mask models. 

The notebook is in beta mode so issue reports and suggestions are welcome! 

# 1. Setup

If you don't already have one, obtain a Github Personal Access Token using the steps [here](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token). Save this token somewhere private.

In [None]:
email = input("Github email: ")
username = input("Github username: ")

!git config --global user.email $username
!git config --global user.name $email

from getpass import getpass
token = getpass('Github Personal Access Token:')
!git clone https://$username:$token@github.com/nasaharvest/crop-mask.git
%cd crop-mask

In [None]:
# Install required packages
!pip install \
    dvc==1.11.16 \
    rasterio==1.2.10 \
    geopandas==0.9.0 \
    pytorch-lightning==0.7.1 \
    wandb \
    cropharvest==0.3.0 \
    pyyaml==5.4.1 \
    -q

In [None]:
# Verify that basic unit tests pass
!python -m unittest

In [None]:
# Login to wandb for tracking model runs
!wandb login

In [None]:
# Login to Google Cloud, you must have access to bsos-geog-harvest1 project to download data
from google.colab import auth
auth.authenticate_user()

# 2. Download latest data

In [None]:
# Pull in latest training data
!dvc pull data/models -q
!dvc pull data/processed -q
!dvc pull data/compressed_features.tar.gz -q
!cd data && tar -xzf compressed_features.tar.gz

In [None]:
# Available datasets for training and evaluation
!cat data/datasets.txt

# 3. Train model
![model](https://github.com/nasaharvest/crop-mask/blob/master/assets/models.png?raw=true)

In [None]:
from src.bboxes import bboxes
# A bounding box tells the model which area to focus on
bboxes

In [None]:
##################################################################
# START: Configuration (edit below code)
##################################################################
selected_bbox = bboxes["East_Africa"]
model_name = "my_first_model"
eval_datasets = "Kenya,Rwanda,Uganda,Tanzania_CEO_2019"
##################################################################
# END: Configuration
##################################################################

In [None]:
# Train a new model (may take up to 30 minutes)
!python scripts/model_train.py \
    --min_lon {selected_bbox.min_lon} \
    --max_lon {selected_bbox.max_lon} \
    --min_lat {selected_bbox.min_lat} \
    --max_lat {selected_bbox.max_lat} \
    --model_name {model_name} \
    --eval_datasets {eval_datasets} \
    --max_epochs 7

# 4. Pushing the model to the repository

In [None]:
!dvc commit data/models.dvc      # Saves model to repository
!dvc push data/models            # Uploads model to remote storage 

In [None]:
# Push changes to github
!git checkout -b'$model_name'
!git add .
!git commit -m 'Trained new: $model_name'
!git push --set-upstream origin "$model_name"

Create a Pull Request so the model can be merged into the master branch. When the branch is merged into master.