# Launch job

## Environment setup

### Git setup
Optimize clone of large repo

In [None]:
!git config --global core.compression 0
!git config --global pack.windowMemory 100m
!git config --global pack.packSizeLimit 100m
!git config --global index.threads 4

### Google Drive setup
Mount drive in runtime

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Python setup
Install missing packages in runtime

In [None]:
!pip install torchmetrics

## Dataset and code preparation

### Dataset cloning
Get dataset from Github repo

In [None]:
!git clone https://github.com/MicheleCazzola/mlvm-dataset --depth=1 mlvm-dataset --progress --verbose
!cd mlvm-dataset; git pull --all

!cd mlvm-dataset; git status

### Code cloning
Get code from Github repo

In [None]:
!git clone https://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@github.com/MicheleCazzola/mvlm-project.git mlvm-project

!cd mlvm-project; git status

### Dataset preprocessing

In [None]:
!mv mlvm-dataset/* mlvm-project/data
!ls -la mlvm-project/data

In [None]:
%cd mlvm-project/data
!sed -i 's/\\/\//g' train.txt; sed -i 's/\\/\//g' val.txt; sed -i 's/\\/\//g' test.txt;
!sed -i 's/artist_dataset\///g' train.txt; sed -i 's/artist_dataset\///g' val.txt; sed -i 's/artist_dataset\///g' test.txt;
%cd ../..

### Cache cleaning
Clean local cache folder between multiple runs

In [None]:
!cd mlvm-project; rm temp/*

## Run

Use Unix-like CLI commands to run the job with custom parameters.  
Choose --help for a brief guide.  
Default params are stored in [config](/mlvm-project/src/config/components.py)

### Sample run template

In [None]:
"""
%cd mlvm-project
!chmod u+x main.py;
!python main.py \
    --root ./data/artist_dataset \
    --trained_model_path ... \
    --resume_training \
    --batch-size ... \
    --reduce-factor ... \
    --augment \
    --num-epochs ... \
    --train-log-frequency ... \
    --val-log-frequency ... \
    --lr ... \
    --momentum ... \
    --wd ... \
    --scheduler \
    --scheduler-milestones ... ... ... \
    --scheduler-factors ... ... ... \
    --save-models \
    --save-models-step ... \
    --inference-only ... \
    --backbone-type ... \
    --use-handcrafted ;
%cd ..
"""

### Run job
Change the params basing on customization needs

In [None]:
%cd mlvm-project
!chmod u+x main.py;
!python main.py \
    --root ./data/artist_dataset \
    --trained_model_path ... \
    --resume_training \
    --augment \
    --num-epochs ... \
    --lr ... \
    --wd ... \
    --scheduler custom_step_lr \
    --scheduler-milestones 10 15 20 \
    --scheduler-factors 0.5 0.5 0.25 \
    --use-handcrafted ;
%cd ..

## Save results

Cached results are moved to official folder, then saved in a homonymous one on Drive.

In [None]:
!p=$(find mlvm-project -maxdepth 1 -mindepth 1 -type d -name '202*'); cp -r ./mlvm-project/temp/ $p; cp -r $p /content/drive/MyDrive/mlvm_shared/; echo $p