# HMSA - Combining the HMSA and ResNest Architectures


## Before Running the Code
1. Ensure that the runtime is set to GPU!
2. Ensure that the Colab VM is in the same region as the Google Cloud Storage Buckets (which are `US-multi-region`).
  - This minimizes costs, as Google charges for network egress.
  - The Colab region can be checked by running the cell below.
  - Perform a factory reset of the runtime until the runtime is in the US.

In [1]:
!curl ipinfo.io

{
  "ip": "35.199.18.110",
  "hostname": "110.18.199.35.bc.googleusercontent.com",
  "city": "Washington",
  "region": "Washington, D.C.",
  "country": "US",
  "loc": "38.8951,-77.0364",
  "org": "AS15169 Google LLC",
  "postal": "20045",
  "timezone": "America/New_York",
  "readme": "https://ipinfo.io/missingauth"
}

In [2]:
# check GPU specs
!nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-1946430a-41c4-8341-ae23-784393f7ce9d)


## 1. Setup


### 1.1. Clone Git Repo

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%cd /content/drive/MyDrive/CSC413/
!git clone https://github.com/Brian0615/CSC413FinalProject.git
%cd /content/drive/MyDrive/CSC413/CSC413FinalProject/
!git checkout brian

/content/drive/MyDrive/CSC413
fatal: destination path 'CSC413FinalProject' already exists and is not an empty directory.
/content/drive/MyDrive/CSC413/CSC413FinalProject
M	HMSA/train.py
M	PyTorch-Encoding/docs/source/_static/img/upconv.png
M	PyTorch-Encoding/tests/lint.py
Branch 'brian' set up to track remote branch 'brian' from 'origin'.
Switched to a new branch 'brian'


In [None]:
%cd /content/drive/MyDrive/CSC413/CSC413FinalProject/
!git pull

###1.2. Download Cityscapes Dataset

In [None]:
%cd /content
!mkdir cityscapes
%cd cityscapes
!mkdir leftImg8bit_trainvaltest
!mkdir gtFine_trainvaltest
%cd leftImg8bit_trainvaltest
!mkdir leftImg8bit
%cd leftImg8bit
!mkdir train
!mkdir val
!mkdir test

In [4]:
!gcloud init

Welcome! This command will take you through the configuration of gcloud.

Settings from your current configuration [default] are:
component_manager:
  disable_update_check: 'True'
compute:
  gce_metadata_read_timeout_sec: '0'

Pick configuration to use:
 [1] Re-initialize this configuration [default] with new settings 
 [2] Create a new configuration
Please enter your numeric choice:  1

Your current configuration has been set to: [default]

You can skip diagnostics next time by using the following flag:
  gcloud init --skip-diagnostics

Network diagnostic detects and fixes local network connection issues.
Reachability Check passed.
Network diagnostic passed (1/1 checks passed).

You must log in to continue. Would you like to log in (Y/n)?  y

Go to the following link in your browser:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=32555940559.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=openid+https%3A%2F%2Fwww.googleapi

In [None]:
!gsutil -m cp -r gs://csc413-final-project-cityscapes-data/gtFine /content/cityscapes/gtFine_trainvaltest/

In [None]:
!gsutil -m cp -r gs://csc413-final-project-cityscapes-data/leftImg8bit/train/ /content/cityscapes/leftImg8bit_trainvaltest/leftImg8bit/
!gsutil -m cp -r gs://csc413-final-project-cityscapes-data/leftImg8bit/val/ /content/cityscapes/leftImg8bit_trainvaltest/leftImg8bit/
!gsutil -m cp -r gs://csc413-final-project-cityscapes-data/leftImg8bit/test/ /content/cityscapes/leftImg8bit_trainvaltest/leftImg8bit/

### 1.3. Download Weights


In [110]:
!mkdir /content/drive/MyDrive/data/seg_weights/
!gsutil -m cp gs://csc413-final-project-hmsa-weights/original_weights/ocrnet.HRNet_industrious-chicken.pth /content/drive/MyDrive/data/seg_weights/
!gsutil -m cp gs://csc413-final-project-hmsa-weights/original_weights/hrnetv2_w48_imagenet_pretrained.pth //content/drive/MyDrive/data/seg_weights/

mkdir: cannot create directory ‘/content/drive/MyDrive/data/seg_weights/’: File exists
Copying gs://csc413-final-project-hmsa-weights/original_weights/hrnetv2_w48_imagenet_pretrained.pth...
| [1/1 files][296.2 MiB/296.2 MiB] 100% Done                                    
Operation completed over 1 objects/296.2 MiB.                                    


In [11]:
!mkdir /content/drive/MyDrive/data/uniform_centroids/
!gsutil -m cp gs://csc413-final-project-cityscapes-data/uniform_centroids/* /content/drive/MyDrive/data/uniform_centroids/

mkdir: cannot create directory ‘/content/drive/MyDrive/data/uniform_centroids/’: File exists
Copying gs://csc413-final-project-cityscapes-data/uniform_centroids/cityscapes_cv0_tile1024.json...


### 1.4. Install Dependencies

In [4]:
# install basic dependencies
!pip install runx==0.0.6 numpy sklearn h5py jupyter scikit-image pillow piexif cffi tqdm dominate opencv-python nose ninja
!apt-get update
!apt-get install libgtk2.0-dev -y
!rm -rf /var/lib/apt/lists/*

Collecting runx==0.0.6
  Downloading https://files.pythonhosted.org/packages/6b/4f/757e3a0bdf6c94f6d2571cf5ab6fc3812535f0bf918fb2609837eca1bd0a/runx-0.0.6-py3-none-any.whl
Collecting piexif
  Downloading https://files.pythonhosted.org/packages/2c/d8/6f63147dd73373d051c5eb049ecd841207f898f50a5a1d4378594178f6cf/piexif-1.1.3-py2.py3-none-any.whl
Collecting dominate
  Downloading https://files.pythonhosted.org/packages/ef/a8/4354f8122c39e35516a2708746d89db5e339c867abbd8e0179bccee4b7f9/dominate-2.6.0-py2.py3-none-any.whl
Collecting nose
[?25l  Downloading https://files.pythonhosted.org/packages/15/d8/dd071918c040f50fa1cf80da16423af51ff8ce4a0f2399b7bf8de45ac3d9/nose-1.3.7-py3-none-any.whl (154kB)
[K     |████████████████████████████████| 163kB 30.7MB/s 
[?25hCollecting ninja
[?25l  Downloading https://files.pythonhosted.org/packages/1d/de/393468f2a37fc2c1dc3a06afc37775e27fde2d16845424141d4da62c686d/ninja-1.10.0.post2-py3-none-manylinux1_x86_64.whl (107kB)
[K     |███████████████████████

In [5]:
# change cuda version to 10.1
%cd /usr/local/
!rm -rf cuda
!ln -s /usr/local/cuda-10.1 /usr/local/cuda  # replace symlink to cuda-11.0 with cuda-10.1

/usr/local


In [6]:
%cd /home/
!git clone https://github.com/NVIDIA/apex.git apex
%cd apex
!git checkout a651e2c24ecf97cbf367fd3f330df36760e1c597 .

/home
Cloning into 'apex'...
remote: Enumerating objects: 8038, done.[K
remote: Counting objects: 100% (125/125), done.[K
remote: Compressing objects: 100% (90/90), done.[K
remote: Total 8038 (delta 58), reused 65 (delta 30), pack-reused 7913[K
Receiving objects: 100% (8038/8038), 14.10 MiB | 28.71 MiB/s, done.
Resolving deltas: 100% (5459/5459), done.
/home/apex


In [7]:
# install apex
!python setup.py install --cuda_ext --cpp_ext



torch.__version__  = 1.8.1+cu101



Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
from /usr/local/cuda/bin

running install
running bdist_egg
running egg_info
creating apex.egg-info
writing apex.egg-info/PKG-INFO
writing dependency_links to apex.egg-info/dependency_links.txt
writing top-level names to apex.egg-info/top_level.txt
writing manifest file 'apex.egg-info/SOURCES.txt'
writing manifest file 'apex.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/apex
copying apex/__init__.py -> build/lib.linux-x86_64-3.7/apex
creating build/lib.linux-x86_64-3.7/apex/fp16_utils
copying apex/fp16_utils/fp16util.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
copying apex/fp16_utils/__init__.py 

In [8]:
%cd ~
!pip install git+https://github.com/zhanghang1989/PyTorch-Encoding/

/root
Collecting git+https://github.com/zhanghang1989/PyTorch-Encoding/
  Cloning https://github.com/zhanghang1989/PyTorch-Encoding/ to /tmp/pip-req-build-utdc140f
  Running command git clone -q https://github.com/zhanghang1989/PyTorch-Encoding/ /tmp/pip-req-build-utdc140f
Collecting portalocker
  Downloading https://files.pythonhosted.org/packages/68/33/cb524f4de298509927b90aa5ee34767b9a2b93e663cf354b2a3efa2b4acd/portalocker-2.3.0-py2.py3-none-any.whl
Building wheels for collected packages: torch-encoding
  Building wheel for torch-encoding (setup.py) ... [?25l[?25hdone
  Created wheel for torch-encoding: filename=torch_encoding-1.2.2b20210420-cp37-cp37m-linux_x86_64.whl size=8227853 sha256=ff90a62558eb3950c9ed232cdeae4254ec2e82104479221b4492208bebf1a014
  Stored in directory: /tmp/pip-ephem-wheel-cache-t3dxml23/wheels/f8/4f/46/924a4c89ee95252b34c3e257f1de2664a053e52c5aa5013d4a
Successfully built torch-encoding
Installing collected packages: portalocker, torch-encoding
Successfully 

### 1.5. Config Setup
 - Inside `CSC413FinalProject/HMSA/config.py`, set the following items:
  ```
  __C.ASSETS_PATH = '/content/drive/MyDrive/data'
  __C.DATASET.CITYSCAPES_DIR = \
  os.path.join(__C.ASSETS_PATH, 'citys')
  ```

* Copy the most recent snapshot weights into `/content/drive/MyDrive/data/seg_weights/`

* Since Colab only has one GPU, we need to adjust the evaluation script

* Inside `CSC413FinalProject/HMSA/scripts/train_cityscapes_deepv3.yml`, set the following:
  ```
  CMD: "python -m torch.distributed.launch --nproc_per_node=1 train.py"
  bs_trn: 32
  arch: mscale.DeepV3RN50,
  snapshot: "ASSETS_PATH/seg_weights/<weight-file-name>.pth",

  ```

## 2. Training on Cityscapes

In [9]:
# memory footprint support libraries/code
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install gputil
!pip install psutil
!pip install humanize

import psutil
import humanize
import os
import GPUtil as GPU

GPUs = GPU.getGPUs()
# XXX: only one GPU on Colab and isn’t guaranteed
gpu = GPUs[0]
def printm():
    process = psutil.Process(os.getpid())
    print("Gen RAM Free: " + humanize.naturalsize(psutil.virtual_memory().available), " |     Proc size: " + humanize.naturalsize(process.memory_info().rss))
    print("GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total     {3:.0f}MB".format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))
printm()

Collecting gputil
  Downloading https://files.pythonhosted.org/packages/ed/0e/5c61eedde9f6c87713e89d794f01e378cfd9565847d4576fa627d758c554/GPUtil-1.4.0.tar.gz
Building wheels for collected packages: gputil
  Building wheel for gputil (setup.py) ... [?25l[?25hdone
  Created wheel for gputil: filename=GPUtil-1.4.0-cp37-none-any.whl size=7411 sha256=3aced381f783065a949bd1875adf8af45800bf5c05f619ba9f1d3d16e369588b
  Stored in directory: /root/.cache/pip/wheels/3d/77/07/80562de4bb0786e5ea186911a2c831fdd0018bda69beab71fd
Successfully built gputil
Installing collected packages: gputil
Successfully installed gputil-1.4.0
Gen RAM Free: 12.7 GB  |     Proc size: 120.2 MB
GPU RAM Free: 15109MB | Used: 0MB | Util   0% | Total     15109MB


In [10]:
%cd /content/drive/MyDrive/CSC413/CSC413FinalProject/HMSA_ResNest/

/content/drive/MyDrive/CSC413/CSC413FinalProject/HMSA_ResNest


In [11]:
# dry run (to see full command)
!python -m runx.runx scripts/train_cityscapes.yml -i -n

python -m torch.distributed.launch --nproc_per_node=1 train.py --dataset cityscapes --cv 0 --syncbn --apex --fp16 --crop_size 300,600 --bs_trn 8 --poly_exp 2 --lr 0.003067 --rmi_loss --max_epoch 100 --n_scales 0.5,1.0,2.0 --supervised_mscale_loss_wt 0.05 --snapshot ASSETS_PATH/seg_weights/HMSA_ResNest_resolute-axolotl_ep4.pth --arch mscale.DeepV3RN50 --result_dir logs/train_cityscapes/mscale.DeepV3RN50_tall-dormouse_2021.04.20_00.36 


In [13]:
# real run
!python -m runx.runx scripts/train_cityscapes.yml -i

None
Global Rank: 0 Local Rank: 0
Torch version: 1.8, 1.8.1+cu101
n scales [0.5, 1.0, 2.0]
dataset = cityscapes
ignore_label = 255
num_classes = 19
cv split val 0 ['val/frankfurt', 'val/lindau', 'val/munster']
mode val found 500 images
cn num_classes 19
cv split train 0 ['train/aachen', 'train/bochum', 'train/bremen', 'train/cologne', 'train/darmstadt', 'train/dusseldorf', 'train/erfurt', 'train/hamburg', 'train/hanover', 'train/jena', 'train/krefeld', 'train/monchengladbach', 'train/strasbourg', 'train/stuttgart', 'train/tubingen', 'train/ulm', 'train/weimar', 'train/zurich']
mode train found 2975 images
cn num_classes 19
Loading centroid file /content/drive/MyDrive/data/uniform_centroids/cityscapes_cv0_tile1024.json
Found 19 centroids
Class Uniform Percentage: 0.5
Class Uniform items per Epoch: 2975
cls 0 len 5866
cls 1 len 5184
cls 2 len 5678
cls 3 len 1312
cls 4 len 1723
cls 5 len 5656
cls 6 len 2769
cls 7 len 4860
cls 8 len 5388
cls 9 len 2440
cls 10 len 4722
cls 11 len 3719
cls 1

In [None]:
# upload results to google cloud bucket
# !gsutil -m cp -r /content/uniform_centroids/* gs://csc413-final-project-cityscapes-data/uniform_centroids/
!gsutil -m cp -r /content/drive/MyDrive/CSC413FinalProject/HMSA/logs/train_cityscapes/* gs://csc413-final-project-hmsa-weights/training_results/

In [None]:
# folder name: jasper-ocelot