# MLDL23 - NAS for Tiny Visual Wake Words


### Install libraries and clone the project

In [1]:
!pip install wget
!pip install fvcore
!pip install pyvww
!pip install timm

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wget
  Downloading wget-3.2.zip (10 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25l[?25hdone
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9657 sha256=70d97d62dcc6a72b355fa85c5ec16830359e77047970337617f76095f7eec30b
  Stored in directory: /root/.cache/pip/wheels/8b/f1/7f/5c94f0a7a505ca1c81cd1d9208ae2064675d97582078e6c769
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting fvcore
  Downloading fvcore-0.1.5.post20221221.tar.gz (50 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.2/50.2 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
C

In [2]:
!git clone https://github.com/alessandroficca9/MLDL23_NAS_project

Cloning into 'MLDL23_NAS_project'...
remote: Enumerating objects: 370, done.[K
remote: Counting objects: 100% (23/23), done.[K
remote: Compressing objects: 100% (17/17), done.[K
remote: Total 370 (delta 9), reused 14 (delta 6), pack-reused 347[K
Receiving objects: 100% (370/370), 102.57 KiB | 1.80 MiB/s, done.
Resolving deltas: 100% (232/232), done.


### Download the dataset

In [3]:
!python /content/MLDL23_NAS_project/data/download_coco_data.py

downloading train2017.zip ...
Downloading: 100% [19336861798 / 19336861798] bytes
Unzipping train2017.zip ...
downloading val2017.zip ...
Downloading: 100% [815585330 / 815585330] bytes
Unzipping val2017.zip ..
Moving all files into one folder ...
moved train files
moved val files
Downloading annotations files ...
Download data complete


In [4]:
TRAIN_ANNOTATIONS_FILE="COCOdataset/annotations/instances_train2017.json"
VAL_ANNOTATIONS_FILE="COCOdataset/annotations/instances_val2017.json"
DIR="COCOdataset/annotations/"
!python /content/MLDL23_NAS_project/data/create_coco_train_minival_split.py \
  --train_annotations_file="{TRAIN_ANNOTATIONS_FILE}" \
  --val_annotations_file="{VAL_ANNOTATIONS_FILE}" \
--output_dir="{DIR}"


In [5]:
MAXITRAIN_ANNOTATIONS_FILE="COCOdataset/annotations/instances_maxitrain.json"
MINIVAL_ANNOTATIONS_FILE="COCOdataset/annotations/instances_minival.json"
VWW_OUTPUT_DIR="visualwakewords"
!python /content/MLDL23_NAS_project/data/create_visualwakewords_annotations.py \
  --train_annotations_file="{MAXITRAIN_ANNOTATIONS_FILE}" \
  --val_annotations_file="{MINIVAL_ANNOTATIONS_FILE}" \
  --output_dir="{VWW_OUTPUT_DIR}" \
  --threshold=0.005 \
  --foreground_class='person'

Processing /content/COCOdataset/annotations/instances_maxitrain.json...
loading annotations into memory...
Done (t=15.34s)
creating index...
index created!
There are 55233 images that now have label person, of the 115228 images in total.
Processing /content/COCOdataset/annotations/instances_minival.json...
loading annotations into memory...
Done (t=0.69s)
creating index...
index created!
There are 3800 images that now have label person, of the 8059 images in total.


# Some experiments

### Run the search algorithms

In [None]:
!export 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512'

**Random search**:
- number of iterations = 20000
- metrics = without cost (considering only Synflow and NASWOT)
- max_blocks = 13 (min_blocks=3)
- variable lenght of the model


In [None]:
!python /content/MLDL23_NAS_project/src/run_search.py \
    --algo random_search \
    --max_flops 200000000 \
    --max_params 2500000 \
    --metrics without_cost \
    --n_random 20000 \
    --max_blocks 13 \
    --save True

Start random search ...
100% 20000/20000 [30:56<00:00, 10.77it/s]
Finish random search.
Remaining 204 that satisfy constraints
Best exemplar obtained ---
Model: NetworkDecoded(
  (layers): Sequential(
    (0): InvertedResidualBlock(
      (conv1x1_1): Sequential(
        (0): Conv2d(3, 12, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
      (depthwise): Sequential(
        (0): Conv2d(12, 12, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=12, bias=False)
        (1): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
      (conv1x1_2): Sequential(
        (0): Conv2d(12, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (relu): ReLU(inplace=True)
      (downsample): C

**Evolutionary algorithm**:
-  Population size = 25
- crossover = True
- #generation = 1000
- metrics = without_cost
- variable lenght

In [None]:
!python /content/MLDL23_NAS_project/src/run_search.py \
    --algo "ea_search" \
    --max_flops 200000000 \
    --max_params 2500000 \
    --initial_pop 25 \
    --generation_ea 1000 \
    --max_blocks 13 \
    --save True

Start Evolutionary search ...
Population initialization ...
Population size: 1/25
Population size: 2/25
Population size: 3/25
Population size: 4/25
Population size: 5/25
Population size: 6/25
Population size: 7/25
Population size: 8/25
Population size: 9/25
Population size: 10/25
Population size: 11/25
Population size: 12/25
Population size: 13/25
Population size: 14/25
Population size: 15/25
Population size: 16/25
Population size: 17/25
Population size: 18/25
Population size: 19/25
Population size: 20/25
Population size: 21/25
Population size: 22/25
Population size: 23/25
Population size: 24/25
Population size: 25/25
Start evolution ...
100% 1000/1000 [41:48<00:00,  2.51s/it]
End evolution ...
Best exemplar obtained ---
Model: NetworkDecoded(
  (layers): Sequential(
    (0): InvertedResidualBlock(
      (conv1x1_1): Sequential(
        (0): Conv2d(3, 18, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(18, eps=1e-05, momentum=0.1, affine=True, track_running_stat

In this random search algorithm,we evalute the model according to the training-free metrics and the number of parameters and flops. A model with fewer parameters and flops is preferred in the ranking compared to a model with higher computational cost

## Training

**Training model obtained with the random search**

In [None]:
!python /content/MLDL23_NAS_project/src/run_train.py \
    --model "Model_random_search.pth" \
    --root_data "COCOdataset/all2017" \
    --ann_train "visualwakewords/instances_train.json" \
    --ann_val "visualwakewords/instances_val.json" \
    --batch_size 256 \
    --learning_rate 0.01 \
    --momentum 0.9 \
    --epochs 40 \
    --weight_decay 0.0001


loading annotations into memory...
Done (t=4.01s)
creating index...
index created!
loading annotations into memory...
Done (t=0.59s)
creating index...
index created!
training epoch number 0.00 of total epochs of 40.00
100% 361/361 [04:07<00:00,  1.46it/s, training with Current loss: 0.0026, Accuracy: 58.8655, at iteration: 360.0]
100% 91/91 [01:03<00:00,  1.43it/s, validation with Current loss: 0.0027, Accuracy: 62.2217, at iteration: 90.0]
Epoch: 1
	 Training loss 0.00260, Training accuracy 58.87
	 Validation loss 0.00272, Validation accuracy 62.22
training epoch number 1.00 of total epochs of 40.00
100% 361/361 [04:07<00:00,  1.46it/s, training with Current loss: 0.0024, Accuracy: 67.1631, at iteration: 360.0]
100% 91/91 [01:03<00:00,  1.44it/s, validation with Current loss: 0.0027, Accuracy: 66.4743, at iteration: 90.0]
Epoch: 2
	 Training loss 0.00237, Training accuracy 67.16
	 Validation loss 0.00267, Validation accuracy 66.47
training epoch number 2.00 of total epochs of 40.00
10

**Training model obtained with the evolutionary algorithm**

In [14]:
!python /content/MLDL23_NAS_project/src/run_train.py \
    --model "Model_ea_search.pth" \
    --root_data "COCOdataset/all2017" \
    --ann_train "visualwakewords/instances_train.json" \
    --ann_val "visualwakewords/instances_val.json" \
    --batch_size 256 \
    --learning_rate 0.1 \
    --momentum 0.9 \
    --epochs 40 \
    --weight_decay 0.0001


loading annotations into memory...
Done (t=4.01s)
creating index...
index created!
loading annotations into memory...
Done (t=0.22s)
creating index...
index created!
training epoch number 0.00 of total epochs of 40.00
100% 361/361 [04:07<00:00,  1.46it/s, training with Current loss: 0.0027, Accuracy: 53.6400, at iteration: 360.0]
100% 91/91 [01:03<00:00,  1.44it/s, validation with Current loss: 0.0027, Accuracy: 56.5112, at iteration: 90.0]
Epoch: 1
	 Training loss 0.00270, Training accuracy 53.64
	 Validation loss 0.00269, Validation accuracy 56.51
training epoch number 1.00 of total epochs of 40.00
100% 361/361 [04:06<00:00,  1.46it/s, training with Current loss: 0.0026, Accuracy: 58.7766, at iteration: 360.0]
100% 91/91 [01:03<00:00,  1.43it/s, validation with Current loss: 0.0026, Accuracy: 60.1606, at iteration: 90.0]
Epoch: 2
	 Training loss 0.00263, Training accuracy 58.78
	 Validation loss 0.00263, Validation accuracy 60.16
training epoch number 2.00 of total epochs of 40.00
10

In [18]:
!python /content/MLDL23_NAS_project/src/run_search.py \
    --algo  MobileNetV2 \
    --save True

MobileNetV2 --- 
 params = 398690 flops = 72384968 

Synflow score: 1412040.777842849
NASWOT score: 453.49859619140625


In [20]:
!python /content/MLDL23_NAS_project/src/run_train.py \
    --model "Model_MobileNetV2.pth" \
    --root_data "COCOdataset/all2017" \
    --ann_train "visualwakewords/instances_train.json" \
    --ann_val "visualwakewords/instances_val.json" \
    --batch_size 512 \
    --learning_rate 0.04 \
    --momentum 0.9 \
    --epochs 40 \
    --weight_decay 0.00001

loading annotations into memory...
Done (t=4.01s)
creating index...
index created!
loading annotations into memory...
Done (t=0.22s)
creating index...
index created!
training epoch number 0.00 of total epochs of 40.00
100% 181/181 [04:07<00:00,  1.37s/it, training with Current loss: 0.0015, Accuracy: 54.6174, at iteration: 180.0]
100% 46/46 [01:06<00:00,  1.44s/it, validation with Current loss: 0.0025, Accuracy: 52.1024, at iteration: 45.0]
Epoch: 1
	 Training loss 0.00152, Training accuracy 54.62
	 Validation loss 0.00252, Validation accuracy 52.10
training epoch number 1.00 of total epochs of 40.00
100% 181/181 [04:08<00:00,  1.37s/it, training with Current loss: 0.0013, Accuracy: 59.2799, at iteration: 180.0]
100% 46/46 [01:06<00:00,  1.45s/it, validation with Current loss: 0.0023, Accuracy: 60.9460, at iteration: 45.0]
Epoch: 2
	 Training loss 0.00135, Training accuracy 59.28
	 Validation loss 0.00226, Validation accuracy 60.95
training epoch number 2.00 of total epochs of 40.00
10