# Deep Learning Miniproject
## Model training and testing for Google Colab

A live version of this notebook is available [here](https://colab.research.google.com/drive/1vWbWfP1b4aTkLgKY-1Nas5mSjwMp_wBL?usp=sharing).

The following notebook shows how to train and test a model using the scripts provided in the repository. The output saved below is the training and testing of our final results. The training takes approx. 4-5 hrs with a high-memory GPU session.

### Step 0: Verify GPU

Before running any scripts, verify GPU. At this time, Colab is typically serving 16GB Tesla T4 instances. 

In [None]:
!nvidia-smi

Mon Nov 21 14:48:05 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   44C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

### Step 1: Mount Google Drive

The scripts below use a mounted Google Drive to save checkpoints and logs. Colab has a habit of disconnecting while you are making coffee.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Step 2: Clone repository

Delete stale code if it is there and clone the current repo.

In [None]:
!rm -rf /content/cautious-fiesta && git clone https://github.com/cprimel/cautious-fiesta.git

Cloning into 'cautious-fiesta'...
remote: Enumerating objects: 386, done.[K
remote: Counting objects: 100% (39/39), done.[K
remote: Compressing objects: 100% (29/29), done.[K
remote: Total 386 (delta 19), reused 28 (delta 10), pack-reused 347[K
Receiving objects: 100% (386/386), 25.53 MiB | 17.61 MiB/s, done.
Resolving deltas: 100% (205/205), done.


### Step 3: Train model

We can now run the training script. The `log-dir` and `checkpoint-dir` arguments provided may need to be changed depending on whether or not you mounted Google Drive or not.

In [None]:
!cd cautious-fiesta && python train.py --config experiments/convmixer256_8_k5_p2_03.yml --log-dir=/content/drive/MyDrive/miniproject/logs --checkpoint-dir=/content/drive/MyDrive/miniproject/ckpts --batch-size=512 --experiment=convmixer256_8_k5_p2_05 --epochs=200

Preparing experiment convmixer256_8_k5_p2_05...
convmixer256_8_k5_p2 created, # of params: 594,186.
Files already downloaded and verified
Epoch 1 complete:
	Train Acc: 0.33
	Test Acc: 0.52
	lr: 0.00203
	Time: 88.8s
Epoch 2 complete:
	Train Acc: 0.45
	Test Acc: 0.65
	lr: 0.00213
	Time: 89.7s
Accuracy increased (0.52 -> 0.65). Saving model...
Epoch 3 complete:
	Train Acc: 0.50
	Test Acc: 0.72
	lr: 0.00229
	Time: 90.9s
Accuracy increased (0.65 -> 0.72). Saving model...
Epoch 4 complete:
	Train Acc: 0.55
	Test Acc: 0.73
	lr: 0.00252
	Time: 90.2s
Accuracy increased (0.72 -> 0.73). Saving model...
Epoch 5 complete:
	Train Acc: 0.60
	Test Acc: 0.78
	lr: 0.00281
	Time: 90.0s
Accuracy increased (0.73 -> 0.78). Saving model...
Epoch 6 complete:
	Train Acc: 0.62
	Test Acc: 0.81
	lr: 0.00317
	Time: 90.1s
Accuracy increased (0.78 -> 0.81). Saving model...
Epoch 7 complete:
	Train Acc: 0.61
	Test Acc: 0.81
	lr: 0.00359
	Time: 90.4s
Accuracy increased (0.81 -> 0.81). Saving model...
Epoch 8 complete:

### Step 5: Evaluate model on test data

Colab has likely disconnected while you were away. If so, redo Step 0-2. Now, find the latest checkpoint saved in the appropriate directory in the checkpoint folder. This will be the checkpoint with the highest validation accuracy. Copy its path below and provide the relevant commandline arguments that fit your case.

In [None]:
!cd cautious-fiesta && python test.py --model=convmixer256_8_k5_p2 --experiment=convmixer256_8_k5_p2_05 --checkpoint=/content/drive/MyDrive/miniproject/ckpts/convmixer256_8_k5_p2_05/convmixer256_8_k5_p2_191_1669012398.121431.pt --logs=/content/drive/MyDrive/miniproject/logs

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to .data/cifar-10-python.tar.gz
100% 170498071/170498071 [00:13<00:00, 12533631.85it/s]
Extracting .data/cifar-10-python.tar.gz to .data
Test: [10/40     Acc:  0.949     Time: 0.0035
Test: [20/40     Acc:  0.948     Time: 0.0033
Test: [30/40     Acc:  0.950     Time: 0.0031
Test: [40/40     Acc:  0.950     Time: 0.0080
Results:
	Test Acc: 0.950
