## Check what GPU you got
Click the Runtime dropdown at the top of the page, then Change Runtime Type and confirm the instance type is GPU.

Check the output of !nvidia-smi to make sure you've been allocated a Tesla P100.

In [1]:
!nvidia-smi

Fri May  1 22:28:43 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

## Pre-requisites
Mount the source code and set up Nltk

In [16]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

import os
os.chdir("/content/drive/My Drive/Language-Detector/src")
!ls

!pip3 install torch torchvision

import nltk
nltk.download('all')


Mounted at /content/drive
build-vocabs.py  helper.py    model	  __pycache__  train.py
dataloader	 __init__.py  predict.py  test.py
[nltk_data] Downloading collection 'all'
[nltk_data]    | 
[nltk_data]    | Downloading package abc to /root/nltk_data...
[nltk_data]    |   Package abc is already up-to-date!
[nltk_data]    | Downloading package alpino to /root/nltk_data...
[nltk_data]    |   Package alpino is already up-to-date!
[nltk_data]    | Downloading package biocreative_ppi to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Package biocreative_ppi is already up-to-date!
[nltk_data]    | Downloading package brown to /root/nltk_data...
[nltk_data]    |   Package brown is already up-to-date!
[nltk_data]    | Downloading package brown_tei to /root/nltk_data...
[nltk_data]    |   Package brown_tei is already up-to-date!
[nltk_data]    | Downloading package cess_cat to /root/nltk_data...
[nltk_data]    |   Package cess_cat is already up-to-date!
[nltk_data]    | Downloading pa

True

## Build the Vocabs
Make the vocabulary from a dataset

In [18]:
%%shell
DATASET=Multi30k
TRAIN=../data/${DATASET}/Training/
TEST=../data/${DATASET}/Testing/

echo "Making the vocabulary"

python3 build-vocabs.py $TRAIN ../models/${DATASET}/vocab.gz --langs en fr --min-frequency 2

echo "Finished making the vocabulary"

Making the vocabulary
Building en vocab from 1 transcriptions
100% 29461/29461 [00:04<00:00, 7035.88it/s]
Built 5933 vocabs
Building fr vocab from 1 transcriptions
100% 29461/29461 [00:04<00:00, 6101.77it/s]
Built 6743 vocabs
Built 11814 vocabs
Finished making the vocabulary




## Train
Train the model with attention

In [19]:
%%shell
DATASET=Multi30k
TRAIN=../data/${DATASET}/Training/
TEST=../data/${DATASET}/Testing/

echo "Training the model"

python3 train.py "${TRAIN}" \
    "../models/${DATASET}/vocab.gz" \
    "../models/${DATASET}/model.pt" \
    --langs en fr \
    --patience 3 \
    --train-val-ratio 0.75 \
    --batch-size 32 \
    --seed 1 \
    --device cuda \
    --resume-from-checkpoint "../models/${DATASET}/checkpoint.pt" \
    --save-checkpoint-to "../models/${DATASET}/checkpoint.pt" \

echo "Finished training"

Training the model
Loaded 11814 words
100% 29461/29461 [00:04<00:00, 7140.50it/s]
100% 29461/29461 [00:04<00:00, 6253.86it/s]
100% 1381/1381 [00:08<00:00, 162.08it/s]
100% 461/461 [00:02<00:00, 183.89it/s]
Epoch=1 Train-Loss=0.01789521402010994 Train-Acc=0.994637038377987 Test-Loss=3.66725477076478e-09 Test-Acc=1.0 Num-Poor=0
Saved checkpoint
100% 1381/1381 [00:08<00:00, 160.45it/s]
100% 461/461 [00:02<00:00, 176.68it/s]
Epoch=2 Train-Loss=6.8475759983770676e-06 Train-Acc=1.0 Test-Loss=6.58225212103817e-10 Test-Acc=1.0 Num-Poor=0
Saved checkpoint
100% 1381/1381 [00:08<00:00, 158.02it/s]
100% 461/461 [00:02<00:00, 178.80it/s]
Epoch=3 Train-Loss=2.162756262094883e-06 Train-Acc=1.0 Test-Loss=2.820965326842712e-10 Test-Acc=1.0 Num-Poor=0
Saved checkpoint
100% 1381/1381 [00:08<00:00, 157.18it/s]
100% 461/461 [00:02<00:00, 184.37it/s]
Epoch=4 Train-Loss=8.5039107579692e-07 Train-Acc=1.0 Test-Loss=9.403217756142372e-11 Test-Acc=1.0 Num-Poor=0
Saved checkpoint
100% 1381/1381 [00:08<00:00, 156.



## Test

In [20]:
%%shell
DATASET=Multi30k
TRAIN=../data/${DATASET}/Training/
TEST=../data/${DATASET}/Testing/

echo "Testing the model"

python3 test.py "${TRAIN}" \
    "../models/${DATASET}/vocab.gz" \
    "../models/${DATASET}/model.pt" \
    --langs en fr \
    --batch-size 32 \
    --seed 1 \
    --device cuda \

echo "Finished testing"

Testing the model
Loaded 11814 words
100% 29461/29461 [00:03<00:00, 7422.72it/s]
100% 29461/29461 [00:04<00:00, 6357.60it/s]
100% 1842/1842 [00:08<00:00, 205.93it/s]
Test loss=0.0, Test Accuracy=1.0
Finished testing


