## Check what GPU you got
Click the Runtime dropdown at the top of the page, then Change Runtime Type and confirm the instance type is GPU.

Check the output of !nvidia-smi to make sure you've been allocated a Tesla P100.

In [2]:
!nvidia-smi

Sat May  2 16:43:02 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   32C    P0    25W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

## Pre-requisites
Mount the source code and set up Nltk

In [3]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

import os
os.chdir("/content/drive/My Drive/Language-Detector/src")
!ls

!pip3 install torch torchvision

import nltk
nltk.download('all')


Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive
build-vocabs.py  helper.py    model	  __pycache__  train.py
dataloader	 __init__.py  predict.py  test.py
[nltk_data] Downloading collection 'all'
[nltk_data]    | 
[nltk_data]    | Downloading package abc to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/abc.zip.
[nltk_data]    | Downloading package alpino to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/alpino.zip.
[nltk_data]    | Downloading package biocreative_ppi to
[nltk_data]    |    

True

## Build the Vocabs
Make the vocabulary from a dataset

In [0]:
%%shell
DATASET=Hansard-Multi30k
TRAIN=../data/${DATASET}/Training/
TEST=../data/${DATASET}/Testing/

echo "Making the vocabulary"

python3 build-vocabs.py $TRAIN ../models/${DATASET}/vocab.gz --langs en fr --min-frequency 2

echo "Finished making the vocabulary"

Making the vocabulary
Building en vocab from 421 transcriptions
100% 1082745/1082745 [03:09<00:00, 5717.27it/s]
Built 44673 vocabs
Building fr vocab from 426 transcriptions
100% 1101309/1101309 [03:38<00:00, 5034.36it/s]
Built 64896 vocabs
Built 97250 vocabs
Finished making the vocabulary




## Train
Train the model with attention

In [0]:
%%shell
DATASET=Hansard-Multi30k
TRAIN=../data/${DATASET}/Training/
TEST=../data/${DATASET}/Testing/

echo "Training the model"

python3 train.py "${TRAIN}" \
    "../models/${DATASET}/vocab.gz" \
    "../models/${DATASET}/model.pt" \
    --langs en fr \
    --patience 3 \
    --train-val-ratio 0.75 \
    --batch-size 32 \
    --seed 1 \
    --device cuda \
    --resume-from-checkpoint "../models/${DATASET}/checkpoint.pt" \
    --save-checkpoint-to "../models/${DATASET}/checkpoint.pt" \

echo "Finished training"

Training the model
Loaded 97250 words
100% 1082745/1082745 [04:18<00:00, 4188.05it/s]
100% 1101309/1101309 [04:54<00:00, 3744.32it/s]
100% 51189/51189 [19:27<00:00, 43.86it/s]
100% 17063/17063 [05:37<00:00, 50.54it/s]
Epoch=1 Train-Loss=0.012346373213826853 Train-Acc=0.9932053273164156 Test-Loss=0.010792660666332204 Test-Acc=0.9935159568504444 Num-Poor=0
Saved checkpoint
100% 51189/51189 [17:57<00:00, 47.50it/s]
100% 17063/17063 [05:35<00:00, 50.85it/s]
Epoch=2 Train-Loss=0.010500692987510017 Train-Acc=0.9936863877004826 Test-Loss=0.010669575334231952 Test-Acc=0.9935789427416046 Num-Poor=0
Saved checkpoint
100% 51189/51189 [18:16<00:00, 46.69it/s]
100% 17063/17063 [05:43<00:00, 49.71it/s]
Epoch=3 Train-Loss=0.010185857433744587 Train-Acc=0.9905442331360254 Test-Loss=0.010693859521125476 Test-Acc=0.9935423137783508 Num-Poor=0
Saved checkpoint
100% 51189/51189 [18:15<00:00, 46.74it/s]
100% 17063/17063 [05:44<00:00, 49.59it/s]
Epoch=4 Train-Loss=0.010166060295866823 Train-Acc=0.9932004434

## Test

In [0]:
%%shell
DATASET=Hansard-Multi30k
TRAIN=../data/${DATASET}/Training/
TEST=../data/${DATASET}/Testing/

echo "Testing the model"

python3 test.py "${TRAIN}" \
    "../models/${DATASET}/vocab.gz" \
    "../models/${DATASET}/model.pt" \
    --langs en fr \
    --batch-size 32 \
    --seed 1 \
    --device cuda \

echo "Finished testing"

Testing the model
Loaded 97250 words
