### Environment Setup
Install
```
pip install torch torchvision --user
pip install tqdm --user
pip install git+https://github.com/tagucci/pythonrouge.git
```

If `pythonrouge` gives an error about `perl` saying `non-zero exit status 2`:
```
apt-get install libxml-parser-perl
```



In [1]:
ls -lh

total 1012M
-rwxr-xr-x 1 ubuntu ubuntu  1.7M Mar 12 19:08 [0m[01;32mConvert_IA_data_to_jsonl.ipynb[0m*
drwxr-xr-x 3 ubuntu ubuntu  4.0K Mar 12 19:11 [01;34mdata[0m/
drwxr-xr-x 7 ubuntu ubuntu  4.0K Mar 12 19:09 [01;34mdataset[0m/
-rwxr-xr-x 1 ubuntu ubuntu   280 Mar 11 17:22 [01;32mdownload.sh[0m*
-rwxr-xr-x 1 ubuntu ubuntu  3.0K Mar 11 17:22 [01;32meval.py[0m*
-rw-rw-r-- 1 ubuntu ubuntu 1007M Mar 12 19:00 ia_data_all.jsonl
-rwxr-xr-x 1 ubuntu ubuntu   603 Mar 11 17:22 [01;32mLICENSE[0m*
drwxr-xr-x 3 ubuntu ubuntu  4.0K Mar 12 19:18 [01;34mmodel[0m/
-rw------- 1 ubuntu ubuntu  619K Mar 14 12:42 nohup_Nan.out
-rw------- 1 ubuntu ubuntu  1.8M Mar 17 05:01 nohup.out
-rwxr-xr-x 1 ubuntu ubuntu  4.1K Mar 11 17:22 [01;32mprepare_vocab.py[0m*
drwxr-xr-x 2 ubuntu ubuntu  4.0K Mar 11 17:22 [01;34mpretrained[0m/
-rwxr-xr-x 1 ubuntu ubuntu  4.4K Mar 11 17:22 [01;32mREADME.md[0m*
-rwxr-xr-x 1 ubuntu ubuntu  2.9K Mar 11 17:22 [01;32mrun.py[0m*
drwxrwxr-x 3 ubun

In [3]:
import torch

#### Check GPU Presence

In [None]:
ind = torch.cuda.current_device()
torch.cuda.device(ind)
print("# of GPUs:  {}".format(torch.cuda.device_count()))
print("GPU index:  {}".format(ind))
print("GPU name:   {}".format(torch.cuda.get_device_name(ind)))

#### Create Word Embeddings and Vocabulary
- Create a folder called `ia-patients` under `dataset`
- Converted `jsonl` data goes to this folder
- 3 files needed: `train.jsonl`, `dev.jsonl`, `test.jsonl`
- More training parameters are available in `train.py`

In [4]:
%run prepare_vocab.py dataset/ia-patients/ dataset/vocab --glove_dir dataset/glove

Directory dataset/vocab do not exist; creating...
loading files...
67106584 tokens from 421862 examples loaded from dataset/ia-patients//train.jsonl.
14384492 tokens from 90399 examples loaded from dataset/ia-patients//dev.jsonl.
14371443 tokens from 90398 examples loaded from dataset/ia-patients//test.jsonl.
loading glove...
1 words loaded from glove.
building vocab...
vocab built with 5/453276 words.
calculating oov...
train oov: 67106584/67106584 (100.00%)
dev oov: 14384492/14384492 (100.00%)
test oov: 14371443/14371443 (100.00%)
building embeddings...
embedding size: 5 x 100
dumping to files...
all done.


#### Train Model

In [2]:
%run train.py --id IA_Model_v1 --data_dir dataset/ia-patients/ --batch_size 10 --num_epoch 5

Vocab size 417194 loaded from file
Loading data from dataset/ia-patients/ with batch size 10...
14930 batches created for dataset/ia-patients//trainu.jsonl.
1867 batches created for dataset/ia-patients//devu.jsonl.
Config saved to file ./saved_models/IA_Model_v1/config.json
Overwriting old vocab file at ./saved_models/IA_Model_v1/vocab.pkl

Running with the following configs:
	data_dir : dataset/ia-patients/
	vocab_dir : dataset/vocab
	hidden_dim : 150
	emb_dim : 300
	num_layers : 2
	emb_dropout : 0.5
	dropout : 0.5
	lower : True
	max_dec_len : 80
	beam_size : 5
	top : 1000000
	train_data : trainu
	dev_data : devu
	attn_type : mlp
	cov : False
	cov_alpha : 0
	cov_loss_epoch : 0
	background : False
	concat_background : False
	use_bleu : False
	sample_train : 1.0
	lr : 0.001
	lr_decay : 0.9
	decay_epoch : 30
	optim : adam
	num_epoch : 5
	batch_size : 10
	max_grad_norm : 5.0
	log_step : 20
	log : logs.txt
	save_dir : ./saved_models
	id : IA_Model_v1
	info : 
	seed : 1234
	cuda : True
	cpu

RuntimeError: CUDA out of memory. Tried to allocate 1.77 GiB (GPU 0; 15.75 GiB total capacity; 13.93 GiB already allocated; 524.94 MiB free; 344.96 MiB cached)

In [2]:
!pip install git+https://github.com/tagucci/pythonrouge.git

Collecting git+https://github.com/tagucci/pythonrouge.git
  Cloning https://github.com/tagucci/pythonrouge.git to /tmp/pip-req-build-wo3fnghr
Building wheels for collected packages: pythonrouge
  Running setup.py bdist_wheel for pythonrouge ... [?25ldone
[?25h  Stored in directory: /tmp/pip-ephem-wheel-cache-jaelxltv/wheels/fd/ff/be/6716935d513fa8656ab185cb0aa70aed382b72dda42bf09c95
Successfully built pythonrouge
[31msagemaker 1.18.5 has requirement requests<2.21,>=2.20.0, but you'll have requests 2.21.0 which is incompatible.[0m
[31mdocker-compose 1.23.2 has requirement requests!=2.11.0,!=2.12.2,!=2.18.0,<2.21,>=2.6.1, but you'll have requests 2.21.0 which is incompatible.[0m
Installing collected packages: pythonrouge
Successfully installed pythonrouge-0.2
[33mYou are using pip version 10.0.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [1]:
!nvidia-smi

Tue Apr 16 15:58:04 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   36C    P0    25W / 300W |      0MiB / 16130MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage    

In [None]:
!kill -9 2480


In [None]:
sudo kill -9 6154 // sudo kill -9 6154