### Environment Setup
Install
```
pip install torch torchvision --user
pip install tqdm --user
pip install git+https://github.com/tagucci/pythonrouge.git
```

If `pythonrouge` gives an error about `perl` saying `non-zero exit status 2`:
```
apt-get install libxml-parser-perl
```



In [1]:
ls -lh

total 1.7M
-rwxr-xr-x 1 ubuntu ubuntu 1.7M May 10 19:01 [0m[01;32mConvert_IA_data_to_jsonl.ipynb[0m*
-rw-rw-r-- 1 ubuntu ubuntu  19K May 27 02:12 train_IA_model_dc.ipynb
-rwxr-xr-x 1 ubuntu ubuntu  19K Apr 16 15:58 [01;32mtrain_IA_model.ipynb[0m*


In [1]:
import torch

#### Check GPU Presence

In [2]:
ind = torch.cuda.current_device()
torch.cuda.device(ind)
print("# of GPUs:  {}".format(torch.cuda.device_count()))
print("GPU index:  {}".format(ind))
print("GPU name:   {}".format(torch.cuda.get_device_name(ind)))

# of GPUs:  1
GPU index:  0
GPU name:   Tesla V100-SXM2-16GB


#### Create Word Embeddings and Vocabulary
- Create a folder called `ia-patients` under `dataset`
- Converted `jsonl` data goes to this folder
- 3 files needed: `train.jsonl`, `dev.jsonl`, `test.jsonl`
- More training parameters are available in `train.py`

In [None]:
#%run ../prepare_vocab.py ../dataset/ia-patients/ ../dataset/vocab --glove_dir ../dataset/glove

#### Train Model

In [3]:
%run train.py --id IA_Model_v1 --data_dir dataset/ia-patients/utku_reduced_data/ --batch_size 2 --num_epoch 3 --lr 0.001 --background --decay_epoch 15 --emb_dim 300

Vocab size 417194 loaded from file
Loading data from dataset/ia-patients/utku_reduced_data/ with batch size 2...
74648 batches created for dataset/ia-patients/utku_reduced_data//trainu.jsonl.
9331 batches created for dataset/ia-patients/utku_reduced_data//devu.jsonl.
Config saved to file ./saved_models/IA_Model_v1/config.json
Overwriting old vocab file at ./saved_models/IA_Model_v1/vocab.pkl

Running with the following configs:
	data_dir : dataset/ia-patients/utku_reduced_data/
	vocab_dir : dataset/vocab
	hidden_dim : 200
	emb_dim : 300
	num_layers : 2
	emb_dropout : 0.5
	dropout : 0.5
	lower : True
	max_dec_len : 80
	beam_size : 5
	top : 1000000
	train_data : trainu
	dev_data : devu
	attn_type : mlp
	cov : False
	cov_alpha : 0
	cov_loss_epoch : 0
	background : True
	concat_background : False
	use_bleu : False
	sample_train : 1.0
	lr : 0.001
	lr_decay : 0.9
	decay_epoch : 15
	optim : adam
	num_epoch : 3
	batch_size : 2
	max_grad_norm : 5.0
	log_step : 20
	log : logs.txt
	save_dir : ./s

2019-05-28 03:55:35: step 1440/223944 (epoch 1/3), loss = 5.137422 (0.187 sec/batch), lr: 0.001000
2019-05-28 03:55:39: step 1460/223944 (epoch 1/3), loss = 5.846762 (0.178 sec/batch), lr: 0.001000
2019-05-28 03:55:43: step 1480/223944 (epoch 1/3), loss = 6.678418 (0.224 sec/batch), lr: 0.001000
2019-05-28 03:55:47: step 1500/223944 (epoch 1/3), loss = 4.164983 (0.102 sec/batch), lr: 0.001000
2019-05-28 03:55:51: step 1520/223944 (epoch 1/3), loss = 6.179486 (0.453 sec/batch), lr: 0.001000
2019-05-28 03:55:56: step 1540/223944 (epoch 1/3), loss = 5.960763 (0.224 sec/batch), lr: 0.001000
2019-05-28 03:56:00: step 1560/223944 (epoch 1/3), loss = 6.125066 (0.143 sec/batch), lr: 0.001000
2019-05-28 03:56:05: step 1580/223944 (epoch 1/3), loss = 4.908601 (0.176 sec/batch), lr: 0.001000
2019-05-28 03:56:09: step 1600/223944 (epoch 1/3), loss = 6.873186 (0.516 sec/batch), lr: 0.001000
2019-05-28 03:56:14: step 1620/223944 (epoch 1/3), loss = 4.950609 (0.150 sec/batch), lr: 0.001000
2019-05-28

2019-05-28 04:01:09: step 3100/223944 (epoch 1/3), loss = 5.711883 (0.369 sec/batch), lr: 0.001000
2019-05-28 04:01:13: step 3120/223944 (epoch 1/3), loss = 4.896299 (0.153 sec/batch), lr: 0.001000
2019-05-28 04:01:17: step 3140/223944 (epoch 1/3), loss = 7.040502 (0.185 sec/batch), lr: 0.001000
2019-05-28 04:01:20: step 3160/223944 (epoch 1/3), loss = 5.858666 (0.136 sec/batch), lr: 0.001000
2019-05-28 04:01:25: step 3180/223944 (epoch 1/3), loss = 4.296470 (0.152 sec/batch), lr: 0.001000
2019-05-28 04:01:28: step 3200/223944 (epoch 1/3), loss = 5.193908 (0.312 sec/batch), lr: 0.001000
2019-05-28 04:01:32: step 3220/223944 (epoch 1/3), loss = 5.586099 (0.127 sec/batch), lr: 0.001000
2019-05-28 04:01:36: step 3240/223944 (epoch 1/3), loss = 4.987723 (0.198 sec/batch), lr: 0.001000
2019-05-28 04:01:39: step 3260/223944 (epoch 1/3), loss = 5.087118 (0.157 sec/batch), lr: 0.001000
2019-05-28 04:01:44: step 3280/223944 (epoch 1/3), loss = 5.295550 (0.252 sec/batch), lr: 0.001000
2019-05-28

2019-05-28 04:06:42: step 4760/223944 (epoch 1/3), loss = 5.277436 (0.133 sec/batch), lr: 0.001000
2019-05-28 04:06:46: step 4780/223944 (epoch 1/3), loss = 3.849370 (0.122 sec/batch), lr: 0.001000
2019-05-28 04:06:49: step 4800/223944 (epoch 1/3), loss = 5.439433 (0.252 sec/batch), lr: 0.001000
2019-05-28 04:06:54: step 4820/223944 (epoch 1/3), loss = 5.001100 (0.337 sec/batch), lr: 0.001000
2019-05-28 04:06:57: step 4840/223944 (epoch 1/3), loss = 4.090434 (0.158 sec/batch), lr: 0.001000
2019-05-28 04:07:02: step 4860/223944 (epoch 1/3), loss = 5.108887 (0.272 sec/batch), lr: 0.001000
2019-05-28 04:07:05: step 4880/223944 (epoch 1/3), loss = 5.241382 (0.192 sec/batch), lr: 0.001000
2019-05-28 04:07:09: step 4900/223944 (epoch 1/3), loss = 4.954860 (0.171 sec/batch), lr: 0.001000
2019-05-28 04:07:13: step 4920/223944 (epoch 1/3), loss = 5.224489 (0.150 sec/batch), lr: 0.001000
2019-05-28 04:07:17: step 4940/223944 (epoch 1/3), loss = 4.813218 (0.126 sec/batch), lr: 0.001000
2019-05-28

2019-05-28 04:12:14: step 6420/223944 (epoch 1/3), loss = 5.864668 (0.184 sec/batch), lr: 0.001000
2019-05-28 04:12:17: step 6440/223944 (epoch 1/3), loss = 5.246961 (0.167 sec/batch), lr: 0.001000
2019-05-28 04:12:21: step 6460/223944 (epoch 1/3), loss = 5.734756 (0.142 sec/batch), lr: 0.001000
2019-05-28 04:12:25: step 6480/223944 (epoch 1/3), loss = 4.550629 (0.143 sec/batch), lr: 0.001000
2019-05-28 04:12:29: step 6500/223944 (epoch 1/3), loss = 4.968910 (0.105 sec/batch), lr: 0.001000
2019-05-28 04:12:33: step 6520/223944 (epoch 1/3), loss = 5.133187 (0.161 sec/batch), lr: 0.001000
2019-05-28 04:12:37: step 6540/223944 (epoch 1/3), loss = 4.766873 (0.184 sec/batch), lr: 0.001000
2019-05-28 04:12:42: step 6560/223944 (epoch 1/3), loss = 5.393078 (0.211 sec/batch), lr: 0.001000
2019-05-28 04:12:46: step 6580/223944 (epoch 1/3), loss = 5.775028 (0.207 sec/batch), lr: 0.001000
2019-05-28 04:12:50: step 6600/223944 (epoch 1/3), loss = 5.857060 (0.492 sec/batch), lr: 0.001000
2019-05-28

2019-05-28 04:17:47: step 8080/223944 (epoch 1/3), loss = 4.986622 (0.209 sec/batch), lr: 0.001000
2019-05-28 04:17:51: step 8100/223944 (epoch 1/3), loss = 5.318104 (0.156 sec/batch), lr: 0.001000
2019-05-28 04:17:55: step 8120/223944 (epoch 1/3), loss = 5.267671 (0.279 sec/batch), lr: 0.001000
2019-05-28 04:17:59: step 8140/223944 (epoch 1/3), loss = 3.780145 (0.130 sec/batch), lr: 0.001000
2019-05-28 04:18:03: step 8160/223944 (epoch 1/3), loss = 5.233938 (0.171 sec/batch), lr: 0.001000
2019-05-28 04:18:07: step 8180/223944 (epoch 1/3), loss = 4.835152 (0.270 sec/batch), lr: 0.001000
2019-05-28 04:18:12: step 8200/223944 (epoch 1/3), loss = 4.904494 (0.194 sec/batch), lr: 0.001000
2019-05-28 04:18:15: step 8220/223944 (epoch 1/3), loss = 4.622981 (0.151 sec/batch), lr: 0.001000
2019-05-28 04:18:19: step 8240/223944 (epoch 1/3), loss = 5.732751 (0.213 sec/batch), lr: 0.001000
2019-05-28 04:18:23: step 8260/223944 (epoch 1/3), loss = 5.265625 (0.265 sec/batch), lr: 0.001000
2019-05-28

2019-05-28 04:23:13: step 9740/223944 (epoch 1/3), loss = 5.214248 (0.286 sec/batch), lr: 0.001000
2019-05-28 04:23:17: step 9760/223944 (epoch 1/3), loss = 5.054987 (0.252 sec/batch), lr: 0.001000
2019-05-28 04:23:21: step 9780/223944 (epoch 1/3), loss = 3.018207 (0.111 sec/batch), lr: 0.001000
2019-05-28 04:23:25: step 9800/223944 (epoch 1/3), loss = 5.018272 (0.382 sec/batch), lr: 0.001000
2019-05-28 04:23:29: step 9820/223944 (epoch 1/3), loss = 5.250906 (0.100 sec/batch), lr: 0.001000
2019-05-28 04:23:33: step 9840/223944 (epoch 1/3), loss = 4.964127 (0.218 sec/batch), lr: 0.001000
2019-05-28 04:23:38: step 9860/223944 (epoch 1/3), loss = 4.443533 (0.250 sec/batch), lr: 0.001000
2019-05-28 04:23:42: step 9880/223944 (epoch 1/3), loss = 3.038129 (0.218 sec/batch), lr: 0.001000
2019-05-28 04:23:46: step 9900/223944 (epoch 1/3), loss = 3.399281 (0.161 sec/batch), lr: 0.001000
2019-05-28 04:23:49: step 9920/223944 (epoch 1/3), loss = 4.698596 (0.118 sec/batch), lr: 0.001000
2019-05-28

RuntimeError: CUDA out of memory. Tried to allocate 1.30 GiB (GPU 0; 15.75 GiB total capacity; 13.72 GiB already allocated; 798.88 MiB free; 280.86 MiB cached)

In [2]:
!pip install git+https://github.com/tagucci/pythonrouge.git

Collecting git+https://github.com/tagucci/pythonrouge.git
  Cloning https://github.com/tagucci/pythonrouge.git to /tmp/pip-req-build-wo3fnghr
Building wheels for collected packages: pythonrouge
  Running setup.py bdist_wheel for pythonrouge ... [?25ldone
[?25h  Stored in directory: /tmp/pip-ephem-wheel-cache-jaelxltv/wheels/fd/ff/be/6716935d513fa8656ab185cb0aa70aed382b72dda42bf09c95
Successfully built pythonrouge
[31msagemaker 1.18.5 has requirement requests<2.21,>=2.20.0, but you'll have requests 2.21.0 which is incompatible.[0m
[31mdocker-compose 1.23.2 has requirement requests!=2.11.0,!=2.12.2,!=2.18.0,<2.21,>=2.6.1, but you'll have requests 2.21.0 which is incompatible.[0m
Installing collected packages: pythonrouge
Successfully installed pythonrouge-0.2
[33mYou are using pip version 10.0.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [1]:
!nvidia-smi

Tue Apr 16 15:58:04 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   36C    P0    25W / 300W |      0MiB / 16130MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage    

In [None]:
!kill -9 2480


In [None]:
sudo kill -9 6154 // sudo kill -9 6154