# Model Building

In this notebook we will build with different deep learning based models for text summarization. In particular, the architecture used here is an encoder-decoder type of network built using LSTM layers. Refer to the literature review notebook for further details on this architecture.

In this notebook we will experiment with various settings such as number of hidden dimensions, dropout, size of training data vocabulary, number of LSTM layers, etc.

We will use Pytorch to train the models and Tensorboard (integrated with Pytorch) for visualization.

## Mount Google Drive and Import Libraries

In [1]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import sys
import os
import torch
# for auto-reloading external modules (automatically reloads before using an imported module)
# %load_ext autoreload
# %autoreload 2

#To ensure that the Colab Python interpreter can load Python files from within
PATH_NAME = os.path.join('/', 'content', 'drive', 'My Drive', 'Colab Notebooks', 'UCSDX_MLE_Bootcamp', 'Text_Summarization_UCSD', 'ModelBuilding')
sys.path.append(os.path.join(PATH_NAME, 'src'))
print(sys.path)
%cd $PATH_NAME

print(f'Torch version {torch.__version__}') #1.8.1+cu101

['', '/content', '/env/python', '/usr/lib/python37.zip', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/dist-packages', '/usr/lib/python3/dist-packages', '/usr/local/lib/python3.7/dist-packages/IPython/extensions', '/root/.ipython', '/content/drive/My Drive/Colab Notebooks/UCSDX_MLE_Bootcamp/Text_Summarization_UCSD/ModelBuilding/src']
/content/drive/My Drive/Colab Notebooks/UCSDX_MLE_Bootcamp/Text_Summarization_UCSD/ModelBuilding
Torch version 1.8.1+cu101


In [39]:
!git config --global user.name “[Amit Patel]”
!git config --global user.email “[amitpatel.gt@gmail.com]”
!git config --global color.ui auto
!git config -l

user.name=“[Amit
user.email=“[amitpatel.gt@gmail.com]”
color.ui=auto
core.repositoryformatversion=0
core.filemode=true
core.bare=false
core.logallrefupdates=true
remote.origin.url=https://github.com/amitp-ai/Text_Summarization_UCSD.git
remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
branch.main.remote=origin
branch.main.merge=refs/heads/main


In [40]:
!git status

On branch model_building
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	[31mmodified:   ../LiteratureSurvey/Literature_Survey.ipynb[m
	[31mmodified:   ModelBuilding_step8.ipynb[m
	[31mmodified:   logs/evaluate.log[m
	[31mmodified:   logs/train.log[m
	[31mmodified:   src/models.py[m
	[31mmodified:   src/train.py[m
	[31mmodified:   ../tests/test_ModelBuilding1.py[m

no changes added to commit (use "git add" and/or "git commit -a")


In [3]:
!nvidia-smi

Sat Apr 24 21:40:38 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   31C    P0    23W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Load Data and Utility Functions
We will use cpc_codes 'de' from the BigPatent dataset

In [None]:
'''
import utils
data = utils.load_data_numpy(split_type='train', cpc_codes='de', fname='data0_np.npz')
for data_np in data:
    print(data_np['data'].shape, data_np['data'][0,0].shape[1], data_np['data'][0,1].shape[1])
    print(data_np['data'][0,1])
    break
del data, data_np
''';

### Mini Data: Generate vocabulary, word2idx, idx2word, and numpy array

Need to do this as the vocabulary for the full dataset is too large for quick prototying and debugging.

But try with both, the full vocabulary for the de dataset as well as the vocabulary created from the mini training set.

## LSTM Based Encoder-Decoder

For further details:-

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/

 http://www.abigailsee.com/2017/04/16/taming-rnns-for-better-summarization.html

In [4]:
!pip install rouge

Collecting rouge
  Downloading https://files.pythonhosted.org/packages/43/cc/e18e33be20971ff73a056ebdb023476b5a545e744e3fc22acd8c758f1e0d/rouge-1.0.0-py3-none-any.whl
Installing collected packages: rouge
Successfully installed rouge-1.0.0


In [5]:
#testing
!python -m pytest -s ../tests/

platform linux -- Python 3.7.10, pytest-3.6.4, py-1.10.0, pluggy-0.7.1
rootdir: /content/drive/My Drive/Colab Notebooks/UCSDX_MLE_Bootcamp/Text_Summarization_UCSD, inifile:
plugins: typeguard-2.7.1
collected 3 items                                                              [0m

../tests/test_ModelBuilding1.py ...



### Speed Difference: Storing Data on Gdrive vs Locally on the GCP VM
Conclusion: No difference in speed was observed

In [None]:
!ls

images	ModelBuilding_step8.ipynb		  __pycache__  saved_models
logs	Model_Experimentation_step7_14-8-1.ipynb  runs	       src


In [None]:
%%timeit -r 1 -n 1
#From GDrive
''' MODEL_DELETE: 
'''
!python ./src/train.py --hiddenDim 200 --numLayers 2 --batchSize 64 --numEpochs 100 --lr 3e-3 \
                        --savedModelDir './saved_models/MODEL_DELETE' --printEveryIters 200 --tbDescr 'MODEL_DELETE' \
                        --modelType 'models.Seq2SeqwithAttention' --loadBestModel False --toTrain True --dropout 0.2 \
                        --fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.0 --beamSize 0

In [None]:
!mkdir '/content/Text_Summarization_UCSD'
!mkdir '/content/Text_Summarization_UCSD/ModelBuilding'
!mkdir '/content/Text_Summarization_UCSD/DataWrangling'
!mkdir '/content/Text_Summarization_UCSD/ModelBuilding/logs'
!mkdir '/content/Text_Summarization_UCSD/ModelBuilding/runs'
!mkdir '/content/Text_Summarization_UCSD/ModelBuilding/runs/seq2seqWithAtten'
!mkdir '/content/Text_Summarization_UCSD/ModelBuilding/saved_models'
!cp -r '../../Text_Summarization_UCSD/ModelBuilding/src' '/content/Text_Summarization_UCSD/ModelBuilding/'
!cp -r '../../Text_Summarization_UCSD/DataWrangling/bigPatentPreprocessedData' '/content/Text_Summarization_UCSD/DataWrangling/'
!cp -r ../../Text_Summarization_UCSD/DataWrangling/*.json /content/Text_Summarization_UCSD/DataWrangling/
!ls /content

drive  sample_data  Text_Summarization_UCSD


In [None]:
sys.path.pop() #remove the path in Gdrive
sys.path.append('/content/Text_Summarization_UCSD/ModelBuilding/src')
sys.path
%cd '/content/Text_Summarization_UCSD/ModelBuilding'
!ls

/content/Text_Summarization_UCSD/ModelBuilding
src


In [None]:
'''
Change input_path in utils.load_data_string() and load_data_numpy()
'''

In [None]:
%%timeit -r 1 -n 1
#From GCP VM
''' MODEL_DELETE: 
'''
!python ./src/train.py --hiddenDim 200 --numLayers 2 --batchSize 64 --numEpochs 100 --lr 3e-3 \
                        --savedModelDir './saved_models/MODEL_DELETE' --printEveryIters 200 --tbDescr 'MODEL_DELETE' \
                        --modelType 'models.Seq2SeqwithAttention' --loadBestModel False --toTrain True --dropout 0.2 \
                        --fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.0 --beamSize 0

### Seq2Seq: lr=0.004, dropout=0.0, hiddim=200, numlyrs=2, full-de-vocab, train_size=128, val_size=16

In [None]:
%%timeit -r 1 -n 1
#test above trained model with beamsize=5 
!python ./src/train.py --hiddenDim 200 --numLayers 2 --batchSize 64 --numEpochs 700 --lr 2e-3 --savedModelDir './saved_models/seq2seq_200hid_2lyrs' \
                        --printEveryIters 4800 --tbDescr 'dropout-0_hiddim-200_numlyrs-2_full-de-data' \
                        --modelType 'models.Seq2Seq' --loadBestModel False --toTrain True
#but there is no improvement in rouge scores vs no beam search (greedy search seems to be ok for a well trained model)

Getting the training data...
Size of description vocab is 36828 and abstract vocab is 10769
tcmalloc: large alloc 2406391808 bytes == 0x55912d3c0000 @  0x7fb2f175c1e7 0x5590b2ec0f48 0x7fb2c82ce53e 0x7fb2c82cecd9 0x7fb2c82cefaf 0x7fb2c82cc4b4 0x5590b2e8f0e4 0x5590b2e8ede0 0x5590b2f036f5 0x5590b2e9069a 0x5590b2efec9e 0x5590b2e9069a 0x5590b2efec9e 0x5590b2e9069a 0x5590b2efec9e 0x5590b2e9069a 0x5590b2efec9e 0x5590b2efdb0e 0x5590b2dcfe2b 0x5590b2f001e6 0x5590b2efde0d 0x5590b2dcfe2b 0x5590b2f001e6 0x5590b2efde0d 0x5590b2e9077a 0x5590b2eff86a 0x5590b2f81858 0x5590b2efeee2 0x5590b2efdb0e 0x5590b2e9077a 0x5590b2eff86a
max length (before adding stop token) in mini_df.description is 3943 and in mini_df.abstract (before adding start/stop tokens) is 147
(128, 4)
max length (before adding stop token) in mini_df.description is 3993 and in mini_df.abstract (before adding start/stop tokens) is 140
(16, 4)
Data shape is: torch.Size([128, 4000]), torch.Size([128, 150]), torch.Size([128])
Total data size 

In [None]:
%%timeit -r 1 -n 1
#without attention
#test above trained model with beamsize=5 
!python ./src/train.py --hiddenDim 200 --numLayers 2 --batchSize 64 --numEpochs 700 --lr 2e-3 --savedModelDir './saved_models/seq2seq_200hid_2lyrs' \
                        --printEveryIters 4800 --tbDescr 'dropout-0_hiddim-200_numlyrs-2_full-de-data' \
                        --modelType 'models.Seq2Seq' --loadBestModel True --toTrain False
#but there is no improvement in rouge scores vs no beam search (greedy search seems to be ok for a well trained model)

### Seq2Seq with Atten: lr=0.004, dropout=0.1, hiddim=200, numlyrs=2, full-de-vocab, train_size=128, val_size=16

In [None]:
%%timeit -r 1 -n 1
#with attention
!python ./src/train.py --hiddenDim 200 --numLayers 2 --batchSize 64 --numEpochs 3000 --lr 4e-3 --savedModelDir './saved_models/seq2seq_withAtten_200hid_2lyrs' \
                        --printEveryIters 400 --tbDescr 'seq2seq_withAtten_dropout-0p1_hiddim-200_numlyrs-2_full-de-vocab' \
                        --modelType 'models.Seq2SeqwithAttention' --loadBestModel False --toTrain True \
                        --dropout 0.1

Getting the training data...
Size of description vocab is 36828 and abstract vocab is 10769
tcmalloc: large alloc 2406391808 bytes == 0x558ffe044000 @  0x7fec669a51e7 0x558f8340ff48 0x7fec3d51753e 0x7fec3d517cd9 0x7fec3d517faf 0x7fec3d5154b4 0x558f833de0e4 0x558f833ddde0 0x558f834526f5 0x558f833df69a 0x558f8344dc9e 0x558f833df69a 0x558f8344dc9e 0x558f833df69a 0x558f8344dc9e 0x558f833df69a 0x558f8344dc9e 0x558f8344cb0e 0x558f8331ee2b 0x558f8344f1e6 0x558f8344ce0d 0x558f8331ee2b 0x558f8344f1e6 0x558f8344ce0d 0x558f833df77a 0x558f8344e86a 0x558f834d0858 0x558f8344dee2 0x558f8344cb0e 0x558f833df77a 0x558f8344e86a
max length (before adding stop token) in mini_df.description is 3943 and in mini_df.abstract (before adding start/stop tokens) is 147
(128, 4)
max length (before adding stop token) in mini_df.description is 3993 and in mini_df.abstract (before adding start/stop tokens) is 140
(16, 4)
Data shape is: torch.Size([128, 4000]), torch.Size([128, 150]), torch.Size([128])
Total data size 

Training Loss |Training Data Rouge | Validation Data Rouge
--- | --- | ---
![](https://drive.google.com/uc?export=view&id=1d26RtIVDwygkh3A_Lq3v4eFR-RCK3ZbA) | ![](https://drive.google.com/uc?export=view&id=1xMBiCByW-57N7i-_Brd8o3Rmnc-pJFnA) | ![](https://drive.google.com/uc?export=view&id=1q4GQOM8M7fsxe4qaahFOaUEqDKqO6Xel)
| Dark Blue: Rouge-1, Red: Rouge-2, Light Blue: Rouge-l | Pink: Rouge-1, Green: Rouge-2, Gray: Rouge-l


Note that the initial loss will be approximately -log(abstract_vocab_size) because the model is randomly initialized.

Best checkpoint at 1200: Rouge-1 is 0.2941, Rouge-2 is 0.0498, and Rouge-l is 0.2024


### Seq2Seq with Atten: lr=0.004, dropout=0.4, hiddim=200, numlyrs=2, full-de-vocab, train_size=128, val_size=16

In [None]:
%%timeit -r 1 -n 1
#with attention
!python ./src/train.py --hiddenDim 200 --numLayers 2 --batchSize 64 --numEpochs 3000 --lr 4e-3 --savedModelDir './saved_models/seq2seq_withAtten_200hid_2lyrs_0p4dropout' \
                        --printEveryIters 400 --tbDescr 'seq2seq_withAtten_dropout-0p4_hiddim-200_numlyrs-2_full-de-vocab' \
                        --modelType 'models.Seq2SeqwithAttention' --loadBestModel False --toTrain True \
                        --dropout 0.4

Getting the training data...
Size of description vocab is 36828 and abstract vocab is 10769
tcmalloc: large alloc 2406391808 bytes == 0x55c6459b2000 @  0x7f66719d51e7 0x55c5caf76f48 0x7f664854753e 0x7f6648547cd9 0x7f6648547faf 0x7f66485454b4 0x55c5caf450e4 0x55c5caf44de0 0x55c5cafb96f5 0x55c5caf4669a 0x55c5cafb4c9e 0x55c5caf4669a 0x55c5cafb4c9e 0x55c5caf4669a 0x55c5cafb4c9e 0x55c5caf4669a 0x55c5cafb4c9e 0x55c5cafb3b0e 0x55c5cae85e2b 0x55c5cafb61e6 0x55c5cafb3e0d 0x55c5cae85e2b 0x55c5cafb61e6 0x55c5cafb3e0d 0x55c5caf4677a 0x55c5cafb586a 0x55c5cb037858 0x55c5cafb4ee2 0x55c5cafb3b0e 0x55c5caf4677a 0x55c5cafb586a
max length (before adding stop token) in mini_df.description is 3943 and in mini_df.abstract (before adding start/stop tokens) is 147
(128, 4)
max length (before adding stop token) in mini_df.description is 3993 and in mini_df.abstract (before adding start/stop tokens) is 140
(16, 4)
Data shape is: torch.Size([128, 4000]), torch.Size([128, 150]), torch.Size([128])
Total data size 

Training Loss |Training Data Rouge | Validation Data Rouge
--- | --- | ---
![](https://drive.google.com/uc?export=view&id=1dX5GYaPXYXegue-OS67ak4sfSyFEV997) | ![](https://drive.google.com/uc?export=view&id=138SJejuTlbeqKcuNbjcek-bbwyILMSiq) | ![](https://drive.google.com/uc?export=view&id=1OplXxb3_m7rE-90CUwJ2w8wPQO6aDq9o)
| Dark Blue: Rouge-1, Red: Rouge-2, Light Blue: Rouge-l | Pink: Rouge-1, Green: Rouge-2, Gray: Rouge-l

Note that the initial loss will be approximately -log(abstract_vocab_size) because the model is randomly initialized.

Best checkpoint at 1200: Rouge-1 is 0.2935, Rouge-2 is 0.0382, and Rouge-l is 0.2242

Didn't see much difference even with dropout of 0.75. 

### Seq2Seq with Atten: lr=0.004, dropout=0.4 and 0.0, hiddim=200, numlyrs=2, full-de-vocab, train_size=1024, val_size=16

Did not notice much improvement in R1/R2 scores

### Seq2Seq with Atten: lr=0.004, dropout=0.4, hiddim=200, numlyrs=2, full-de-vocab

#### Experiment

In [None]:
%%timeit -r 1 -n 1
''' MODEL_1: 
--hiddenDim 200 --numLayers 2 --batchSize 64 --numEpochs 3000 --lr 4e-3 \
--savedModelDir './saved_models/MODEL_1' --printEveryIters 400 --tbDescr 'MODEL_1' \
--modelType 'models.Seq2SeqwithAttention' --loadBestModel False --toTrain True --dropout 0.4 \
--fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.0 --beamSize 0
'''

!python ./src/train.py --hiddenDim 200 --numLayers 2 --batchSize 64 --numEpochs 3000 --lr 4e-3 \
                        --savedModelDir './saved_models/MODEL_1' --printEveryIters 400 --tbDescr 'MODEL_1' \
                        --modelType 'models.Seq2SeqwithAttention' --loadBestModel False --toTrain True --dropout 0.4 \
                        --fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.0 --beamSize 0

Namespace(batchSize=64, dropout=0.4, fullVocab=True, hiddenDim=200, loadBestModel=False, lr=0.004, modelType='models.Seq2SeqwithAttention', numEpochs=3000, numLayers=2, printEveryIters=400, savedModelDir='./saved_models/MODEL_1', seed=0, tbDescr='MODEL_1', toTrain=True, trainSize=128, valSize=16)
Getting the training and validation data...
tcmalloc: large alloc 2406391808 bytes == 0x561ecc4ea000 @  0x7f2554b691e7 0x561e51f65f48 0x7f252b6db53e 0x7f252b6dbcd9 0x7f252b6dbfaf 0x7f252b6d94b4 0x561e51f340e4 0x561e51f33de0 0x561e51fa86f5 0x561e51f3569a 0x561e51fa3c9e 0x561e51f3569a 0x561e51fa3c9e 0x561e51f3569a 0x561e51fa3c9e 0x561e51f3569a 0x561e51fa3c9e 0x561e51fa2b0e 0x561e51e74e2b 0x561e51fa51e6 0x561e51fa2e0d 0x561e51e74e2b 0x561e51fa51e6 0x561e51fa2e0d 0x561e51f3577a 0x561e51fa486a 0x561e52026858 0x561e51fa3ee2 0x561e51fa2b0e 0x561e51f3577a 0x561e51fa486a
max length (before adding stop token) in mini_df.description is 3943 and in mini_df.abstract (before adding start/stop tokens) is 147

#### Results

In [None]:
%%timeit -r 1 -n 1
#MODEL_1 evaluation using best model
!python ./src/train.py --hiddenDim 200 --numLayers 2 --batchSize 64 --numEpochs 3000 --lr 4e-3 \
                        --savedModelDir './saved_models/MODEL_1' --printEveryIters 400 --tbDescr 'MODEL_1' \
                        --modelType 'models.Seq2SeqwithAttention' --loadBestModel True --toTrain False --dropout 0.4 \
                        --fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.0 --beamSize 0

Namespace(batchSize=64, beamSize=0, dropout=0.4, fullVocab=True, hiddenDim=200, loadBestModel=True, lr=0.004, modelType='models.Seq2SeqwithAttention', numEpochs=3000, numLayers=2, printEveryIters=400, savedModelDir='./saved_models/MODEL_1', seed=0, tbDescr='MODEL_1', tfThresh=0.0, toTrain=False, trainSize=128, valSize=16)
Getting the training and validation data...
tcmalloc: large alloc 2406391808 bytes == 0x55779a532000 @  0x7f7ac65921e7 0x55771fb42f48 0x7f7a9d10453e 0x7f7a9d104cd9 0x7f7a9d104faf 0x7f7a9d1024b4 0x55771fb110e4 0x55771fb10de0 0x55771fb856f5 0x55771fb1269a 0x55771fb80c9e 0x55771fb1269a 0x55771fb80c9e 0x55771fb1269a 0x55771fb80c9e 0x55771fb1269a 0x55771fb80c9e 0x55771fb7fb0e 0x55771fa51e2b 0x55771fb821e6 0x55771fb7fe0d 0x55771fa51e2b 0x55771fb821e6 0x55771fb7fe0d 0x55771fb1277a 0x55771fb8186a 0x55771fc03858 0x55771fb80ee2 0x55771fb7fb0e 0x55771fb1277a 0x55771fb8186a
max length (before adding stop token) in mini_df.description is 3943 and in mini_df.abstract (before adding

Training Loss |Training Data Rouge | Validation Data Rouge
--- | --- | ---
![](https://drive.google.com/uc?export=view&id=1223BCcnrvhqmLikji7XxthSLdIilB78t) | ![](https://drive.google.com/uc?export=view&id=10HWKJoHLv0fO5VY1Y1s09cggZnQSrUeP) | ![](https://drive.google.com/uc?export=view&id=12EubeiW8liHUHx997qZp4OQUK8KxdgqS)
| Pink: Rouge-1, Teal: Rouge-2, Gray: Rouge-l | Orange: Rouge-1, Blue: Rouge-2, Red: Rouge-l

Best checkpoint at 1200: Rouge-1 is 0.2909, Rouge-2 is 0.0358, and Rouge-l is 0.1944

### Seq2Seq with Atten: lr=0.004, dropout=0.4, hiddim=200, numlyrs=2, full-de-vocab (same as Model1 but with attention layer properly implemented by fixing enc and dec mask)

#### Experiment

In [None]:
%%timeit -r 1 -n 1
''' MODEL_1B: 
--hiddenDim 200 --numLayers 2 --batchSize 64 --numEpochs 3000 --lr 4e-3 \
--savedModelDir './saved_models/MODEL_1B' --printEveryIters 400 --tbDescr 'MODEL_1B' \
--modelType 'models.Seq2SeqwithAttention' --loadBestModel False --toTrain True --dropout 0.4 \
--fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.0 --beamSize 0
'''

!python ./src/train.py --hiddenDim 200 --numLayers 2 --batchSize 64 --numEpochs 3000 --lr 4e-3 \
                        --savedModelDir './saved_models/MODEL_1B' --printEveryIters 400 --tbDescr 'MODEL_1B' \
                        --modelType 'models.Seq2SeqwithAttention' --loadBestModel False --toTrain True --dropout 0.4 \
                        --fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.0 --beamSize 0

Namespace(batchSize=64, beamSize=0, dropout=0.4, fullVocab=True, hiddenDim=200, loadBestModel=False, lr=0.004, modelType='models.Seq2SeqwithAttention', numEpochs=3000, numLayers=2, printEveryIters=400, savedModelDir='./saved_models/MODEL_1B', seed=0, tbDescr='MODEL_1B', tfThresh=0.0, toTrain=True, trainSize=128, valSize=16)
Getting the training and validation data...
tcmalloc: large alloc 2406391808 bytes == 0x55b2d327e000 @  0x7ff816ec11e7 0x55b257caff48 0x7ff7eda3353e 0x7ff7eda33cd9 0x7ff7eda33faf 0x7ff7eda314b4 0x55b257c7e0e4 0x55b257c7dde0 0x55b257cf26f5 0x55b257c7f69a 0x55b257cedc9e 0x55b257c7f69a 0x55b257cedc9e 0x55b257c7f69a 0x55b257cedc9e 0x55b257c7f69a 0x55b257cedc9e 0x55b257cecb0e 0x55b257bbee2b 0x55b257cef1e6 0x55b257cece0d 0x55b257bbee2b 0x55b257cef1e6 0x55b257cece0d 0x55b257c7f77a 0x55b257cee86a 0x55b257d70858 0x55b257cedee2 0x55b257cecb0e 0x55b257c7f77a 0x55b257cee86a
max length (before adding stop token) in mini_df.description is 3943 and in mini_df.abstract (before addi


#### Results

In [None]:
%%timeit -r 1 -n 1
#Evaluation Model1B
!python ./src/train.py --hiddenDim 200 --numLayers 2 --batchSize 64 --numEpochs 3000 --lr 4e-3 \
                        --savedModelDir './saved_models/MODEL_1B' --printEveryIters 400 --tbDescr 'MODEL_1B' \
                        --modelType 'models.Seq2SeqwithAttention' --loadBestModel True --toTrain False --dropout 0.4 \
                        --fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.0 --beamSize 0

Namespace(batchSize=64, beamSize=0, dropout=0.4, fullVocab=True, hiddenDim=200, loadBestModel=True, lr=0.004, modelType='models.Seq2SeqwithAttention', numEpochs=3000, numLayers=2, printEveryIters=400, savedModelDir='./saved_models/MODEL_1B', seed=0, tbDescr='MODEL_1B', tfThresh=0.0, toTrain=False, trainSize=128, valSize=16)
Getting the training and validation data...
tcmalloc: large alloc 2406391808 bytes == 0x5643c046a000 @  0x7f5a6bd9b1e7 0x56434517cf48 0x7f5a4290d53e 0x7f5a4290dcd9 0x7f5a4290dfaf 0x7f5a4290b4b4 0x56434514b0e4 0x56434514ade0 0x5643451bf6f5 0x56434514c69a 0x5643451bac9e 0x56434514c69a 0x5643451bac9e 0x56434514c69a 0x5643451bac9e 0x56434514c69a 0x5643451bac9e 0x5643451b9b0e 0x56434508be2b 0x5643451bc1e6 0x5643451b9e0d 0x56434508be2b 0x5643451bc1e6 0x5643451b9e0d 0x56434514c77a 0x5643451bb86a 0x56434523d858 0x5643451baee2 0x5643451b9b0e 0x56434514c77a 0x5643451bb86a
max length (before adding stop token) in mini_df.description is 3943 and in mini_df.abstract (before addi

Training Loss |Training Data Rouge | Validation Data Rouge
--- | --- | ---
![](https://drive.google.com/uc?export=view&id=1Nhdd_63qZhLeZskmO8du9zFAoIwukuUC) | ![](https://drive.google.com/uc?export=view&id=1wySXpDRU3aLaXGyykwZxKjW0X4IGs3fx) | ![](https://drive.google.com/uc?export=view&id=1RW_kCTmM_xksgvA1V7xoRq2cG0QhSEfU)
| Red: Rouge-1, Blue: Rouge-2, Pink: Rouge-l | Teal: Rouge-1, Gray: Rouge-2, Orange: Rouge-l

Best checkpoint at 2000: Rouge-1 is 0.2695, Rouge-2 is 0.0450, and Rouge-l is 0.1773

### Seq2Seq with Atten: lr=0.004, dropout=0.4, hiddim=200, numlyrs=2, full-de-vocab and teacher forcing only 70% of the time during training

#### Experiment

In [None]:
%%timeit -r 1 -n 1
''' MODEL_3: 
--hiddenDim 200 --numLayers 2 --batchSize 16 --numEpochs 3000 --lr 3e-3 \
--savedModelDir './saved_models/MODEL_3' --printEveryIters 500 --tbDescr 'MODEL_3' \
--modelType 'models.Seq2SeqwithAttention' --loadBestModel False --toTrain True --dropout 0.4 \
--fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.3 --beamSize 0
'''

!python ./src/train.py --hiddenDim 200 --numLayers 2 --batchSize 16 --numEpochs 3000 --lr 3e-3 \
                        --savedModelDir './saved_models/MODEL_3' --printEveryIters 500 --tbDescr 'MODEL_3' \
                        --modelType 'models.Seq2SeqwithAttention' --loadBestModel False --toTrain True --dropout 0.4 \
                        --fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.3 --beamSize 0

Namespace(batchSize=16, beamSize=0, dropout=0.4, fullVocab=True, hiddenDim=200, loadBestModel=False, lr=0.003, modelType='models.Seq2SeqwithAttention', numEpochs=3000, numLayers=2, printEveryIters=500, savedModelDir='./saved_models/MODEL_3', seed=0, tbDescr='MODEL_3', tfThresh=0.3, toTrain=True, trainSize=128, valSize=16)
Getting the training and validation data...
tcmalloc: large alloc 2406391808 bytes == 0x55fb23e28000 @  0x7fdc3547d1e7 0x55faa914ff48 0x7fdc0bfef53e 0x7fdc0bfefcd9 0x7fdc0bfeffaf 0x7fdc0bfed4b4 0x55faa911e0e4 0x55faa911dde0 0x55faa91926f5 0x55faa911f69a 0x55faa918dc9e 0x55faa911f69a 0x55faa918dc9e 0x55faa911f69a 0x55faa918dc9e 0x55faa911f69a 0x55faa918dc9e 0x55faa918cb0e 0x55faa905ee2b 0x55faa918f1e6 0x55faa918ce0d 0x55faa905ee2b 0x55faa918f1e6 0x55faa918ce0d 0x55faa911f77a 0x55faa918e86a 0x55faa9210858 0x55faa918dee2 0x55faa918cb0e 0x55faa911f77a 0x55faa918e86a
max length (before adding stop token) in mini_df.description is 3943 and in mini_df.abstract (before adding

#### Results

In [None]:
#Model3 best model's evaluation
!python ./src/train.py --hiddenDim 200 --numLayers 2 --batchSize 16 --numEpochs 3000 --lr 3e-3 \
                        --savedModelDir './saved_models/MODEL_3' --printEveryIters 500 --tbDescr 'MODEL_3' \
                        --modelType 'models.Seq2SeqwithAttention' --loadBestModel True --toTrain False --dropout 0.4 \
                        --fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.3 --beamSize 0

Namespace(batchSize=16, beamSize=0, dropout=0.4, fullVocab=True, hiddenDim=200, loadBestModel=True, lr=0.003, modelType='models.Seq2SeqwithAttention', numEpochs=3000, numLayers=2, printEveryIters=500, savedModelDir='./saved_models/MODEL_3', seed=0, tbDescr='MODEL_3', tfThresh=0.3, toTrain=False, trainSize=128, valSize=16)
Getting the training and validation data...
tcmalloc: large alloc 2406391808 bytes == 0x55fadc97c000 @  0x7f07536381e7 0x55fa62af9f48 0x7f072a1aa53e 0x7f072a1aacd9 0x7f072a1aafaf 0x7f072a1a84b4 0x55fa62ac80e4 0x55fa62ac7de0 0x55fa62b3c6f5 0x55fa62ac969a 0x55fa62b37c9e 0x55fa62ac969a 0x55fa62b37c9e 0x55fa62ac969a 0x55fa62b37c9e 0x55fa62ac969a 0x55fa62b37c9e 0x55fa62b36b0e 0x55fa62a08e2b 0x55fa62b391e6 0x55fa62b36e0d 0x55fa62a08e2b 0x55fa62b391e6 0x55fa62b36e0d 0x55fa62ac977a 0x55fa62b3886a 0x55fa62bba858 0x55fa62b37ee2 0x55fa62b36b0e 0x55fa62ac977a 0x55fa62b3886a
max length (before adding stop token) in mini_df.description is 3943 and in mini_df.abstract (before adding

Training Loss |Training Data Rouge | Validation Data Rouge
--- | --- | ---
![](https://drive.google.com/uc?export=view&id=1lUTgOaOQGfVKERIXvRXgPXDyMEb7CkSH) | ![](https://drive.google.com/uc?export=view&id=1OIh1Zwl3Ojo0j0rykMatgrZzddH6yPHS) | ![](https://drive.google.com/uc?export=view&id=13C36MB6ZHdyjXf4Km9dt7MDYa79jf78X)
| Teal: Rouge-1, Gray: Rouge-2, Orange: Rouge-l | Dark Blue: Rouge-1, Red: Rouge-2, Light Blue: Rouge-l

Best checkpoint at 6000: Rouge-1 is 0.2704, Rouge-2 is 0.0312, and Rouge-l is 0.1798

### Seq2Seq with Atten: lr=0.004, dropout=0.4, hiddim=200, numlyrs=2, full-de-vocab and teacher forcing only 70% of the time (finetuning from model 1).

Continue training from the best checkpoint of Model 1

In [None]:
# !cp -r saved_models/MODEL_1 saved_models/MODEL_2

In [None]:
%%timeit -r 1 -n 1
''' MODEL_2: 
--hiddenDim 200 --numLayers 2 --batchSize 16 --numEpochs 500 --lr 6e-3 \
--savedModelDir './saved_models/MODEL_2' --printEveryIters 100 --tbDescr 'MODEL_2' \
--modelType 'models.Seq2SeqwithAttention' --loadBestModel True --toTrain True --dropout 0.4 \
--fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.3 --beamSize 0
'''

!python ./src/train.py --hiddenDim 200 --numLayers 2 --batchSize 16 --numEpochs 500 --lr 6e-3 \
                        --savedModelDir './saved_models/MODEL_2' --printEveryIters 100 --tbDescr 'MODEL_2' \
                        --modelType 'models.Seq2SeqwithAttention' --loadBestModel True --toTrain True --dropout 0.4 \
                        --fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.3 --beamSize 0

Namespace(batchSize=16, beamSize=0, dropout=0.4, fullVocab=True, hiddenDim=200, loadBestModel=True, lr=0.006, modelType='models.Seq2SeqwithAttention', numEpochs=500, numLayers=2, printEveryIters=100, savedModelDir='./saved_models/MODEL_2', seed=0, tbDescr='MODEL_2', tfThresh=0.3, toTrain=True, trainSize=128, valSize=16)
Getting the training and validation data...
tcmalloc: large alloc 2406391808 bytes == 0x563b7e012000 @  0x7f166d0f11e7 0x563b02c69f48 0x7f1643c6353e 0x7f1643c63cd9 0x7f1643c63faf 0x7f1643c614b4 0x563b02c380e4 0x563b02c37de0 0x563b02cac6f5 0x563b02c3969a 0x563b02ca7c9e 0x563b02c3969a 0x563b02ca7c9e 0x563b02c3969a 0x563b02ca7c9e 0x563b02c3969a 0x563b02ca7c9e 0x563b02ca6b0e 0x563b02b78e2b 0x563b02ca91e6 0x563b02ca6e0d 0x563b02b78e2b 0x563b02ca91e6 0x563b02ca6e0d 0x563b02c3977a 0x563b02ca886a 0x563b02d2a858 0x563b02ca7ee2 0x563b02ca6b0e 0x563b02c3977a 0x563b02ca886a
max length (before adding stop token) in mini_df.description is 3943 and in mini_df.abstract (before adding s

This does not seem to help much

### Seq2Seq with Atten: lr=0.004, dropout=0.6, hiddim=200, numlyrs=2, full-de-vocab and teacher forcing only 70% of the time during training

#### Experiment

In [None]:
%%timeit -r 1 -n 1
''' MODEL_4: 
--hiddenDim 200 --numLayers 2 --batchSize 16 --numEpochs 2000 --lr 3e-3 \
--savedModelDir './saved_models/MODEL_4' --printEveryIters 400 --tbDescr 'MODEL_4' \
--modelType 'models.Seq2SeqwithAttention' --loadBestModel False --toTrain True --dropout 0.6 \
--fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.3 --beamSize 0
'''

!python ./src/train.py --hiddenDim 200 --numLayers 2 --batchSize 16 --numEpochs 2000 --lr 3e-3 \
                        --savedModelDir './saved_models/MODEL_4' --printEveryIters 400 --tbDescr 'MODEL_4' \
                        --modelType 'models.Seq2SeqwithAttention' --loadBestModel False --toTrain True --dropout 0.6 \
                        --fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.3 --beamSize 0

Namespace(batchSize=16, beamSize=0, dropout=0.6, fullVocab=True, hiddenDim=200, loadBestModel=False, lr=0.003, modelType='models.Seq2SeqwithAttention', numEpochs=2000, numLayers=2, printEveryIters=400, savedModelDir='./saved_models/MODEL_4', seed=0, tbDescr='MODEL_4', tfThresh=0.3, toTrain=True, trainSize=128, valSize=16)
Getting the training and validation data...
tcmalloc: large alloc 2406391808 bytes == 0x5652bc4f6000 @  0x7fd3cbb691e7 0x5652418c1f48 0x7fd3a26db53e 0x7fd3a26dbcd9 0x7fd3a26dbfaf 0x7fd3a26d94b4 0x5652418900e4 0x56524188fde0 0x5652419046f5 0x56524189169a 0x5652418ffc9e 0x56524189169a 0x5652418ffc9e 0x56524189169a 0x5652418ffc9e 0x56524189169a 0x5652418ffc9e 0x5652418feb0e 0x5652417d0e2b 0x5652419011e6 0x5652418fee0d 0x5652417d0e2b 0x5652419011e6 0x5652418fee0d 0x56524189177a 0x56524190086a 0x565241982858 0x5652418ffee2 0x5652418feb0e 0x56524189177a 0x56524190086a
max length (before adding stop token) in mini_df.description is 3943 and in mini_df.abstract (before adding

#### Results

In [None]:
%%timeit -r 1 -n 1
#evaluation
!python ./src/train.py --hiddenDim 200 --numLayers 2 --batchSize 16 --numEpochs 2000 --lr 3e-3 \
                        --savedModelDir './saved_models/MODEL_4' --printEveryIters 400 --tbDescr 'MODEL_4' \
                        --modelType 'models.Seq2SeqwithAttention' --loadBestModel True --toTrain False --dropout 0.6 \
                        --fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.3 --beamSize 0

Namespace(batchSize=16, beamSize=0, dropout=0.6, fullVocab=True, hiddenDim=200, loadBestModel=True, lr=0.003, modelType='models.Seq2SeqwithAttention', numEpochs=2000, numLayers=2, printEveryIters=400, savedModelDir='./saved_models/MODEL_4', seed=0, tbDescr='MODEL_4', tfThresh=0.3, toTrain=False, trainSize=128, valSize=16)
Getting the training and validation data...
tcmalloc: large alloc 2406391808 bytes == 0x55c4fdd94000 @  0x7f7c6d6b21e7 0x55c4843fbf48 0x7f7c4422453e 0x7f7c44224cd9 0x7f7c44224faf 0x7f7c442224b4 0x55c4843ca0e4 0x55c4843c9de0 0x55c48443e6f5 0x55c4843cb69a 0x55c484439c9e 0x55c4843cb69a 0x55c484439c9e 0x55c4843cb69a 0x55c484439c9e 0x55c4843cb69a 0x55c484439c9e 0x55c484438b0e 0x55c48430ae2b 0x55c48443b1e6 0x55c484438e0d 0x55c48430ae2b 0x55c48443b1e6 0x55c484438e0d 0x55c4843cb77a 0x55c48443a86a 0x55c4844bc858 0x55c484439ee2 0x55c484438b0e 0x55c4843cb77a 0x55c48443a86a
max length (before adding stop token) in mini_df.description is 3943 and in mini_df.abstract (before adding

Training Loss |Training Data Rouge | Validation Data Rouge
--- | --- | ---
![](https://drive.google.com/uc?export=view&id=1lzbamDmwVUwYI-e9eJ8cAyILiwcT4YE3) | ![](https://drive.google.com/uc?export=view&id=1SoTf8m895V3SHoz3teycJaYecuCMXYPb) | ![](https://drive.google.com/uc?export=view&id=1jFbvlfLUk41sZ9G97nc6ABl2VC6MAvNn)
| Teal: Rouge-1, Gray: Rouge-2, Orange: Rouge-l | Dark Blue: Rouge-1, Red: Rouge-2, Light Blue: Rouge-l

Best checkpoint at 14800: Rouge-1 is 0.2427, Rouge-2 is 0.0372, and Rouge-l is 0.1996

### Transformer based Model 5: Dropout 0.3

#### Experiment

In [35]:
%%timeit -r 1 -n 1
''' MODEL_5: 
--hiddenDim 48 --numLayers 2 --batchSize 6 --numEpochs 500 --lr 3e-3 \
--savedModelDir './saved_models/MODEL_5' --printEveryIters 400 --tbDescr 'MODEL_5' \
--modelType 'models.Seq2SeqwithXfmr' --loadBestModel False --toTrain True --dropout 0.3 \
--fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.0 --beamSize 0
'''
!python ./src/train.py --hiddenDim 48 --numLayers 2 --batchSize 6 --numEpochs 500 --lr 3e-3 \
                        --savedModelDir './saved_models/MODEL_5' --printEveryIters 400 --tbDescr 'MODEL_5' \
                        --modelType 'models.Seq2SeqwithXfmr' --loadBestModel False --toTrain True --dropout 0.3 \
                        --fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.0 --beamSize 0

Namespace(batchSize=6, beamSize=0, dropout=0.3, fullVocab=True, hiddenDim=48, loadBestModel=False, lr=0.003, modelType='models.Seq2SeqwithXfmr', numEpochs=500, numLayers=2, printEveryIters=400, savedModelDir='./saved_models/MODEL_DEL', seed=0, tbDescr='MODEL_DEL', tfThresh=0.0, toTrain=True, trainSize=128, valSize=16)
Getting the training and validation data...
tcmalloc: large alloc 2406391808 bytes == 0x55a3a73ca000 @  0x7f626ed891e7 0x55a32c22ff48 0x7f62458fb53e 0x7f62458fbcd9 0x7f62458fbfaf 0x7f62458f94b4 0x55a32c1fe0e4 0x55a32c1fdde0 0x55a32c2726f5 0x55a32c1ff69a 0x55a32c26dc9e 0x55a32c1ff69a 0x55a32c26dc9e 0x55a32c1ff69a 0x55a32c26dc9e 0x55a32c1ff69a 0x55a32c26dc9e 0x55a32c26cb0e 0x55a32c13ee2b 0x55a32c26f1e6 0x55a32c26ce0d 0x55a32c13ee2b 0x55a32c26f1e6 0x55a32c26ce0d 0x55a32c1ff77a 0x55a32c26e86a 0x55a32c2f0858 0x55a32c26dee2 0x55a32c26cb0e 0x55a32c1ff77a 0x55a32c26e86a
max length (before adding stop token) in mini_df.description is 3943 and in mini_df.abstract (before adding sta

#### Results

In [36]:
%%timeit -r 1 -n 1

!python ./src/train.py --hiddenDim 48 --numLayers 2 --batchSize 6 --numEpochs 500 --lr 3e-3 \
                        --savedModelDir './saved_models/MODEL_5' --printEveryIters 400 --tbDescr 'MODEL_5' \
                        --modelType 'models.Seq2SeqwithXfmr' --loadBestModel True --toTrain False --dropout 0.3 \
                        --fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.0 --beamSize 0

Namespace(batchSize=6, beamSize=0, dropout=0.3, fullVocab=True, hiddenDim=48, loadBestModel=True, lr=0.003, modelType='models.Seq2SeqwithXfmr', numEpochs=500, numLayers=2, printEveryIters=400, savedModelDir='./saved_models/MODEL_5', seed=0, tbDescr='MODEL_5', tfThresh=0.0, toTrain=False, trainSize=128, valSize=16)
Getting the training and validation data...
tcmalloc: large alloc 2406391808 bytes == 0x55cb0da8c000 @  0x7f4fd43cc1e7 0x55ca935b2f48 0x7f4faaf3e53e 0x7f4faaf3ecd9 0x7f4faaf3efaf 0x7f4faaf3c4b4 0x55ca935810e4 0x55ca93580de0 0x55ca935f56f5 0x55ca9358269a 0x55ca935f0c9e 0x55ca9358269a 0x55ca935f0c9e 0x55ca9358269a 0x55ca935f0c9e 0x55ca9358269a 0x55ca935f0c9e 0x55ca935efb0e 0x55ca934c1e2b 0x55ca935f21e6 0x55ca935efe0d 0x55ca934c1e2b 0x55ca935f21e6 0x55ca935efe0d 0x55ca9358277a 0x55ca935f186a 0x55ca93673858 0x55ca935f0ee2 0x55ca935efb0e 0x55ca9358277a 0x55ca935f186a
max length (before adding stop token) in mini_df.description is 3943 and in mini_df.abstract (before adding start/s

### Transformer based Model 6: Dropout 0.6

#### Experiment

In [38]:
%%timeit -r 1 -n 1
''' MODEL_6: 
--hiddenDim 48 --numLayers 2 --batchSize 6 --numEpochs 500 --lr 3e-3 \
--savedModelDir './saved_models/MODEL_6' --printEveryIters 400 --tbDescr 'MODEL_6' \
--modelType 'models.Seq2SeqwithXfmr' --loadBestModel False --toTrain True --dropout 0.6 \
--fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.0 --beamSize 0
'''
!python ./src/train.py --hiddenDim 48 --numLayers 2 --batchSize 6 --numEpochs 500 --lr 3e-3 \
                        --savedModelDir './saved_models/MODEL_6' --printEveryIters 400 --tbDescr 'MODEL_6' \
                        --modelType 'models.Seq2SeqwithXfmr' --loadBestModel False --toTrain True --dropout 0.6 \
                        --fullVocab True --trainSize 128 --valSize 16 --seed 0 --tfThresh 0.0 --beamSize 0

Namespace(batchSize=6, beamSize=0, dropout=0.6, fullVocab=True, hiddenDim=48, loadBestModel=False, lr=0.003, modelType='models.Seq2SeqwithXfmr', numEpochs=500, numLayers=2, printEveryIters=400, savedModelDir='./saved_models/MODEL_6', seed=0, tbDescr='MODEL_6', tfThresh=0.0, toTrain=True, trainSize=128, valSize=16)
Getting the training and validation data...
tcmalloc: large alloc 2406391808 bytes == 0x55d5aa196000 @  0x7f34f8cc61e7 0x55d52f910f48 0x7f34cf83853e 0x7f34cf838cd9 0x7f34cf838faf 0x7f34cf8364b4 0x55d52f8df0e4 0x55d52f8dede0 0x55d52f9536f5 0x55d52f8e069a 0x55d52f94ec9e 0x55d52f8e069a 0x55d52f94ec9e 0x55d52f8e069a 0x55d52f94ec9e 0x55d52f8e069a 0x55d52f94ec9e 0x55d52f94db0e 0x55d52f81fe2b 0x55d52f9501e6 0x55d52f94de0d 0x55d52f81fe2b 0x55d52f9501e6 0x55d52f94de0d 0x55d52f8e077a 0x55d52f94f86a 0x55d52f9d1858 0x55d52f94eee2 0x55d52f94db0e 0x55d52f8e077a 0x55d52f94f86a
max length (before adding stop token) in mini_df.description is 3943 and in mini_df.abstract (before adding start/s

#### Results

In [37]:
%%timeit -r 1 -n 1


UsageError: %%timeit is a cell magic, but the cell body is empty. Did you mean the line magic %timeit (single %)?


### XXX

In [None]:
# import models
# import train

# device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# train_data, val_data, lang_train = train.get_data(use_full_vocab=True, cpc_codes='de', fname='data0_str_json.gz',
#                                                     train_size=128, val_size=16)

# encoder = models.EncoderLSTM(vocab_size=len(lang_train.desc_vocab), hidden_dim=50, num_layers=2, bidir=True)
# decoder = models.DecoderLSTM(vocab_size=len(lang_train.abs_vocab), hidden_dim=50, num_layers=2, bidir=False)
# model = models.Seq2Seq(encoder, decoder)

# train_data.shuffle(2)
# val_data.shuffle(2)

# train.train(model=model, train_data=train_data, val_data=val_data, abs_idx2word=lang_train.abs_idx2word, device=device, 
#             batch_size=128, num_epochs=1, lr=2e-3, print_every_iters=250, tb_descr='zzzdropout-0_hiddim-50_numlyrs-2_full-de-data')

#only do this once you are done with this notebook
# utils.closeLoggerFileHandler(train.logger)
# utils.closeLoggerFileHandler(train.evaluate.logger)

## Visualization Using Tensorboard

In [None]:
%load_ext tensorboard

In [None]:
%tensorboard --logdir='runs'

## Future Experimentation
1. Use glove embeddings to initialize the embeddings layer
2. for decoder use initial value of h and c from encoder output or just h and initialize c with zero?
3. see if can get rid of stop token from the decription
4. use weight in the cross entropy loss proportional to the word counts in the abstract vocabulary
5. Use beam search
- Didn't make much difference. So revisit this byt looking at CS224n lecture slides on this (http://web.stanford.edu/class/cs224n/slides/cs224n-2021-lecture07-nmt.pdf)
6. add attention and make model larger (more lstm layers and increase hidden dim size)
7. use transformers
8. use teacher forcing only 50% of the time when training and not 100%
9. use some of the ideas documented as part of my literature survey
10. Other ideas to try:
- http://blog.echen.me/2012/01/03/introduction-to-conditional-random-fields/
- https://towardsdatascience.com/abstractive-summarization-of-dialogues-f530c7d290be
- https://www.analyticsvidhya.com/blog/2021/02/dialogue-summarization-a-deep-learning-approach/
- https://www.analyticsvidhya.com/blog/2020/11/summarize-twitter-live-data-using-pretrained-nlp-models/