<a href="https://colab.research.google.com/github/aytlee/sentiment140-pytorch-lstm/blob/main/sentiment140_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sentiment Analysis

The objective of this notebook is to gain a better understanding of how to preprocess and apply a LSTM model for sentiment analysis using the Sentiment 140 dataset. This dataset was downloaded from Kaggle for ease of use, and contains 1.6M tweets that were extracted using Twitter's API. Additional information and the original dataset can be found [here](http://help.sentiment140.com/home). The original training dataset is unique in that it was automatically labeled rather than manually labeled. The notebook follows the code/tutorial from:

1. [LSTM Text Classification Using PyTorch](https://towardsdatascience.com/lstm-text-classification-using-pytorch-2c6c657f8fc0) by Raymond Chang
2. [Upgraded Sentiment Analysis](https://github.com/bentrevett/pytorch-sentiment-analysis) by Ben Trevett 


# Importing Libraries 

First, load the dataset and import all of the required libraries. While importing libraries, it was necessary to specify a version of torchtext in order to import the torchtext.legacy module. 

In [None]:
from google.colab import files

# # Install Kaggle library
!pip install -q kaggle

# Upload kaggle API key file
uploaded = files.upload()

Saving kaggle.json to kaggle.json


In [None]:
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json
#! kaggle competitions download -c tabular-playground-series-apr-2022
! mkdir train
! kaggle datasets download -d kazanova/sentiment140 -p /content/train/ --unzip
#! unzip tabular-playground-series-apr-2022.zip -d train

Downloading sentiment140.zip to /content/train
 80% 65.0M/80.9M [00:00<00:00, 148MB/s]
100% 80.9M/80.9M [00:00<00:00, 145MB/s]


In [None]:
# Seems to be an issue with intalling torchtext without specifying version
# cannot import torchtext.legacy module without running into issues 
!pip install torchtext==0.10.0

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torchtext==0.10.0
  Downloading torchtext-0.10.0-cp37-cp37m-manylinux1_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 6.5 MB/s 
Collecting torch==1.9.0
  Downloading torch-1.9.0-cp37-cp37m-manylinux1_x86_64.whl (831.4 MB)
[K     |████████████████████████████████| 831.4 MB 2.7 kB/s 
Installing collected packages: torch, torchtext
  Attempting uninstall: torch
    Found existing installation: torch 1.11.0+cu113
    Uninstalling torch-1.11.0+cu113:
      Successfully uninstalled torch-1.11.0+cu113
  Attempting uninstall: torchtext
    Found existing installation: torchtext 0.12.0
    Uninstalling torchtext-0.12.0:
      Successfully uninstalled torchtext-0.12.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.12.0+

In [None]:
import os 
import pandas as pd 
import numpy as np
from sklearn import model_selection 

import torch
from torchtext.legacy import data
from torchtext.legacy import datasets
import torch.autograd as autograd
import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
#import torch.nn.functional as F
import torch.optim as optim 
#from torch.utils.data import Dataset, DataLoader 

import seaborn as sns 
import matplotlib.pyplot as plt 

from sklearn.model_selection import train_test_split

from multiprocessing import cpu_count
from sklearn.metrics import classification_report, confusion_matrix 

import plotly.graph_objects as go

In [None]:
# TEXT = data.Field()
# LABEL = data.LabelField(dtype = torch.long)
# legacy_train, legacy_test = datasets.IMDB.splits(TEXT, LABEL)  # datasets here refers to torchtext.legacy.datasets

In [None]:
cd train

/content/train


# Exploration of the training dataset 

According to the metadata, there are 6 columns consisting of the target, id, date, flag, user and text for the tweet. Targets are labeled 0 (negative), 2 (neutral) and 4 (positive). The training dataset only includes the negative and positive tweets, while the test dataset (found on the Stanford website) also includes the neutral tweets. 

In addition, the training dataset set only has the value "NO_QUERY" for the column flag, while the test dataset seems to have multiple values for the column. The "flag" column seems to denote some keyword that is associated with the tweet. As this notebook is meant mainly to familiarize myself with NLP using PyTorch and LSTM, I will not be using the test dataset and instead only conducting the model on a split training and validation set. 



In [None]:
train_data = pd.read_csv(r'training.1600000.processed.noemoticon.csv', encoding='latin-1', header=None)

In [None]:
train_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1600000 entries, 0 to 1599999
Data columns (total 6 columns):
 #   Column  Non-Null Count    Dtype 
---  ------  --------------    ----- 
 0   0       1600000 non-null  int64 
 1   1       1600000 non-null  int64 
 2   2       1600000 non-null  object
 3   3       1600000 non-null  object
 4   4       1600000 non-null  object
 5   5       1600000 non-null  object
dtypes: int64(2), object(4)
memory usage: 73.2+ MB


In [None]:
# Add in the column headings
train_data.columns = ['target', 'id', 'date', 'flag', 'user', 'text']

train_data.head()

Unnamed: 0,target,id,date,flag,user,text
0,0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, t..."
1,0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by ...
2,0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Man...
3,0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire
4,0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all...."


In [None]:
# Check the proportion of the targets in the dataset
print(train_data['target'].value_counts(normalize=True))
# Apparently there are no neutral comments in the dataset, the dataset is balanced


0    0.5
4    0.5
Name: target, dtype: float64


In [None]:
train_data['flag'].value_counts()
# No flag in this dataset


NO_QUERY    1600000
Name: flag, dtype: int64

In [None]:
print('Number of unique users in this dataset:', train_data['user'].nunique())

Number of unique users in this dataset: 659775


In [None]:
# Look at a sample of the negative tweets 
neg_tweets = train_data.query('target==0')[['text']].sample(10)

for i in neg_tweets.iterrows():
  print(neg_tweets.loc[i[0], 'text'], '\n')


@chuckstar76  I'm obviously not meant to send this email out, as its crashed again, and I was so close to the send button  

@QthePirate go eat some for me  

on a all day training course. no Internet all day  

@christinelu kiddin?! I guess they're flying ANA not sure it's the same terminal  

Oh. 1password doesn't work with Chrome  

@dale_dale  but cheetoh's are good too! 

@Elliethinks i wish i done that, but i had like no money, and left my cash card at home  

sad about that missing plane  

Headache=  I think after work I'm going back to bed. There is a theme this weekend. I think my bed is confused by how much I've been in it 

ICT  eewwww  Save Me Please! xo. 



In [None]:
# Look at a sample of the positive tweets 
pos_tweets = train_data.query('target==4')[['text']].sample(10)

for i in pos_tweets.iterrows():
  print(pos_tweets.loc[i[0], 'text'], '\n')


Going outside again today to enjoy the beautiful weather  Laundry can wait! 

@tinchystryder i love number 1. it is amazing! im also getting one of ur tshirts and sweatshirts  i cant wait till it gets delivered!! 

@MandyyJirouxx Hi mandy  hah 

Chemistry's nearly OVER, yes!  

Wooo got an interview on monday  goood times 

@pseud0random wow u really got hooked didnt you? LOL want another?  

@killaseze yeah! Still got a great hook from it!  

Happy worship Sunday! Gonna do some Word, then hit New River, then look @ a couple open houses. Sounds like a plan!  

@alexandramusic only if u say u love diversity?!!  

preparing to my trip to us, will visit Seattle and then WWDC in SF, start today afternoon  



In [None]:
# Relabel the positive tweets 
train_data['target'] = train_data['target'].map({4:1, 0:0})

In [None]:
train_data['target'].value_counts()

0    800000
1    800000
Name: target, dtype: int64

Based on the Stanford paper and website, the original analysis of the tweets was done automatically rather than manually reviewed. The tweets were labeled as either negative or positive based on a query term in the tweet. Their training data also used emoticons and distant supervision machine learning in order to determine the sentiment. In this notebook, we will use torchtext and a LSTM in order to process and determine the tweet sentiment. 

#Process and set up the text data for PyTorch


The cell below declares the Fields that will be used in the model. The Field specifies the data type, holds a Vocab object and determines how the data will be converted to a Tensor. I will also split the dataset using train_test_split and writing the files to a CSV file to be loaded into a TabularDataset. TabularDataset will read in columns from TSV, CSV, jSON files and convert them to a Dataset object. 

In [None]:
TEXT = data.Field(tokenize='spacy', 
                  tokenizer_language='en_core_web_sm', 
                  include_lengths=True,
                  batch_first=True)

LABEL = data.LabelField(dtype=torch.float, batch_first=True)

fields = [('target', LABEL), ('text', TEXT)]

In [None]:
df_train, df_valid = train_test_split(train_data[['target', 'text']], test_size=0.2, random_state=123)

In [None]:
df_train.to_csv(r'df_train.csv', index=False)
df_valid.to_csv(r'df_valid.csv', index=False)

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

cuda


In [None]:
source_folder = os.getcwd()
trainds, validds = data.TabularDataset.splits(path=source_folder, 
                                          format='CSV',
                                          train='df_train.csv', 
                                          validation='df_valid.csv',
                                          fields=fields,
                                          skip_header=True)

In [None]:
# Print some of the variables in the tabular dataset to see what they look like 
print(vars(trainds[0]))
print(vars(validds[-1]))



{'target': '1', 'text': ['withholding', 'information', 'am', 'I', 'Shae', '?', 'YUP', ' ']}
{'target': '1', 'text': ['Hi', ',', 'my', 'name', "'s", 'Doug', '...', 'my', 'owner', 'SQUIRREL!', '...', 'Hi', ',', 'my', 'name', "'s", 'Doug', '...']}


After creating the Dataset object, we can build a BucketIterator object using torchtext. Essentially, TabularDataset and BucketIterator are like the Dataset and DataLoader objects used in torch. However, BucketIterator will sort the data so that sequences with similar lengths are grouped together within a batch. This optimizes the batches when being input into the model and reduces the amount of padding needed. 

According to [this](https://www.analyticsvidhya.com/blog/2021/09/sentiment-analysis-with-lstm-and-torchtext-with-code-and-explanation/) article, the build_vocab function will generate the Vocab object by tokenizing the words, and provide a tensor of the token IDs for each sequence. If we take a look at the items in each batch from the BucketIterator, we see that it returns a Tensor with numerical values rather than word tokens. Although it's not done in this model, we can also pass pre-trained word embeddings to build_vocab. This is done in Ben Trevett's Upgraded Sentiment Analysis tutorial and is explained in more detail there. 

In [None]:
TEXT.build_vocab(trainds, min_freq = 3)
LABEL.build_vocab(trainds) 

In [None]:
# BucketIterator is an iterator that batches examples of similar lengths together 
# Minimizes the amount of padding and shuffles batches for each new epoch 

train_iter, valid_iter = data.BucketIterator.splits(
    (trainds, validds), 
    batch_size=64,
    sort_key = lambda x: len(x.text),
    sort=False,
    sort_within_batch=True,
    device=device)

In [None]:
# Loop over bucket iterator to see what each batch and looks like 
print('PyTorchText BucketIterator\n')

for batch_no, batch in enumerate(train_iter):
    text, batch_len = batch.text
    print(text, batch_len)
    print(batch.target)
    break

  # Only look at first batch. 

PyTorchText BucketIterator

tensor([[ 73541,     15,     15,  ...,     15,     15,     15],
        [     0,     15,     15,  ...,      1,      1,      1],
        [ 17810,     38,     85,  ...,      1,      1,      1],
        ...,
        [120888,  19758,    510,  ...,      1,      1,      1],
        [  1302,     50,    523,  ...,      1,      1,      1],
        [   795,    640,    755,  ...,      1,      1,      1]],
       device='cuda:0') tensor([105,  62,  43,  41,  40,  39,  39,  39,  39,  39,  38,  38,  38,  38,
         38,  38,  38,  38,  38,  38,  37,  37,  37,  37,  37,  37,  37,  37,
         37,  37,  37,  37,  36,  36,  36,  36,  36,  36,  36,  36,  36,  36,
         36,  36,  36,  36,  36,  36,  36,  36,  36,  36,  35,  35,  35,  35,
         35,  35,  35,  35,  35,  35,  35,  35], device='cuda:0')
tensor([0., 0., 0., 0., 1., 0., 0., 1., 1., 1., 1., 1., 1., 0., 0., 0., 1., 1.,
        0., 1., 0., 0., 1., 1., 0., 1., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0.,
        1., 

In [None]:
# In the below dictionary, you can see that stoi creates a dictionary with the token and the corresponding token ID 
# unknown and padding are 0 and 1 respectively and all other words come after 

TEXT.vocab.stoi.items()



The model below consists of:
- an embedding layer 
- a bidirectional LSTM  
- dropout 
- fully connected layer 

The text is input into an embedding layer to create the word embeddings, then they are packed and padded before being input in the bidirectional LSTM. The last outputs from the forward and backward pass of the LSTM is then passed to the dropout layer (for regularization) before being passed to the fully connected layer. The model uses BCELoss so it's necessary to pass the outputs of the forward layer through a sigmoid function.

In [None]:
class LSTM(nn.Module):
  def __init__(self, output_dim, hidden_dim, input_dim, embedding_dim, num_layers, dropout):
    super().__init__() # should this just be super().__init__()?

    self.embedding = nn.Embedding(input_dim, embedding_dim) 
    self.hidden_dim = hidden_dim
    self.output_dim = output_dim
    self.lstm = nn.LSTM(input_size=embedding_dim, 
                        hidden_size=hidden_dim,
                        num_layers=num_layers,
                        batch_first=True,
                        bidirectional=True)
    self.drop = nn.Dropout(dropout)

    self.fc = nn.Linear(2*hidden_dim, output_dim)

  def forward(self, text, text_len):

    text_emb = self.embedding(text)

    # Packs a Tensor containing padded sequences of variable length 
    packed_input = pack_padded_sequence(text_emb, text_len, batch_first=True, enforce_sorted=False)

    packed_output, (hidden, cell) = self.lstm(packed_input) 
    # unpack the padded sequence 
    output, output_len = pad_packed_sequence(packed_output, batch_first=True)

    # out_reduced = torch.cat((out_forward, out_reverse), 1)
    hidden = torch.cat((hidden[-2, :, :], hidden[-1, :, :]), dim=1)
    text_fea = self.drop(hidden)

    text_fea = self.fc(text_fea)
    text_fea = torch.squeeze(text_fea, 1)
    text_out = torch.sigmoid(text_fea)

    return text_out

In [None]:
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 300
HIDDEN_DIM = 128
OUTPUT_DIM = 1
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.5 


In [None]:
model = LSTM(OUTPUT_DIM, HIDDEN_DIM, INPUT_DIM, EMBEDDING_DIM, 
             N_LAYERS, DROPOUT).to(device)
optimizer = optim.Adam(model.parameters())
criterion = nn.BCELoss() 
#criterion = criterion.to(device) # criterion needs to be sent to GPU? 
# Check if packed inputs need to be sent to GPU?

In [None]:
def binary_accuracy(preds, y):
  '''
  Returns accuracy per batch
  '''

  rounded_preds = torch.round(preds) 
  correct = (rounded_preds == y).float() # convert to float for division
  acc = correct.sum()/len(correct)
  return acc 


A note for the train and eval functions below, the text_len variable must be a CPU tensor and not a GPU tensor like text or labels. As `train_iter` and `val_iter` were set up to be on GPU, the text and label variables will be on the GPU already. 

In [None]:
# NOTE: The text_len variable must be a CPU tensor and not a GPU tensor like the other inputs. 

def train(mode, iterator, optimizer, criterion):
  epoch_loss = 0
  epoch_acc = 0
  model.train()

  for batch in iterator:
    labels = batch.target
    text, text_len = batch.text 
    
    text_len = text_len.cpu()

    optimizer.zero_grad()

    predictions = model(text, text_len)
    loss = criterion(predictions, labels)
    acc = binary_accuracy(predictions, labels)
    loss.backward()
    optimizer.step()

    epoch_loss += loss.item()
    epoch_acc += acc.item()

  return epoch_loss/len(iterator), epoch_acc/len(iterator)

  

In [None]:
def evaluate(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()
    
    with torch.no_grad():
    
        for batch in iterator:

            text, text_len = batch.text
            labels = batch.target

            # labels.to(device)
            # text.to(device)
            # text_len.to(device)

            text_len = text_len.cpu()
            predictions = model(text, text_len)
            
            loss = criterion(predictions, labels)
            
            acc = binary_accuracy(predictions, labels)

            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [None]:
import time

def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

In [None]:
N_EPOCHS = 5
best_valid_loss = float('inf')
train_loss_list = []
valid_loss_list = []


for epoch in range(N_EPOCHS):
  start_time = time.time()
  train_loss, train_acc = train(model, train_iter, optimizer, criterion)
  valid_loss, valid_acc = evaluate(model, valid_iter, criterion)

  end_time = time.time()
  
  train_loss_list.append(train_loss)
  valid_loss_list.append(valid_loss)

  epoch_mins, epoch_secs = epoch_time(start_time, end_time) 

  if valid_loss < best_valid_loss:
    best_valid_loss = valid_loss
    torch.save(model.state_dict(), 'first-draft-model.pt')

  print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
  print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
  print(f'\t Val.Loss: {valid_loss:.3f} | Val.Acc: {valid_acc*100:.2f}%')


Epoch: 01 | Epoch Time: 8m 22s
	Train Loss: 0.375 | Train Acc: 83.21%
	 Val.Loss: 0.338 | Val.Acc: 85.30%
Epoch: 02 | Epoch Time: 8m 20s
	Train Loss: 0.303 | Train Acc: 87.15%
	 Val.Loss: 0.329 | Val.Acc: 85.79%
Epoch: 03 | Epoch Time: 8m 17s
	Train Loss: 0.263 | Train Acc: 89.08%
	 Val.Loss: 0.338 | Val.Acc: 85.68%
Epoch: 04 | Epoch Time: 8m 18s
	Train Loss: 0.229 | Train Acc: 90.65%
	 Val.Loss: 0.349 | Val.Acc: 85.48%
Epoch: 05 | Epoch Time: 8m 17s
	Train Loss: 0.200 | Train Acc: 91.94%
	 Val.Loss: 0.374 | Val.Acc: 85.20%


After 5 epochs, the accuracy for the training dataset went from 83% to 91%; however, the validation accuracy stayed around 85%, and the accuracy does drop slightly by the last epoch. The accuracy seems to imply that the data is being overfit as it continues. Another metric that can be looked at is the classification report to see the precision and recall of the model. 

A few things that could be tried to see if the metrics can improve:
- Use pre-trained word embeddings. Glove has word embeddings for tweets specifically.
- Change the drop out value to see if increased regularization will reduce overfitting 
- Process the data more to remove certain values, e.g. usernames, links, that might not contribute to the sentiment analysis

It would also be interesting to try a simpler model, e.g. bag of words or TFIDF, to see if these models would still output similar results. Each epoch took about 8 minutes to run for this bidirectional LSTM, so if a simpler model could provide similar results, then that could lead to faster computational and processing times. 

# Determining the Last Time Step of the Backward and Forward Pass

Checked to see if `output_forward` matched `output[:, -1, :self.dimension]` or `hidden[-2, :, :]`. Found that the majority of the tensor outputs were the same but that there was one difference in certain batches with `output[:, -1, :self.dimension]`. 

out_forward: 

```
tensor([[ 0.0341, -0.0997, -0.0522,  ..., -0.0227,  0.0251,  0.3660],
        [ 0.0488,  0.0952,  0.0927,  ..., -0.1522,  0.0247,  0.2887],
        [ 0.1365,  0.0957, -0.0611,  ...,  0.0462,  0.0153, -0.2304],
        ...,
        [ 0.0477,  0.2365, -0.1074,  ...,  0.0071,  0.0043, -0.3155],
        [ 0.0067,  0.0112, -0.2509,  ..., -0.0420,  0.0087, -0.0415],
        [-0.0426,  0.3102,  0.0678,  ..., -0.0504,  0.0289, -0.3696]],
       device='cuda:0', grad_fn=<IndexBackward>)

out_forward shape: torch.Size([64, 128])
```

out_test: 
```
tensor([[ 0.0341, -0.0997, -0.0522,  ..., -0.0227,  0.0251,  0.3660],
        [ 0.0488,  0.0952,  0.0927,  ..., -0.1522,  0.0247,  0.2887],
        [ 0.1365,  0.0957, -0.0611,  ...,  0.0462,  0.0153, -0.2304],
        ...,
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000]],
       device='cuda:0', grad_fn=<SliceBackward>)

out_test shape: torch.Size([64, 128])
```

hidden_forward: 

```
tensor([[ 0.0341, -0.0997, -0.0522,  ..., -0.0227,  0.0251,  0.3660],
        [ 0.0488,  0.0952,  0.0927,  ..., -0.1522,  0.0247,  0.2887],
        [ 0.1365,  0.0957, -0.0611,  ...,  0.0462,  0.0153, -0.2304],
        ...,
        [ 0.0477,  0.2365, -0.1074,  ...,  0.0071,  0.0043, -0.3155],
        [ 0.0067,  0.0112, -0.2509,  ..., -0.0420,  0.0087, -0.0415],
        [-0.0426,  0.3102,  0.0678,  ..., -0.0504,  0.0289, -0.3696]],
       device='cuda:0', grad_fn=<SliceBackward>)

hidden_forward shape: torch.Size([64, 128])
```

While `out_test`, which is `output[:, -1, :self.dimension]`, has identical values in the first part of the tensor, the latter parts are all 0's. This  does not occur in every batch. In other batches, all of the values in `out_test` are the same as both `out_forward` and `hidden_forward`. When `batch_first=True` in the LSTM model, the dimensions of output are (N, L, D*H_out), so `output[:, -1, :self.dimension]` should give the last time step of the batch up to the first hidden size dimension (forward). Unsure where this discrepancy comes from. It does seem to make more sense though, given the confusion regarding the last time step of the hidden states from output, to just take the concatenation of `hidden[-2, :, :]` and `hidden[-1, :, :]`.  

Since the output has dimensions of (batch size, sequence length, 2* size of the hidden state), it's possible that the 0s are coming from the padded values. In `out_forward`, it is only running the full length of the sequence, and nothing past that. 

From checking the output tensor, the output lengths and the text lengths, we see the following code output:

```
Output shape: torch.Size([64, 35, 256])
output_len: tensor([35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35,
        35, 35, 35, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34,
        34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34,
        34, 34, 34, 34, 34, 34, 34, 34, 34, 34])
Text length minus 1: tensor([34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34,
        34, 34, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
        33, 33, 33, 33, 33, 33, 33, 33, 33, 33])
```

It seems that the output shape uses the maximum text length; however, the output length for each batch shows that it's not always the same text_length for each batch. The sequences are arranged such that they are sorted by text_length in descending order. It's likely then, that the 0s are coming from the the padded values when using `output[:, -1, :self.dimension]` as this does not necessarily get the last timestep. 

Given the confusion that comes from the output tensor in bidirectional LSTMs, it seems easiest to use `hidden[-2, :, :]` and `hidden[-1, :. :]` to determine the last time steps of the forward and backward passes. Especially when the length of the sequences are not always the same.



---



---

