# Sieci neuronowe i Deep Learning


## Zadanie 11.1

Celem tego zadania jest próba ulepszenia sieci zbudowanej w ramach poprzednich zajęć (chodzi o przykładową, gotową architekturę, patrz fragment kodu wklejony poniżej).

Sprawdzić następujące propozycje modyfikacji istniejącej sieci. Wybrać te, które zwiększają dokładność predykcyjną modelu (uczenie prowadzić przez 10 epok (lub więcej) przy `batch_size = 32`).

1. Zwiększenie wymiaru wektorów embedding (`embed_dim`) może pozwolić modelowi na uchwycenie bardziej złożonych relacji między słowami.
2. Zwiększenie liczby jednostek w warstwie LSTM (`rnn_hidden_size`) może poprawić zdolność modelu do zapamiętywania długich sekwencji.
3. Dodanie kolejnej warstwy LSTM może pomóc modelowi w lepszym uchwyceniu zależności w sekwencjach.
4. Dodanie warstw Dropout pomiędzy warstwami może pomóc w zapobieganiu przeuczeniu modelu.
5. Zamiast ReLU, można spróbować użyć innych funkcji aktywacji, takich jak LeakyReLU lub ELU.

Pamiętać o optymalizacji tempa uczenia.

In [None]:
!pip install torch==2.2.0.

Collecting torch==2.2.0.
  Downloading torch-2.2.0-cp310-cp310-manylinux1_x86_64.whl (755.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m755.5/755.5 MB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.2.0.)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.2.0.)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.2.0.)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.2.0.)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.2.0.)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
Collecting nvidia-cufft-cu12==

In [None]:
import torch
import torch.nn as nn

In [None]:
!pip install torchtext==0.17.0

Collecting torchtext==0.17.0
  Downloading torchtext-0.17.0-cp310-cp310-manylinux1_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m20.4 MB/s[0m eta [36m0:00:00[0m
Collecting torchdata==0.7.1 (from torchtext==0.17.0)
  Downloading torchdata-0.7.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.7/4.7 MB[0m [31m48.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: torchdata, torchtext
  Attempting uninstall: torchtext
    Found existing installation: torchtext 0.18.0
    Uninstalling torchtext-0.18.0:
      Successfully uninstalled torchtext-0.18.0
Successfully installed torchdata-0.7.1 torchtext-0.17.0


In [None]:
!pip install portalocker

Collecting portalocker
  Downloading portalocker-2.8.2-py3-none-any.whl (17 kB)
Installing collected packages: portalocker
Successfully installed portalocker-2.8.2


# Przygotowanie zbioru danych z recenzjami

In [None]:
from torchtext.datasets import IMDB
from torch.utils.data.dataset import random_split

In [None]:
!pip install torchdata



In [None]:
## Wczytanie danych i podział na train/test

train_dataset = IMDB(split='train')
test_dataset = IMDB(split='test')

test_dataset = list(test_dataset)   #datapipe to list

W każdym zbiorze znajduje się po 25 000 przykładów: recenzja + etykieta (neg/pos).

In [None]:
from torch.utils.data import Subset
torch.manual_seed(1)

# Wydzielamy train i valid z domyślnego traina:
train_dataset, valid_dataset = random_split(
    list(train_dataset), [20000, 5000])
#train_dataset=Subset(train_dataset.dataset, train_dataset.indices[:1000])
#valid_dataset=Subset(valid_dataset.dataset, valid_dataset.indices[:250])

In [None]:
## Kodujemy dane tekstowe: szukamy unikatowch słów (tokenów) -
## można wykorzystać klasę Counter z pakietu collections

import re
from collections import Counter, OrderedDict

token_counts = Counter()

# Gotowa funkcja do tokenizacji (czyści też tekst):
def tokenizer(text):
    text = re.sub('<[^>]*>', '', text)
    emoticons = re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)', text.lower())
    text = re.sub('[\W]+', ' ', text.lower()) +\
        ' '.join(emoticons).replace('-', '')
    tokenized = text.split()
    return tokenized


for label, line in train_dataset:
    tokens = tokenizer(line)
    token_counts.update(tokens)


print('Vocab-size:', len(token_counts))

Vocab-size: 69023


In [None]:
## Mapowanie unikalnych słów na integery (pakiet torchtext ma do tego klasę Vocab)
from torchtext.vocab import vocab

sorted_by_freq_tuples = sorted(token_counts.items(), key=lambda x: x[1], reverse=True)
ordered_dict = OrderedDict(sorted_by_freq_tuples)

vocab = vocab(ordered_dict)

vocab.insert_token("<pad>", 0)  # 0 - symbol ,,zastępczy"
vocab.insert_token("<unk>", 1)  # 1 - dla nieznanych tokenów
vocab.set_default_index(1)

print([vocab[token] for token in ['this', 'is', 'an', 'example']])

[11, 7, 35, 457]


In [None]:
if not torch.cuda.is_available():
    print("Warning: this code may be very slow on CPU")

Jeśli za wolno, to wziąć mniej danych.

In [None]:
## Definijemy funkcję do transformacji oraz funkcję do etykiet 0-1:
import torchtext
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

text_pipeline = lambda x: [vocab[token] for token in tokenizer(x)]

from torchtext import __version__ as torchtext_version
from pkg_resources import parse_version

if parse_version(torchtext.__version__) > parse_version("0.10"):
    label_pipeline = lambda x: 1. if x == 2 else 0.         # 1 ~ negative, 2 ~ positive review
else:
    label_pipeline = lambda x: 1. if x == 'pos' else 0.


## Składamy w jedną funkcję:
def collate_batch(batch):
    label_list, text_list, lengths = [], [], []
    for _label, _text in batch:
        label_list.append(label_pipeline(_label))
        processed_text = torch.tensor(text_pipeline(_text),
                                      dtype=torch.int64)
        text_list.append(processed_text)
        lengths.append(processed_text.size(0))
    label_list = torch.tensor(label_list)
    lengths = torch.tensor(lengths)
    padded_text_list = nn.utils.rnn.pad_sequence(
        text_list, batch_first=True)
    return padded_text_list.to(device), label_list.to(device), lengths.to(device)

In [None]:
## Test na 4 przykładach

from torch.utils.data import DataLoader

dataloader = DataLoader(train_dataset, batch_size=4, shuffle=False, collate_fn=collate_batch)
text_batch, label_batch, length_batch = next(iter(dataloader))

Mamy zamienione sekwencje słów na sekwencje liczb całkowitych, a etykiety na 1 lub 0.

Funckja `pad_sequence()` dopełniła przykłady zerami tak, aby wszystki przykłady w batchu miały ten sam kształt
(aby efektywnie przechowywać je w postaci tensorów).

In [None]:
## Dzielimy zbiory danych na batche o rozmiarze 32:

batch_size = 32

train_dl = DataLoader(train_dataset, batch_size=batch_size,
                      shuffle=True, collate_fn=collate_batch)
valid_dl = DataLoader(valid_dataset, batch_size=batch_size,
                      shuffle=False, collate_fn=collate_batch)
test_dl = DataLoader(test_dataset, batch_size=batch_size,
                     shuffle=False, collate_fn=collate_batch)

## Model RNN do analizy sentymentu w recenzjach filmów z IMDb

### 1. Zwiększenie wymiaru wektorów embedding (`embed_dim`) może pozwolić modelowi na uchwycenie bardziej złożonych relacji między słowami.
### 2. Zwiększenie liczby jednostek w warstwie LSTM (`rnn_hidden_size`) może poprawić zdolność modelu do zapamiętywania długich sekwencji.

In [None]:
class RNN(nn.Module):
    def __init__(self, vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size,
                                      embed_dim,
                                      padding_idx=0)
        self.rnn = nn.LSTM(embed_dim, rnn_hidden_size,
                           batch_first=True)
        self.fc1 = nn.Linear(rnn_hidden_size, fc_hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(fc_hidden_size, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, text, lengths):
        out = self.embedding(text)
        out = nn.utils.rnn.pack_padded_sequence(out, lengths.cpu().numpy(), enforce_sorted=False, batch_first=True)
        out, (hidden, cell) = self.rnn(out)
        out = hidden[-1, :, :]
        out = self.fc1(out)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.sigmoid(out)
        return out

vocab_size = len(vocab)
embed_dim = 20
#1. Zwiększenie wymiaru wektorów embedding
embed_dim_1 = 50
#2.Zwiększenie liczby jednostek w warstwie LSTM
rnn_hidden_size = 64
rnn_hidden_size_1 = 128
fc_hidden_size = 64

torch.manual_seed(1)
model = RNN(vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size)
model = model.to(device)
model_emb = RNN(vocab_size, embed_dim_1, rnn_hidden_size, fc_hidden_size)
model_emb = model_emb.to(device)
model_rnn = RNN(vocab_size, embed_dim, rnn_hidden_size_1, fc_hidden_size)
model_rnn = model_rnn.to(device)

In [None]:
def train(dataloader, model, optimizer):
    model.train()
    total_acc, total_loss = 0, 0
    for text_batch, label_batch, lengths in dataloader:
        optimizer.zero_grad()
        pred = model(text_batch, lengths)[:, 0]
        loss = loss_fn(pred, label_batch)
        loss.backward()
        optimizer.step()
        total_acc += ((pred>=0.5).float() == label_batch).float().sum().item()
        total_loss += loss.item()*label_batch.size(0)
    return total_acc/len(dataloader.dataset), total_loss/len(dataloader.dataset)

def evaluate(dataloader, model):
    model.eval()
    total_acc, total_loss = 0, 0
    with torch.no_grad():
        for text_batch, label_batch, lengths in dataloader:
            pred = model(text_batch, lengths)[:, 0]
            loss = loss_fn(pred, label_batch)
            total_acc += ((pred>=0.5).float() == label_batch).float().sum().item()
            total_loss += loss.item()*label_batch.size(0)
    return total_acc/len(dataloader.dataset), total_loss/len(dataloader.dataset)

In [None]:
loss_fn = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
optimizer_em = torch.optim.Adam(model_emb.parameters(), lr=0.001)
optimizer_rnn = torch.optim.Adam(model_rnn.parameters(), lr=0.001)
num_epochs = 10

torch.manual_seed(1)

for epoch in range(num_epochs):
    acc_train, loss_train = train(train_dl, model, optimizer)
    acc_valid, loss_valid = evaluate(valid_dl, model)
    print(f'Epoch {epoch} accuracy: {acc_train:.4f} val_accuracy: {acc_valid:.4f}')

for epoch in range(num_epochs):
    acc_train_e, loss_train_e = train(train_dl, model_emb, optimizer_em)
    acc_valid_e, loss_valid_e = evaluate(valid_dl, model_emb)
    print(f'Epoch {epoch} accuracy: {acc_train_e:.4f} val_accuracy: {acc_valid_e:.4f}')

for epoch in range(num_epochs):
    acc_train_rnn, loss_train_rnn = train(train_dl, model_rnn, optimizer_rnn)
    acc_valid_rnn, loss_valid_rnn = evaluate(valid_dl, model_rnn)
    print(f'Epoch {epoch} accuracy: {acc_train_rnn:.4f} val_accuracy: {acc_valid_rnn:.4f}')

Epoch 0 accuracy: 0.6096 val_accuracy: 0.6852
Epoch 1 accuracy: 0.7257 val_accuracy: 0.7452
Epoch 2 accuracy: 0.7466 val_accuracy: 0.6284
Epoch 3 accuracy: 0.7253 val_accuracy: 0.5366
Epoch 4 accuracy: 0.7972 val_accuracy: 0.7492
Epoch 5 accuracy: 0.8619 val_accuracy: 0.7784
Epoch 6 accuracy: 0.8911 val_accuracy: 0.8040
Epoch 7 accuracy: 0.9162 val_accuracy: 0.8574
Epoch 8 accuracy: 0.9328 val_accuracy: 0.8598
Epoch 9 accuracy: 0.9504 val_accuracy: 0.8634
Epoch 0 accuracy: 0.5927 val_accuracy: 0.5598
Epoch 1 accuracy: 0.6872 val_accuracy: 0.7280
Epoch 2 accuracy: 0.7806 val_accuracy: 0.8040
Epoch 3 accuracy: 0.8616 val_accuracy: 0.8004
Epoch 4 accuracy: 0.8419 val_accuracy: 0.8588
Epoch 5 accuracy: 0.9280 val_accuracy: 0.8654
Epoch 6 accuracy: 0.9510 val_accuracy: 0.8790
Epoch 7 accuracy: 0.9706 val_accuracy: 0.8698
Epoch 8 accuracy: 0.9834 val_accuracy: 0.8734
Epoch 9 accuracy: 0.9891 val_accuracy: 0.8708
Epoch 0 accuracy: 0.5522 val_accuracy: 0.5796
Epoch 1 accuracy: 0.6383 val_accur

In [None]:
acc_test, _ = evaluate(test_dl, model)
acc_test_em, _ = evaluate(test_dl, model_emb)
acc_test_rnn, _ = evaluate(test_dl, model_rnn)
print(f'test_accuracy: {acc_test:.4f}')
print(f'test_accuracy emd_dim=50: {acc_test_em:.4f}')
print(f'test_accuracy rnn_hidden_size_1 = 128: {acc_test_rnn:.4f}')

test_accuracy: 0.8571
test_accuracy emd_dim=50: 0.8586
test_accuracy rnn_hidden_size_1 = 12: 0.8543


Porównywalne wyniki dla kazdej z rozwazanych opcji

### 3. Dodanie kolejnej warstwy LSTM może pomóc modelowi w lepszym uchwyceniu zależności w sekwencjach.


In [None]:
class RNN_1(nn.Module):
    def __init__(self, vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size, num_layers):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size,
                                      embed_dim,
                                      padding_idx=0)
        self.rnn = nn.LSTM(embed_dim, rnn_hidden_size, num_layers,
                           batch_first=True)
        self.fc1 = nn.Linear(rnn_hidden_size, fc_hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(fc_hidden_size, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, text, lengths):
        out = self.embedding(text)
        out = nn.utils.rnn.pack_padded_sequence(out, lengths.cpu().numpy(), enforce_sorted=False, batch_first=True)
        out, (hidden, cell) = self.rnn(out)
        out = hidden[-1, :, :]
        out = self.fc1(out)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.sigmoid(out)
        return out

#2 warstwy LSTM
num_layers=2
vocab_size = len(vocab)
embed_dim = 20
rnn_hidden_size = 64
fc_hidden_size = 64



torch.manual_seed(1)
model_lstm = RNN_1(vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size, num_layers)
model_lstm = model_lstm.to(device)


In [None]:
# 3 warstwy LSTM
num_layers=3
vocab_size = len(vocab)
embed_dim = 20
rnn_hidden_size = 64
fc_hidden_size = 64



torch.manual_seed(1)
model_lstm3 = RNN_1(vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size, num_layers)
model_lstm3 = model_lstm3.to(device)

In [None]:
loss_fn = nn.BCELoss()
optimizer_lstm = torch.optim.Adam(model_lstm.parameters(), lr=0.001)
num_epochs = 10

torch.manual_seed(1)

for epoch in range(num_epochs):
    acc_train, loss_train = train(train_dl, model_lstm, optimizer_lstm)
    acc_valid, loss_valid = evaluate(valid_dl, model_lstm)
    print(f'Epoch {epoch} accuracy: {acc_train:.4f} val_accuracy: {acc_valid:.4f}')


Epoch 0 accuracy: 0.5856 val_accuracy: 0.6708
Epoch 1 accuracy: 0.7030 val_accuracy: 0.7134
Epoch 2 accuracy: 0.7752 val_accuracy: 0.7766
Epoch 3 accuracy: 0.7782 val_accuracy: 0.8340
Epoch 4 accuracy: 0.8799 val_accuracy: 0.8558
Epoch 5 accuracy: 0.9139 val_accuracy: 0.8654
Epoch 6 accuracy: 0.9327 val_accuracy: 0.8606
Epoch 7 accuracy: 0.9504 val_accuracy: 0.8534
Epoch 8 accuracy: 0.9654 val_accuracy: 0.8764
Epoch 9 accuracy: 0.9735 val_accuracy: 0.8738


In [None]:
acc_test_lstm, _ = evaluate(test_dl, model_lstm)

In [None]:
loss_fn = nn.BCELoss()
optimizer_lstm3 = torch.optim.Adam(model_lstm3.parameters(), lr=0.001)
num_epochs = 10

torch.manual_seed(1)

for epoch in range(num_epochs):
    acc_train, loss_train = train(train_dl, model_lstm3, optimizer_lstm3)
    acc_valid, loss_valid = evaluate(valid_dl, model_lstm3)
    print(f'Epoch {epoch} accuracy: {acc_train:.4f} val_accuracy: {acc_valid:.4f}')

Epoch 0 accuracy: 0.5419 val_accuracy: 0.5966
Epoch 1 accuracy: 0.6599 val_accuracy: 0.7198
Epoch 2 accuracy: 0.7438 val_accuracy: 0.7158
Epoch 3 accuracy: 0.7943 val_accuracy: 0.7474
Epoch 4 accuracy: 0.8442 val_accuracy: 0.7984
Epoch 5 accuracy: 0.8758 val_accuracy: 0.8266
Epoch 6 accuracy: 0.8921 val_accuracy: 0.8376
Epoch 7 accuracy: 0.9085 val_accuracy: 0.8230
Epoch 8 accuracy: 0.9217 val_accuracy: 0.8600
Epoch 9 accuracy: 0.9311 val_accuracy: 0.8546


In [None]:
acc_test_lstm3, _ = evaluate(test_dl, model_lstm3)

In [None]:
print(f'test_accuracy default model: {acc_test:.4f}')
print(f'test_accuracy 2 lstm layers: {acc_test_lstm:.4f}')
print(f'test_accuracy 3 lstm layers: {acc_test_lstm3:.4f}')

test_accuracy default model: 0.8571
test_accuracy 2 lstm layers: 0.8686
test_accuracy 3 lstm layers: 0.8474


Z dwoma wartswami LSTM wypada zarowno najlepiej na tescie jak i validzie

### 4.Dodanie warstw Dropout pomiędzy warstwami może pomóc w zapobieganiu przeuczeniu modelu.

In [None]:
class RNN_2a(nn.Module):
    def __init__(self, vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size, p1 , p2):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size,
                                      embed_dim,
                                      padding_idx=0)
        self.rnn = nn.LSTM(embed_dim, rnn_hidden_size,
                           batch_first=True)
        self.fc1 = nn.Linear(rnn_hidden_size, fc_hidden_size)
        self.relu = nn.ReLU()
        self.dropout1 = nn.Dropout(p1)
        self.fc2 = nn.Linear(fc_hidden_size, 1)
        self.sigmoid = nn.Sigmoid()
        self.dropout2 = nn.Dropout(p2)

    def forward(self, text, lengths):
        out = self.embedding(text)
        out = nn.utils.rnn.pack_padded_sequence(out, lengths.cpu().numpy(), enforce_sorted=False, batch_first=True)
        out, (hidden, cell) = self.rnn(out)
        out = hidden[-1, :, :]
        out = self.fc1(out)
        out = self.relu(out)
        out = self.dropout1(out)
        out = self.fc2(out)
        out = self.dropout2(out)
        out = self.sigmoid(out)
        return out

class RNN_2b(nn.Module):
    def __init__(self, vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size, p2):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size,
                                      embed_dim,
                                      padding_idx=0)
        self.rnn = nn.LSTM(embed_dim, rnn_hidden_size,
                           batch_first=True)
        self.fc1 = nn.Linear(rnn_hidden_size, fc_hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(fc_hidden_size, 1)
        self.sigmoid = nn.Sigmoid()
        self.dropout2 = nn.Dropout(p2)

    def forward(self, text, lengths):
        out = self.embedding(text)
        out = nn.utils.rnn.pack_padded_sequence(out, lengths.cpu().numpy(), enforce_sorted=False, batch_first=True)
        out, (hidden, cell) = self.rnn(out)
        out = hidden[-1, :, :]
        out = self.fc1(out)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.dropout2(out)
        out = self.sigmoid(out)
        return out

vocab_size = len(vocab)
embed_dim = 20
rnn_hidden_size = 64
fc_hidden_size = 64
p1=0.3
p2=0.4
p=0.5
model_dropout1 = RNN_2a(vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size, p1, p2)
model_dropout1 = model_dropout1.to(device)

model_dropout2 = RNN_2b(vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size, p)
model_dropout2 = model_dropout2.to(device)


In [None]:
loss_fn = nn.BCELoss()
optimizer_dropout1 = torch.optim.Adam(model_dropout1.parameters(), lr=0.001)
optimizer_dropout2 = torch.optim.Adam(model_dropout2.parameters(), lr=0.001)
num_epochs = 10


for epoch in range(num_epochs):
    acc_train, loss_train = train(train_dl, model_dropout1, optimizer_dropout1)
    acc_valid, loss_valid = evaluate(valid_dl, model_dropout1)
    print(f'Epoch {epoch} accuracy: {acc_train:.4f} val_accuracy: {acc_valid:.4f}')

for epoch in range(num_epochs):
    acc_train, loss_train = train(train_dl, model_dropout2, optimizer_dropout2)
    acc_valid, loss_valid = evaluate(valid_dl, model_dropout2)
    print(f'Epoch {epoch} accuracy: {acc_train:.4f} val_accuracy: {acc_valid:.4f}')


Epoch 0 accuracy: 0.5526 val_accuracy: 0.6540
Epoch 1 accuracy: 0.6146 val_accuracy: 0.7086
Epoch 2 accuracy: 0.5846 val_accuracy: 0.6348
Epoch 3 accuracy: 0.6352 val_accuracy: 0.7476
Epoch 4 accuracy: 0.6875 val_accuracy: 0.8100
Epoch 5 accuracy: 0.7219 val_accuracy: 0.8334
Epoch 6 accuracy: 0.7313 val_accuracy: 0.8474
Epoch 7 accuracy: 0.7442 val_accuracy: 0.8544
Epoch 8 accuracy: 0.7605 val_accuracy: 0.8600
Epoch 9 accuracy: 0.7592 val_accuracy: 0.8622
Epoch 0 accuracy: 0.5336 val_accuracy: 0.6136
Epoch 1 accuracy: 0.5813 val_accuracy: 0.6652
Epoch 2 accuracy: 0.6119 val_accuracy: 0.7236
Epoch 3 accuracy: 0.6432 val_accuracy: 0.7830
Epoch 4 accuracy: 0.6603 val_accuracy: 0.7918
Epoch 5 accuracy: 0.6670 val_accuracy: 0.7872
Epoch 6 accuracy: 0.6770 val_accuracy: 0.7936
Epoch 7 accuracy: 0.6907 val_accuracy: 0.8190
Epoch 8 accuracy: 0.7014 val_accuracy: 0.8352
Epoch 9 accuracy: 0.6958 val_accuracy: 0.7940


In [None]:
acc_test_dropout1, _ = evaluate(test_dl, model_dropout1)
acc_test_dropout2, _ = evaluate(test_dl, model_dropout2)

In [None]:
print(f'test_accuracy default model: {acc_test:.4f}')
print(f'test_accuracy 2 dropout layers: {acc_test_dropout1:.4f}')
print(f'test_accuracy 1 dropout layers: {acc_test_dropout2:.4f}')

test_accuracy default model: 0.8571
test_accuracy 2 dropout layers: 0.8614
test_accuracy 1 dropout layers: 0.7986


z 2 dropoutami lepsze wyniki

### 5. Zamiast ReLU,  spróbować użyć innych funkcji aktywacji, takich jak LeakyReLU lub ELU.

### Proba z LeakyReLU i ELU

In [None]:
class RNN_3(nn.Module):
    def __init__(self, vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size, slope):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size,
                                      embed_dim,
                                      padding_idx=0)
        self.rnn = nn.LSTM(embed_dim, rnn_hidden_size,
                           batch_first=True)
        self.fc1 = nn.Linear(rnn_hidden_size, fc_hidden_size)
        self.leakyrelu = nn.LeakyReLU(slope)
        self.fc2 = nn.Linear(fc_hidden_size, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, text, lengths):
        out = self.embedding(text)
        out = nn.utils.rnn.pack_padded_sequence(out, lengths.cpu().numpy(), enforce_sorted=False, batch_first=True)
        out, (hidden, cell) = self.rnn(out)
        out = hidden[-1, :, :]
        out = self.fc1(out)
        out = self.leakyrelu(out)
        out = self.fc2(out)
        out = self.sigmoid(out)
        return out

class RNN_4(nn.Module):
    def __init__(self, vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size,
                                      embed_dim,
                                      padding_idx=0)
        self.rnn = nn.LSTM(embed_dim, rnn_hidden_size,
                           batch_first=True)
        self.fc1 = nn.Linear(rnn_hidden_size, fc_hidden_size)
        self.elu = nn.ELU()
        self.fc2 = nn.Linear(fc_hidden_size, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, text, lengths):
        out = self.embedding(text)
        out = nn.utils.rnn.pack_padded_sequence(out, lengths.cpu().numpy(), enforce_sorted=False, batch_first=True)
        out, (hidden, cell) = self.rnn(out)
        out = hidden[-1, :, :]
        out = self.fc1(out)
        out = self.elu(out)
        out = self.fc2(out)
        out = self.sigmoid(out)
        return out

vocab_size = len(vocab)
embed_dim = 20
rnn_hidden_size = 64
fc_hidden_size = 64
slope=0.1

model_leaky = RNN_3(vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size, slope)
model_leaky = model_leaky.to(device)


model_elu = RNN_4(vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size)
model_elu = model_elu.to(device)



In [None]:
loss_fn = nn.BCELoss()
optimizer_leaky = torch.optim.Adam(model_leaky.parameters(), lr=0.001)
optimizer_elu = torch.optim.Adam(model_elu.parameters(), lr=0.001)

num_epochs = 10

torch.manual_seed(1)

for epoch in range(num_epochs):
    acc_train, loss_train = train(train_dl, model_leaky, optimizer_leaky)
    acc_valid, loss_valid = evaluate(valid_dl, model_leaky)
    print(f'Epoch {epoch} accuracy: {acc_train:.4f} val_accuracy: {acc_valid:.4f}')


Epoch 0 accuracy: 0.6095 val_accuracy: 0.6638
Epoch 1 accuracy: 0.7511 val_accuracy: 0.7786
Epoch 2 accuracy: 0.8141 val_accuracy: 0.8138
Epoch 3 accuracy: 0.8593 val_accuracy: 0.8206
Epoch 4 accuracy: 0.8720 val_accuracy: 0.8030
Epoch 5 accuracy: 0.9031 val_accuracy: 0.8386
Epoch 6 accuracy: 0.9250 val_accuracy: 0.7760
Epoch 7 accuracy: 0.9407 val_accuracy: 0.8530
Epoch 8 accuracy: 0.9543 val_accuracy: 0.8506
Epoch 9 accuracy: 0.9635 val_accuracy: 0.8582


In [None]:
for epoch in range(num_epochs):
    acc_train, loss_train = train(train_dl, model_elu, optimizer_elu)
    acc_valid, loss_valid = evaluate(valid_dl, model_elu)
    print(f'Epoch {epoch} accuracy: {acc_train:.4f} val_accuracy: {acc_valid:.4f}')

Epoch 0 accuracy: 0.5842 val_accuracy: 0.6304
Epoch 1 accuracy: 0.6735 val_accuracy: 0.6938
Epoch 2 accuracy: 0.7474 val_accuracy: 0.7480
Epoch 3 accuracy: 0.8153 val_accuracy: 0.8094
Epoch 4 accuracy: 0.8566 val_accuracy: 0.8268
Epoch 5 accuracy: 0.8952 val_accuracy: 0.8450
Epoch 6 accuracy: 0.9041 val_accuracy: 0.8404
Epoch 7 accuracy: 0.9345 val_accuracy: 0.8520
Epoch 8 accuracy: 0.9484 val_accuracy: 0.8632
Epoch 9 accuracy: 0.9589 val_accuracy: 0.8648


In [None]:
acc_test_leaky, _ = evaluate(test_dl, model_leaky)
acc_test_elu, _ = evaluate(test_dl, model_elu)


In [None]:
#print(f'test_accuracy default model: {acc_test:.4f}')
print(f'test_accuracy activation function LeakyRelu negative slope=0.1: {acc_test_leaky:.4f}')
print(f'test_accuracy activation function ELU: {acc_test_elu:.4f}')

test_accuracy activation function LeakyRelu negative slope=0.1: 0.8499
test_accuracy activation function LeakyRelu negative slope=0.2: 0.8552


### Finalny model

wybieramy, które z powyższych podejść poprawiły jakość wyjściowego modelu i stosujemy wszystkie z nich. Sprawdzamy trzy tempa uczenia lr=0.01, 0.001, 0.0001


Jakość poprawiły:
- embedding = 50
- dodatkowa warstwa LSTM
- dwie warstwy dropout

In [None]:
num_layers=2
vocab_size = len(vocab)
embed_dim = 50
rnn_hidden_size = 64
fc_hidden_size = 64
p1=0.3
p2=0.4


class RNN_opt(nn.Module):
    def __init__(self, vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size, num_layers, p1,p2):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size,
                                      embed_dim,
                                      padding_idx=0)
        self.rnn = nn.LSTM(embed_dim, rnn_hidden_size, num_layers,
                           batch_first=True)
        self.fc1 = nn.Linear(rnn_hidden_size, fc_hidden_size)
        self.relu = nn.ReLU()
        self.dropout1 = nn.Dropout(p1)
        self.fc2 = nn.Linear(fc_hidden_size, 1)
        self.sigmoid = nn.Sigmoid()
        self.dropout2 = nn.Dropout(p2)


    def forward(self, text, lengths):
        out = self.embedding(text)
        out = nn.utils.rnn.pack_padded_sequence(out, lengths.cpu().numpy(), enforce_sorted=False, batch_first=True)
        out, (hidden, cell) = self.rnn(out)
        out = hidden[-1, :, :]
        out = self.fc1(out)
        out = self.relu(out)
        out = self.dropout1(out)
        out = self.fc2(out)
        out = self.dropout2(out)
        out = self.sigmoid(out)
        return out


model_opt1 = RNN_opt(vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size, num_layers,p1,p2)
model_opt1 = model_opt1.to(device)

model_opt2 = RNN_opt(vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size, num_layers,p1,p2)
model_opt2 = model_opt2.to(device)

model_opt3 = RNN_opt(vocab_size, embed_dim, rnn_hidden_size, fc_hidden_size, num_layers,p1,p2)
model_opt3 = model_opt3.to(device)


In [None]:
loss_fn = nn.BCELoss()
optimizer1 = torch.optim.Adam(model_opt1.parameters(), lr=0.01)
optimizer2 = torch.optim.Adam(model_opt2.parameters(), lr=0.001)
optimizer3 = torch.optim.Adam(model_opt3.parameters(), lr=0.0001)

num_epochs = 10

torch.manual_seed(1)

for epoch in range(num_epochs):
    acc_train, loss_train = train(train_dl, model_opt1, optimizer1)
    acc_valid, loss_valid = evaluate(valid_dl, model_opt1)
    print(f'Epoch {epoch} accuracy: {acc_train:.4f} val_accuracy: {acc_valid:.4f}')

for epoch in range(num_epochs):
    acc_train, loss_train = train(train_dl, model_opt2, optimizer2)
    acc_valid, loss_valid = evaluate(valid_dl, model_opt2)
    print(f'Epoch {epoch} accuracy: {acc_train:.4f} val_accuracy: {acc_valid:.4f}')

for epoch in range(num_epochs):
    acc_train, loss_train = train(train_dl, model_opt3, optimizer3)
    acc_valid, loss_valid = evaluate(valid_dl, model_opt3)
    print(f'Epoch {epoch} accuracy: {acc_train:.4f} val_accuracy: {acc_valid:.4f}')

Epoch 0 accuracy: 0.6058 val_accuracy: 0.8520
Epoch 1 accuracy: 0.7287 val_accuracy: 0.8762
Epoch 2 accuracy: 0.7582 val_accuracy: 0.8842
Epoch 3 accuracy: 0.7685 val_accuracy: 0.8816
Epoch 4 accuracy: 0.7782 val_accuracy: 0.8830
Epoch 5 accuracy: 0.7833 val_accuracy: 0.8658
Epoch 6 accuracy: 0.7857 val_accuracy: 0.8592
Epoch 7 accuracy: 0.7922 val_accuracy: 0.8716
Epoch 8 accuracy: 0.7919 val_accuracy: 0.8722
Epoch 9 accuracy: 0.7903 val_accuracy: 0.8732
Epoch 0 accuracy: 0.5759 val_accuracy: 0.7006
Epoch 1 accuracy: 0.6624 val_accuracy: 0.6986
Epoch 2 accuracy: 0.6390 val_accuracy: 0.7928
Epoch 3 accuracy: 0.7059 val_accuracy: 0.8262
Epoch 4 accuracy: 0.7379 val_accuracy: 0.8690
Epoch 5 accuracy: 0.7540 val_accuracy: 0.8802
Epoch 6 accuracy: 0.7639 val_accuracy: 0.8790
Epoch 7 accuracy: 0.7780 val_accuracy: 0.8814
Epoch 8 accuracy: 0.7784 val_accuracy: 0.8774
Epoch 9 accuracy: 0.7852 val_accuracy: 0.8824
Epoch 0 accuracy: 0.5141 val_accuracy: 0.5640
Epoch 1 accuracy: 0.5722 val_accur

In [None]:
acc_test1, _ = evaluate(test_dl, model_opt1)
acc_test2, _ = evaluate(test_dl, model_opt2)
acc_test3, _ = evaluate(test_dl, model_opt3)

In [None]:
print(f'test_accuracy optimized model with lr=0.01: {acc_test1:.4f}')
print(f'test_accuracy optimized model with lr=0.001: {acc_test2:.4f}')
print(f'test_accuracy optimized model with lr=0.0001: {acc_test3:.4f}')

test_accuracy optimized model with lr=0.01: 0.8575
test_accuracy optimized model with lr=0.001: 0.8703
test_accuracy optimized model with lr=0.0001: 0.8146


### Poprawiliśmy dokładność modelu na zbiorze testowym o niecałe 2%