# Exercise 5 (NLP): Very Deep Learning

**Natural language processing (NLP)** is the ability of a computer program to understand human language as it is spoken. It involves a pipeline of steps and by the end of the exercise, we would be able to classify the sentiment of a given review as POSITIVE or NEGATIVE.


Before starting, it is important to understand the need for RNNs and the lecture from Stanford is a must to see before starting the exercise:

https://www.youtube.com/watch?v=iX5V1WpxxkY

When done, let's begin. 

In [1]:
# In this exercise, we will import libraries when needed so that we understand the need for it. 
# However, this is a bad practice and don't get used to it.
import numpy as np

# read data from reviews and labels file.
with open('data/reviews.txt', 'r') as f:
    reviews_ = f.readlines()
with open('data/labels.txt', 'r') as f:
    
    labels = f.readlines()

In [2]:
# One of the most important task is to visualize data before starting with any ML task. 
for i in range(5):
    print(labels[i] + "\t: " + reviews_[i][:100] + "...")

positive
	: bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life...
negative
	: story of a man who has unnatural feelings for a pig . starts out with a opening scene that is a terr...
positive
	: homelessness  or houselessness as george carlin stated  has been an issue for years but never a plan...
negative
	: airport    starts as a brand new luxury    plane is loaded up with valuable paintings  such belongin...
positive
	: brilliant over  acting by lesley ann warren . best dramatic hobo lady i have ever seen  and love sce...




We can see there are a lot of punctuation marks like fullstop(.), comma(,), new line (\n) and so on and we need to remove it. 

Here is a list of all the punctuation marks that needs to be removed 
```
(!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~)
```


## Task 1: Remove all the punctuation marks from the reviews.
Many ways of doing it: Regex, Spacy, import punctuation from string.

In [3]:
# Make everything lower case to make the whole dataset even. 
reviews = ''.join(reviews_).lower()


In [4]:
# complete the function below to remove punctuations and save it in no_punct_text
import re

def text_without_punct(reviews):
    spl_char = '[!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~]+'
    #return re.sub('[^A-Za-z0-9]+','',reviews)
    return re.sub(spl_char,'',reviews).strip()


no_punct_text = text_without_punct(reviews)
reviews_split = no_punct_text.split('\n')
print('labels-',len(labels),'reviews ',len(reviews_split))
reviews_split[0]

('labels-', 25000, 'reviews ', 25000)


'bromwell high is a cartoon comedy  it ran at the same time as some other programs about school life  such as  teachers   my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers   the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students  when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled          at           high  a classic line inspector i  m here to sack one of your teachers  student welcome to bromwell high  i expect that many adults of my age think that bromwell high is far fetched  what a pity that it isn  t   '

In [5]:
# split the formatted no_punct_text into words
def split_in_words(no_punct_text):
    return no_punct_text.split()

words = split_in_words(no_punct_text)

In [6]:
# once you are done print the ten words that should yield the following output
words[:10]

['bromwell', 'high', 'is', 'a', 'cartoon', 'comedy', 'it', 'ran', 'at', 'the']

In [7]:
# print the total length of the words
len(words)

6020196

In [8]:
# Total number of unique words
len(set(words))

74072


Next step is to create a vocabulary. This way every word is mapped to an integer number.
```
Example: 1: hello, 2: I, 3: am, 4: Robo and so on...
```


In [9]:
# Lets create a vocab out of it

# feel free to use this import 
from collections import Counter

## Let's keep a count of all the words and let's see how many words are there. 
def word_count(words):
    return Counter(words)

counts=word_count(words)

In [10]:
# If you did everything correct, this is what you should get as output. 
print (counts['wonderful'])

print (counts['bad'])


1658
9308


## Task 2: Word to Integer and Integer to word
The task is to map every word to an integer value and then vice-versa. 


In [11]:
# define a vocabulary for the words
def vocabulary(counts):
    vocab = []
    for c in counts:
        vocab.append(c)
    return vocab

vocab = vocabulary(counts)
print(len(vocab))
vocab[1]

74072


'tsukino'

In [12]:
# map each vocab word to an integer. Also, start the indexing with 1 as we will use 
# '0' for padding and we dont want to mix the two.
def vocabulary_to_integer(vocab):
    vocab_to_int = {}
    for i in range(len(vocab)):
        vocab_to_int[vocab[i]] = i
    return vocab_to_int

vocab_to_int = vocabulary_to_integer(vocab)

In [13]:
# verify if the length is same and if 'and' is mapped to the correct integer value.
print(len(vocab_to_int))
vocab_to_int['tsukino']

74072


1

Let's see what positve words in positive reviews we have and what we have in negative reviews. 

In [14]:
positive_counts = Counter()
negative_counts = Counter()

In [15]:
# loop over each sentence
for i in range(len(reviews_)):
    # if the sentence has positive review, all the words contained in the sentence contribute +1 to positive_counts
    if(labels[i] == 'positive\n'):
        for word in reviews_[i].split(" "):
            positive_counts[word] += 1
    # if the sentence has negative review, all the words contained in the sentence contribute +1 to negative_counts
    else:
        for word in reviews_[i].split(" "):
            negative_counts[word] += 1

In [16]:
labels

['positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive

In [17]:
positive_counts.most_common()

[('', 537968),
 ('the', 173324),
 ('.', 159654),
 ('and', 89722),
 ('a', 83688),
 ('of', 76855),
 ('to', 66746),
 ('is', 57245),
 ('in', 50215),
 ('br', 49235),
 ('it', 48025),
 ('i', 40743),
 ('that', 35630),
 ('this', 35080),
 ('s', 33815),
 ('as', 26308),
 ('with', 23247),
 ('for', 22416),
 ('was', 21917),
 ('film', 20937),
 ('but', 20822),
 ('movie', 19074),
 ('his', 17227),
 ('on', 17008),
 ('you', 16681),
 ('he', 16282),
 ('are', 14807),
 ('not', 14272),
 ('t', 13720),
 ('one', 13655),
 ('have', 12587),
 ('\n', 12500),
 ('be', 12416),
 ('by', 11997),
 ('all', 11942),
 ('who', 11464),
 ('an', 11294),
 ('at', 11234),
 ('from', 10767),
 ('her', 10474),
 ('they', 9895),
 ('has', 9186),
 ('so', 9154),
 ('like', 9038),
 ('about', 8313),
 ('very', 8305),
 ('out', 8134),
 ('there', 8057),
 ('she', 7779),
 ('what', 7737),
 ('or', 7732),
 ('good', 7720),
 ('more', 7521),
 ('when', 7456),
 ('some', 7441),
 ('if', 7285),
 ('just', 7152),
 ('can', 7001),
 ('story', 6780),
 ('time', 6515),
 ('

In [18]:
negative_counts.most_common()

[('', 548962),
 ('.', 167538),
 ('the', 163389),
 ('a', 79321),
 ('and', 74385),
 ('of', 69009),
 ('to', 68974),
 ('br', 52637),
 ('is', 50083),
 ('it', 48327),
 ('i', 46880),
 ('in', 43753),
 ('this', 40920),
 ('that', 37615),
 ('s', 31546),
 ('was', 26291),
 ('movie', 24965),
 ('for', 21927),
 ('but', 21781),
 ('with', 20878),
 ('as', 20625),
 ('t', 20361),
 ('film', 19218),
 ('you', 17549),
 ('on', 17192),
 ('not', 16354),
 ('have', 15144),
 ('are', 14623),
 ('be', 14541),
 ('he', 13856),
 ('one', 13134),
 ('they', 13011),
 ('\n', 12500),
 ('at', 12279),
 ('his', 12147),
 ('all', 12036),
 ('so', 11463),
 ('like', 11238),
 ('there', 10775),
 ('just', 10619),
 ('by', 10549),
 ('or', 10272),
 ('an', 10266),
 ('who', 9969),
 ('from', 9731),
 ('if', 9518),
 ('about', 9061),
 ('out', 8979),
 ('what', 8422),
 ('some', 8306),
 ('no', 8143),
 ('her', 7947),
 ('even', 7687),
 ('can', 7653),
 ('has', 7604),
 ('good', 7423),
 ('bad', 7401),
 ('would', 7036),
 ('up', 6970),
 ('only', 6781),
 ('m

The above is just to show the most common words in the positive and negative sentences. However, there are a lot of unnecessary words like `the`, `a`, `was`, and so on. Can you find a way to show the relevant words and not these words? 

```
Hint: Stop Words removal or normalizing each term.
```

In [19]:
words[:30]

['bromwell',
 'high',
 'is',
 'a',
 'cartoon',
 'comedy',
 'it',
 'ran',
 'at',
 'the',
 'same',
 'time',
 'as',
 'some',
 'other',
 'programs',
 'about',
 'school',
 'life',
 'such',
 'as',
 'teachers',
 'my',
 'years',
 'in',
 'the',
 'teaching',
 'profession',
 'lead',
 'me']

In [20]:
[vocab_to_int[word] for word in words[:30]]

[43732,
 59198,
 28537,
 62650,
 52828,
 3699,
 28540,
 69932,
 20610,
 23859,
 29728,
 56652,
 20607,
 63343,
 36918,
 9940,
 50681,
 9415,
 19130,
 56915,
 20607,
 33161,
 32588,
 1442,
 62078,
 23859,
 64734,
 5599,
 73870,
 32565]

In [21]:
vocab_to_int['bromwell']

43732

## One hot encoding

We need one hot encoding for the labels. Think of a reason why we need one hot encoded labels for classes?

## Task 3: Create one hot encoding for the labels. 

* Write the one hot encoding logic in the `one_hot` function.
* Use 1 for positive label and 0 for negative label.
* Save all the values in the `encoded_labels` function.

In [22]:
# 1 for positive label and 0 for negative label
#from sklearn.preprocessing import LabelEncoder
def one_hot(labels):
    encoded_labels = np.asarray(labels)    
    condlist = [encoded_labels=='positive\n', encoded_labels=='negative\n']
    choicelist = [1, 0]
    one_hot = np.select(condlist, choicelist)
    return one_hot
    
 
encoded_labels = one_hot(labels)
encoded_labels


array([1, 0, 1, ..., 0, 1, 0])

In [23]:
#print the length of your label and uncomment next line only if the encoded_labels size is 25001.
# If you dont get the intuition behind this step, print encoded_labels to see it.
#encoded_labels = encoded_labels[:25000]

In [24]:
len(encoded_labels)

25000

In [25]:
# reviews_ints: list like reviews_split but containing corresponding integer instead of word. contains 25000 reviews
reviews_ints = []
for review in reviews_split:
    reviews_ints.append([vocab_to_int[word] for word in review.split()])


In [26]:
# This step is to see if any review is empty and we remove it. Otherwise the input will be all zeroes.
# review_lens: how many similar length reviews occur and length of reviews
review_lens = Counter([len(x) for x in reviews_ints])
print("Zero-length reviews: {}".format(review_lens[0]))
print("Maximum review length: {}".format(max(review_lens)))
review_lens[2514]

Zero-length reviews: 0
Maximum review length: 2514


1

In [27]:
print('Number of reviews before removing outliers: ', len(reviews_ints))

## remove any reviews/labels with zero length from the reviews_ints list.

# get indices of any reviews with length 0
non_zero_idx = [ii for ii, review in enumerate(reviews_ints) if len(review) != 0]

# remove 0-length reviews and their labels
reviews_ints = [reviews_ints[ii] for ii in non_zero_idx]
encoded_labels = np.array([encoded_labels[ii] for ii in non_zero_idx])

print('Number of reviews after removing outliers: ', len(reviews_ints))

('Number of reviews before removing outliers: ', 25000)
('Number of reviews after removing outliers: ', 25000)


In [28]:
len(encoded_labels)


25000

## Task 4: Padding the data

> Define a function that returns an array `features` that contains the padded data, of a standard size, that we'll pass to the network. 
* The data should come from `review_ints`, since we want to feed integers to the network. 
* Each row should be `seq_length` elements long. 
* For reviews shorter than `seq_length` words, **left pad** with 0s. That is, if the review is `['best', 'movie', 'ever']`, `[117, 18, 128]` as integers, the row will look like `[0, 0, 0, ..., 0, 117, 18, 128]`. 
* For reviews longer than `seq_length`, use only the first `seq_length` words as the feature vector.

As a small example, if the `seq_length=10` and an input review is: 
```
[117, 18, 128]
```
The resultant, padded sequence should be: 

```
[0, 0, 0, 0, 0, 0, 0, 117, 18, 128]
```

**Your final `features` array should be a 2D array, with as many rows as there are reviews, and as many columns as the specified `seq_length`.**

In [29]:
# Write the logic for padding the data

def pad_features(reviews_ints, seq_length):
    padded = []
    for review in reviews_ints:
        
        if (len(review) >= seq_length):
            review = review[:seq_length]
        else:
            review = [0 for _ in range(seq_length-len(review))]+review[:]
        padded.append(review)
    
    return np.asarray(padded)

In [30]:
# Verify if everything till now is correct. 

seq_length = 200

features = pad_features(reviews_ints, seq_length=seq_length)

## test statements - do not change - ##
assert len(features)==len(reviews_ints), "Your features should have as many rows as reviews."
assert len(features[0])==seq_length, "Each feature row should contain seq_length values."

# print first 10 values of the first 30 batches 
print(features[:30,:10])

[[    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [54757 34566 45716 20607 71033 43935 20874   886 33120 20604]
 [30208 65137 20607 62650 69307 27812 42136 58102 28537 46955]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [14822 27396 42637 47823 32588  3213 32848 32565 56777 39300]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [23859 55573 46491 21663 62650 27812 73293 28753 62078 59825]
 [62078 59205 21153 51903 55651 37505 45643 34541 42661 34367]
 [    0     0     0     0     0     0     0     0     0

In [31]:
import random

a = [[1, 2, 3, 4], [5, 6], [7, 8, 9]]
# random.seed(101)
random.shuffle(a)
print(a)

[[5, 6], [7, 8, 9], [1, 2, 3, 4]]


Now we have everything ready. It's time to split our dataset into `Train`, `Test` and `Validate`. 

Read more about the train-test-split here : https://cs230-stanford.github.io/train-dev-test-split.html

## Task 5: Lets create train, test and val split in the ratio of 8:1:1.  

Hint: Either use shuffle and slicing in Python or use train-test-val split in Sklearn. 

In [32]:


import random

train_frac = 0.8
val_frac = 0.1
test_frac = 0.1


def train_test_val_split(features):
    random.seed(101)
    random.shuffle(features)
    split_1 = int(0.8 * len(features))
    split_2 = int(0.9 * len(features))
    train_x = features[:split_1]
    val_x = features[split_1:split_2]
    test_x = features[split_2:]
    return train_x, val_x, test_x

def train_test_val_labels(encoded_labels):
    random.seed(102)
    random.shuffle(features)
    split_1 = int(0.8 * len(encoded_labels))
    split_2 = int(0.9 * len(encoded_labels))
    train_y = encoded_labels[:split_1]
    val_y = encoded_labels[split_1:split_2]
    test_y = encoded_labels[split_2:]
    return train_y, val_y, test_y

train_x, val_x, test_x = train_test_val_split(features)
train_y, val_y, test_y = train_test_val_labels(encoded_labels)

In [33]:
## print out the shapes of your resultant feature data
print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}".format(train_x.shape), 
      "\nValidation set: \t{}".format(val_x.shape),
      "\nTest set: \t\t{}".format(test_x.shape))

			Feature Shapes:
('Train set: \t\t(20000, 200)', '\nValidation set: \t(2500, 200)', '\nTest set: \t\t(2500, 200)')


## DataLoaders and Batching

After creating training, test, and validation data, we can create DataLoaders for this data by following two steps:
1. Create a known format for accessing our data, using [TensorDataset](https://pytorch.org/docs/stable/data.html#) which takes in an input set of data and a target set of data with the same first dimension, and creates a dataset.
2. Create DataLoaders and batch our training, validation, and test Tensor datasets.

```
train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y))
train_loader = DataLoader(train_data, batch_size=batch_size)
```

This is an alternative to creating a generator function for batching our data into full batches.

### Task 6: Create a generator function for the dataset. 
See the above link for more info.

In [34]:
import torch
from torch.utils.data import TensorDataset, DataLoader

# create Tensor datasets for train, test and val
train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y))
valid_data = TensorDataset(torch.from_numpy(val_x), torch.from_numpy(val_y))
test_data = TensorDataset(torch.from_numpy(test_x), torch.from_numpy(test_y))

# dataloaders
batch_size = 50 

# make sure to SHUFFLE your training data. Keep Shuffle=True.
train_loader = DataLoader(train_data, batch_size=batch_size,shuffle=True)
valid_loader = DataLoader(valid_data, batch_size=batch_size)
test_loader = DataLoader(test_data, batch_size=batch_size)

In [35]:
# obtain one batch of training data and label. 
dataiter = iter(train_loader)
sample_x, sample_y = dataiter.next()

print('Sample input size: ', sample_x.size()) # batch_size, seq_length
print('Sample input: \n', sample_x)
print()
print('Sample label size: ', sample_y.size()) # batch_size
print('Sample label: \n', sample_y)

('Sample input size: ', torch.Size([50, 200]))
('Sample input: \n', tensor([[16556, 34549, 68646,  ..., 57194, 28537, 61723],
        [72284,  6998,  1856,  ..., 15938,  1856,  6083],
        [73948, 67567,  6778,  ..., 31063,  1595, 20048],
        ...,
        [    0,     0,     0,  ...,   489, 34549, 32454],
        [    0,     0,     0,  ..., 34027, 32588, 57672],
        [    0,     0,     0,  ...,  6902, 34027, 36345]]))
()
('Sample label size: ', torch.Size([50]))
('Sample label: \n', tensor([0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0,
        0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1,
        0, 0]))


In [36]:
# Check if GPU is available.
train_on_gpu=torch.cuda.is_available()

if(train_on_gpu):
    print('Training on GPU.')
else:
    print('No GPU available, training on CPU.')

Training on GPU.


## Creating the Model 

Here we are creating a simple RNN in PyTorch and pass the output to the a Linear layer and Sigmoid at the end to get the probability score and prediction as POSITIVE or NEGATIVE. 

The network is very similar to the CNN network created in Exercise 2. 

More info available at: https://pytorch.org/docs/0.3.1/nn.html?highlight=rnn#torch.nn.RNN

Read about the parameters that the RNN takes and see what will happen when `batch_first` is set as `True`.

In [37]:
import torch.nn as nn

class SentimentRNN(nn.Module):
    """
    The RNN model that will be used to perform Sentiment analysis.
    """

    def __init__(self, vocab_size, output_size, hidden_dim, n_layers, drop_prob=0.5):
        """
        Initialize the model by setting up the layers.
        """
        super(SentimentRNN, self).__init__()

        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        
        # RNN layer
        self.rnn = nn.RNN(vocab_size, hidden_dim, n_layers, 
                            dropout=drop_prob, batch_first=True)
        
        # linear and sigmoid layers
        self.fc = nn.Linear(hidden_dim, output_size)
        self.sig = nn.Sigmoid()
        

    def forward(self, x, hidden):
        """
        Perform a forward pass of our model on some input and hidden state.
        """
        batch_size = x.size(0)

        # RNN out layer
        print ('X-',x.shape, 'hidden',hidden.shape)
        rnn_out, hidden = self.rnn(x, hidden)
        print ('out-',rnn_out, 'hidden',hidden)
    
        # stack up lstm outputs
        rnn_out = rnn_out.view(-1, self.hidden_dim)
        
        # dropout and fully-connected layer
        out = self.dropout(rnn_out)
        out = self.fc(out)
        # sigmoid function
        sig_out = self.sig(out)
        
        # reshape to be batch_size first
        sig_out = sig_out.view(batch_size, -1)
        sig_out = sig_out[:, -1] # get last batch of labels
        
        # return last sigmoid output and hidden state
        return sig_out, hidden
    
    def init_hidden(self, batch_size):
        ''' Initializes hidden state '''
        # Create two new tensors with sizes n_layers x batch_size x hidden_dim,
        # initialized to zero, for hidden state and cell state of LSTM
        weight = next(self.parameters()).data
        
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
        
        return hidden

    


## Task 7 : Know the shape

Given a batch of 64 and input size as 1 and a sequence length of 200 to a RNN with 2 stacked layers and 512 hidden layers, find the shape of input data (x) and the hidden dimension (hidden) specified in the forward pass of the network. Note, the batch_first is kept to be True. 



In [38]:
# Instantiate the model w/ hyperparams
vocab_size = len(vocab_to_int)+1 # +1 for the 0 padding + our word tokens
output_size = 1
hidden_dim = 256
n_layers = 2

#input shape = (64,200,vocab_size)
#hidden shape = (2*1,64,256)

net = SentimentRNN(vocab_size, output_size, hidden_dim, n_layers)

print(net)

SentimentRNN(
  (rnn): RNN(74073, 256, num_layers=2, batch_first=True, dropout=0.5)
  (fc): Linear(in_features=256, out_features=1, bias=True)
  (sig): Sigmoid()
)



## Task 8: LSTM 

Before we start creating the LSTM, it is important to understand LSTM and to know why we prefer LSTM over a Vanilla RNN for this task. 
> Here are some good links to know about LSTM:
* [Colah Blog](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
* [Understanding LSTM](http://blog.echen.me/2017/05/30/exploring-lstms/)
* [RNN effectiveness](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)


Now create a class named SentimentLSTM with `n_layers=2`, and rest all hyperparameters same as before. Also, create an embedding layer and feed the output of the embedding layer as input to the LSTM model. Dont forget to add a regularizer (dropout) layer after the LSTM layer with p=0.4 to prevent overfitting. 

In [86]:
import torch.nn as nn

class SentimentLSTM(nn.Module):
    """
    The LSTM model that will be used to perform Sentiment analysis.
    """

    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, drop_prob=0.5):
        """
        Initialize the model by setting up the layers.
        """
        super(SentimentLSTM, self).__init__()

        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        
        # define embedding, LSTM, dropout and Linear layers here
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
       
        self.lstm= nn.LSTM(embedding_dim,hidden_dim,num_layers=self.n_layers)
       
        self.fc = nn.Linear(in_features=hidden_dim, out_features=output_size)
        
        self.dropout = nn.Dropout(p=drop_prob)
        # define embedding, LSTM, dropout and Linear layers here
        self.sig = nn.Sigmoid()

    def forward(self, input, hidden):
        """
        Perform a forward pass of our model on some input and hidden state.
        """
        # input = B x S . size(0) = B
        batch_size = input.size(0)
        seq_len = input.size(1)

        # input:  B x S  -- (transpose) --> S x B
        input = input.t()
        
        # Embedding Seq X Batch (200, 50) ---->  Seq X Batch x E (200, 50, 300) (embedding size)
        #print("  input", input.size())
        embedded = self.embedding(input)
        #print("  embedding", embedded.size()) # ('  embedding', torch.Size([200, 50, 300]))
        
       
        lstm_out, hidden = self.lstm(embedded, hidden)
        #print('lstm output',lstm_out.size()) # ('lstm output', torch.Size([200, 50, 256])))
        



    
        # stack up lstm outputs
        #out = lstm_out.view(-1, self.hidden_dim)
        
        # take the batch_size and features [ 50x256],drop the first dimension 
        out = lstm_out[-1]
        #print('shaped output',out.size()) #('shaped output', torch.Size([50, 256]))
        
        # dropout and fully-connected layer
        out = self.dropout(out)
        fc_output = self.fc(out)
        #print('fc_output',fc_output.size()) #('fc_output', torch.Size([50, 1]))

        # sigmoid function
        sig_out = self.sig(fc_output)
        sig_out = sig_out[:, -1] 
        #print('sig_out',sig_out.size(),sig_out) 
       
        # reshape to be batch_size first
#         sig_out = sig_out.view(batch_size, -1)
        
        # get last batch of labels
        #print('reshaped sig_out',sig_out.size(),sig_out)
        # return last sigmoid output and hidden state
        return sig_out, hidden
      
    
    def init_hidden(self, batch_size):
        ''' Initializes hidden state '''
        # Create two new tensors with sizes n_layers x batch_size x hidden_dim,
        # initialized to zero, for hidden state and cell state of LSTM
        weight = next(self.parameters()).data
        
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
        
        return hidden
        

## Instantiate the network

Here, we'll instantiate the network. First up, defining the hyperparameters.

* `vocab_size`: Size of our vocabulary or the range of values for our input, word tokens.
* `output_size`: Size of our desired output; the number of class scores we want to output (pos/neg).
* `embedding_dim`: Number of columns in the embedding lookup table; size of our embeddings.
* `hidden_dim`: Number of units in the hidden layers of our LSTM cells. Usually larger is better performance wise. Common values are 128, 256, 512, etc.
* `n_layers`: Number of LSTM layers in the network. Typically between 1-3

In [99]:
# Instantiate the model with these hyperparameters
vocab_size = len(vocab_to_int)+1 # +1 for the 0 padding + our word tokens
output_size = 1
embedding_dim = 300
hidden_dim = 256
n_layers = 2

net = SentimentLSTM(vocab_size, output_size, embedding_dim, hidden_dim, n_layers)

print(net)

SentimentLSTM(
  (embedding): Embedding(74073, 300)
  (lstm): LSTM(300, 256, num_layers=2)
  (fc): Linear(in_features=256, out_features=1, bias=True)
  (dropout): Dropout(p=0.5)
  (sig): Sigmoid()
)


In [100]:
# loss and optimization functions
lr=0.001

criterion = nn.BCELoss()
optimizer = torch.optim.Adam(net.parameters(), lr=lr)


### Task 9: Loss Functions
We are using `BCELoss (Binary Cross Entropy Loss)` since we have two output classes. 

Can Cross Entropy Loss be used instead of BCELoss? 

If no, why not? If yes, how?

Is `NLLLoss()` and last layer as `LogSoftmax()` is same as using `CrossEntropyLoss()` with a Softmax final layer? Can you get the mathematical intuition behind it?

In [101]:
#Training and Validation

epochs = 4 # 3-4 is approx where I noticed the validation loss stop decreasing

counter = 0
print_every = 100
clip=5 # gradient clipping

# move model to GPU, if available
if(train_on_gpu):
    net.cuda()

net.train()
# train for some number of epochs
for e in range(epochs):
    # initialize hidden state
    h = net.init_hidden(batch_size)

    # batch loop
    for inputs, labels in train_loader:
        counter += 1

        if(train_on_gpu):
            inputs, labels = inputs.cuda(), labels.cuda()

        # Creating new variables for the hidden state, otherwise
        # we'd backprop through the entire training history
        h = tuple([each.data for each in h])

        # zero accumulated gradients
        net.zero_grad()
        #print (inputs.shape)
        # get the output from the model
        output, h = net(inputs, h)

        # calculate the loss and perform backprop
        print('output.squeeze()',output.squeeze())
        print('labels.float()',labels.float())
        loss = criterion(output.squeeze(), labels.float())
        loss.backward()
        # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
        nn.utils.clip_grad_norm_(net.parameters(), clip)
        optimizer.step()
        #break
        # loss stats
        if counter % print_every == 0:
            # Get validation loss
            val_h = net.init_hidden(batch_size)
            val_losses = []
            net.eval()
            for inputs, labels in valid_loader:

                # Creating new variables for the hidden state, otherwise
                # we'd backprop through the entire training history
                val_h = tuple([each.data for each in val_h])

                if(train_on_gpu):
                    inputs, labels = inputs.cuda(), labels.cuda()

                output, val_h = net(inputs, val_h)
                val_loss = criterion(output.squeeze(), labels.float())

                val_losses.append(val_loss.item())

            net.train()
            print("Epoch: {}/{}...".format(e+1, epochs),
                  "Step: {}...".format(counter),
                  "Loss: {:.6f}...".format(loss.item()),
                  "Val Loss: {:.6f}".format(np.mean(val_losses)))
    #break

('output.squeeze()', tensor([0.4900, 0.4929, 0.4972, 0.5131, 0.4909, 0.4930, 0.4906, 0.5071, 0.4924,
        0.4965, 0.4993, 0.4839, 0.4939, 0.4860, 0.4846, 0.5003, 0.4883, 0.5145,
        0.4983, 0.5011, 0.4871, 0.4930, 0.5001, 0.4954, 0.4881, 0.4993, 0.4919,
        0.4903, 0.4907, 0.5020, 0.4909, 0.5058, 0.5028, 0.4996, 0.4842, 0.4916,
        0.4936, 0.4946, 0.4947, 0.4897, 0.4891, 0.4919, 0.4807, 0.4947, 0.5024,
        0.4873, 0.4902, 0.4892, 0.4874, 0.4820],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 0., 1., 1., 0., 0., 0., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0.,
        1., 0., 1., 1., 0., 1., 1., 0., 0., 0., 0., 1., 0., 0., 1., 1., 0., 0.,
        1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4855, 0.4809, 0.4748, 0.4839, 0.4614, 0.4686, 0.4798, 0.4920, 0.4960,
        0.4783, 0.4816, 0.4839, 0.4758, 0.4801, 0.4700, 0.5024, 0.4802, 0.4836,
        0.4900, 0.4844, 0.4

('output.squeeze()', tensor([0.4979, 0.5020, 0.5177, 0.4935, 0.4782, 0.4891, 0.4921, 0.4838, 0.4733,
        0.4995, 0.4923, 0.5055, 0.5071, 0.4924, 0.4794, 0.4770, 0.5262, 0.5215,
        0.4192, 0.5076, 0.4372, 0.4970, 0.4722, 0.5037, 0.5101, 0.4972, 0.4851,
        0.4893, 0.4957, 0.4675, 0.4955, 0.4922, 0.4937, 0.4847, 0.5047, 0.4942,
        0.5067, 0.4908, 0.4795, 0.4975, 0.4619, 0.4710, 0.4912, 0.4933, 0.4773,
        0.5081, 0.4874, 0.4713, 0.4960, 0.5343],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 1., 1., 0., 1., 0., 1., 1., 0., 1., 0., 0., 1., 1., 0., 0., 0.,
        1., 1., 0., 1., 1., 0., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0.,
        0., 0., 1., 0., 0., 1., 1., 1., 0., 0., 0., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5292, 0.5015, 0.5041, 0.4684, 0.5020, 0.4874, 0.4823, 0.4760, 0.5138,
        0.4693, 0.5077, 0.4915, 0.5059, 0.4757, 0.4655, 0.5056, 0.4859, 0.4816,
        0.4984, 0.4853, 0.5

('output.squeeze()', tensor([0.4646, 0.5400, 0.5196, 0.5161, 0.5152, 0.5082, 0.4974, 0.5101, 0.5012,
        0.4772, 0.5010, 0.5135, 0.5122, 0.5245, 0.5245, 0.5218, 0.5017, 0.5308,
        0.5113, 0.4902, 0.5165, 0.5077, 0.5037, 0.5167, 0.5135, 0.4984, 0.5135,
        0.4848, 0.4617, 0.5070, 0.5154, 0.5069, 0.4728, 0.4876, 0.5092, 0.4883,
        0.5108, 0.4653, 0.5163, 0.5047, 0.5048, 0.4888, 0.4867, 0.4935, 0.4968,
        0.4823, 0.4945, 0.4967, 0.5088, 0.4943],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 1., 0., 1., 1., 1., 0., 0., 0., 1., 1., 0., 1., 1., 0., 1., 1.,
        0., 0., 1., 1., 0., 1., 0., 0., 1., 0., 1., 0., 1., 0., 0., 1., 0., 1.,
        1., 0., 1., 1., 0., 0., 1., 0., 1., 0., 1., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4580, 0.5120, 0.4962, 0.5247, 0.5080, 0.5263, 0.5004, 0.5158, 0.4933,
        0.5224, 0.4277, 0.5045, 0.5195, 0.5042, 0.5055, 0.4905, 0.5297, 0.5108,
        0.5092, 0.4628, 0.5

('output.squeeze()', tensor([0.5149, 0.4709, 0.5030, 0.4596, 0.5034, 0.5057, 0.4987, 0.4782, 0.4887,
        0.4919, 0.5128, 0.4917, 0.5240, 0.5014, 0.4839, 0.5316, 0.4762, 0.5003,
        0.4740, 0.5086, 0.4710, 0.5229, 0.5264, 0.4910, 0.5209, 0.4806, 0.4954,
        0.4656, 0.5291, 0.5107, 0.4808, 0.4760, 0.5122, 0.4736, 0.5087, 0.4614,
        0.5055, 0.4921, 0.4928, 0.4883, 0.4594, 0.4872, 0.4929, 0.4939, 0.4813,
        0.5122, 0.5176, 0.4697, 0.4747, 0.5098],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 0., 1., 1., 1., 1., 0., 1., 0., 1., 1., 0., 1., 1., 0., 1., 0.,
        1., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 0., 0., 0., 0., 1.,
        0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 0., 1., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4870, 0.5243, 0.4560, 0.5055, 0.4791, 0.5037, 0.4922, 0.5116, 0.5038,
        0.4994, 0.5198, 0.4973, 0.5183, 0.4663, 0.4943, 0.4644, 0.5255, 0.5054,
        0.4944, 0.4857, 0.4

('output.squeeze()', tensor([0.5132, 0.5330, 0.4927, 0.5252, 0.4926, 0.5227, 0.4831, 0.5056, 0.4895,
        0.4961, 0.5014, 0.4908, 0.4964, 0.5289, 0.4692, 0.4921, 0.4811, 0.4817,
        0.5087, 0.5010, 0.5091, 0.5142, 0.5147, 0.4621, 0.4817, 0.5314, 0.4611,
        0.4545, 0.5198, 0.5146, 0.4808, 0.4842, 0.4613, 0.4994, 0.5045, 0.5056,
        0.5169, 0.4885, 0.4798, 0.5119, 0.4682, 0.5406, 0.4863, 0.4719, 0.5208,
        0.5204, 0.4826, 0.5484, 0.4707, 0.4737],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 1., 1., 0., 1., 1., 0., 0., 1., 1., 1., 0., 1., 0., 1., 1., 1.,
        1., 1., 1., 0., 1., 0., 1., 1., 0., 1., 1., 1., 1., 0., 0., 0., 1., 1.,
        1., 0., 0., 1., 1., 0., 1., 0., 1., 0., 1., 0., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4504, 0.5303, 0.4962, 0.5362, 0.5675, 0.5675, 0.5110, 0.4855, 0.4907,
        0.4873, 0.5336, 0.5356, 0.5105, 0.4621, 0.5020, 0.4975, 0.4496, 0.5337,
        0.4953, 0.4753, 0.4

('output.squeeze()', tensor([0.4686, 0.5045, 0.4795, 0.4623, 0.5264, 0.4790, 0.5179, 0.4566, 0.5077,
        0.4805, 0.4609, 0.4644, 0.5020, 0.5028, 0.5164, 0.5123, 0.5171, 0.4788,
        0.5129, 0.4961, 0.5191, 0.5237, 0.5216, 0.4851, 0.4813, 0.4631, 0.4856,
        0.5054, 0.4976, 0.5117, 0.4991, 0.4915, 0.5066, 0.5003, 0.5000, 0.4648,
        0.4692, 0.4890, 0.4952, 0.4857, 0.4944, 0.5163, 0.4747, 0.4981, 0.4492,
        0.4816, 0.5018, 0.5049, 0.4948, 0.4814],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 1., 0., 0., 1., 1., 1., 1., 0., 0., 1., 0., 0., 0., 0., 1., 0.,
        0., 1., 0., 1., 1., 0., 1., 0., 0., 1., 0., 1., 0., 1., 0., 0., 1., 1.,
        1., 0., 1., 0., 0., 0., 1., 0., 0., 1., 0., 1., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4980, 0.5100, 0.5044, 0.4988, 0.4765, 0.5015, 0.5062, 0.4649, 0.4950,
        0.5062, 0.4967, 0.4916, 0.5129, 0.4952, 0.4757, 0.4892, 0.4706, 0.5034,
        0.4893, 0.5079, 0.4

('output.squeeze()', tensor([0.5007, 0.5449, 0.4837, 0.4793, 0.4882, 0.4846, 0.4022, 0.5399, 0.4866,
        0.5118, 0.5031, 0.5246, 0.5106, 0.5368, 0.5043, 0.4925, 0.5287, 0.4855,
        0.5116, 0.4994, 0.5122, 0.5269, 0.4836, 0.4777, 0.4868, 0.5098, 0.5039,
        0.4909, 0.5244, 0.5248, 0.4249, 0.4970, 0.4912, 0.4805, 0.4801, 0.5141,
        0.4950, 0.4942, 0.5102, 0.5058, 0.5325, 0.5268, 0.5290, 0.5198, 0.5121,
        0.4798, 0.5182, 0.4886, 0.4835, 0.5064],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 0., 1., 0., 0., 1., 1., 1., 0., 1., 1., 0., 0., 0., 1., 1.,
        0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 0., 1., 1., 0., 0., 1., 0., 0.,
        1., 0., 0., 1., 1., 1., 1., 0., 0., 1., 1., 1., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4969, 0.4736, 0.4924, 0.5336, 0.5173, 0.5020, 0.4944, 0.5002, 0.5035,
        0.4747, 0.5046, 0.5057, 0.4954, 0.4738, 0.5075, 0.5326, 0.5127, 0.4926,
        0.5140, 0.4872, 0.5

('Epoch: 1/4...', 'Step: 100...', 'Loss: 0.709408...', 'Val Loss: 0.694542')
('output.squeeze()', tensor([0.5111, 0.4834, 0.4997, 0.5073, 0.5257, 0.5252, 0.5092, 0.5394, 0.5157,
        0.5441, 0.5157, 0.5294, 0.4966, 0.5125, 0.5134, 0.4945, 0.5608, 0.5077,
        0.5037, 0.5068, 0.4988, 0.5146, 0.5346, 0.5223, 0.4870, 0.5428, 0.5417,
        0.5357, 0.5154, 0.5114, 0.5197, 0.5148, 0.5006, 0.5520, 0.5161, 0.5318,
        0.5343, 0.5310, 0.4536, 0.5195, 0.5066, 0.5088, 0.5378, 0.4916, 0.5568,
        0.5109, 0.4868, 0.5023, 0.5196, 0.5278],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 0., 1.,
        1., 0., 0., 1., 0., 1., 1., 1., 0., 1., 0., 0., 1., 1., 1., 1., 0., 1.,
        0., 1., 0., 1., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4920, 0.5021, 0.5407, 0.5368, 0.5145, 0.5409, 0.5159, 0.4968, 0.5404,
        0.5086, 0.5144, 0.5217

('output.squeeze()', tensor([0.4839, 0.4799, 0.5263, 0.5042, 0.4877, 0.5222, 0.5191, 0.4786, 0.4913,
        0.5032, 0.5042, 0.5005, 0.4586, 0.4934, 0.4934, 0.5131, 0.4506, 0.4824,
        0.4962, 0.5136, 0.4760, 0.4988, 0.4944, 0.4665, 0.5144, 0.5097, 0.4735,
        0.4765, 0.4947, 0.5118, 0.4758, 0.5009, 0.4899, 0.5053, 0.4885, 0.5267,
        0.4927, 0.4909, 0.4880, 0.5004, 0.4992, 0.4879, 0.5044, 0.5064, 0.4759,
        0.4597, 0.4960, 0.4952, 0.4734, 0.4919],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0., 1., 0., 1., 1., 0., 0., 1.,
        0., 1., 1., 0., 0., 0., 1., 0., 1., 1., 0., 1., 1., 1., 1., 0., 1., 1.,
        0., 1., 1., 0., 1., 1., 0., 0., 1., 0., 1., 0., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4145, 0.4929, 0.4830, 0.4935, 0.5001, 0.5112, 0.4973, 0.5386, 0.5306,
        0.4675, 0.4892, 0.5071, 0.4756, 0.5047, 0.4702, 0.5090, 0.4832, 0.5114,
        0.5199, 0.5027, 0.4

('output.squeeze()', tensor([0.5029, 0.4927, 0.4629, 0.4883, 0.4903, 0.4999, 0.5188, 0.4921, 0.4743,
        0.5164, 0.4964, 0.4911, 0.4932, 0.5011, 0.5076, 0.4903, 0.4919, 0.4987,
        0.5094, 0.5008, 0.4937, 0.5373, 0.5052, 0.4949, 0.5004, 0.4983, 0.4860,
        0.4964, 0.4948, 0.5169, 0.5193, 0.4949, 0.5225, 0.5031, 0.4778, 0.4862,
        0.4688, 0.5084, 0.4905, 0.4831, 0.5062, 0.5050, 0.5251, 0.5012, 0.5017,
        0.4917, 0.4940, 0.5003, 0.4896, 0.4962],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 1., 1., 1., 0., 1., 0., 1.,
        1., 0., 0., 0., 1., 1., 1., 1., 0., 0., 1., 1., 0., 1., 1., 0., 0., 1.,
        0., 1., 1., 0., 0., 0., 1., 0., 1., 0., 0., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4927, 0.5155, 0.5334, 0.4921, 0.4873, 0.4807, 0.4921, 0.4963, 0.5048,
        0.5029, 0.5100, 0.5037, 0.5127, 0.4893, 0.5120, 0.4809, 0.5239, 0.5102,
        0.5006, 0.5265, 0.5

('output.squeeze()', tensor([0.5125, 0.4764, 0.5231, 0.4624, 0.5043, 0.4857, 0.4686, 0.5078, 0.5094,
        0.5272, 0.4930, 0.4905, 0.5122, 0.4775, 0.4991, 0.5066, 0.5049, 0.4987,
        0.5005, 0.5193, 0.4941, 0.5319, 0.5259, 0.5016, 0.5162, 0.5026, 0.4949,
        0.4730, 0.5238, 0.4954, 0.4901, 0.5186, 0.5168, 0.4932, 0.4891, 0.5262,
        0.5135, 0.4939, 0.5067, 0.5121, 0.4832, 0.4439, 0.4928, 0.5089, 0.5091,
        0.4934, 0.5027, 0.4962, 0.5214, 0.5027],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0.,
        1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0.,
        0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5129, 0.5369, 0.5254, 0.4662, 0.5333, 0.4773, 0.4905, 0.5312, 0.4765,
        0.5180, 0.5092, 0.5026, 0.5078, 0.5126, 0.4847, 0.5168, 0.5046, 0.5158,
        0.4956, 0.5146, 0.5

('output.squeeze()', tensor([0.5454, 0.4660, 0.4837, 0.5181, 0.5200, 0.4743, 0.4743, 0.4723, 0.4844,
        0.5271, 0.4847, 0.4609, 0.4533, 0.5279, 0.4657, 0.4405, 0.5340, 0.4986,
        0.4736, 0.4966, 0.4604, 0.5543, 0.4657, 0.5257, 0.5033, 0.4921, 0.5421,
        0.4904, 0.4698, 0.4793, 0.5226, 0.5025, 0.5062, 0.4803, 0.5338, 0.5426,
        0.5550, 0.4652, 0.5238, 0.4811, 0.4946, 0.4874, 0.4771, 0.5172, 0.5076,
        0.4992, 0.4895, 0.4819, 0.5288, 0.4316],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 1., 1., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0.,
        1., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0., 0., 1., 0., 1., 1., 1., 1.,
        1., 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4988, 0.5025, 0.5311, 0.5043, 0.4891, 0.4572, 0.4634, 0.5530, 0.4683,
        0.5356, 0.4987, 0.5052, 0.4373, 0.4983, 0.5083, 0.5009, 0.4845, 0.4211,
        0.5041, 0.5118, 0.4

('output.squeeze()', tensor([0.5249, 0.4833, 0.5441, 0.5103, 0.4990, 0.5062, 0.5060, 0.4924, 0.5095,
        0.4844, 0.4545, 0.4619, 0.5052, 0.4875, 0.4364, 0.4220, 0.4682, 0.4866,
        0.5091, 0.4781, 0.4344, 0.4771, 0.4817, 0.5241, 0.4807, 0.4411, 0.5173,
        0.5185, 0.4541, 0.4810, 0.4952, 0.4479, 0.4779, 0.4772, 0.4729, 0.4641,
        0.4971, 0.4905, 0.4762, 0.4489, 0.5079, 0.4739, 0.5024, 0.4751, 0.4538,
        0.5053, 0.4886, 0.4884, 0.5044, 0.4831],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 0., 1., 0., 0., 1., 1., 0., 1., 1., 0., 0., 1., 0., 0., 0.,
        1., 0., 1., 0., 1., 1., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 0., 1.,
        1., 1., 1., 0., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4676, 0.5149, 0.4825, 0.4985, 0.4722, 0.4790, 0.4745, 0.4929, 0.5061,
        0.5599, 0.4835, 0.4804, 0.5358, 0.4939, 0.4883, 0.4585, 0.4811, 0.4924,
        0.5189, 0.5102, 0.4

('output.squeeze()', tensor([0.5383, 0.5409, 0.5004, 0.4943, 0.5121, 0.5353, 0.4978, 0.4821, 0.5012,
        0.4893, 0.5017, 0.5124, 0.5039, 0.4524, 0.4860, 0.5297, 0.5037, 0.5320,
        0.4780, 0.5596, 0.4848, 0.5264, 0.5467, 0.4752, 0.5728, 0.5321, 0.4663,
        0.4770, 0.5200, 0.4763, 0.5260, 0.4498, 0.5188, 0.5067, 0.5613, 0.5171,
        0.4724, 0.4782, 0.4913, 0.5010, 0.4882, 0.5570, 0.4265, 0.4665, 0.5294,
        0.4691, 0.5052, 0.5379, 0.4691, 0.5178],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 0., 0., 0., 1., 0., 0., 0., 1., 1., 0., 1., 0., 1., 0., 1.,
        1., 1., 1., 0., 1., 1., 1., 0., 0., 1., 1., 0., 0., 0., 0., 1., 1., 1.,
        0., 1., 0., 1., 1., 1., 0., 1., 1., 0., 0., 0., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4675, 0.5270, 0.5376, 0.4505, 0.5540, 0.5121, 0.4993, 0.5685, 0.4923,
        0.4977, 0.5356, 0.5262, 0.4757, 0.5075, 0.5348, 0.5504, 0.4886, 0.5911,
        0.4993, 0.4667, 0.5

('output.squeeze()', tensor([0.5076, 0.4352, 0.4572, 0.4029, 0.4878, 0.4773, 0.5013, 0.5195, 0.4804,
        0.4907, 0.4666, 0.5253, 0.4482, 0.4368, 0.4921, 0.5251, 0.4902, 0.5013,
        0.4325, 0.5406, 0.4990, 0.4854, 0.5606, 0.5192, 0.4860, 0.4849, 0.4770,
        0.4427, 0.4684, 0.4230, 0.4964, 0.4218, 0.5017, 0.5308, 0.5055, 0.4717,
        0.5191, 0.5322, 0.5255, 0.4688, 0.5077, 0.5042, 0.4455, 0.4515, 0.4730,
        0.4760, 0.5270, 0.4441, 0.4497, 0.5253],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 1., 0., 0., 0., 1., 1., 1., 1., 1., 1., 0., 0., 0., 1., 1., 1.,
        1., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 0., 1., 1., 1., 0., 1., 0.,
        0., 1., 0., 1., 1., 1., 1., 0., 1., 1., 0., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4631, 0.4034, 0.4962, 0.4886, 0.4712, 0.4421, 0.4463, 0.5597, 0.4390,
        0.4385, 0.4571, 0.5111, 0.4593, 0.4964, 0.5028, 0.4956, 0.4589, 0.4663,
        0.5101, 0.4942, 0.4

('output.squeeze()', tensor([0.4647, 0.4878, 0.5189, 0.5172, 0.4791, 0.5044, 0.5451, 0.5178, 0.5015,
        0.5066, 0.5001, 0.4989, 0.4741, 0.5352, 0.5124, 0.5122, 0.4789, 0.4625,
        0.4863, 0.4765, 0.4829, 0.5053, 0.5026, 0.5010, 0.5156, 0.5159, 0.5300,
        0.4738, 0.5155, 0.4924, 0.4753, 0.4622, 0.4770, 0.4428, 0.4982, 0.5219,
        0.5562, 0.4725, 0.4896, 0.4511, 0.5414, 0.4949, 0.5009, 0.4700, 0.4489,
        0.5427, 0.5160, 0.5103, 0.5112, 0.4867],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 0., 1., 1., 0., 0., 1., 1., 1., 0., 0., 1., 0., 1., 0., 0., 1.,
        0., 1., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 1., 0., 0., 1., 1., 0.,
        0., 0., 1., 1., 0., 0., 1., 1., 0., 1., 0., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4967, 0.4472, 0.4929, 0.5602, 0.4821, 0.4786, 0.4111, 0.4970, 0.4705,
        0.4888, 0.4594, 0.5643, 0.4753, 0.5067, 0.4511, 0.4959, 0.4552, 0.4935,
        0.5045, 0.4753, 0.4

('output.squeeze()', tensor([0.5156, 0.4873, 0.4854, 0.5294, 0.5100, 0.5022, 0.4911, 0.4679, 0.4984,
        0.5088, 0.4964, 0.5081, 0.5107, 0.5275, 0.5092, 0.5074, 0.5285, 0.5287,
        0.5192, 0.5035, 0.4992, 0.4667, 0.4607, 0.4805, 0.4983, 0.4767, 0.5211,
        0.4843, 0.5687, 0.5039, 0.4887, 0.5024, 0.4843, 0.5088, 0.5038, 0.4884,
        0.4722, 0.5097, 0.4633, 0.5044, 0.4675, 0.5267, 0.4879, 0.5036, 0.5328,
        0.5095, 0.5086, 0.4696, 0.5116, 0.5559],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 1.,
        1., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 1., 1.,
        1., 1., 1., 1., 1., 0., 1., 1., 1., 0., 0., 1., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4632, 0.5166, 0.5072, 0.4972, 0.5047, 0.4993, 0.5058, 0.5198, 0.4946,
        0.5261, 0.4938, 0.5169, 0.5261, 0.5232, 0.4715, 0.5042, 0.4848, 0.4776,
        0.5077, 0.4708, 0.5

('output.squeeze()', tensor([0.4738, 0.5368, 0.4958, 0.5356, 0.5081, 0.4940, 0.4694, 0.5103, 0.4909,
        0.5365, 0.5066, 0.5246, 0.4545, 0.5622, 0.5351, 0.4894, 0.4859, 0.4927,
        0.4763, 0.5045, 0.4703, 0.4596, 0.5233, 0.4968, 0.5407, 0.5436, 0.4931,
        0.5107, 0.5085, 0.5227, 0.5312, 0.5032, 0.5163, 0.4878, 0.4871, 0.5050,
        0.4866, 0.4907, 0.5524, 0.5070, 0.4782, 0.4803, 0.5214, 0.4847, 0.5007,
        0.5134, 0.5055, 0.4821, 0.5105, 0.5266],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 0., 1., 0., 0., 1., 0., 1., 0., 0., 1., 1., 1., 1., 0., 0., 0.,
        0., 1., 0., 1., 1., 0., 1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 0., 1.,
        0., 0., 1., 0., 0., 1., 1., 1., 0., 0., 1., 1., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4825, 0.4940, 0.5223, 0.4525, 0.5118, 0.4863, 0.4637, 0.4295, 0.4573,
        0.5239, 0.5009, 0.4852, 0.4229, 0.5190, 0.4855, 0.5130, 0.5006, 0.5189,
        0.4910, 0.5004, 0.5

('output.squeeze()', tensor([0.5157, 0.4910, 0.4866, 0.4819, 0.5548, 0.4982, 0.5235, 0.4351, 0.5228,
        0.5249, 0.5272, 0.5691, 0.4986, 0.5059, 0.4485, 0.5222, 0.5190, 0.5046,
        0.5070, 0.5010, 0.5101, 0.5119, 0.4959, 0.5608, 0.4778, 0.4901, 0.5098,
        0.5085, 0.4750, 0.5134, 0.5105, 0.4972, 0.5215, 0.5060, 0.4850, 0.4889,
        0.4919, 0.5078, 0.5030, 0.4933, 0.5087, 0.5011, 0.5153, 0.5094, 0.4588,
        0.5047, 0.4824, 0.5027, 0.4996, 0.5048],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 0., 0., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 0., 1.,
        0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 0., 1., 0., 0., 0., 0., 0.,
        0., 0., 0., 1., 1., 0., 1., 1., 1., 0., 1., 1., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5315, 0.5236, 0.5100, 0.5389, 0.4940, 0.4940, 0.5174, 0.5460, 0.5313,
        0.5095, 0.6037, 0.5446, 0.5219, 0.5118, 0.5266, 0.5035, 0.4440, 0.5360,
        0.4908, 0.5327, 0.4

('output.squeeze()', tensor([0.4974, 0.5168, 0.4998, 0.4898, 0.4828, 0.4497, 0.4778, 0.4648, 0.4989,
        0.4993, 0.5116, 0.4698, 0.4890, 0.5131, 0.4532, 0.4578, 0.5024, 0.4892,
        0.5171, 0.4793, 0.5078, 0.5169, 0.5043, 0.5615, 0.5359, 0.4684, 0.5145,
        0.5256, 0.5192, 0.4906, 0.5179, 0.5296, 0.5215, 0.5819, 0.4925, 0.5265,
        0.4464, 0.4096, 0.5455, 0.5115, 0.4806, 0.5288, 0.5123, 0.4885, 0.4976,
        0.5228, 0.5713, 0.4680, 0.4848, 0.4965],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0.,
        1., 0., 1., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 1., 0., 1., 1.,
        0., 1., 1., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4940, 0.4788, 0.5022, 0.4705, 0.4623, 0.5106, 0.4887, 0.5300, 0.4839,
        0.4984, 0.5035, 0.4706, 0.5150, 0.5370, 0.4609, 0.4889, 0.4670, 0.5128,
        0.5468, 0.4668, 0.5

('output.squeeze()', tensor([0.4909, 0.5096, 0.4908, 0.4763, 0.5059, 0.4739, 0.4904, 0.4864, 0.5222,
        0.4581, 0.4703, 0.5065, 0.4863, 0.4756, 0.4745, 0.5177, 0.5063, 0.4536,
        0.5084, 0.5080, 0.5103, 0.5003, 0.4961, 0.4947, 0.4825, 0.5156, 0.5011,
        0.5182, 0.4742, 0.4950, 0.4967, 0.5127, 0.4679, 0.4854, 0.4746, 0.4676,
        0.5067, 0.5027, 0.5280, 0.4978, 0.4650, 0.4351, 0.5075, 0.5247, 0.4647,
        0.5055, 0.5173, 0.4549, 0.4964, 0.5159],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0.,
        1., 0., 1., 1., 1., 1., 0., 0., 0., 1., 1., 0., 0., 0., 1., 1., 1., 0.,
        1., 1., 0., 1., 1., 0., 0., 0., 0., 0., 0., 1., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4827, 0.5007, 0.4561, 0.4975, 0.5178, 0.4896, 0.4696, 0.5113, 0.5077,
        0.4727, 0.5001, 0.4959, 0.4711, 0.4732, 0.5209, 0.4795, 0.4422, 0.5093,
        0.5063, 0.4910, 0.5

('output.squeeze()', tensor([0.4986, 0.5135, 0.5075, 0.5247, 0.5046, 0.4777, 0.5182, 0.5068, 0.4850,
        0.5023, 0.4693, 0.5099, 0.5176, 0.4928, 0.4859, 0.4735, 0.5150, 0.4787,
        0.4935, 0.5032, 0.4871, 0.4750, 0.5271, 0.4587, 0.4557, 0.5227, 0.4985,
        0.4798, 0.4813, 0.4733, 0.4806, 0.5009, 0.5026, 0.4857, 0.4758, 0.4901,
        0.5264, 0.4977, 0.4925, 0.4700, 0.4866, 0.4833, 0.5112, 0.5016, 0.4913,
        0.4944, 0.4780, 0.5147, 0.4818, 0.4897],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 0., 0., 1., 1., 0., 0., 1., 1.,
        1., 1., 1., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0., 1.,
        0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4702, 0.5013, 0.4981, 0.4982, 0.4723, 0.5087, 0.5020, 0.4930, 0.4969,
        0.4659, 0.4475, 0.5165, 0.5198, 0.5146, 0.4684, 0.5177, 0.5034, 0.4687,
        0.4610, 0.4995, 0.4

('output.squeeze()', tensor([0.5030, 0.4797, 0.4317, 0.5080, 0.5141, 0.4802, 0.4428, 0.4786, 0.5241,
        0.4836, 0.5191, 0.5303, 0.4821, 0.5153, 0.5249, 0.4968, 0.4589, 0.4836,
        0.5333, 0.4634, 0.5091, 0.4809, 0.5104, 0.4768, 0.5195, 0.5013, 0.4665,
        0.5005, 0.4831, 0.4879, 0.4799, 0.4797, 0.5416, 0.5023, 0.4983, 0.5060,
        0.4863, 0.5042, 0.5032, 0.4998, 0.4993, 0.4788, 0.5181, 0.5050, 0.5387,
        0.5129, 0.4865, 0.5088, 0.5217, 0.5105],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 0., 0., 1., 1., 0., 1., 1., 0., 0., 0., 0., 1., 1., 0., 0.,
        0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 0.,
        1., 0., 1., 1., 0., 0., 0., 1., 1., 0., 1., 1., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5057, 0.5336, 0.4764, 0.4692, 0.4543, 0.5118, 0.5310, 0.5044, 0.5097,
        0.5143, 0.4866, 0.5078, 0.4798, 0.5095, 0.5169, 0.4978, 0.4976, 0.4705,
        0.4913, 0.4808, 0.4

('output.squeeze()', tensor([0.4178, 0.4885, 0.4940, 0.5198, 0.4743, 0.4324, 0.5177, 0.4927, 0.5009,
        0.4926, 0.4977, 0.4487, 0.5068, 0.5218, 0.5384, 0.4914, 0.5227, 0.4934,
        0.4994, 0.4540, 0.4847, 0.4944, 0.5198, 0.4884, 0.5112, 0.5067, 0.4864,
        0.4973, 0.4754, 0.5081, 0.4633, 0.4836, 0.4764, 0.4887, 0.4980, 0.5026,
        0.5093, 0.4937, 0.4959, 0.5193, 0.5005, 0.5007, 0.4757, 0.4989, 0.4883,
        0.4506, 0.5236, 0.4715, 0.5134, 0.5330],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 0., 0., 1., 0., 0., 1., 1., 1., 0., 1., 0., 1., 1., 0., 1., 1.,
        0., 0., 0., 0., 0., 1., 1., 1., 0., 1., 1., 0., 1., 1., 0., 0., 0., 1.,
        1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5013, 0.4686, 0.4935, 0.5246, 0.5161, 0.5215, 0.4692, 0.5166, 0.4888,
        0.5294, 0.4838, 0.4847, 0.4892, 0.4907, 0.5178, 0.4941, 0.4679, 0.5073,
        0.4678, 0.5236, 0.4

('output.squeeze()', tensor([0.4963, 0.5412, 0.4606, 0.4594, 0.5044, 0.4711, 0.4791, 0.5034, 0.4945,
        0.4875, 0.4756, 0.4965, 0.5252, 0.5276, 0.4718, 0.5098, 0.5267, 0.4660,
        0.4367, 0.5236, 0.4954, 0.5258, 0.4869, 0.5187, 0.5090, 0.4822, 0.5081,
        0.4351, 0.4937, 0.4978, 0.5056, 0.5144, 0.5092, 0.5076, 0.4720, 0.5051,
        0.5021, 0.4331, 0.5111, 0.5057, 0.5029, 0.4884, 0.5020, 0.5440, 0.4578,
        0.4385, 0.5149, 0.5315, 0.5083, 0.4740],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 1., 0.,
        0., 1., 1., 1., 1., 0., 1., 1., 0., 1., 1., 1., 0., 1., 0., 1., 0., 1.,
        1., 1., 1., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5201, 0.4930, 0.4786, 0.4921, 0.4877, 0.5111, 0.4883, 0.5108, 0.4784,
        0.5297, 0.5062, 0.5211, 0.4377, 0.4952, 0.5062, 0.5192, 0.4442, 0.5209,
        0.4998, 0.5020, 0.4

('output.squeeze()', tensor([0.4936, 0.4860, 0.4761, 0.5222, 0.5544, 0.5609, 0.4923, 0.5446, 0.4899,
        0.4672, 0.5120, 0.5166, 0.5402, 0.4876, 0.4809, 0.5319, 0.4661, 0.5226,
        0.5178, 0.5352, 0.5253, 0.5221, 0.4919, 0.5333, 0.5117, 0.5260, 0.4943,
        0.5165, 0.5021, 0.4329, 0.4541, 0.4380, 0.4995, 0.5482, 0.5011, 0.4729,
        0.5103, 0.5458, 0.5225, 0.5207, 0.5262, 0.4963, 0.5168, 0.5131, 0.5075,
        0.5449, 0.5182, 0.5496, 0.5097, 0.5401],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 0., 0., 1., 1., 0., 0., 0., 1., 1., 0., 0., 0., 1., 1., 1., 1.,
        1., 1., 1., 0., 0., 1., 0., 1., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 0., 1., 0., 1., 1., 1., 1., 0., 0., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5701, 0.5025, 0.5308, 0.5382, 0.5026, 0.5457, 0.4933, 0.4865, 0.4630,
        0.4799, 0.5200, 0.5167, 0.5293, 0.4762, 0.5200, 0.5499, 0.5219, 0.5211,
        0.5326, 0.5609, 0.5

('output.squeeze()', tensor([0.5142, 0.5449, 0.5310, 0.5288, 0.5675, 0.5121, 0.5438, 0.5110, 0.5665,
        0.5534, 0.5658, 0.5191, 0.5517, 0.5468, 0.5915, 0.5609, 0.5501, 0.5717,
        0.5372, 0.5555, 0.5152, 0.5237, 0.5858, 0.5453, 0.5362, 0.5611, 0.5627,
        0.5338, 0.5264, 0.5083, 0.5269, 0.5440, 0.5472, 0.5401, 0.5548, 0.5249,
        0.5325, 0.5359, 0.5304, 0.5704, 0.5559, 0.5459, 0.5277, 0.5657, 0.5112,
        0.5320, 0.5510, 0.5296, 0.4961, 0.6032],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 1., 1., 0., 1., 0., 1., 1., 1., 0., 0., 0., 1., 0., 1., 0.,
        1., 1., 0., 1., 1., 1., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 1.,
        1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5520, 0.5683, 0.5177, 0.5736, 0.5621, 0.5526, 0.5421, 0.5230, 0.5454,
        0.5438, 0.5092, 0.5406, 0.5550, 0.5309, 0.5120, 0.5493, 0.5565, 0.5496,
        0.5388, 0.5939, 0.5

('output.squeeze()', tensor([0.5163, 0.5127, 0.5210, 0.5129, 0.4964, 0.5250, 0.5217, 0.5476, 0.5068,
        0.5303, 0.5086, 0.4823, 0.5145, 0.4800, 0.5397, 0.5331, 0.5011, 0.5474,
        0.5002, 0.5330, 0.5202, 0.4984, 0.4799, 0.5137, 0.4898, 0.5623, 0.5121,
        0.5082, 0.5074, 0.4984, 0.5213, 0.4907, 0.4928, 0.5177, 0.4841, 0.4955,
        0.5226, 0.4912, 0.5111, 0.4958, 0.5219, 0.5024, 0.4892, 0.4857, 0.5275,
        0.5375, 0.5360, 0.5249, 0.4702, 0.5074],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 0., 1., 0., 1., 1., 1., 1., 0., 0., 0., 1., 1., 0., 0., 0.,
        0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 1., 0.,
        1., 1., 1., 0., 0., 1., 0., 1., 0., 0., 1., 1., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5241, 0.4603, 0.5301, 0.5056, 0.4951, 0.5066, 0.5286, 0.5154, 0.4791,
        0.5024, 0.5101, 0.4864, 0.4909, 0.5087, 0.5183, 0.5214, 0.5268, 0.5071,
        0.5128, 0.5021, 0.4

('output.squeeze()', tensor([0.4862, 0.5011, 0.4990, 0.4892, 0.5152, 0.4972, 0.5267, 0.4815, 0.4907,
        0.5335, 0.4594, 0.5004, 0.4875, 0.5087, 0.5141, 0.5104, 0.4763, 0.4857,
        0.5068, 0.5203, 0.5197, 0.5018, 0.4924, 0.5056, 0.5333, 0.5311, 0.4898,
        0.4804, 0.5143, 0.4723, 0.5234, 0.4954, 0.4959, 0.4984, 0.5021, 0.4868,
        0.5189, 0.5059, 0.5185, 0.5152, 0.4850, 0.4898, 0.4620, 0.5028, 0.4906,
        0.5097, 0.4886, 0.5204, 0.4966, 0.4674],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 1., 0., 1., 1., 1., 1., 1., 0.,
        0., 0., 0., 1., 0., 1., 1., 0., 1., 1., 0., 1., 0., 0., 1., 1., 1., 0.,
        0., 0., 0., 0., 1., 0., 0., 1., 1., 0., 0., 0., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5211, 0.5083, 0.5153, 0.5097, 0.4969, 0.5100, 0.4925, 0.5119, 0.4982,
        0.5147, 0.5085, 0.4770, 0.5109, 0.5255, 0.5075, 0.5103, 0.5231, 0.5104,
        0.4969, 0.4827, 0.4

('output.squeeze()', tensor([0.4915, 0.4945, 0.5361, 0.5048, 0.5179, 0.4936, 0.5122, 0.5060, 0.5288,
        0.4881, 0.5344, 0.5197, 0.5124, 0.5184, 0.4762, 0.4801, 0.4920, 0.5015,
        0.5226, 0.5052, 0.5068, 0.4939, 0.5113, 0.4988, 0.5327, 0.5037, 0.5196,
        0.5071, 0.5250, 0.5091, 0.5219, 0.4836, 0.4998, 0.5122, 0.5172, 0.4948,
        0.4863, 0.4939, 0.5006, 0.5131, 0.4879, 0.4895, 0.4910, 0.4776, 0.4937,
        0.5025, 0.4954, 0.5004, 0.5460, 0.4919],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 1., 1., 1., 1., 0., 0., 1., 0., 0., 1., 1., 0., 1., 0., 1., 1.,
        0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 1., 0., 1., 0., 1., 1., 0., 0.,
        0., 0., 1., 1., 0., 1., 1., 1., 1., 0., 0., 0., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5048, 0.4737, 0.5007, 0.4933, 0.4963, 0.4974, 0.5196, 0.5223, 0.4897,
        0.4874, 0.4943, 0.5047, 0.4804, 0.4842, 0.5293, 0.4973, 0.5252, 0.4950,
        0.5241, 0.5292, 0.5

('output.squeeze()', tensor([0.5276, 0.5416, 0.4987, 0.4643, 0.5014, 0.5181, 0.5318, 0.5126, 0.4852,
        0.5067, 0.5162, 0.4778, 0.4944, 0.4829, 0.5046, 0.5340, 0.5139, 0.5187,
        0.4792, 0.5086, 0.5189, 0.5150, 0.4943, 0.5018, 0.4934, 0.4891, 0.4771,
        0.5028, 0.4993, 0.5078, 0.5013, 0.5211, 0.5567, 0.4800, 0.4628, 0.4902,
        0.5131, 0.5321, 0.5081, 0.5047, 0.5048, 0.4937, 0.5331, 0.5048, 0.5210,
        0.5418, 0.5141, 0.5219, 0.5449, 0.5297],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 0., 1., 1., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 0.,
        0., 0., 0., 0., 1., 0., 1., 0., 0., 1., 1., 1., 1., 0., 0., 0., 1., 0.,
        0., 1., 0., 1., 0., 0., 1., 0., 1., 1., 0., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5301, 0.4776, 0.5446, 0.5011, 0.5076, 0.5354, 0.5124, 0.5267, 0.5146,
        0.5446, 0.5243, 0.5114, 0.5244, 0.5391, 0.5175, 0.5009, 0.5253, 0.4969,
        0.5095, 0.4886, 0.4

('output.squeeze()', tensor([0.4772, 0.4927, 0.4492, 0.4616, 0.5254, 0.4671, 0.4930, 0.4872, 0.5097,
        0.4277, 0.5166, 0.5385, 0.4995, 0.4479, 0.4861, 0.5817, 0.5042, 0.5639,
        0.4985, 0.5699, 0.4747, 0.5718, 0.4661, 0.4990, 0.4857, 0.5138, 0.5160,
        0.5115, 0.5150, 0.4929, 0.5369, 0.5051, 0.5040, 0.5956, 0.4797, 0.4237,
        0.5833, 0.5405, 0.5080, 0.5029, 0.4842, 0.4993, 0.4772, 0.4772, 0.4893,
        0.5390, 0.5029, 0.5356, 0.4704, 0.5031],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 0., 0., 0., 1., 0., 0., 1., 1., 1., 0., 0., 0., 1., 1., 0., 0.,
        1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 1., 0., 1., 1., 1., 1., 0., 1.,
        0., 1., 0., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4719, 0.4891, 0.4905, 0.5257, 0.4668, 0.4570, 0.4839, 0.4546, 0.5293,
        0.4870, 0.5368, 0.4798, 0.4027, 0.4950, 0.4834, 0.5460, 0.5405, 0.4919,
        0.5201, 0.4647, 0.4

('output.squeeze()', tensor([0.5424, 0.5288, 0.5094, 0.4150, 0.4391, 0.5577, 0.5210, 0.4773, 0.5109,
        0.5818, 0.4745, 0.4945, 0.4936, 0.4322, 0.5088, 0.4517, 0.5141, 0.5767,
        0.5780, 0.4753, 0.5219, 0.4668, 0.4825, 0.5172, 0.5444, 0.4008, 0.4415,
        0.4522, 0.5054, 0.5397, 0.4888, 0.4914, 0.4484, 0.4517, 0.5356, 0.4804,
        0.3818, 0.6233, 0.4870, 0.4090, 0.5509, 0.4939, 0.5681, 0.4779, 0.4404,
        0.5257, 0.4214, 0.4631, 0.5382, 0.5611],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 0., 0., 0., 1., 0., 1., 1., 1., 0., 0., 1., 1., 1., 0., 1., 0.,
        0., 0., 1., 1., 0., 1., 0., 0., 1., 1., 1., 0., 1., 1., 0., 1., 0., 0.,
        0., 1., 1., 1., 1., 1., 0., 1., 1., 0., 0., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5139, 0.6279, 0.4750, 0.4279, 0.5239, 0.5266, 0.5145, 0.5185, 0.5457,
        0.4606, 0.5206, 0.5021, 0.4540, 0.5482, 0.5075, 0.5047, 0.5232, 0.5355,
        0.3795, 0.4259, 0.4

('output.squeeze()', tensor([0.5134, 0.4257, 0.5099, 0.4560, 0.5150, 0.4856, 0.4604, 0.3677, 0.5271,
        0.3986, 0.5456, 0.5012, 0.4602, 0.5310, 0.4356, 0.5103, 0.4921, 0.5556,
        0.4284, 0.4134, 0.4308, 0.4095, 0.5221, 0.4607, 0.4938, 0.4879, 0.6114,
        0.4616, 0.4405, 0.4275, 0.4895, 0.5030, 0.5535, 0.5505, 0.5786, 0.5308,
        0.5585, 0.5186, 0.3951, 0.4028, 0.4416, 0.4833, 0.4755, 0.5771, 0.4045,
        0.5303, 0.5330, 0.5968, 0.5281, 0.5648],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 1., 1., 1., 0., 0., 1., 0., 1., 1., 1., 0., 0., 0., 0., 0.,
        1., 1., 0., 1., 0., 1., 0., 1., 1., 1., 1., 0., 1., 0., 0., 1., 1., 1.,
        1., 1., 1., 1., 0., 1., 0., 0., 0., 1., 1., 1., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4802, 0.5357, 0.4234, 0.5407, 0.4571, 0.5198, 0.4422, 0.5507, 0.3587,
        0.5122, 0.4977, 0.4382, 0.5645, 0.6286, 0.3976, 0.5436, 0.5809, 0.5534,
        0.5844, 0.4591, 0.3

('output.squeeze()', tensor([0.4122, 0.3788, 0.5849, 0.3724, 0.5862, 0.5955, 0.6281, 0.5448, 0.5613,
        0.5227, 0.5580, 0.5103, 0.5548, 0.5433, 0.4636, 0.4387, 0.4435, 0.6085,
        0.5678, 0.5799, 0.5731, 0.5486, 0.5468, 0.4222, 0.4898, 0.4976, 0.5255,
        0.4628, 0.5614, 0.4931, 0.5431, 0.6122, 0.5334, 0.5408, 0.5196, 0.6266,
        0.5062, 0.5351, 0.5488, 0.4713, 0.5718, 0.4530, 0.4713, 0.4952, 0.5267,
        0.5522, 0.4937, 0.5360, 0.6178, 0.4854],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 1., 0., 1., 1., 1., 0., 1., 0., 1., 0., 0., 0., 1., 1., 1., 0.,
        1., 1., 1., 0., 0., 1., 1., 1., 0., 1., 0., 1., 1., 1., 0., 1., 0., 0.,
        1., 1., 0., 0., 1., 1., 0., 1., 0., 1., 1., 1., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4825, 0.5844, 0.5896, 0.5038, 0.5207, 0.5140, 0.6011, 0.5404, 0.5353,
        0.5491, 0.5255, 0.4363, 0.5396, 0.4627, 0.4782, 0.5230, 0.5235, 0.5129,
        0.5289, 0.6069, 0.4

('output.squeeze()', tensor([0.4208, 0.5038, 0.4763, 0.5455, 0.5132, 0.5188, 0.5158, 0.4019, 0.5702,
        0.5171, 0.4394, 0.5607, 0.6455, 0.5080, 0.4722, 0.4717, 0.5717, 0.4711,
        0.5779, 0.5356, 0.6160, 0.5917, 0.4923, 0.5371, 0.4269, 0.5970, 0.5433,
        0.5766, 0.5282, 0.5422, 0.5068, 0.5688, 0.5669, 0.5115, 0.5385, 0.4960,
        0.5185, 0.5022, 0.5909, 0.5550, 0.5143, 0.5435, 0.5322, 0.5286, 0.4581,
        0.4057, 0.4019, 0.5585, 0.4889, 0.5843],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1., 0.,
        1., 0., 1., 1., 1., 0., 0., 0., 1., 1., 0., 1., 1., 1., 1., 0., 1., 0.,
        1., 0., 1., 1., 0., 1., 0., 0., 1., 0., 1., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.3820, 0.5114, 0.5695, 0.5287, 0.4922, 0.5821, 0.4815, 0.4752, 0.4988,
        0.5493, 0.4815, 0.5813, 0.4791, 0.4684, 0.5042, 0.5619, 0.5075, 0.5333,
        0.5404, 0.5474, 0.5

('Epoch: 2/4...', 'Step: 500...', 'Loss: 0.718177...', 'Val Loss: 0.698546')
('output.squeeze()', tensor([0.4927, 0.5176, 0.5186, 0.5106, 0.4815, 0.4358, 0.5707, 0.6215, 0.5080,
        0.4018, 0.5882, 0.4826, 0.5471, 0.5597, 0.4018, 0.5096, 0.4947, 0.5894,
        0.5418, 0.6004, 0.5400, 0.4885, 0.4161, 0.5015, 0.5668, 0.4314, 0.4395,
        0.4867, 0.4849, 0.4737, 0.4927, 0.5321, 0.5953, 0.5564, 0.3814, 0.5329,
        0.4766, 0.5354, 0.4190, 0.3886, 0.4915, 0.4682, 0.4303, 0.4902, 0.4355,
        0.5055, 0.5554, 0.5262, 0.4121, 0.4756],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 1., 1., 1., 1., 0., 1., 1., 1., 0., 1., 0., 1., 1., 0., 0., 1.,
        0., 1., 0., 0., 0., 0., 1., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 0.,
        0., 1., 1., 1., 0., 0., 0., 1., 0., 1., 0., 0., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5509, 0.6085, 0.3952, 0.5546, 0.4647, 0.5457, 0.4498, 0.5423, 0.5348,
        0.4578, 0.4395, 0.4634

('output.squeeze()', tensor([0.4962, 0.4609, 0.4902, 0.4915, 0.5496, 0.4313, 0.4070, 0.4800, 0.4508,
        0.4368, 0.4915, 0.5395, 0.4443, 0.4802, 0.5320, 0.4900, 0.4541, 0.4385,
        0.4840, 0.4972, 0.4446, 0.4584, 0.4201, 0.4607, 0.4635, 0.5244, 0.5107,
        0.5090, 0.3799, 0.5348, 0.5499, 0.4897, 0.5246, 0.4419, 0.6278, 0.4992,
        0.6399, 0.4924, 0.4761, 0.5184, 0.4415, 0.4561, 0.4868, 0.5806, 0.6316,
        0.4373, 0.5083, 0.4864, 0.5209, 0.4882],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 1., 1., 0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0.,
        1., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0., 0.,
        1., 0., 0., 0., 1., 1., 0., 1., 1., 0., 0., 0., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5229, 0.5281, 0.4609, 0.4631, 0.4195, 0.5149, 0.4205, 0.4113, 0.5223,
        0.5026, 0.4382, 0.5387, 0.5016, 0.4703, 0.4793, 0.5157, 0.5010, 0.4600,
        0.5106, 0.4637, 0.5

('output.squeeze()', tensor([0.5267, 0.5300, 0.5301, 0.4506, 0.4667, 0.6202, 0.5546, 0.4705, 0.5046,
        0.4686, 0.4562, 0.6116, 0.5405, 0.6065, 0.4816, 0.6257, 0.5349, 0.4988,
        0.3618, 0.5247, 0.5976, 0.4603, 0.6243, 0.3688, 0.5449, 0.4661, 0.5458,
        0.6673, 0.5005, 0.5239, 0.4593, 0.5610, 0.5317, 0.5150, 0.5386, 0.3982,
        0.5228, 0.4000, 0.4187, 0.5364, 0.4793, 0.5342, 0.5355, 0.4809, 0.4795,
        0.5234, 0.5357, 0.5347, 0.5746, 0.6153],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 1., 0., 1., 1., 0., 0., 1., 1., 0., 0., 0., 0., 1., 1., 1., 0.,
        0., 0., 0., 1., 1., 1., 0., 0., 1., 1., 1., 1., 0., 1., 0., 0., 1., 1.,
        1., 0., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5075, 0.4562, 0.5790, 0.3986, 0.5732, 0.5266, 0.5679, 0.4599, 0.4951,
        0.5821, 0.5575, 0.4864, 0.4152, 0.5358, 0.5156, 0.5553, 0.3486, 0.5305,
        0.4891, 0.5223, 0.5

('output.squeeze()', tensor([0.4797, 0.6123, 0.5375, 0.4860, 0.4905, 0.5462, 0.3818, 0.4981, 0.4617,
        0.4586, 0.6342, 0.3308, 0.5073, 0.5005, 0.4758, 0.3912, 0.5729, 0.5590,
        0.5048, 0.4461, 0.2883, 0.5447, 0.3731, 0.4591, 0.4539, 0.5338, 0.4742,
        0.6015, 0.5112, 0.5426, 0.6829, 0.6063, 0.4714, 0.4793, 0.4282, 0.4496,
        0.5375, 0.5285, 0.5628, 0.4466, 0.4269, 0.4555, 0.4870, 0.4497, 0.4412,
        0.5696, 0.4822, 0.5071, 0.5020, 0.3749],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 1., 0., 1., 0., 0., 1., 1., 1.,
        1., 1., 0., 1., 1., 0., 0., 1., 1., 1., 0., 1., 1., 0., 1., 0., 1., 0.,
        1., 0., 1., 1., 0., 1., 1., 0., 0., 1., 0., 0., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.3462, 0.5501, 0.4929, 0.5455, 0.5185, 0.5560, 0.5093, 0.4983, 0.4316,
        0.5662, 0.4231, 0.5461, 0.5247, 0.5410, 0.4055, 0.6170, 0.4883, 0.4959,
        0.5045, 0.4348, 0.5

('output.squeeze()', tensor([0.4931, 0.4715, 0.5730, 0.4614, 0.4826, 0.4936, 0.4885, 0.4555, 0.4016,
        0.5824, 0.4208, 0.5953, 0.4958, 0.2996, 0.4746, 0.5041, 0.5987, 0.4926,
        0.5063, 0.6356, 0.3583, 0.4390, 0.3750, 0.6033, 0.4824, 0.3496, 0.5052,
        0.4228, 0.4537, 0.4216, 0.5379, 0.4941, 0.5363, 0.4808, 0.4424, 0.4468,
        0.5241, 0.4428, 0.6474, 0.4604, 0.3053, 0.6104, 0.4852, 0.5378, 0.4734,
        0.5149, 0.4916, 0.6262, 0.5293, 0.6654],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 0., 1., 0., 1., 1., 0., 1., 1., 0., 1., 0., 0., 0., 0., 1., 1.,
        0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0.,
        1., 1., 1., 1., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5658, 0.4357, 0.4338, 0.4790, 0.2958, 0.4901, 0.2933, 0.6940, 0.5847,
        0.3965, 0.5906, 0.5361, 0.6091, 0.5538, 0.6645, 0.5080, 0.4662, 0.2475,
        0.4333, 0.2530, 0.6

('output.squeeze()', tensor([0.4660, 0.3862, 0.5316, 0.4708, 0.5143, 0.3916, 0.5197, 0.5067, 0.3668,
        0.4581, 0.4687, 0.4850, 0.5407, 0.5926, 0.5082, 0.4393, 0.5395, 0.3935,
        0.4181, 0.5070, 0.5116, 0.5641, 0.4940, 0.5531, 0.3702, 0.3423, 0.4981,
        0.6497, 0.5267, 0.5510, 0.6249, 0.4637, 0.3611, 0.4811, 0.4881, 0.5474,
        0.5534, 0.6147, 0.4985, 0.4543, 0.4313, 0.6353, 0.3750, 0.4302, 0.5202,
        0.4184, 0.5413, 0.5453, 0.3631, 0.5205],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 1., 1., 0., 0., 0., 1., 1., 1., 1., 0., 1., 0., 0., 1., 0., 0.,
        0., 0., 1., 0., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 1.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5186, 0.4969, 0.4509, 0.5770, 0.4376, 0.4017, 0.7066, 0.4517, 0.5828,
        0.5342, 0.4525, 0.4578, 0.5219, 0.5252, 0.4238, 0.4578, 0.4388, 0.5782,
        0.3391, 0.3501, 0.6

('output.squeeze()', tensor([0.5778, 0.4671, 0.3967, 0.4765, 0.4017, 0.3587, 0.5045, 0.6429, 0.4672,
        0.4791, 0.5282, 0.4449, 0.4337, 0.5506, 0.5835, 0.4725, 0.5168, 0.5266,
        0.4103, 0.4688, 0.4612, 0.5465, 0.4427, 0.5008, 0.4481, 0.4973, 0.5588,
        0.5832, 0.4716, 0.5311, 0.4624, 0.5068, 0.5963, 0.5074, 0.5398, 0.4344,
        0.4164, 0.5144, 0.4574, 0.5463, 0.4901, 0.6165, 0.3473, 0.5068, 0.5409,
        0.5340, 0.5801, 0.6694, 0.4470, 0.3634],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 0., 1., 1., 0., 1., 0., 0., 1., 0., 1., 0., 0., 1., 0., 0.,
        0., 0., 1., 0., 1., 0., 0., 1., 0., 1., 1., 1., 0., 1., 1., 1., 1., 1.,
        0., 1., 1., 0., 1., 1., 0., 0., 0., 0., 1., 1., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5009, 0.5307, 0.5292, 0.5090, 0.5525, 0.5255, 0.5266, 0.4464, 0.4599,
        0.4698, 0.5079, 0.4460, 0.4440, 0.4896, 0.4611, 0.4116, 0.5691, 0.6555,
        0.6578, 0.5319, 0.4

('Epoch: 2/4...', 'Step: 600...', 'Loss: 0.708087...', 'Val Loss: 0.701554')
('output.squeeze()', tensor([0.5178, 0.4569, 0.4770, 0.6032, 0.4505, 0.5116, 0.4760, 0.4766, 0.4763,
        0.5190, 0.5321, 0.5431, 0.3950, 0.4658, 0.5857, 0.4131, 0.4641, 0.4514,
        0.4852, 0.4152, 0.5273, 0.4728, 0.4960, 0.4256, 0.4885, 0.4489, 0.5088,
        0.6030, 0.4861, 0.4156, 0.4766, 0.4868, 0.5058, 0.4394, 0.5053, 0.4219,
        0.5210, 0.5623, 0.4648, 0.4638, 0.4367, 0.4026, 0.4974, 0.4594, 0.5213,
        0.3660, 0.5255, 0.4853, 0.5393, 0.5496],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 1., 0., 0., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 0., 0.,
        0., 1., 1., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 1., 1., 0., 1., 0.,
        1., 1., 0., 0., 0., 1., 0., 0., 1., 0., 1., 1., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4838, 0.5070, 0.4445, 0.4643, 0.4634, 0.4966, 0.4740, 0.4240, 0.4913,
        0.5679, 0.4924, 0.5192

('output.squeeze()', tensor([0.4529, 0.5207, 0.4528, 0.4894, 0.5532, 0.4440, 0.5679, 0.4069, 0.4838,
        0.3815, 0.6265, 0.5085, 0.5077, 0.4286, 0.4718, 0.4473, 0.4878, 0.5506,
        0.5274, 0.5320, 0.5560, 0.5242, 0.5652, 0.4555, 0.5140, 0.4994, 0.3384,
        0.4321, 0.4921, 0.5099, 0.4375, 0.4491, 0.5609, 0.4342, 0.4806, 0.3457,
        0.5962, 0.5336, 0.4931, 0.4968, 0.4492, 0.5076, 0.3830, 0.4894, 0.4674,
        0.5211, 0.4460, 0.4364, 0.4391, 0.5010],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 1., 1., 0., 0., 1., 0., 1., 0., 0., 0., 1., 1., 1., 0., 0., 0.,
        1., 1., 0., 1., 1., 1., 0., 0., 0., 1., 1., 1., 1., 1., 0., 1., 1., 1.,
        1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5456, 0.4317, 0.4721, 0.4756, 0.4951, 0.5259, 0.4812, 0.5369, 0.5116,
        0.4164, 0.5412, 0.4534, 0.5401, 0.4702, 0.5806, 0.5415, 0.4344, 0.4831,
        0.5309, 0.5313, 0.5

('output.squeeze()', tensor([0.5744, 0.4440, 0.5132, 0.4380, 0.5970, 0.4852, 0.5034, 0.4793, 0.4680,
        0.5029, 0.4759, 0.4606, 0.4936, 0.5059, 0.4240, 0.5237, 0.5999, 0.5117,
        0.5145, 0.5312, 0.4764, 0.5212, 0.4821, 0.4272, 0.5105, 0.4933, 0.4881,
        0.5295, 0.5269, 0.5341, 0.5005, 0.4705, 0.5637, 0.4844, 0.4442, 0.4686,
        0.5256, 0.4916, 0.4902, 0.5826, 0.5240, 0.5285, 0.5065, 0.4883, 0.5343,
        0.5053, 0.4787, 0.5084, 0.4832, 0.4876],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0., 0., 0.,
        0., 1., 0., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 0., 1., 0., 1., 0.,
        0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4085, 0.5701, 0.4660, 0.5824, 0.3203, 0.4507, 0.4550, 0.4881, 0.4781,
        0.5726, 0.4553, 0.5191, 0.4967, 0.5199, 0.4394, 0.4829, 0.4756, 0.5053,
        0.5217, 0.4613, 0.4

('output.squeeze()', tensor([0.5488, 0.5353, 0.5059, 0.5272, 0.4983, 0.5322, 0.5586, 0.5310, 0.5288,
        0.5693, 0.5264, 0.5000, 0.5639, 0.5717, 0.4639, 0.4755, 0.5387, 0.5328,
        0.5656, 0.5765, 0.4921, 0.5007, 0.4728, 0.4931, 0.5104, 0.5171, 0.5900,
        0.5128, 0.5149, 0.4511, 0.4816, 0.5180, 0.4572, 0.5355, 0.5303, 0.6059,
        0.5861, 0.5046, 0.4602, 0.4867, 0.4836, 0.5487, 0.4934, 0.4738, 0.5426,
        0.5068, 0.4998, 0.5789, 0.4841, 0.5384],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 1., 1., 0., 1., 1., 0., 1., 1., 0., 1., 0., 0., 1., 0., 1., 1.,
        0., 1., 0., 0., 0., 0., 1., 1., 1., 1., 0., 1., 0., 1., 0., 1., 1., 1.,
        1., 0., 0., 1., 0., 1., 0., 0., 0., 1., 0., 0., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5233, 0.5063, 0.4732, 0.4744, 0.5397, 0.5445, 0.5259, 0.4796, 0.4662,
        0.4759, 0.4606, 0.4940, 0.5812, 0.4269, 0.4661, 0.4818, 0.5329, 0.5523,
        0.5021, 0.6415, 0.4

('output.squeeze()', tensor([0.5042, 0.5555, 0.5334, 0.5163, 0.5232, 0.5166, 0.5462, 0.5144, 0.5053,
        0.5425, 0.5380, 0.5237, 0.4651, 0.5541, 0.5790, 0.5775, 0.5410, 0.4069,
        0.5210, 0.5443, 0.5073, 0.4970, 0.5298, 0.4713, 0.3996, 0.5916, 0.4668,
        0.5407, 0.5181, 0.5880, 0.5029, 0.5731, 0.6171, 0.6134, 0.4812, 0.5603,
        0.5058, 0.5002, 0.5358, 0.4957, 0.4542, 0.5891, 0.5589, 0.5078, 0.5229,
        0.5565, 0.4872, 0.5610, 0.6152, 0.5298],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 0., 0., 1., 1., 1., 0., 1., 0., 0., 0., 1., 1., 0., 0., 1., 0.,
        1., 1., 1., 0., 1., 1., 0., 0., 0., 0., 1., 0., 1., 1., 1., 1., 1., 0.,
        0., 1., 1., 1., 1., 0., 0., 1., 0., 1., 0., 0., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4952, 0.5151, 0.5151, 0.5840, 0.5998, 0.5207, 0.5924, 0.4883, 0.4703,
        0.5315, 0.5687, 0.5320, 0.5260, 0.4884, 0.5584, 0.5508, 0.5264, 0.5093,
        0.5416, 0.4452, 0.6

('output.squeeze()', tensor([0.4702, 0.5180, 0.5340, 0.6337, 0.5336, 0.4758, 0.5831, 0.5299, 0.5172,
        0.5637, 0.4773, 0.4933, 0.4959, 0.4820, 0.5154, 0.3739, 0.4623, 0.5091,
        0.5485, 0.4861, 0.4924, 0.5455, 0.5098, 0.4776, 0.4751, 0.4918, 0.4989,
        0.4695, 0.4648, 0.5238, 0.4751, 0.4741, 0.5333, 0.5983, 0.6349, 0.5300,
        0.4839, 0.5003, 0.5138, 0.5061, 0.4960, 0.5338, 0.5140, 0.4683, 0.5192,
        0.4790, 0.5048, 0.5459, 0.5300, 0.5848],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 1., 0., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0., 0., 1., 1., 1.,
        0., 0., 0., 0., 1., 0., 0., 1., 1., 0., 1., 0., 0., 0., 1., 1., 1., 1.,
        0., 0., 0., 1., 1., 0., 1., 0., 0., 0., 1., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4294, 0.4890, 0.5755, 0.5484, 0.5087, 0.4486, 0.4582, 0.4775, 0.5415,
        0.4044, 0.5060, 0.4939, 0.5001, 0.4392, 0.4508, 0.3588, 0.4904, 0.4913,
        0.4498, 0.4539, 0.4

('output.squeeze()', tensor([0.5003, 0.5001, 0.4356, 0.4995, 0.5117, 0.5046, 0.5319, 0.4226, 0.5226,
        0.5125, 0.4512, 0.5367, 0.4652, 0.5034, 0.4396, 0.4650, 0.4199, 0.5345,
        0.4290, 0.4740, 0.5194, 0.4565, 0.4941, 0.4795, 0.4201, 0.4064, 0.4961,
        0.5554, 0.5463, 0.4688, 0.4406, 0.3873, 0.4462, 0.4496, 0.5084, 0.4998,
        0.4373, 0.2937, 0.4196, 0.3793, 0.4905, 0.4421, 0.5131, 0.4074, 0.5435,
        0.4508, 0.4452, 0.4655, 0.4601, 0.4804],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 1., 0., 0., 1., 1., 1., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0.,
        1., 1., 1., 1., 1., 0., 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0.,
        1., 0., 1., 0., 1., 0., 1., 1., 0., 1., 0., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4845, 0.4666, 0.4539, 0.3885, 0.5376, 0.4550, 0.4948, 0.5008, 0.4939,
        0.4749, 0.4649, 0.4893, 0.4923, 0.5072, 0.4531, 0.4751, 0.3680, 0.3598,
        0.4744, 0.5286, 0.2

('output.squeeze()', tensor([0.4215, 0.6350, 0.5308, 0.4285, 0.4989, 0.4282, 0.4691, 0.4271, 0.5754,
        0.4670, 0.4754, 0.4562, 0.5043, 0.4441, 0.4806, 0.4657, 0.4365, 0.5285,
        0.4866, 0.4430, 0.5281, 0.4591, 0.4675, 0.5132, 0.5154, 0.4905, 0.4798,
        0.4969, 0.4363, 0.4145, 0.5197, 0.4507, 0.5840, 0.4692, 0.4124, 0.5490,
        0.4965, 0.3000, 0.4149, 0.4596, 0.4906, 0.3789, 0.4886, 0.4724, 0.4814,
        0.4768, 0.4732, 0.4768, 0.4626, 0.4167],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 1., 1., 0., 1., 1., 0., 0., 1., 0., 0., 0., 1., 1., 0., 0., 1.,
        0., 1., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 0., 0.,
        1., 0., 0., 1., 1., 1., 1., 0., 1., 1., 0., 1., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4308, 0.5138, 0.4961, 0.4773, 0.5043, 0.4739, 0.5064, 0.4951, 0.4753,
        0.4726, 0.4864, 0.5148, 0.5086, 0.4795, 0.4930, 0.5252, 0.4441, 0.4676,
        0.4448, 0.4705, 0.4

('output.squeeze()', tensor([0.4608, 0.4445, 0.5137, 0.4736, 0.5010, 0.4865, 0.5129, 0.4742, 0.4899,
        0.4724, 0.5685, 0.4301, 0.5051, 0.4503, 0.4575, 0.5508, 0.4541, 0.5821,
        0.4610, 0.5100, 0.5200, 0.4939, 0.4578, 0.4994, 0.5585, 0.4067, 0.5169,
        0.4613, 0.5127, 0.5100, 0.4808, 0.3797, 0.4551, 0.5075, 0.5070, 0.5578,
        0.5154, 0.5097, 0.5856, 0.5211, 0.5018, 0.4528, 0.4912, 0.5025, 0.4699,
        0.5349, 0.4278, 0.4978, 0.4825, 0.5119],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
        1., 1., 0., 0., 0., 1., 0., 1., 1., 0., 0., 1., 0., 1., 0., 0., 0., 0.,
        0., 1., 0., 0., 1., 0., 0., 0., 0., 1., 1., 1., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5401, 0.5183, 0.4096, 0.4029, 0.4941, 0.4594, 0.5260, 0.5141, 0.5997,
        0.4843, 0.5176, 0.5090, 0.5107, 0.4493, 0.4767, 0.4881, 0.4836, 0.5131,
        0.5014, 0.5263, 0.5

('output.squeeze()', tensor([0.5672, 0.4403, 0.5287, 0.5337, 0.5448, 0.5171, 0.4391, 0.5235, 0.5143,
        0.4162, 0.5432, 0.5082, 0.5018, 0.5598, 0.5050, 0.4389, 0.5143, 0.4449,
        0.4746, 0.4658, 0.5442, 0.5528, 0.5395, 0.4924, 0.4505, 0.5372, 0.5200,
        0.5546, 0.5027, 0.5472, 0.5145, 0.5622, 0.5377, 0.5186, 0.4849, 0.5429,
        0.5147, 0.4225, 0.5623, 0.5655, 0.5236, 0.5094, 0.4430, 0.4821, 0.5304,
        0.3310, 0.4471, 0.4922, 0.5125, 0.5111],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 0., 1., 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 1., 1., 1., 0.,
        1., 1., 0., 1., 0., 1., 1., 0., 1., 0., 0., 0., 1., 0., 0., 1., 1., 1.,
        1., 1., 1., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5168, 0.4284, 0.5128, 0.4837, 0.5010, 0.5123, 0.4632, 0.5202, 0.5617,
        0.5052, 0.4783, 0.5287, 0.5291, 0.5178, 0.5550, 0.5109, 0.4969, 0.4615,
        0.5358, 0.4201, 0.6

('output.squeeze()', tensor([0.5988, 0.4912, 0.5170, 0.3894, 0.3961, 0.5691, 0.5105, 0.4485, 0.4724,
        0.5671, 0.5040, 0.6004, 0.6127, 0.4033, 0.5161, 0.5129, 0.4706, 0.5005,
        0.5449, 0.3630, 0.5580, 0.4377, 0.5533, 0.5169, 0.5132, 0.5265, 0.5291,
        0.5175, 0.5017, 0.4916, 0.5956, 0.5197, 0.5748, 0.4828, 0.5642, 0.4964,
        0.5061, 0.5023, 0.5589, 0.5697, 0.4840, 0.5396, 0.5053, 0.5267, 0.4854,
        0.5092, 0.4631, 0.4437, 0.5291, 0.5759],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 0., 1., 0., 1.,
        0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 0., 1., 1., 1., 0.,
        0., 0., 0., 0., 1., 0., 1., 1., 0., 1., 0., 1., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5827, 0.4755, 0.6070, 0.2916, 0.5938, 0.4599, 0.5788, 0.5017, 0.5015,
        0.4942, 0.4078, 0.4401, 0.4705, 0.5403, 0.4823, 0.4806, 0.5467, 0.5333,
        0.4939, 0.4405, 0.5

('output.squeeze()', tensor([0.6216, 0.5043, 0.5021, 0.6299, 0.4902, 0.4852, 0.4435, 0.5729, 0.4858,
        0.4351, 0.5866, 0.5060, 0.4833, 0.5214, 0.5011, 0.5783, 0.5201, 0.4641,
        0.5642, 0.4850, 0.5315, 0.6478, 0.5149, 0.5053, 0.4929, 0.5517, 0.5259,
        0.5211, 0.4875, 0.3989, 0.4256, 0.5698, 0.5754, 0.4694, 0.4706, 0.4162,
        0.4708, 0.5902, 0.2683, 0.5200, 0.5877, 0.4952, 0.5444, 0.4758, 0.5703,
        0.4870, 0.4862, 0.4708, 0.5397, 0.4584],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 0., 1., 1., 0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 0.,
        0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 1., 0., 1., 0., 0., 1., 0.,
        1., 1., 0., 1., 1., 0., 0., 0., 1., 1., 0., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5498, 0.5364, 0.4158, 0.5424, 0.4928, 0.5863, 0.4987, 0.3988, 0.5756,
        0.4463, 0.4400, 0.5183, 0.4797, 0.4610, 0.5656, 0.4677, 0.5246, 0.5804,
        0.5043, 0.5061, 0.4

('output.squeeze()', tensor([0.5636, 0.4994, 0.4326, 0.5095, 0.5587, 0.4659, 0.5049, 0.4973, 0.4625,
        0.5018, 0.4834, 0.5282, 0.6295, 0.5521, 0.6097, 0.4538, 0.4676, 0.4924,
        0.4912, 0.4973, 0.5236, 0.5432, 0.3470, 0.5489, 0.6022, 0.4316, 0.4949,
        0.4932, 0.4025, 0.4420, 0.4784, 0.5743, 0.4360, 0.5409, 0.4359, 0.5446,
        0.5423, 0.4603, 0.6159, 0.4736, 0.5346, 0.4784, 0.4273, 0.5112, 0.3989,
        0.4838, 0.5232, 0.5336, 0.6512, 0.4728],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 0., 0., 1., 1., 1., 0., 0., 1., 1., 0., 1., 0., 1., 1., 0., 1.,
        1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 1., 0., 0., 0., 1., 0., 1., 1.,
        1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4189, 0.4946, 0.4954, 0.4429, 0.4516, 0.5115, 0.5486, 0.5850, 0.5742,
        0.5745, 0.5130, 0.5331, 0.5264, 0.5438, 0.5756, 0.4886, 0.4730, 0.4886,
        0.4344, 0.4742, 0.5

('output.squeeze()', tensor([0.4770, 0.4405, 0.5136, 0.5708, 0.5168, 0.5368, 0.5362, 0.5566, 0.5522,
        0.4724, 0.6673, 0.5254, 0.5138, 0.4746, 0.5574, 0.5042, 0.5104, 0.5216,
        0.5706, 0.4827, 0.4758, 0.4792, 0.4796, 0.4503, 0.4901, 0.6085, 0.4852,
        0.5304, 0.5019, 0.4666, 0.5191, 0.5199, 0.4923, 0.5857, 0.4754, 0.3768,
        0.4702, 0.5071, 0.4952, 0.5660, 0.4567, 0.4978, 0.4732, 0.5722, 0.5106,
        0.4901, 0.4760, 0.5167, 0.4781, 0.5287],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 1., 1., 0., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0., 1., 1., 0.,
        0., 0., 1., 0., 0., 0., 1., 0., 0., 1., 0., 1., 1., 1., 0., 1., 0., 1.,
        0., 0., 0., 1., 1., 0., 1., 1., 0., 0., 0., 0., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5888, 0.4972, 0.5178, 0.5172, 0.4656, 0.5145, 0.5302, 0.5185, 0.5768,
        0.5484, 0.5123, 0.5369, 0.5639, 0.5155, 0.4315, 0.4640, 0.5246, 0.4940,
        0.5311, 0.4563, 0.5

('Epoch: 2/4...', 'Step: 800...', 'Loss: 0.698850...', 'Val Loss: 0.697738')
('output.squeeze()', tensor([0.5473, 0.5155, 0.5411, 0.5095, 0.5110, 0.5029, 0.4723, 0.4626, 0.4978,
        0.4793, 0.4655, 0.4949, 0.4857, 0.5087, 0.5288, 0.5320, 0.5014, 0.5391,
        0.5206, 0.5381, 0.4653, 0.5366, 0.4472, 0.4766, 0.4943, 0.5218, 0.4404,
        0.5087, 0.5235, 0.4528, 0.5377, 0.4656, 0.5230, 0.4444, 0.4448, 0.5355,
        0.4600, 0.5255, 0.4618, 0.5256, 0.4742, 0.4920, 0.5019, 0.4518, 0.4844,
        0.4937, 0.4731, 0.4701, 0.5327, 0.5180],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 1., 0.,
        0., 0., 1., 0., 0., 0., 0., 1., 0., 1., 0., 1., 1., 1., 1., 1., 1., 1.,
        0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 0., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4583, 0.4993, 0.5101, 0.4765, 0.4866, 0.5591, 0.5654, 0.5127, 0.5144,
        0.5099, 0.6008, 0.5089

('output.squeeze()', tensor([0.5166, 0.5448, 0.5490, 0.5413, 0.4446, 0.5166, 0.5317, 0.4318, 0.4616,
        0.4632, 0.4832, 0.4528, 0.5076, 0.4335, 0.5557, 0.3985, 0.4422, 0.5184,
        0.4914, 0.5066, 0.5061, 0.4533, 0.5677, 0.4559, 0.5171, 0.4423, 0.4719,
        0.4590, 0.3538, 0.4722, 0.5065, 0.5020, 0.5499, 0.3810, 0.4930, 0.5005,
        0.5385, 0.3985, 0.5330, 0.4835, 0.5465, 0.5013, 0.5054, 0.4679, 0.5318,
        0.4655, 0.4778, 0.4702, 0.5575, 0.5304],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 1., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0.,
        0., 1., 1., 0., 1., 0., 0., 0., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1.,
        1., 1., 1., 0., 1., 1., 0., 1., 1., 1., 1., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4638, 0.4670, 0.5503, 0.5712, 0.4999, 0.4417, 0.4314, 0.4795, 0.5302,
        0.5404, 0.4761, 0.4466, 0.4333, 0.4855, 0.5013, 0.4122, 0.4775, 0.4477,
        0.4670, 0.5101, 0.5

('output.squeeze()', tensor([0.5921, 0.4552, 0.3699, 0.4408, 0.5902, 0.5023, 0.5631, 0.5930, 0.5043,
        0.6754, 0.4883, 0.4917, 0.4389, 0.5689, 0.5300, 0.4018, 0.5443, 0.4703,
        0.6581, 0.5034, 0.5507, 0.3962, 0.4503, 0.5271, 0.4932, 0.4722, 0.5938,
        0.5175, 0.3650, 0.4798, 0.4613, 0.5369, 0.4701, 0.5485, 0.4055, 0.4514,
        0.5518, 0.6824, 0.5270, 0.4680, 0.6081, 0.6620, 0.5266, 0.4751, 0.3796,
        0.4529, 0.4766, 0.4506, 0.6547, 0.4737],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 1., 1., 1., 0., 1., 0.,
        1., 0., 0., 0., 1., 0., 0., 1., 1., 0., 0., 0., 1., 1., 0., 1., 0., 0.,
        0., 1., 1., 1., 0., 1., 1., 0., 0., 1., 0., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4269, 0.3152, 0.5380, 0.5807, 0.4101, 0.5298, 0.3260, 0.3278, 0.5386,
        0.4746, 0.3741, 0.5711, 0.4945, 0.4653, 0.4350, 0.5144, 0.4052, 0.4001,
        0.5312, 0.4775, 0.6

('output.squeeze()', tensor([0.4512, 0.3316, 0.4752, 0.7315, 0.4590, 0.5139, 0.6374, 0.5715, 0.3873,
        0.7724, 0.6815, 0.5194, 0.4122, 0.5930, 0.3439, 0.3154, 0.6445, 0.4784,
        0.5569, 0.6775, 0.5091, 0.2560, 0.4821, 0.5487, 0.3278, 0.7633, 0.2435,
        0.4692, 0.3877, 0.6838, 0.6898, 0.4908, 0.3014, 0.7978, 0.4758, 0.3868,
        0.6711, 0.4492, 0.7447, 0.5088, 0.3181, 0.6325, 0.5102, 0.4310, 0.4056,
        0.5957, 0.3866, 0.4765, 0.4783, 0.3258],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0.,
        0., 1., 1., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 1., 1., 0., 0.,
        0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 1., 0., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5854, 0.5570, 0.7231, 0.6303, 0.4744, 0.3697, 0.2922, 0.5331, 0.4710,
        0.3862, 0.3879, 0.7146, 0.4285, 0.5847, 0.6656, 0.4105, 0.4371, 0.5709,
        0.2492, 0.5469, 0.5

('output.squeeze()', tensor([0.4895, 0.5041, 0.7234, 0.4239, 0.5606, 0.3288, 0.7414, 0.5435, 0.4249,
        0.4984, 0.5838, 0.4123, 0.7527, 0.8122, 0.7140, 0.7453, 0.3103, 0.5254,
        0.5166, 0.5988, 0.7168, 0.4002, 0.7594, 0.4361, 0.6392, 0.6600, 0.7244,
        0.6208, 0.3238, 0.5834, 0.7431, 0.2945, 0.7055, 0.2137, 0.5328, 0.5885,
        0.3333, 0.3774, 0.6474, 0.4315, 0.3911, 0.5740, 0.5198, 0.7536, 0.3489,
        0.5601, 0.3761, 0.5648, 0.5352, 0.5144],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 1., 1., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0.,
        0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 0., 1., 1., 0., 1., 0., 0., 1.,
        0., 0., 1., 0., 1., 1., 1., 1., 0., 1., 0., 0., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.6898, 0.8456, 0.4042, 0.5423, 0.3686, 0.5011, 0.5760, 0.8038, 0.3987,
        0.5006, 0.1293, 0.5984, 0.5423, 0.5734, 0.5716, 0.7365, 0.5744, 0.4950,
        0.6138, 0.7126, 0.5

('output.squeeze()', tensor([0.4960, 0.5460, 0.3893, 0.5195, 0.5097, 0.4292, 0.4363, 0.4995, 0.5328,
        0.4279, 0.5320, 0.4216, 0.5196, 0.2772, 0.4560, 0.4202, 0.5563, 0.5810,
        0.5059, 0.5869, 0.4487, 0.6439, 0.5394, 0.4527, 0.5254, 0.5410, 0.5556,
        0.6378, 0.5339, 0.5547, 0.5620, 0.3897, 0.5146, 0.4823, 0.6137, 0.4886,
        0.4254, 0.4530, 0.4249, 0.4471, 0.6537, 0.5543, 0.3259, 0.4673, 0.5714,
        0.3652, 0.6135, 0.3799, 0.7156, 0.4336],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 1., 0., 1., 1., 1., 1.,
        0., 0., 0., 0., 1., 0., 1., 1., 0., 0., 1., 0., 1., 0., 0., 0., 0., 1.,
        0., 0., 1., 0., 1., 0., 0., 0., 1., 1., 0., 0., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4716, 0.5276, 0.5683, 0.3854, 0.5094, 0.3804, 0.5282, 0.3915, 0.6081,
        0.4558, 0.6244, 0.5466, 0.4982, 0.3869, 0.5736, 0.6591, 0.7308, 0.5119,
        0.6062, 0.5931, 0.5

('output.squeeze()', tensor([0.3425, 0.5023, 0.4721, 0.6326, 0.4850, 0.3765, 0.4586, 0.4015, 0.7293,
        0.6355, 0.3375, 0.5808, 0.5811, 0.6376, 0.4906, 0.5876, 0.4431, 0.4146,
        0.5992, 0.4804, 0.5182, 0.5072, 0.3884, 0.4701, 0.5152, 0.5614, 0.4762,
        0.5549, 0.4502, 0.6143, 0.4366, 0.4357, 0.4593, 0.4641, 0.6968, 0.6392,
        0.3857, 0.5164, 0.4564, 0.3688, 0.5249, 0.7252, 0.5343, 0.4197, 0.5702,
        0.5574, 0.5373, 0.4211, 0.5595, 0.5779],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 1., 0., 1., 0., 0., 1., 0., 1., 0., 1., 0., 1., 1., 1., 1., 0.,
        1., 1., 0., 1., 0., 1., 1., 1., 0., 0., 0., 1., 0., 1., 1., 0., 0., 0.,
        1., 0., 1., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5879, 0.5530, 0.5377, 0.5597, 0.3688, 0.4529, 0.5033, 0.7199, 0.6411,
        0.5530, 0.3451, 0.3300, 0.6232, 0.5954, 0.4900, 0.5677, 0.4008, 0.5011,
        0.5225, 0.5182, 0.3

('output.squeeze()', tensor([0.6990, 0.4718, 0.4667, 0.4681, 0.4293, 0.6259, 0.3788, 0.5144, 0.5261,
        0.4944, 0.5699, 0.5072, 0.4704, 0.6871, 0.5207, 0.5697, 0.4800, 0.5481,
        0.4528, 0.4910, 0.4341, 0.4946, 0.4527, 0.6114, 0.7096, 0.3670, 0.5288,
        0.4184, 0.5647, 0.4497, 0.5377, 0.3811, 0.5997, 0.3933, 0.5235, 0.5198,
        0.5081, 0.5284, 0.4211, 0.5369, 0.5090, 0.4443, 0.5165, 0.5956, 0.5352,
        0.5579, 0.6145, 0.4931, 0.4283, 0.5295],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 1., 0., 0., 1., 0., 1., 1., 1.,
        0., 0., 1., 0., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1., 0., 1., 1., 0.,
        0., 0., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4133, 0.6824, 0.4594, 0.5155, 0.7818, 0.4725, 0.4252, 0.4671, 0.4696,
        0.5058, 0.5455, 0.5067, 0.7837, 0.3892, 0.6562, 0.5388, 0.5673, 0.4794,
        0.4119, 0.3585, 0.4

('output.squeeze()', tensor([0.5212, 0.4590, 0.4193, 0.4134, 0.6890, 0.3425, 0.3565, 0.6053, 0.4750,
        0.5083, 0.5598, 0.4212, 0.3744, 0.5630, 0.4984, 0.4042, 0.4330, 0.4420,
        0.4453, 0.4100, 0.5245, 0.4530, 0.3425, 0.7039, 0.6258, 0.5425, 0.4505,
        0.5450, 0.4211, 0.4843, 0.4725, 0.6087, 0.4901, 0.5164, 0.4834, 0.6136,
        0.4372, 0.6653, 0.5305, 0.4224, 0.7069, 0.6315, 0.3683, 0.4973, 0.4407,
        0.3644, 0.6261, 0.7165, 0.5598, 0.5020],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 0., 1., 1., 1., 1., 0., 1., 1., 1., 0., 0., 0., 1., 1., 0.,
        0., 1., 0., 1., 0., 0., 1., 0., 1., 0., 1., 1., 0., 0., 1., 0., 0., 1.,
        1., 0., 0., 1., 1., 0., 0., 1., 0., 0., 1., 0., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4555, 0.6082, 0.4682, 0.4671, 0.5309, 0.5064, 0.5847, 0.4232, 0.4510,
        0.5989, 0.3593, 0.6334, 0.3750, 0.5243, 0.5752, 0.6539, 0.6718, 0.4469,
        0.7252, 0.4526, 0.6

('output.squeeze()', tensor([0.3569, 0.4387, 0.6533, 0.5877, 0.4691, 0.3821, 0.5229, 0.5099, 0.4284,
        0.5325, 0.6230, 0.4572, 0.4769, 0.4418, 0.5147, 0.5451, 0.5210, 0.4586,
        0.4649, 0.5128, 0.6489, 0.3980, 0.3482, 0.5465, 0.6708, 0.4628, 0.4318,
        0.4253, 0.5460, 0.6951, 0.4165, 0.5306, 0.5424, 0.5096, 0.5006, 0.5138,
        0.5582, 0.5040, 0.5961, 0.5941, 0.4861, 0.5488, 0.4514, 0.3547, 0.6102,
        0.6302, 0.3473, 0.4291, 0.3442, 0.4402],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 1., 0., 0., 0., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1., 0.,
        1., 1., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 0.,
        1., 1., 1., 1., 1., 0., 0., 1., 0., 1., 0., 0., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.6111, 0.4972, 0.4397, 0.5439, 0.3974, 0.5101, 0.4345, 0.4901, 0.4866,
        0.4925, 0.5957, 0.5326, 0.5277, 0.5881, 0.4453, 0.4938, 0.4340, 0.2778,
        0.3685, 0.4494, 0.5

('output.squeeze()', tensor([0.4908, 0.5897, 0.5943, 0.4368, 0.4335, 0.5897, 0.3287, 0.3614, 0.4567,
        0.4919, 0.5409, 0.4807, 0.5938, 0.5258, 0.4358, 0.3318, 0.5433, 0.3879,
        0.5449, 0.7453, 0.4973, 0.4964, 0.5193, 0.4611, 0.4643, 0.4663, 0.5747,
        0.5640, 0.3532, 0.4118, 0.5459, 0.3913, 0.4225, 0.4229, 0.4710, 0.5513,
        0.4696, 0.4992, 0.7241, 0.5719, 0.4670, 0.6123, 0.5542, 0.4475, 0.3738,
        0.4441, 0.5114, 0.6566, 0.4964, 0.4471],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 0., 1., 1., 0., 1., 0., 1., 1., 1., 0., 0., 1., 0., 1., 0.,
        1., 1., 1., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1.,
        1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 0., 1., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4244, 0.4411, 0.5679, 0.5783, 0.6035, 0.6029, 0.5807, 0.4864, 0.3570,
        0.5557, 0.4602, 0.4350, 0.5112, 0.6127, 0.3020, 0.4587, 0.6625, 0.4943,
        0.5468, 0.4984, 0.4

('output.squeeze()', tensor([0.5923, 0.5117, 0.6018, 0.6104, 0.4372, 0.6511, 0.5318, 0.5383, 0.4734,
        0.5587, 0.5005, 0.5280, 0.5696, 0.4022, 0.4680, 0.5802, 0.6043, 0.4387,
        0.6052, 0.5280, 0.6212, 0.5725, 0.5229, 0.5937, 0.5554, 0.4634, 0.5427,
        0.4489, 0.6877, 0.5802, 0.5380, 0.6311, 0.5381, 0.3864, 0.6455, 0.3331,
        0.5554, 0.5660, 0.4555, 0.4599, 0.6556, 0.5732, 0.5307, 0.4817, 0.2772,
        0.6070, 0.5616, 0.6184, 0.5266, 0.6287],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 1., 1., 0., 1., 0., 1., 0., 0., 0., 0., 1., 1., 0., 0., 1., 1.,
        1., 1., 0., 1., 1., 1., 1., 1., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5948, 0.4469, 0.5919, 0.4961, 0.6434, 0.5544, 0.3831, 0.4949, 0.6533,
        0.5963, 0.6059, 0.6397, 0.4676, 0.4910, 0.4790, 0.5594, 0.3879, 0.3853,
        0.4989, 0.5583, 0.4

('output.squeeze()', tensor([0.5415, 0.3990, 0.5847, 0.5564, 0.5192, 0.5714, 0.4595, 0.5517, 0.5484,
        0.6027, 0.5145, 0.1942, 0.6862, 0.5746, 0.5285, 0.4487, 0.4989, 0.5011,
        0.5782, 0.4936, 0.4488, 0.5007, 0.4850, 0.4112, 0.5125, 0.5540, 0.6365,
        0.5286, 0.5535, 0.4081, 0.5021, 0.7797, 0.3908, 0.3364, 0.5161, 0.4319,
        0.5368, 0.5772, 0.5867, 0.5897, 0.6362, 0.6020, 0.5136, 0.3937, 0.4264,
        0.3934, 0.4212, 0.5487, 0.6472, 0.5594],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 0., 0., 1., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 0., 1.,
        1., 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 1., 1., 1., 0., 1., 0., 1.,
        0., 1., 1., 0., 1., 0., 0., 1., 0., 1., 1., 1., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.3821, 0.3828, 0.5559, 0.4594, 0.6012, 0.5155, 0.4462, 0.4284, 0.5909,
        0.4612, 0.4092, 0.5656, 0.5910, 0.5673, 0.4684, 0.3998, 0.5542, 0.5256,
        0.4265, 0.3169, 0.4

('output.squeeze()', tensor([0.3684, 0.5080, 0.5178, 0.4816, 0.3338, 0.3480, 0.7003, 0.4039, 0.5703,
        0.5244, 0.5048, 0.4862, 0.4765, 0.7393, 0.6271, 0.4303, 0.5027, 0.5170,
        0.3631, 0.5015, 0.4359, 0.4019, 0.6024, 0.4854, 0.4889, 0.5534, 0.4327,
        0.5137, 0.6269, 0.3371, 0.5746, 0.5834, 0.4900, 0.5864, 0.5230, 0.2922,
        0.3057, 0.4445, 0.6604, 0.4752, 0.4607, 0.4659, 0.4172, 0.3851, 0.5302,
        0.6872, 0.5181, 0.4536, 0.6052, 0.4460],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 1., 0., 0., 0., 1., 1., 0., 0., 1., 1., 0., 1., 0., 1., 1., 1.,
        0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 1.,
        0., 0., 1., 1., 0., 1., 1., 0., 1., 1., 0., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4705, 0.6001, 0.5783, 0.5372, 0.5931, 0.4788, 0.3034, 0.3808, 0.6188,
        0.5023, 0.4414, 0.4700, 0.6086, 0.6346, 0.5962, 0.4509, 0.6220, 0.4339,
        0.4972, 0.3534, 0.4

('output.squeeze()', tensor([0.4055, 0.3568, 0.4791, 0.2498, 0.6180, 0.3586, 0.4742, 0.6213, 0.5744,
        0.5815, 0.2090, 0.5989, 0.2691, 0.4215, 0.3075, 0.5086, 0.1912, 0.4985,
        0.5190, 0.5026, 0.5997, 0.5057, 0.4236, 0.3200, 0.5804, 0.3705, 0.5362,
        0.1496, 0.4265, 0.4118, 0.5478, 0.3949, 0.2986, 0.4318, 0.4203, 0.7126,
        0.5465, 0.4564, 0.6092, 0.4250, 0.3564, 0.4695, 0.5188, 0.4335, 0.4385,
        0.4910, 0.4463, 0.3680, 0.5111, 0.3869],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 0., 1., 1., 0., 1., 0., 0., 0., 1., 1., 1., 1., 1., 0., 1.,
        1., 0., 1., 1., 1., 0., 0., 0., 1., 0., 0., 0., 1., 1., 0., 1., 0., 1.,
        0., 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0.],
       device='cuda:0'))
('Epoch: 3/4...', 'Step: 1000...', 'Loss: 0.648163...', 'Val Loss: 0.717469')
('output.squeeze()', tensor([0.4033, 0.6311, 0.3678, 0.2948, 0.3980, 0.4025, 0.6240, 0.6692, 0.3005,
        0.6336, 0.4523, 0.512

('output.squeeze()', tensor([0.5393, 0.3754, 0.4505, 0.5757, 0.5322, 0.5936, 0.4402, 0.4070, 0.2301,
        0.2291, 0.4849, 0.4392, 0.3608, 0.2814, 0.5603, 0.4109, 0.4788, 0.2944,
        0.4336, 0.4687, 0.4028, 0.4099, 0.4277, 0.5364, 0.5395, 0.6287, 0.5962,
        0.3332, 0.6625, 0.4683, 0.4694, 0.4496, 0.6459, 0.5562, 0.5783, 0.3857,
        0.5380, 0.5246, 0.2529, 0.4605, 0.5419, 0.5181, 0.6843, 0.5984, 0.3817,
        0.4149, 0.6330, 0.4922, 0.4487, 0.5498],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 1., 1., 0., 1., 0., 1., 1., 1., 0., 0., 1., 0., 0., 0., 0., 0.,
        0., 1., 1., 0., 0., 1., 0., 1., 0., 0., 1., 1., 1., 0., 1., 1., 1., 1.,
        0., 1., 1., 0., 0., 0., 1., 1., 0., 0., 1., 1., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5454, 0.4644, 0.4413, 0.5028, 0.6297, 0.4324, 0.5291, 0.3556, 0.5729,
        0.6601, 0.3433, 0.5943, 0.5268, 0.5546, 0.5297, 0.5509, 0.2500, 0.5265,
        0.5596, 0.5290, 0.5

('output.squeeze()', tensor([0.5698, 0.2749, 0.6516, 0.3621, 0.4953, 0.4809, 0.4551, 0.4816, 0.5244,
        0.5917, 0.2596, 0.6836, 0.4650, 0.5011, 0.3962, 0.3915, 0.5658, 0.3300,
        0.5349, 0.5876, 0.4050, 0.6085, 0.5460, 0.3970, 0.5947, 0.5424, 0.3233,
        0.2764, 0.3658, 0.6546, 0.4237, 0.5018, 0.3763, 0.2088, 0.5808, 0.4743,
        0.6550, 0.6455, 0.5177, 0.4615, 0.6760, 0.7375, 0.5151, 0.6040, 0.4712,
        0.4764, 0.4920, 0.3750, 0.4251, 0.4173],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 1., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 0., 0., 1., 0., 0.,
        1., 1., 0., 0., 1., 0., 0., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0.,
        1., 1., 0., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4968, 0.3665, 0.5047, 0.5646, 0.4520, 0.5137, 0.5151, 0.4597, 0.5783,
        0.5962, 0.5904, 0.2362, 0.3411, 0.5837, 0.4176, 0.2949, 0.6910, 0.5765,
        0.4206, 0.4279, 0.4

('output.squeeze()', tensor([0.6002, 0.4335, 0.5340, 0.4357, 0.3902, 0.6607, 0.6501, 0.4337, 0.4709,
        0.5730, 0.3855, 0.5052, 0.4912, 0.4982, 0.5774, 0.3989, 0.4759, 0.4905,
        0.4221, 0.3274, 0.5369, 0.5102, 0.5070, 0.5485, 0.5410, 0.6028, 0.4842,
        0.4998, 0.5011, 0.4935, 0.5148, 0.6026, 0.5636, 0.3533, 0.4917, 0.6750,
        0.4874, 0.6511, 0.4244, 0.5618, 0.5077, 0.5443, 0.7249, 0.4806, 0.4748,
        0.5628, 0.5513, 0.5174, 0.6857, 0.4709],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 1., 1., 0., 1., 1., 1., 0., 0., 1., 0., 1., 1., 1., 0., 0., 0.,
        0., 1., 0., 0., 1., 1., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 1., 1.,
        1., 1., 1., 1., 0., 1., 0., 0., 0., 1., 0., 0., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4987, 0.4887, 0.3457, 0.3726, 0.4998, 0.3903, 0.2685, 0.4881, 0.4837,
        0.5597, 0.5525, 0.5007, 0.5571, 0.4855, 0.4943, 0.3325, 0.6803, 0.3710,
        0.3720, 0.4928, 0.4

('output.squeeze()', tensor([0.5260, 0.5521, 0.5157, 0.5499, 0.5032, 0.3794, 0.6311, 0.6061, 0.5112,
        0.5353, 0.5039, 0.6502, 0.5097, 0.3167, 0.5542, 0.4973, 0.5512, 0.4693,
        0.6813, 0.5771, 0.3758, 0.3699, 0.4122, 0.4397, 0.4910, 0.5423, 0.4678,
        0.5566, 0.4382, 0.4370, 0.6049, 0.5054, 0.3741, 0.7775, 0.2925, 0.4253,
        0.6627, 0.5944, 0.5210, 0.3248, 0.5603, 0.3288, 0.3753, 0.4914, 0.4037,
        0.5534, 0.6053, 0.5734, 0.7323, 0.5223],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 1., 1., 1., 0., 1., 0., 1., 1., 0., 0., 0., 0., 1., 0., 0., 0.,
        0., 1., 0., 0., 1., 0., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 0., 0.,
        0., 1., 0., 0., 0., 0., 1., 1., 0., 0., 1., 0., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4601, 0.7448, 0.5099, 0.5675, 0.3784, 0.5124, 0.4887, 0.5139, 0.4925,
        0.5342, 0.4758, 0.3408, 0.4885, 0.3502, 0.3988, 0.6310, 0.5139, 0.4423,
        0.3265, 0.4767, 0.4

('output.squeeze()', tensor([0.5861, 0.5620, 0.5281, 0.4688, 0.4619, 0.3912, 0.4714, 0.3886, 0.5618,
        0.5081, 0.4706, 0.5524, 0.4921, 0.5526, 0.5605, 0.5428, 0.5495, 0.6833,
        0.3026, 0.5295, 0.6162, 0.6192, 0.4736, 0.5335, 0.5058, 0.3392, 0.5030,
        0.3380, 0.4904, 0.4973, 0.3904, 0.4897, 0.2714, 0.1487, 0.3952, 0.5155,
        0.5184, 0.7178, 0.5646, 0.6258, 0.5699, 0.4369, 0.3500, 0.3352, 0.4042,
        0.5034, 0.4961, 0.4262, 0.6697, 0.6318],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 1., 0., 0., 1., 1., 0., 1., 1.,
        0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 0.,
        1., 1., 0., 1., 0., 0., 0., 1., 0., 1., 0., 1., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.6555, 0.6042, 0.4113, 0.6294, 0.6737, 0.5785, 0.4185, 0.5398, 0.4378,
        0.6609, 0.4395, 0.3572, 0.4586, 0.5504, 0.5818, 0.4888, 0.5970, 0.5108,
        0.4328, 0.3649, 0.5

('output.squeeze()', tensor([0.5180, 0.5400, 0.3709, 0.4568, 0.3955, 0.5176, 0.5364, 0.6551, 0.5305,
        0.3689, 0.5375, 0.5758, 0.5138, 0.6419, 0.5396, 0.6613, 0.3800, 0.6173,
        0.3205, 0.4538, 0.3633, 0.4750, 0.5897, 0.3919, 0.5556, 0.4997, 0.5297,
        0.3831, 0.4241, 0.5098, 0.6526, 0.5017, 0.5291, 0.6560, 0.5505, 0.6908,
        0.4743, 0.5386, 0.3944, 0.6770, 0.6434, 0.6218, 0.5985, 0.4431, 0.5527,
        0.6254, 0.4761, 0.4267, 0.6004, 0.4529],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 0., 1., 0., 0., 0., 1., 0., 0., 1., 1., 0., 1., 1., 1., 1., 1.,
        0., 1., 1., 0., 1., 0., 1., 0., 1., 0., 0., 0., 1., 0., 0., 1., 1., 0.,
        0., 1., 0., 0., 1., 0., 0., 0., 1., 0., 0., 1., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5157, 0.4198, 0.6038, 0.6187, 0.6934, 0.4875, 0.5647, 0.6363, 0.3924,
        0.6988, 0.4896, 0.4693, 0.5069, 0.6010, 0.4837, 0.5552, 0.4774, 0.4904,
        0.4156, 0.4863, 0.4

('Epoch: 3/4...', 'Step: 1100...', 'Loss: 0.670285...', 'Val Loss: 0.714103')
('output.squeeze()', tensor([0.4249, 0.3664, 0.4440, 0.4814, 0.4255, 0.4979, 0.4980, 0.5569, 0.5638,
        0.6545, 0.5663, 0.4608, 0.4025, 0.4158, 0.6174, 0.3987, 0.4888, 0.6603,
        0.4801, 0.5285, 0.6006, 0.6336, 0.3308, 0.5014, 0.6333, 0.3900, 0.3986,
        0.5919, 0.5147, 0.2644, 0.6087, 0.4084, 0.4834, 0.6437, 0.4813, 0.3780,
        0.6139, 0.4408, 0.5243, 0.7214, 0.4332, 0.4563, 0.3752, 0.4744, 0.4625,
        0.3560, 0.3939, 0.5061, 0.6474, 0.5252],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 0., 0., 1., 1., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0.,
        1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 1., 0., 0.,
        0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 1., 0., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4461, 0.6283, 0.5639, 0.5753, 0.6377, 0.4159, 0.4987, 0.2707, 0.3008,
        0.5571, 0.4129, 0.545

('output.squeeze()', tensor([0.4957, 0.6435, 0.5707, 0.4596, 0.7055, 0.6063, 0.5388, 0.5801, 0.4574,
        0.4774, 0.6815, 0.5658, 0.3978, 0.5646, 0.4633, 0.5837, 0.5910, 0.5180,
        0.5750, 0.6488, 0.3693, 0.2943, 0.4979, 0.4547, 0.6552, 0.4044, 0.6859,
        0.6783, 0.5797, 0.3277, 0.4613, 0.5104, 0.5001, 0.4956, 0.5461, 0.7076,
        0.5784, 0.3824, 0.6345, 0.6152, 0.5033, 0.5067, 0.5341, 0.5146, 0.5285,
        0.3435, 0.4607, 0.4705, 0.6101, 0.4198],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 0., 1., 1., 0., 0., 0., 0., 1., 1., 1., 0., 1., 0., 1., 1., 1.,
        1., 1., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 0., 1., 0., 0., 1., 1.,
        1., 1., 1., 0., 0., 1., 1., 0., 0., 1., 0., 1., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4965, 0.4138, 0.4892, 0.5806, 0.4286, 0.6278, 0.6745, 0.4660, 0.5927,
        0.2941, 0.4301, 0.4866, 0.4238, 0.4368, 0.4447, 0.5619, 0.5388, 0.6605,
        0.4885, 0.7678, 0.5

('output.squeeze()', tensor([0.5135, 0.6315, 0.5057, 0.4019, 0.4584, 0.5314, 0.6121, 0.4720, 0.6686,
        0.4263, 0.3498, 0.5931, 0.6718, 0.4043, 0.4944, 0.3743, 0.3714, 0.4819,
        0.6726, 0.6644, 0.5160, 0.4966, 0.6964, 0.4518, 0.3780, 0.5148, 0.4974,
        0.6239, 0.5416, 0.7456, 0.5347, 0.3907, 0.6167, 0.4422, 0.5635, 0.4265,
        0.5884, 0.6790, 0.4950, 0.6568, 0.4536, 0.3759, 0.4269, 0.2920, 0.3550,
        0.2431, 0.3668, 0.5812, 0.4147, 0.3721],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 1., 0., 0.,
        0., 0., 0., 0., 1., 1., 0., 0., 1., 1., 1., 1., 0., 0., 1., 0., 0., 1.,
        1., 1., 1., 1., 1., 1., 0., 0., 1., 1., 0., 0., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5942, 0.3927, 0.7393, 0.4498, 0.7298, 0.4514, 0.3888, 0.3847, 0.5357,
        0.5197, 0.4077, 0.5491, 0.4766, 0.5257, 0.4077, 0.5930, 0.4727, 0.4335,
        0.7160, 0.3055, 0.3

('output.squeeze()', tensor([0.5495, 0.4594, 0.3219, 0.5109, 0.5014, 0.4334, 0.5578, 0.3180, 0.4332,
        0.5411, 0.4047, 0.4963, 0.3792, 0.4299, 0.4544, 0.6699, 0.4800, 0.3891,
        0.6853, 0.4051, 0.4603, 0.3934, 0.3538, 0.4211, 0.4052, 0.3814, 0.6473,
        0.4140, 0.6186, 0.4439, 0.6567, 0.6144, 0.4788, 0.5019, 0.5151, 0.6195,
        0.5627, 0.5415, 0.4687, 0.3671, 0.3948, 0.5543, 0.5705, 0.4436, 0.3991,
        0.4593, 0.6501, 0.7157, 0.4010, 0.3470],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 1., 0., 1., 1., 1., 1., 0., 1., 1., 0., 0., 0., 0., 1., 0., 1.,
        1., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 1., 0., 1., 0.,
        1., 1., 1., 0., 0., 1., 1., 0., 1., 0., 1., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4894, 0.4669, 0.3797, 0.5149, 0.6542, 0.4447, 0.5153, 0.4512, 0.6625,
        0.3910, 0.3997, 0.8082, 0.4155, 0.3697, 0.7055, 0.5760, 0.4505, 0.4738,
        0.5353, 0.4479, 0.5

('output.squeeze()', tensor([0.5350, 0.3728, 0.4879, 0.3006, 0.4133, 0.4734, 0.4428, 0.3053, 0.4315,
        0.4885, 0.4225, 0.6265, 0.5001, 0.4753, 0.4934, 0.4811, 0.5026, 0.3776,
        0.7245, 0.4241, 0.4496, 0.3711, 0.4333, 0.4314, 0.5941, 0.4177, 0.5094,
        0.4360, 0.5783, 0.4729, 0.4677, 0.4084, 0.5592, 0.5500, 0.4329, 0.4461,
        0.4130, 0.7227, 0.4843, 0.6901, 0.4196, 0.3406, 0.4242, 0.4164, 0.4448,
        0.4170, 0.3017, 0.5601, 0.5390, 0.6072],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 0., 1., 1., 0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 1., 0.,
        0., 0., 1., 0., 1., 1., 0., 1., 1., 1., 1., 0., 1., 0., 1., 0., 1., 1.,
        1., 1., 1., 0., 0., 1., 1., 1., 1., 0., 0., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4922, 0.4331, 0.5146, 0.5218, 0.6044, 0.5619, 0.5504, 0.4656, 0.4682,
        0.4060, 0.4601, 0.6030, 0.6347, 0.4698, 0.5231, 0.5619, 0.4550, 0.6044,
        0.3788, 0.3742, 0.5

('output.squeeze()', tensor([0.4805, 0.5342, 0.4405, 0.4009, 0.4444, 0.3869, 0.3897, 0.4787, 0.7066,
        0.4975, 0.4900, 0.4369, 0.4394, 0.4564, 0.3758, 0.4995, 0.4529, 0.3364,
        0.4700, 0.4632, 0.4028, 0.4342, 0.5340, 0.6361, 0.4024, 0.4552, 0.5275,
        0.6540, 0.4654, 0.4138, 0.4100, 0.4595, 0.4824, 0.5330, 0.3983, 0.4981,
        0.4819, 0.4610, 0.3562, 0.5748, 0.4628, 0.4864, 0.4111, 0.4986, 0.4880,
        0.4870, 0.5126, 0.4460, 0.4019, 0.4814],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 0., 1., 1., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 1., 1., 1., 0., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0.,
        0., 1., 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5067, 0.5047, 0.5753, 0.4701, 0.4982, 0.5354, 0.5432, 0.5103, 0.4998,
        0.6036, 0.3391, 0.4266, 0.5392, 0.5557, 0.4601, 0.3926, 0.4897, 0.5338,
        0.4124, 0.3965, 0.4

('output.squeeze()', tensor([0.4699, 0.4539, 0.5063, 0.5335, 0.4697, 0.4618, 0.5356, 0.4563, 0.3466,
        0.4771, 0.6453, 0.3256, 0.6504, 0.5125, 0.3280, 0.5466, 0.5188, 0.5480,
        0.5416, 0.5328, 0.3747, 0.3973, 0.6293, 0.4398, 0.4159, 0.5474, 0.5428,
        0.3675, 0.4302, 0.5453, 0.4816, 0.4535, 0.4191, 0.4608, 0.4661, 0.4331,
        0.5895, 0.4950, 0.4037, 0.4159, 0.5377, 0.4747, 0.4532, 0.5847, 0.4345,
        0.4202, 0.4398, 0.5045, 0.5400, 0.5197],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 1., 1., 1., 0., 1., 0., 0., 1., 0., 1., 1., 1., 1., 0., 0.,
        1., 0., 1., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 1., 0., 0., 1., 1.,
        0., 0., 0., 1., 1., 1., 1., 0., 1., 0., 1., 0., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5995, 0.3684, 0.5630, 0.5255, 0.4682, 0.5045, 0.4927, 0.5036, 0.4522,
        0.6531, 0.5531, 0.3465, 0.4508, 0.5242, 0.4631, 0.5239, 0.3777, 0.7568,
        0.4435, 0.3332, 0.5

('Epoch: 3/4...', 'Step: 1200...', 'Loss: 0.676826...', 'Val Loss: 0.704125')
('output.squeeze()', tensor([0.5438, 0.6175, 0.5126, 0.4979, 0.4583, 0.5047, 0.4733, 0.5231, 0.4406,
        0.4818, 0.4459, 0.4414, 0.4481, 0.5994, 0.4511, 0.4656, 0.4656, 0.4733,
        0.4212, 0.5229, 0.5169, 0.7297, 0.4831, 0.4685, 0.4323, 0.5529, 0.4728,
        0.4821, 0.4507, 0.5419, 0.4938, 0.4014, 0.4507, 0.6731, 0.4096, 0.6860,
        0.6445, 0.4517, 0.5622, 0.4337, 0.4411, 0.6225, 0.4901, 0.5043, 0.6438,
        0.5148, 0.5112, 0.4946, 0.4479, 0.5709],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 1., 0., 1., 1., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0.,
        0., 0., 0., 1., 0., 1., 0., 0., 1., 0., 1., 1., 1., 1., 1., 1., 0., 1.,
        0., 1., 1., 0., 0., 1., 1., 1., 1., 0., 0., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4387, 0.4615, 0.7141, 0.6339, 0.4032, 0.4530, 0.4656, 0.5795, 0.5024,
        0.3673, 0.4211, 0.521

('output.squeeze()', tensor([0.5841, 0.4978, 0.4759, 0.4900, 0.5743, 0.3910, 0.4532, 0.4429, 0.5378,
        0.5413, 0.5454, 0.6958, 0.5229, 0.4978, 0.4199, 0.4156, 0.6558, 0.4895,
        0.6174, 0.6306, 0.4768, 0.5021, 0.3202, 0.5643, 0.5377, 0.4711, 0.6857,
        0.3669, 0.6371, 0.4520, 0.5181, 0.4804, 0.5668, 0.4734, 0.6308, 0.6232,
        0.3683, 0.6284, 0.4808, 0.4669, 0.5181, 0.5765, 0.5625, 0.5657, 0.3548,
        0.4923, 0.5452, 0.4729, 0.6175, 0.4452],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 0., 0., 1., 1., 1., 0., 1., 0., 1., 1., 1., 1., 0., 0., 1., 0.,
        1., 0., 1., 1., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 1., 1.,
        1., 1., 0., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5459, 0.4434, 0.6228, 0.6178, 0.5244, 0.4075, 0.4803, 0.4133, 0.6063,
        0.4848, 0.5124, 0.5545, 0.4386, 0.4174, 0.5933, 0.3542, 0.4650, 0.4875,
        0.4434, 0.6186, 0.3

('output.squeeze()', tensor([0.7279, 0.4176, 0.3663, 0.5606, 0.4639, 0.3402, 0.3766, 0.4229, 0.3550,
        0.4346, 0.4511, 0.3408, 0.5539, 0.4971, 0.4574, 0.7415, 0.5280, 0.8864,
        0.5347, 0.3527, 0.8289, 0.4268, 0.4506, 0.6192, 0.5137, 0.5951, 0.2164,
        0.7622, 0.5299, 0.4669, 0.7823, 0.6144, 0.4437, 0.6206, 0.4834, 0.3991,
        0.3000, 0.3946, 0.4444, 0.4982, 0.3008, 0.4915, 0.3948, 0.4902, 0.4682,
        0.4100, 0.2555, 0.4261, 0.3901, 0.5513],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 0., 0., 0., 1.,
        0., 1., 1., 0., 1., 1., 1., 1., 0., 0., 1., 0., 1., 1., 0., 0., 0., 0.,
        0., 0., 1., 1., 0., 0., 0., 1., 1., 0., 0., 1., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4343, 0.3893, 0.6007, 0.6937, 0.3860, 0.8072, 0.3523, 0.4494, 0.3737,
        0.5294, 0.4292, 0.4150, 0.8377, 0.2898, 0.3832, 0.6529, 0.8146, 0.6462,
        0.4431, 0.5501, 0.2

('output.squeeze()', tensor([0.4435, 0.4932, 0.5836, 0.4414, 0.4841, 0.3538, 0.6090, 0.3951, 0.5512,
        0.8124, 0.5850, 0.7841, 0.7913, 0.3319, 0.7285, 0.6505, 0.7742, 0.5849,
        0.1892, 0.6729, 0.4635, 0.8713, 0.8219, 0.3456, 0.6117, 0.4030, 0.5128,
        0.5570, 0.5140, 0.3179, 0.3281, 0.5701, 0.5538, 0.3594, 0.1941, 0.5051,
        0.8023, 0.6625, 0.4308, 0.3249, 0.5851, 0.5738, 0.2595, 0.3625, 0.2696,
        0.4842, 0.6996, 0.9592, 0.3476, 0.5673],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 1., 1., 0., 1., 1., 0., 0.,
        0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 0., 1.,
        0., 1., 1., 0., 0., 0., 0., 1., 0., 1., 1., 1., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.3791, 0.4850, 0.6452, 0.8399, 0.5250, 0.5652, 0.2368, 0.8794, 0.8562,
        0.2698, 0.3948, 0.5134, 0.7124, 0.4699, 0.8394, 0.7288, 0.6202, 0.4094,
        0.3772, 0.2707, 0.7

('output.squeeze()', tensor([0.4281, 0.3927, 0.8338, 0.4847, 0.4264, 0.6933, 0.6388, 0.7423, 0.4733,
        0.4897, 0.0874, 0.4228, 0.7410, 0.1661, 0.6646, 0.5202, 0.3938, 0.1807,
        0.2010, 0.5556, 0.8665, 0.1418, 0.4570, 0.5701, 0.7815, 0.8091, 0.5474,
        0.3781, 0.5100, 0.6773, 0.3850, 0.4664, 0.2053, 0.3228, 0.5199, 0.3589,
        0.7740, 0.7053, 0.6181, 0.4992, 0.5685, 0.3533, 0.4188, 0.6336, 0.3615,
        0.5836, 0.8387, 0.4701, 0.4659, 0.7094],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0.,
        0., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0., 1., 1., 1., 0., 1., 0., 0.,
        1., 1., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4235, 0.7208, 0.3832, 0.4291, 0.5921, 0.7598, 0.4756, 0.4386, 0.4901,
        0.4159, 0.5191, 0.5790, 0.7184, 0.7026, 0.4514, 0.4555, 0.6576, 0.6627,
        0.4626, 0.6213, 0.8

('output.squeeze()', tensor([0.4918, 0.6609, 0.1215, 0.4128, 0.2697, 0.4822, 0.5369, 0.4132, 0.3666,
        0.4400, 0.5508, 0.2997, 0.2675, 0.4583, 0.4747, 0.4594, 0.3045, 0.5739,
        0.5005, 0.5563, 0.7014, 0.4617, 0.3569, 0.6088, 0.5218, 0.4739, 0.4788,
        0.4980, 0.4478, 0.3782, 0.3575, 0.6650, 0.4712, 0.3212, 0.4667, 0.5181,
        0.3193, 0.5977, 0.6128, 0.4337, 0.3845, 0.5750, 0.5700, 0.2647, 0.6574,
        0.7185, 0.5401, 0.3564, 0.4188, 0.3681],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1.,
        1., 1., 0., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 0., 1., 1.,
        0., 1., 1., 1., 0., 1., 0., 0., 0., 1., 1., 0., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5779, 0.3970, 0.2311, 0.7548, 0.4193, 0.5278, 0.7262, 0.3677, 0.2118,
        0.3889, 0.3034, 0.5030, 0.5033, 0.5372, 0.3441, 0.5947, 0.4296, 0.4994,
        0.4834, 0.3590, 0.3

('output.squeeze()', tensor([0.7768, 0.6541, 0.4606, 0.1586, 0.7052, 0.3923, 0.5929, 0.4216, 0.5671,
        0.3460, 0.5094, 0.2381, 0.6659, 0.2801, 0.5550, 0.7143, 0.5383, 0.5783,
        0.7787, 0.5388, 0.4545, 0.7523, 0.2573, 0.7912, 0.7819, 0.5380, 0.4940,
        0.5338, 0.7266, 0.4177, 0.6505, 0.4834, 0.3979, 0.3376, 0.4324, 0.4812,
        0.4884, 0.4115, 0.4159, 0.4762, 0.4623, 0.4728, 0.4378, 0.7145, 0.8345,
        0.3926, 0.4182, 0.3015, 0.4462, 0.3923],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 0., 0., 1., 0., 1., 1.,
        0., 1., 0., 1., 1., 0., 0., 1., 0., 1., 1., 0., 1., 1., 0., 0., 1., 0.,
        0., 1., 0., 1., 0., 1., 0., 1., 1., 1., 1., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.3800, 0.4205, 0.4247, 0.5089, 0.3330, 0.5302, 0.5458, 0.5664, 0.6707,
        0.6022, 0.5189, 0.5507, 0.4975, 0.4496, 0.3960, 0.1513, 0.3741, 0.7040,
        0.1640, 0.3972, 0.4

('Epoch: 4/4...', 'Step: 1300...', 'Loss: 0.711657...', 'Val Loss: 0.740445')
('output.squeeze()', tensor([0.4334, 0.4639, 0.2941, 0.4997, 0.4159, 0.5980, 0.5203, 0.4971, 0.3891,
        0.5813, 0.3433, 0.2797, 0.3800, 0.5745, 0.3401, 0.3169, 0.3621, 0.3947,
        0.7132, 0.2576, 0.7298, 0.3557, 0.3402, 0.5256, 0.5186, 0.6494, 0.4540,
        0.4913, 0.4893, 0.3864, 0.6905, 0.5253, 0.6008, 0.1388, 0.3485, 0.4074,
        0.7834, 0.3241, 0.6928, 0.6312, 0.8194, 0.7778, 0.4834, 0.6227, 0.4997,
        0.7578, 0.4719, 0.5990, 0.5423, 0.5315],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1.,
        0., 0., 0., 1., 0., 1., 1., 0., 0., 1., 1., 0., 1., 0., 1., 0., 0., 0.,
        0., 0., 1., 0., 0., 0., 1., 1., 1., 1., 0., 0., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4803, 0.4627, 0.2547, 0.4068, 0.5497, 0.5924, 0.3381, 0.5797, 0.7303,
        0.4797, 0.4719, 0.486

('output.squeeze()', tensor([0.5169, 0.4615, 0.4127, 0.3915, 0.4186, 0.5322, 0.5537, 0.4539, 0.4683,
        0.6909, 0.3630, 0.7095, 0.4818, 0.6291, 0.8110, 0.6093, 0.2324, 0.4559,
        0.4833, 0.6346, 0.4876, 0.5892, 0.7106, 0.5548, 0.7051, 0.4833, 0.3473,
        0.5222, 0.4548, 0.4046, 0.7285, 0.3906, 0.6849, 0.5600, 0.4732, 0.1904,
        0.5876, 0.4561, 0.6059, 0.4553, 0.3850, 0.3934, 0.6360, 0.4231, 0.4558,
        0.4646, 0.5779, 0.4590, 0.4802, 0.5067],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 1., 0., 1., 1., 1., 0., 0.,
        0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0.,
        1., 1., 0., 0., 1., 0., 0., 0., 0., 1., 0., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.7717, 0.6412, 0.4142, 0.5975, 0.6194, 0.4889, 0.4606, 0.4516, 0.5111,
        0.5763, 0.4270, 0.5062, 0.4538, 0.5264, 0.6543, 0.4978, 0.5191, 0.4915,
        0.5195, 0.6919, 0.5

('output.squeeze()', tensor([0.7716, 0.4769, 0.6948, 0.3853, 0.7090, 0.4371, 0.2810, 0.3570, 0.4096,
        0.4030, 0.7569, 0.4229, 0.3935, 0.5571, 0.6465, 0.6189, 0.5633, 0.3468,
        0.5059, 0.7062, 0.5202, 0.3643, 0.5372, 0.3712, 0.4072, 0.5949, 0.4749,
        0.5280, 0.5129, 0.4386, 0.3787, 0.6477, 0.5593, 0.4596, 0.7522, 0.4737,
        0.4396, 0.4141, 0.3684, 0.2905, 0.2756, 0.3851, 0.6018, 0.6325, 0.4038,
        0.5967, 0.5278, 0.4322, 0.3712, 0.4995],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 0., 0., 1., 0., 0., 1., 0., 1., 0., 0., 1., 1., 0., 1., 0.,
        1., 0., 0., 1., 1., 0., 0., 1., 0., 1., 1., 0., 0., 0., 0., 0., 1., 0.,
        1., 1., 1., 0., 0., 0., 1., 0., 1., 0., 1., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.3529, 0.3441, 0.5365, 0.3263, 0.6844, 0.5627, 0.3934, 0.7937, 0.4621,
        0.5479, 0.4867, 0.4865, 0.4453, 0.5106, 0.4718, 0.4990, 0.3302, 0.4851,
        0.2317, 0.4753, 0.6

('output.squeeze()', tensor([0.4336, 0.5239, 0.3623, 0.8162, 0.4170, 0.4859, 0.5344, 0.5352, 0.4738,
        0.4895, 0.5226, 0.4863, 0.7446, 0.6845, 0.6063, 0.2307, 0.7851, 0.5915,
        0.4773, 0.6992, 0.3988, 0.3671, 0.7418, 0.4595, 0.5423, 0.6002, 0.3713,
        0.5880, 0.5028, 0.3921, 0.5597, 0.3257, 0.4432, 0.4499, 0.4910, 0.3814,
        0.7067, 0.4964, 0.4358, 0.5513, 0.6848, 0.5314, 0.6430, 0.5259, 0.6329,
        0.6448, 0.6377, 0.3590, 0.6139, 0.4320],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 1., 1., 1., 0., 1., 0., 1., 0., 1., 1., 1., 1., 0., 1., 1.,
        1., 1., 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 1., 1.,
        1., 1., 1., 0., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.7030, 0.6319, 0.4733, 0.5474, 0.4109, 0.6897, 0.2720, 0.4071, 0.6550,
        0.5143, 0.6527, 0.5393, 0.6383, 0.6082, 0.2757, 0.4844, 0.2685, 0.5097,
        0.5481, 0.4742, 0.7

('output.squeeze()', tensor([0.6415, 0.5672, 0.5231, 0.5575, 0.5328, 0.6110, 0.4602, 0.5601, 0.5497,
        0.5038, 0.2896, 0.4364, 0.4999, 0.5970, 0.5646, 0.8049, 0.4062, 0.6292,
        0.4492, 0.4467, 0.5625, 0.3640, 0.5304, 0.4139, 0.5738, 0.5371, 0.6554,
        0.6409, 0.5200, 0.5583, 0.7514, 0.3389, 0.6286, 0.2947, 0.5042, 0.4591,
        0.5576, 0.5697, 0.4996, 0.4800, 0.6493, 0.3983, 0.5426, 0.5288, 0.6396,
        0.5543, 0.5001, 0.7213, 0.5188, 0.3596],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 1., 0., 1., 0., 0., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1.,
        0., 0., 0., 0., 1., 0., 1., 1., 0., 1., 0., 1., 1., 1., 1., 0., 1., 0.,
        0., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5320, 0.4709, 0.6460, 0.4288, 0.7090, 0.4284, 0.5149, 0.3640, 0.5432,
        0.2595, 0.1050, 0.4585, 0.6158, 0.7040, 0.4302, 0.4676, 0.6447, 0.8099,
        0.4091, 0.8371, 0.3

('output.squeeze()', tensor([0.3825, 0.6525, 0.3446, 0.4865, 0.4344, 0.5643, 0.2890, 0.3964, 0.3804,
        0.3927, 0.2927, 0.4953, 0.4737, 0.5111, 0.5430, 0.6013, 0.4199, 0.5887,
        0.3277, 0.5887, 0.3323, 0.6298, 0.6717, 0.7046, 0.5343, 0.3893, 0.4752,
        0.3967, 0.4680, 0.6144, 0.5723, 0.5207, 0.5686, 0.3191, 0.9196, 0.3409,
        0.6133, 0.4901, 0.5102, 0.7046, 0.2735, 0.5654, 0.5342, 0.4282, 0.4656,
        0.7244, 0.4244, 0.5779, 0.5221, 0.4439],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 0., 1., 0., 0., 1., 1., 1., 0., 0., 0., 1., 0., 0., 1., 0., 1.,
        0., 1., 0., 1., 0., 1., 1., 0., 0., 0., 1., 0., 1., 0., 0., 0., 1., 1.,
        1., 1., 0., 1., 1., 1., 0., 0., 1., 1., 0., 1., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.3379, 0.3136, 0.2992, 0.6451, 0.4204, 0.5047, 0.4580, 0.5909, 0.5618,
        0.5334, 0.3764, 0.5709, 0.3451, 0.1663, 0.5407, 0.3266, 0.5385, 0.4057,
        0.5157, 0.4703, 0.6

('output.squeeze()', tensor([0.2844, 0.5941, 0.7068, 0.3139, 0.7864, 0.2335, 0.4377, 0.6095, 0.4393,
        0.4324, 0.5427, 0.6882, 0.2205, 0.3988, 0.3032, 0.2910, 0.7575, 0.3141,
        0.7497, 0.9240, 0.3825, 0.5674, 0.2976, 0.6035, 0.6067, 0.6685, 0.4607,
        0.3996, 0.4772, 0.3434, 0.4696, 0.3828, 0.3996, 0.4024, 0.4059, 0.4286,
        0.4870, 0.5790, 0.5722, 0.7480, 0.3785, 0.7239, 0.4691, 0.8755, 0.2576,
        0.2416, 0.8756, 0.6457, 0.4857, 0.7843],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0.,
        0., 1., 0., 1., 0., 0., 1., 1., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0.,
        0., 1., 0., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5299, 0.4111, 0.6138, 0.7642, 0.2210, 0.3469, 0.3621, 0.7934, 0.5070,
        0.2950, 0.2768, 0.5802, 0.5704, 0.5472, 0.5310, 0.3423, 0.8007, 0.2144,
        0.8849, 0.4033, 0.3

('Epoch: 4/4...', 'Step: 1400...', 'Loss: 0.690795...', 'Val Loss: 0.767156')
('output.squeeze()', tensor([0.9464, 0.4205, 0.3522, 0.3750, 0.2539, 0.4447, 0.7756, 0.3809, 0.3976,
        0.3123, 0.4483, 0.6733, 0.4050, 0.4690, 0.2678, 0.3975, 0.3947, 0.6876,
        0.3521, 0.3811, 0.3120, 0.3251, 0.4057, 0.6926, 0.3464, 0.3684, 0.5295,
        0.4784, 0.2493, 0.2574, 0.3885, 0.5630, 0.4277, 0.4330, 0.6576, 0.6713,
        0.4474, 0.5105, 0.5651, 0.2981, 0.5327, 0.3909, 0.7702, 0.3635, 0.4579,
        0.4041, 0.4467, 0.3446, 0.4662, 0.4410],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 0., 1., 0., 0., 0., 0., 1., 1., 1., 0., 1., 0., 0., 0., 1.,
        0., 1., 1., 0., 0., 1., 1., 1., 1., 0., 0., 0., 1., 0., 0., 1., 0., 1.,
        1., 0., 1., 0., 1., 0., 1., 0., 0., 0., 1., 1., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.6053, 0.4953, 0.7348, 0.4104, 0.3408, 0.7852, 0.4621, 0.5333, 0.2877,
        0.3656, 0.5961, 0.335

('output.squeeze()', tensor([0.4970, 0.3198, 0.5134, 0.4898, 0.4075, 0.4330, 0.5370, 0.3659, 0.4944,
        0.4365, 0.3094, 0.7726, 0.5658, 0.5156, 0.4193, 0.4272, 0.4226, 0.3260,
        0.5132, 0.4359, 0.8621, 0.6124, 0.4385, 0.2451, 0.6159, 0.5395, 0.4288,
        0.4638, 0.6359, 0.1907, 0.5672, 0.6436, 0.3574, 0.5527, 0.3685, 0.4258,
        0.3838, 0.4654, 0.7438, 0.4749, 0.5970, 0.4729, 0.8308, 0.8691, 0.6732,
        0.3689, 0.7580, 0.4170, 0.4056, 0.4834],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 1., 0., 1., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 1.,
        1., 1., 1., 1., 0., 0., 0., 0., 1., 1., 1., 0., 0., 1., 1., 1., 1., 0.,
        0., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 1., 1., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4702, 0.3183, 0.5398, 0.4086, 0.3416, 0.7946, 0.2588, 0.3720, 0.7201,
        0.7976, 0.2215, 0.5845, 0.5795, 0.5065, 0.2415, 0.5376, 0.5345, 0.3912,
        0.4657, 0.7992, 0.4

('output.squeeze()', tensor([0.4629, 0.5428, 0.4575, 0.3388, 0.5174, 0.3838, 0.6886, 0.4616, 0.6123,
        0.5266, 0.5369, 0.7569, 0.4121, 0.5999, 0.3533, 0.3104, 0.6972, 0.4881,
        0.6684, 0.6096, 0.3818, 0.6753, 0.4359, 0.4971, 0.4349, 0.6703, 0.5594,
        0.6405, 0.4605, 0.5363, 0.3531, 0.5442, 0.5403, 0.5893, 0.5781, 0.3830,
        0.6736, 0.3211, 0.4463, 0.6687, 0.7832, 0.3944, 0.7276, 0.3382, 0.5731,
        0.5066, 0.4964, 0.5828, 0.3326, 0.5312],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 0., 0., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 1., 0., 0.,
        1., 0., 0., 1., 1., 1., 0., 1., 1., 0., 0., 1., 0., 0., 0., 0., 1., 1.,
        1., 1., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5400, 0.5847, 0.5960, 0.4894, 0.3717, 0.5534, 0.3537, 0.6114, 0.3362,
        0.7574, 0.4887, 0.4886, 0.5615, 0.3106, 0.4252, 0.5059, 0.5070, 0.4033,
        0.3721, 0.5348, 0.3

('output.squeeze()', tensor([0.5842, 0.5587, 0.4984, 0.5699, 0.4442, 0.3585, 0.5266, 0.6475, 0.5244,
        0.5697, 0.4609, 0.5340, 0.3707, 0.5250, 0.5145, 0.6542, 0.5935, 0.4807,
        0.4832, 0.7120, 0.5001, 0.4768, 0.3801, 0.6223, 0.2753, 0.4131, 0.6722,
        0.3462, 0.6437, 0.6381, 0.5095, 0.4776, 0.4648, 0.4626, 0.3857, 0.7062,
        0.5180, 0.4196, 0.5176, 0.3318, 0.3919, 0.4851, 0.4624, 0.4709, 0.6854,
        0.5756, 0.3400, 0.5706, 0.4848, 0.5215],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 1., 0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0.,
        1., 1., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 1., 1.,
        1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4834, 0.7983, 0.5806, 0.6958, 0.5967, 0.5520, 0.5946, 0.5314, 0.5800,
        0.3427, 0.2748, 0.8499, 0.4723, 0.5177, 0.6589, 0.4113, 0.4187, 0.4200,
        0.4918, 0.4558, 0.3

('output.squeeze()', tensor([0.5317, 0.6002, 0.6355, 0.8096, 0.3645, 0.7581, 0.3110, 0.5215, 0.3835,
        0.4761, 0.3614, 0.5714, 0.4645, 0.7065, 0.3823, 0.7211, 0.6775, 0.4520,
        0.4803, 0.5313, 0.2966, 0.4895, 0.6218, 0.3483, 0.5863, 0.7112, 0.6271,
        0.7173, 0.5007, 0.4588, 0.6683, 0.5882, 0.5387, 0.7389, 0.4594, 0.4712,
        0.5335, 0.5797, 0.2661, 0.4887, 0.2582, 0.4198, 0.3413, 0.5380, 0.5183,
        0.4852, 0.4717, 0.4319, 0.2369, 0.4258],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 0., 1., 0., 0., 0., 0., 1., 0., 1., 0., 1., 1., 1., 1., 1.,
        0., 1., 0., 0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 1., 1., 1., 0., 1.,
        1., 0., 0., 0., 1., 0., 0., 1., 1., 1., 0., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.6349, 0.4395, 0.5228, 0.6132, 0.6673, 0.5379, 0.3907, 0.6752, 0.4080,
        0.5938, 0.6657, 0.5718, 0.5290, 0.4056, 0.4464, 0.6252, 0.3980, 0.4825,
        0.5834, 0.4213, 0.3

('output.squeeze()', tensor([0.2919, 0.5410, 0.5919, 0.7639, 0.3000, 0.5355, 0.4313, 0.3332, 0.6270,
        0.3699, 0.6181, 0.3670, 0.5043, 0.4975, 0.8314, 0.8326, 0.6205, 0.4869,
        0.6236, 0.7639, 0.3805, 0.5700, 0.7941, 0.5561, 0.2572, 0.6954, 0.2226,
        0.5051, 0.5948, 0.7056, 0.5862, 0.4613, 0.5455, 0.7084, 0.6316, 0.3947,
        0.5181, 0.4358, 0.7173, 0.6753, 0.6495, 0.6908, 0.3883, 0.7791, 0.4841,
        0.4511, 0.3022, 0.8112, 0.3579, 0.3405],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 1., 0., 0., 0., 0., 1., 1., 0., 0., 1., 1., 0., 1., 1., 0., 0.,
        1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 0., 0.,
        0., 1., 1., 0., 1., 1., 0., 1., 0., 0., 0., 0., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.6866, 0.4771, 0.7661, 0.4984, 0.2993, 0.5700, 0.4334, 0.5664, 0.5886,
        0.4919, 0.5568, 0.5333, 0.2046, 0.5017, 0.4113, 0.5654, 0.2789, 0.5532,
        0.5214, 0.3247, 0.4

('output.squeeze()', tensor([0.3747, 0.5423, 0.2642, 0.4907, 0.6097, 0.2289, 0.7188, 0.4489, 0.6492,
        0.5017, 0.5907, 0.5674, 0.6091, 0.5117, 0.4607, 0.5201, 0.4964, 0.4966,
        0.2146, 0.6747, 0.5932, 0.7203, 0.3103, 0.3530, 0.6183, 0.3776, 0.7337,
        0.5736, 0.6478, 0.6562, 0.4235, 0.6575, 0.6383, 0.3798, 0.4474, 0.6845,
        0.4552, 0.3788, 0.7631, 0.4170, 0.6867, 0.2417, 0.3364, 0.5747, 0.5845,
        0.6124, 0.3299, 0.4792, 0.4901, 0.4288],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 0., 0., 0., 1., 1., 0., 0., 1., 1., 1., 1., 1., 1., 0., 0., 1., 0.,
        0., 0., 0., 1., 0., 0., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 1.,
        0., 1., 1., 0., 0., 1., 1., 0., 1., 1., 0., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4462, 0.5161, 0.6973, 0.5455, 0.6791, 0.5879, 0.6559, 0.3604, 0.5627,
        0.3030, 0.4667, 0.4020, 0.6649, 0.4387, 0.4689, 0.5082, 0.4889, 0.4024,
        0.4155, 0.5848, 0.4

('Epoch: 4/4...', 'Step: 1500...', 'Loss: 0.717787...', 'Val Loss: 0.729108')
('output.squeeze()', tensor([0.4223, 0.5152, 0.3904, 0.3698, 0.4538, 0.3473, 0.5065, 0.4307, 0.6251,
        0.6222, 0.6126, 0.6059, 0.2866, 0.5396, 0.6499, 0.4261, 0.4324, 0.5879,
        0.4228, 0.5782, 0.4354, 0.4952, 0.5459, 0.5081, 0.4766, 0.4254, 0.3843,
        0.4290, 0.5555, 0.6020, 0.3074, 0.4875, 0.3027, 0.4439, 0.4328, 0.6653,
        0.5591, 0.5007, 0.5474, 0.5217, 0.3894, 0.4469, 0.3537, 0.3365, 0.6075,
        0.7209, 0.5854, 0.6409, 0.1903, 0.7693],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 0., 1., 1., 0., 1., 0., 1., 0., 1., 1., 1., 0., 0., 1., 1., 0.,
        1., 1., 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0., 0.,
        0., 0., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5254, 0.6175, 0.3353, 0.3137, 0.6513, 0.5021, 0.5255, 0.7191, 0.3115,
        0.4539, 0.4293, 0.596

('output.squeeze()', tensor([0.6493, 0.7117, 0.5841, 0.3951, 0.4815, 0.5564, 0.6466, 0.5237, 0.4953,
        0.3599, 0.4987, 0.4578, 0.4739, 0.6810, 0.3528, 0.7157, 0.5164, 0.4050,
        0.4179, 0.4821, 0.7712, 0.4123, 0.4039, 0.7292, 0.5361, 0.3786, 0.5575,
        0.8534, 0.4985, 0.4854, 0.3091, 0.3248, 0.3829, 0.5844, 0.4538, 0.4192,
        0.7300, 0.5221, 0.4278, 0.4823, 0.4913, 0.6279, 0.4397, 0.5024, 0.4310,
        0.3825, 0.1746, 0.5485, 0.6057, 0.5467],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 1., 0., 0., 0., 0., 0., 1., 0., 1., 1., 1., 1., 0., 1., 0., 1.,
        1., 1., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 0., 0., 1., 1.,
        1., 1., 1., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4409, 0.5193, 0.2409, 0.6841, 0.4415, 0.6685, 0.5809, 0.6155, 0.5571,
        0.4442, 0.5224, 0.5804, 0.5839, 0.5047, 0.2685, 0.2841, 0.5849, 0.2199,
        0.4038, 0.2376, 0.4

('output.squeeze()', tensor([0.3241, 0.6285, 0.5671, 0.3677, 0.5285, 0.5652, 0.6904, 0.4261, 0.4037,
        0.4699, 0.6201, 0.2997, 0.3879, 0.4595, 0.5107, 0.5687, 0.5773, 0.3449,
        0.5509, 0.4349, 0.4219, 0.4211, 0.4818, 0.2539, 0.8061, 0.2770, 0.4634,
        0.5514, 0.5720, 0.4034, 0.5061, 0.4387, 0.4636, 0.4862, 0.5012, 0.5787,
        0.4799, 0.5305, 0.5682, 0.4234, 0.6347, 0.5148, 0.3972, 0.5727, 0.3657,
        0.3769, 0.5703, 0.4829, 0.4557, 0.4294],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 1., 0., 1., 0., 0., 1., 0., 0.,
        0., 0., 0., 1., 0., 0., 1., 1., 1., 0., 0., 1., 0., 0., 1., 0., 1., 1.,
        1., 1., 1., 1., 0., 1., 0., 1., 1., 1., 1., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.4261, 0.3021, 0.6125, 0.6470, 0.6506, 0.5049, 0.6737, 0.7294, 0.5589,
        0.5162, 0.4485, 0.6266, 0.6520, 0.5119, 0.4349, 0.4174, 0.4075, 0.4242,
        0.7475, 0.4738, 0.6

('output.squeeze()', tensor([0.7139, 0.5778, 0.4807, 0.5103, 0.5649, 0.6992, 0.3539, 0.4864, 0.4833,
        0.6050, 0.4940, 0.6645, 0.4685, 0.6421, 0.6195, 0.5981, 0.5543, 0.4505,
        0.5540, 0.4722, 0.4387, 0.4580, 0.3918, 0.3952, 0.5123, 0.7564, 0.3650,
        0.4594, 0.5389, 0.6113, 0.4831, 0.6503, 0.4710, 0.3728, 0.5546, 0.5419,
        0.3769, 0.7163, 0.6776, 0.5407, 0.4665, 0.4363, 0.6308, 0.5726, 0.3878,
        0.5715, 0.5233, 0.5654, 0.3987, 0.5447],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 0., 0., 1., 1., 1., 0., 1., 1., 0., 1., 1., 0., 1., 1., 0., 1., 0.,
        1., 1., 1., 0., 1., 1., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0.,
        0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5185, 0.6146, 0.5478, 0.3533, 0.4383, 0.7957, 0.4791, 0.4782, 0.2962,
        0.4780, 0.6427, 0.7312, 0.6484, 0.5812, 0.4868, 0.6932, 0.3388, 0.3941,
        0.6355, 0.5555, 0.5

('output.squeeze()', tensor([0.8521, 0.7131, 0.4522, 0.5126, 0.6309, 0.6899, 0.5790, 0.5306, 0.3343,
        0.7514, 0.4070, 0.5231, 0.4725, 0.5968, 0.4367, 0.4611, 0.4468, 0.3750,
        0.4758, 0.3590, 0.5069, 0.7111, 0.7232, 0.5895, 0.4271, 0.4439, 0.7499,
        0.3397, 0.7714, 0.4750, 0.6586, 0.3264, 0.6292, 0.6009, 0.5158, 0.9031,
        0.5145, 0.4326, 0.6362, 0.4916, 0.6000, 0.4308, 0.5188, 0.6221, 0.6400,
        0.4290, 0.2220, 0.5490, 0.5060, 0.4125],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 0., 1., 0.,
        1., 0., 1., 0., 1., 0., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 1., 1.,
        0., 1., 0., 0., 1., 1., 0., 1., 1., 0., 0., 0., 1., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5040, 0.6444, 0.4494, 0.6223, 0.5638, 0.5198, 0.4699, 0.4707, 0.1597,
        0.8426, 0.5585, 0.5673, 0.8212, 0.2461, 0.6782, 0.2591, 0.2303, 0.6883,
        0.4399, 0.2939, 0.4

('output.squeeze()', tensor([0.3912, 0.4861, 0.4447, 0.4198, 0.5848, 0.3752, 0.5315, 0.4938, 0.6595,
        0.5836, 0.7242, 0.6502, 0.4430, 0.5712, 0.4593, 0.5092, 0.3236, 0.5695,
        0.4037, 0.5959, 0.5663, 0.6023, 0.3106, 0.5063, 0.4591, 0.4823, 0.5980,
        0.5787, 0.5769, 0.5087, 0.4963, 0.5685, 0.4750, 0.6176, 0.4646, 0.1815,
        0.5508, 0.4211, 0.3376, 0.5127, 0.5176, 0.5114, 0.5804, 0.4918, 0.5899,
        0.5120, 0.3322, 0.6382, 0.7590, 0.5406],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([0., 1., 1., 1., 1., 1., 0., 1., 1., 0., 0., 1., 1., 1., 0., 1., 1., 1.,
        1., 1., 0., 0., 0., 1., 0., 0., 0., 1., 1., 0., 1., 0., 0., 1., 1., 0.,
        0., 1., 1., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 1.],
       device='cuda:0'))
('output.squeeze()', tensor([0.5158, 0.2229, 0.4408, 0.5090, 0.3184, 0.5041, 0.3697, 0.5901, 0.5376,
        0.5281, 0.6628, 0.7339, 0.2827, 0.8085, 0.5030, 0.4711, 0.5329, 0.5099,
        0.2711, 0.6546, 0.7

('output.squeeze()', tensor([0.4096, 0.6178, 0.6720, 0.5829, 0.5772, 0.5066, 0.4609, 0.4770, 0.5592,
        0.6180, 0.5242, 0.3924, 0.6307, 0.4704, 0.5319, 0.4873, 0.5184, 0.5114,
        0.5800, 0.5477, 0.4844, 0.6199, 0.4707, 0.5009, 0.5721, 0.5703, 0.5622,
        0.8188, 0.4212, 0.3202, 0.6953, 0.2779, 0.2552, 0.4823, 0.4633, 0.3052,
        0.5848, 0.4747, 0.3695, 0.3497, 0.5069, 0.5710, 0.7460, 0.4340, 0.4530,
        0.6798, 0.2286, 0.4684, 0.4144, 0.4942],
       device='cuda:0', grad_fn=<SqueezeBackward0>))
('labels.float()', tensor([1., 1., 0., 1., 0., 1., 0., 0., 1., 1., 1., 0., 1., 1., 0., 0., 0., 0.,
        0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 0., 0., 0., 1., 0., 0., 1., 0.,
        1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 0., 0., 0., 0.],
       device='cuda:0'))
('output.squeeze()', tensor([0.3646, 0.3905, 0.4847, 0.7052, 0.5742, 0.3810, 0.3114, 0.5667, 0.6098,
        0.2617, 0.6058, 0.6183, 0.6141, 0.6675, 0.3925, 0.2971, 0.4512, 0.5062,
        0.6476, 0.6827, 0.3

('Epoch: 4/4...', 'Step: 1600...', 'Loss: 0.671310...', 'Val Loss: 0.734837')


## Inference
Once we are done with training and validating, we can improve training loss and validation loss by playing around with the hyperparameters. Can you find a better set of hyperparams? Play around with it. 

### Task 10: Prediction Function
Now write a prediction function to predict the output for the test set created. Save the results in a CSV file with one column as the reviews and the prediction in the next column. Calculate the accuracy of the test set.

In [109]:
import csv

def predict():
    net.eval()
    label_dict = {}
    label_dict[0] = 'positive'
    label_dict[1] = 'negative'
    test_h = net.init_hidden(batch_size)
    prediction = []
    acc=0
    avg_acc=0
    with open('sentiment.csv', mode='w') as csv_file:
        fieldnames = ['Ground_Truth', 'Prediction']
        writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
        writer.writeheader()
        
        for inputs,labels in test_loader:
            # Creating new variables for the hidden state, otherwise
            # we'd backprop through the entire training history
            correct=0.0

            #lbl=labels.float()

            #test_h = tuple([each.data for each in test_h])
            if(train_on_gpu):
                inputs, labels = inputs.cuda(), labels.cuda()
            output, test_h = net(inputs, test_h)
            #print(output)
            #prediction.append(label_dict[])
            for op in range(batch_size):
                
                if(output[op]>=0.5 and labels[op]==1):    
                    correct+=1
                
                elif(output[op]<0.5 and labels[op]==0):
                    correct+=1
                    
                if(output[op]>=0.5):
                    pred = 'positive'
                else:
                    pred = 'negative'
                    
                if(labels[op]>=1):
                    gt = 'positive'
                else:
                    gt = 'negative'
                    
                
                writer.writerow({'Ground_Truth': gt, 'Prediction': pred})

            #print('correct',correct)       
            acc=correct/batch_size       
            print("Accuracy=",acc)
            avg_acc+=acc

        print("Avg accuracy = ",avg_acc/len(test_loader))
                       

predict()                

       

('Accuracy=', 0.52)
('Accuracy=', 0.52)
('Accuracy=', 0.5)
('Accuracy=', 0.58)
('Accuracy=', 0.58)
('Accuracy=', 0.52)
('Accuracy=', 0.36)
('Accuracy=', 0.46)
('Accuracy=', 0.48)
('Accuracy=', 0.36)
('Accuracy=', 0.48)
('Accuracy=', 0.46)
('Accuracy=', 0.38)
('Accuracy=', 0.64)
('Accuracy=', 0.4)
('Accuracy=', 0.4)
('Accuracy=', 0.48)
('Accuracy=', 0.5)
('Accuracy=', 0.46)
('Accuracy=', 0.4)
('Accuracy=', 0.46)
('Accuracy=', 0.6)
('Accuracy=', 0.54)
('Accuracy=', 0.44)
('Accuracy=', 0.52)
('Accuracy=', 0.44)
('Accuracy=', 0.4)
('Accuracy=', 0.66)
('Accuracy=', 0.44)
('Accuracy=', 0.48)
('Accuracy=', 0.54)
('Accuracy=', 0.6)
('Accuracy=', 0.46)
('Accuracy=', 0.52)
('Accuracy=', 0.56)
('Accuracy=', 0.52)
('Accuracy=', 0.56)
('Accuracy=', 0.48)
('Accuracy=', 0.52)
('Accuracy=', 0.58)
('Accuracy=', 0.54)
('Accuracy=', 0.44)
('Accuracy=', 0.56)
('Accuracy=', 0.5)
('Accuracy=', 0.58)
('Accuracy=', 0.42)
('Accuracy=', 0.44)
('Accuracy=', 0.48)
('Accuracy=', 0.58)
('Accuracy=', 0.46)
('Avg acc

## Bonus Question: Create an app using Flask

> Extra bonus points if someone attempts this question:
* Save the trained model checkpoints.
* Create a Flask app and load the model. A similar work in the field of CNN has been done here : https://github.com/kumar-shridhar/Business-Card-Detector (Check `app.py`)
* You can use hosting services like Heroku and/or with Docker to host your app and show it to everyone. 
Example here: https://github.com/selimrbd/sentiment_analysis/blob/master/Dockerfile
