# Notes Adrian



- Modified the learning rate in Adam (both models), some further tuning can be done. 
- set log_every_n_steps=100
- since the optimization here is so sensitive to initialization, i put the seed_everything(42) call also before the second model training. in case someone copies just the second model and runs it individually
- reduced max_epochs=3000 (6k steps) to just max_steps=600
- note on reproducibility: always run the notebook from the start for sanity. don't rerun cells with the previous model still instantiated -> Jupyter notebooks are dangerous!


# ------------




# 3 MAIN IDEAS FOR THE LSTM STATQUEST
- How to use TensorBoard to see how the model traied and decide if you should try adding more epochs to training
- How to add extra expochs without having to start over
- How to use PyTorch LSTM class torch.nn.LSTM()

# Questions
- How to we label runs for logging and tensorboard (right now tensorboard had "version_0" vs "version_1"
  and it would be cool if we could have it say "homemade_lstm" vs "nn.LSTM()"
- Is there an easy way to clean up the "lightning_logs" (delete them etc.)?

This is, in theory, a super simple example of how Long Short-Term Memory Neural Networks work. We'll start by implementing a single "memory cell" that we'll use (reusing all the weights and biases) for each element in the input.

First, import the modules...

In [2]:
import torch # torch will allow us to create tensors.
import torch.nn as nn # torch.nn allows us to create a neural network.
import torch.nn.functional as F # nn.functional give us access to the activation and loss functions.
# from torch.optim import SGD # optim contains many optimizers. Here, we're using SGD, stochastic gradient descent.
from torch.optim import Adam, SGD # optim contains many optimizers. Here, we're using Adam

import lightning as L # lightning has tons of cool tools that make neural networks easier
from torch.utils.data import TensorDataset, DataLoader # these are needed for the training data

import matplotlib.pyplot as plt ## matplotlib allows us to draw graphs.
import seaborn as sns ## seaborn makes it easier to draw nice-looking graphs.

## Set the seed so that, hopefully, everyone will get the same results as me.
from pytorch_lightning.utilities.seed import seed_everything

In [11]:
## Here we are implementing an LSTM network by hand...
class BasicLightningTrain(L.LightningModule):

    def __init__(self):
        
        super().__init__()
        
        ###################
        ##
        ## Initialize the tensors for the LSTM
        ##
        ###################
        seed_everything(seed=42)
        
        ## NOTE: nn.LSTM() uses random values from a uniform distribution to initialize the tensors
        ## Here we can do it 2 different ways 1) Normal Distribution and 2) Uniform Distribution
        ## We'll start with the Normal Distribtion...
        mean = torch.tensor(0.0)
        std = torch.tensor(1.0)        
        
        ## NOTE: We can initialize the Weights using the normal distribution (here), 
        ## or the uniform distribiton (below). In this case all Biases are initialized to 0.
        self.stage1shortW = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.stage1inputW = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.stage1B = nn.Parameter(torch.tensor(0.), requires_grad=True)

        self.stage2shortW1 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.stage2inputW1 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.stage2B1 = nn.Parameter(torch.tensor(0.), requires_grad=True)

        self.stage2shortW2 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.stage2inputW2 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.stage2B2 = nn.Parameter(torch.tensor(0.), requires_grad=True)
        
        self.stage3shortW = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.stage3inputW = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.stage3B = nn.Parameter(torch.tensor(0.), requires_grad=True)
        
        ## We can also initialize all Weights and Biases using a uniform distribution. This is
        ## how nn.LSTM() does it.
#         self.stage1shortW = nn.Parameter(torch.rand(1), requires_grad=True)
#         self.stage1inputW = nn.Parameter(torch.rand(1), requires_grad=True)
#         self.stage1B = nn.Parameter(torch.rand(1), requires_grad=True)

#         self.stage2shortW1 = nn.Parameter(torch.rand(1), requires_grad=True)
#         self.stage2inputW1 = nn.Parameter(torch.rand(1), requires_grad=True)
#         self.stage2B1 = nn.Parameter(torch.rand(1), requires_grad=True)

#         self.stage2shortW2 = nn.Parameter(torch.rand(1), requires_grad=True)
#         self.stage2inputW2 = nn.Parameter(torch.rand(1), requires_grad=True)
#         self.stage2B2 = nn.Parameter(torch.rand(1), requires_grad=True)
        
#         self.stage3shortW = nn.Parameter(torch.rand(1), requires_grad=True)
#         self.stage3inputW = nn.Parameter(torch.rand(1), requires_grad=True)
#         self.stage3B = nn.Parameter(torch.rand(1), requires_grad=True)
        
        ## The default learning rate for Adam, the optimizer we are using instead of Stochastic Gradient Descent
        ## is 0.001, which, in this case, will result in needing to take a relatively long time to optimize the LSTM.
        ## The advantage, however, is that we will end up with the exact same Weights and Biases that I used
        ## in the "LSTMs Clearly Explained" StatQuest, which is cool. However, we'll also show how we can
        ## speed up training a ton by setting the learning rate to 0.1.
        self.learning_rate = 0.001 #
    
        ## Lastly, for the logger, we will keep track of which output we are trying to predict
        self.state = 0
        
        
    def lstm_unit(self, input_value, long_memory, short_memory):
        ## NOTES:
        ##  - long term memory is also called "cell state"
        ##  - short term memory is also called "hidden state"
        
        ## The first stage of the LSTM determines what percentage of the old long term memory we need to retain
        old_long_remember_percent = torch.sigmoid((short_memory * self.stage1shortW) + (input_value * self.stage1inputW) + self.stage1B)
        
        ## The second stage of the LSTM determines a potential memory and what percentage should be
        ## added to the current long term memory
        potential_remember_percent = torch.sigmoid((short_memory * self.stage2shortW1) + (input_value * self.stage2inputW1) + self.stage2B1)
        potential_memory = torch.tanh((short_memory * self.stage2shortW2) + (input_value * self.stage2inputW2) + self.stage2B2)
        
        ## The third, and final, stage of the LSTM determines what percentage of the long-term should be
        ## used to create a new short term memory.
        output_percent = torch.sigmoid((short_memory * self.stage3shortW) + (input_value * self.stage3inputW) + self.stage3B)
        
        long_memory = (long_memory * old_long_remember_percent) + (potential_remember_percent * potential_memory)
        short_memory = torch.tanh(long_memory) * output_percent
        return([long_memory, short_memory])
        
    
    def forward(self, input): 
        
        long_memory = 0 # long term memory is also called "cell state" and indexed with c0, c1, ..., cN
        short_memory = 0 # short term memory is also called "hidden state" and indexed with h0, h1, ..., cN
        day1 = input[0]
        day2 = input[1]
        day3 = input[2]
        day4 = input[3]
        
        ## Day 1
        long_memory, short_memory = self.lstm_unit(day1, long_memory, short_memory)
        
        ## Day 2
        long_memory, short_memory = self.lstm_unit(day2, long_memory, short_memory)
        
        ## Day 3
        long_memory, short_memory = self.lstm_unit(day3, long_memory, short_memory)
        
        ## Day 4
        long_memory, short_memory = self.lstm_unit(day4, long_memory, short_memory)
        
        ##### The "output value" (or values) from an LSTM come from the last short term memory
        return short_memory # final value for h4
        
        
    def configure_optimizers(self): # this configures the optimizer we want to use for backpropagation.
        # return Adam(self.parameters(), lr=0.1) # setting the learning rate to 0.1 trains way faster than
                                                 # using the default learning rate, lr=0.001, which requires a lot more 
                                                 # training. However, if we use the default value, we get 
                                                 # the exact same Weights and Biases that I used in
                                                 # the LSTM Clearly Explained StatQuest video...
        return Adam(self.parameters(), lr=self.learning_rate) 
    
    
    def training_step(self, batch, batch_idx): # take a step during gradient descent.
        input_i, label_i = batch # collect input
        output_i = self.forward(input_i[0]) # run input through the neural network
        loss = (output_i - label_i)**2 ## loss = squared residual
                
        ## logging...
        self.log("train_loss", loss)
        ## NOTE: Our dataset consists of two sequences of values representing Company A and Company B
        ## For Company A, the goal is to predict that the value on Day 5 = 0, and for Company B,
        ## the goal is to predict that the value on Day 5 = 1. We use "self.state" to keep track of
        ## which company we just made a prediction for and log that output value so we can see how
        ## well we are predicting each company's value.
        if (self.state == 0):
            self.state = 1
            self.log("out_0", output_i)
        else:
            self.state = 0
            self.log("out_1", output_i)
            
        return loss

In [12]:
## create the training data for the neural network.
inputs = torch.tensor([[0., 0.5, 0.25, 1.], [1., 0.5, 0.25, 1.]])
labels = torch.tensor([0., 1.])

dataset = TensorDataset(inputs, labels) 
dataloader = DataLoader(dataset)

In [13]:
## Create the model object, print out parameters and see how well
## the untrained LSTM can make predictions...
model = BasicLightningTrain() 

print("Before...")
print("Parameters...")
for name, param in model.named_parameters():
    print(name, param.data)

print("\nOutput Values (Predictions)...")
print(model(torch.tensor([0., 0.5, 0.25, 1.])).detach())
print(model(torch.tensor([1., 0.5, 0.25, 1.])).detach())

Global seed set to 42
Global seed set to 42


Before...
Parameters...
stage1shortW tensor(0.3367)
stage1inputW tensor(0.1288)
stage1B tensor(0.)
stage2shortW1 tensor(0.2345)
stage2inputW1 tensor(0.2303)
stage2B1 tensor(0.)
stage2shortW2 tensor(-1.1229)
stage2inputW2 tensor(-0.1863)
stage2B2 tensor(0.)
stage3shortW tensor(2.2082)
stage3inputW tensor(-0.6380)
stage3B tensor(0.)

Output Values (Predictions)...
tensor(-0.0316)
tensor(-0.0323)


In [14]:
## Since the initial predictions are bad, we will train the model
## using a trainer.
model = BasicLightningTrain() 

# trainer = L.Trainer(max_steps=600, log_every_n_steps=1)
trainer = L.Trainer(max_epochs=4000)
trainer.fit(model, train_dataloaders=dataloader)
print("\nAfter...")
## print out the name and value for each parameter
print("Parameters...")
for name, param in model.named_parameters():
    print(name, param.data)

print("\nOutput Values...")
print(model(torch.tensor([0., 0.5, 0.25, 1.])).detach())
print(model(torch.tensor([1., 0.5, 0.25, 1.])).detach())

Global seed set to 42
Global seed set to 42
GPU available: False, used: False
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
HPU available: False, using: 0 HPUs

  | Name | Type | Params
------------------------------
------------------------------
12        Trainable params
0         Non-trainable params
12        Total params
0.000     Total estimated model params size (MB)

  | Name | Type | Params
------------------------------
------------------------------
12        Trainable params
0         Non-trainable params
12        Total params
0.000     Total estimated model params size (MB)


Training: 0it [00:00, ?it/s]


After...
Parameters...
stage1shortW tensor(2.6675)
stage1inputW tensor(1.5465)
stage1B tensor(1.5411)
stage2shortW1 tensor(1.8835)
stage2inputW1 tensor(1.5822)
stage2B1 tensor(0.5024)
stage2shortW2 tensor(1.2124)
stage2inputW2 tensor(0.8829)
stage2B2 tensor(-0.3179)
stage3shortW tensor(4.2505)
stage3inputW tensor(-0.3005)
stage3B tensor(0.4795)

Output Values...
tensor(-0.0511)
tensor(0.9308)


In [None]:
## NOTE: We can run tensorboard inside this notebook or in it's own browser window.
## When we run it in the notebook, it sometimes behaves funny. However, when we run it in its own browser window,
## it works every time, so we'll give it's own browser window.
##
## To run tensorboard in it's own browser window...
## Got to the "File" menu and select "New Launcher". Then scroll down and click on "Terminal"
## In the terminal, navigate to the same directory that contains the "lightning_logs" directory.
## Then in the terminal, enter...
## 
## tensorboard --logdir=lightning_logs/
##
## ...this will then start the tensorboard server and will print out a URL (i.e. http://localhost:6007/ ). Copy the URL
## and paste it into a new browser window and then you are good to go!!!
##
## NOTE: If you are feeling daring and want to run tensorboard inside this notebook just uncomment the code below:
# %reload_ext tensorboard
# %tensorboard --logdir=lightning_logs/

In [15]:
## The logs suggest that maybe more training might help.
## Maybe adding 1000 more epochs will improve the model a little bit more.
path_to_best_checkpoint = trainer.checkpoint_callback.best_model_path ## By default, "best" = "most recent"
print("The new trainer will start where the last left off, and the check point data is here: " + 
      path_to_best_checkpoint + "\n")

trainer = L.Trainer(max_epochs=5000) # before the max epochs as 4000, so we're adding 1000 more
trainer.fit(model, train_dataloaders=dataloader, ckpt_path=path_to_best_checkpoint)
print("\nAfter...")
## print out the name and value for each parameter
print("Parameters...")
for name, param in model.named_parameters():
    print(name, param.data)

print("\nOutput Values...")
print(model(torch.tensor([0., 0.5, 0.25, 1.])).detach())
print(model(torch.tensor([1., 0.5, 0.25, 1.])).detach())

GPU available: False, used: False
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
HPU available: False, using: 0 HPUs
Restoring states from the checkpoint path at /Users/joshstarmer/My Drive/stat_quests/jupyter_notebooks_python_grid/lstm_demo/lightning_logs/version_8/checkpoints/epoch=3999-step=8000.ckpt
Restoring states from the checkpoint path at /Users/joshstarmer/My Drive/stat_quests/jupyter_notebooks_python_grid/lstm_demo/lightning_logs/version_8/checkpoints/epoch=3999-step=8000.ckpt

  | Name | Type | Params
------------------------------
------------------------------
12        Trainable params
0         Non-trainable params
12        Total params
0.000     Total estimated model params size (MB)

  | Name | Type | Params
------------------------------
------------------------------
12        Trainable para

The new trainer will start where the last left off, and the check point data is here: /Users/joshstarmer/My Drive/stat_quests/jupyter_notebooks_python_grid/lstm_demo/lightning_logs/version_8/checkpoints/epoch=3999-step=8000.ckpt



Training: 2it [00:00, ?it/s]


After...
Parameters...
stage1shortW tensor(2.7043)
stage1inputW tensor(1.6307)
stage1B tensor(1.6234)
stage2shortW1 tensor(1.9983)
stage2inputW1 tensor(1.6525)
stage2B1 tensor(0.6204)
stage2shortW2 tensor(1.4122)
stage2inputW2 tensor(0.9393)
stage2B2 tensor(-0.3217)
stage3shortW tensor(4.3848)
stage3inputW tensor(-0.1943)
stage3B tensor(0.5935)

Output Values...
tensor(-0.0781)
tensor(0.9687)


The predictions, -0.08 for Company A and 0.97 for Company B, are pretty good considering the observed values were 0 and 1. However, we could probably improve the predictions even more if we ran the output of the LSTM through a fully connected neural network, where "fully connected neural network" is the most basic type of neural network and what comes to mind when most people say "neural network".

NOTE: These are the Weights and Biases used in the LSTM's Clearly Explained StatQuest...

After...
Parameters...
stage1shortW tensor(2.7043)
stage1inputW tensor(1.6307)
stage1B tensor(1.6234)
stage2shortW1 tensor(1.9983)
stage2inputW1 tensor(1.6525)
stage2B1 tensor(0.6204)
stage2shortW2 tensor(1.4122)
stage2inputW2 tensor(0.9393)
stage2B2 tensor(-0.3217)
stage3shortW tensor(4.3848)
stage3inputW tensor(-0.1943)
stage3B tensor(0.5935)

Output Values...
tensor(-0.0781)
tensor(0.9687)

In [18]:
## Now let's demonstrate how much faster we can train the LSTM by setting the learning rate to 0.1
## This will result in different Weights and Biases than I used in the "LSMTs Clearly Explained", but the
## predictions are just as good, if not better.
model = BasicLightningTrain() 
model.learning_rate = 0.1 # set the learning rate for Adam to 0.1

# trainer = L.Trainer(max_steps=600, log_every_n_steps=1) 
trainer = L.Trainer(max_epochs=300, log_every_n_steps=1) # NOTE: By default L.Trainer() logs every 50 steps, and since we are only
## doing 300 epochs, or 600 steps, that would mean logging only 12 times. So we tell L.Trainer() to log every step.

## NOTE: Before we set max_expochs to 4000 and then did another 1000 epochs after that. Now we are only doing 600 steps, or 
## 300 epochs.
# trainer = L.Trainer(max_epochs=4000)
trainer.fit(model, train_dataloaders=dataloader)
print("\nAfter...")
## print out the name and value for each parameter
print("Parameters...")
for name, param in model.named_parameters():
    print(name, param.data)

print("\nOutput Values...")
print(model(torch.tensor([0., 0.5, 0.25, 1.])).detach())
print(model(torch.tensor([1., 0.5, 0.25, 1.])).detach())


Global seed set to 42
Global seed set to 42
GPU available: False, used: False
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
HPU available: False, using: 0 HPUs

  | Name | Type | Params
------------------------------
------------------------------
12        Trainable params
0         Non-trainable params
12        Total params
0.000     Total estimated model params size (MB)

  | Name | Type | Params
------------------------------
------------------------------
12        Trainable params
0         Non-trainable params
12        Total params
0.000     Total estimated model params size (MB)


Training: 0it [00:00, ?it/s]


After...
Parameters...
stage1shortW tensor(2.5577)
stage1inputW tensor(1.5359)
stage1B tensor(1.5848)
stage2shortW1 tensor(1.4833)
stage2inputW1 tensor(1.9231)
stage2B1 tensor(0.3202)
stage2shortW2 tensor(1.8213)
stage2inputW2 tensor(1.5784)
stage2B2 tensor(-0.5503)
stage3shortW tensor(4.5694)
stage3inputW tensor(-0.1127)
stage3B tensor(0.5870)

Output Values...
tensor(-0.1021)
tensor(0.9868)


BAM!!! By increasing the learning rate from 0.001 to 0.1, we went from needing 5000 epochs to get descent predictions to only needing 300 epochs. The predictions now are a little different, but just as good. We wanted 0 for Company A, and we got -0.1, and we wanted 1 for Company B and we got 0.99.

In [8]:
# %reload_ext tensorboard
# %tensorboard --logdir=lightning_logs/

In [19]:
## now, instead of coding an LSTM by hand, let's see what we can do with PyTorch's nn.LSTM()

class LightningLSTM(L.LightningModule):

    def __init__(self): # __init__() is the class constructor function, and we use it to initialize the weights and biases.
        
        super().__init__() # initialize an instance of the parent class, LightningModule.

        seed_everything(seed=42)
        
        ## input_size = number of features (or variables) in the data. In our example
        ##              we only have a single feature (value)
        ## hidden_size = this determines the dimension of the output
        ##               in other words, if we set hidden_size=1, then we have 1 output node
        ##               if we set hiddeen_size=50, then we hve 50 output nodes (that can then be 50 input
        ##               nodes to a subsequent fully connected neural network.
        self.lstm = nn.LSTM(input_size=1, hidden_size=1) 
        
        self.hidden = (torch.zeros(1,1,1), # init hidden state (short-term memory) to 0
                       torch.zeros(1,1,1)) # init cell state (long-term memory) to 0.

        
        self.state = 0 # this keeps track of which output we are trying to predict for logging
    
    def forward(self, input):
        ## transpose the input vector    
        input_trans = input.view(len(input),1,-1)
        
        # print("input:", str(input) + str(input.shape))
        # print("input_trans:", str(input_trans) + str(input_trans.shape))
        
        ## run it through the LSTM unit (which automatically unrolls for us)
        # lstm_out, self.hidden = self.lstm(input_trans, self.hidden)
        lstm_out, self.hidden = self.lstm(input_trans)
        
        ## lstm_out has the short-term memories for all inputs. We make our prediction with the last one
        prediction = lstm_out[-1] 
        return prediction
        
        
    def configure_optimizers(self): # this configures the optimizer we want to use for backpropagation.
        return Adam(self.parameters(), lr=0.1) ## we'll just go ahead and set the learning rate to 0.1

    
    def training_step(self, batch, batch_idx): # take a step during gradient descent.
        input_i, label_i = batch # collect input
        output_i = self.forward(input_i[0]) # run input through the neural network
        loss = (output_i - label_i)**2 ## loss = squared residual
        
        ## logging...
        self.log("train_loss", loss)
        if (self.state == 0):
            self.state = 1
            self.log("out_0", output_i)
        else:
            self.state = 0
            self.log("out_1", output_i)
            
        return loss

In [20]:
# 
model_lstm = LightningLSTM() # First, make model from the class
model_lstm(torch.tensor([0., 0.5, 0.25, 0.75]))
print("Before...")
## print out the name and value for each parameter
print("Parameters...")
for name, param in model_lstm.named_parameters():
    print(name, param.data)

print("\nOutput Values...")
print(model_lstm(torch.tensor([0., 0.5, 0.25, 1.])).detach())
print(model_lstm(torch.tensor([1., 0.5, 0.25, 1.])).detach())

Before...
Parameters...
lstm.weight_ih_l0 tensor([[ 0.1353],
        [-0.8037],
        [-0.3339],
        [ 0.9626]])
lstm.weight_hh_l0 tensor([[-0.2466],
        [-0.0502],
        [-0.8303],
        [-0.5594]])
lstm.bias_ih_l0 tensor([-0.0204, -0.6212, -0.1240,  0.4070])
lstm.bias_hh_l0 tensor([-0.9782,  0.2970, -0.6612, -0.4881])

Output Values...
tensor([[-0.1920]])
tensor([[-0.1927]])


In [11]:
seed_everything(seed=42)

model_lstm = LightningLSTM() # First, make model from the class

## create the training data for the neural network.
inputs = torch.tensor([[0., 0.5, 0.25, 1.], [1., 0.5, 0.25, 1.]])
labels = torch.tensor([0., 1.])

dataset = TensorDataset(inputs, labels) 
dataloader = DataLoader(dataset)


trainer = L.Trainer(max_steps=600, log_every_n_steps=1)
trainer.fit(model_lstm, train_dataloaders=dataloader)
print("\nAfter...")
## print out the name and value for each parameter
print("Parameters...")
for name, param in model_lstm.named_parameters():
    print(name, param.data)

print("\nOutput Values...")
print(model_lstm(torch.tensor([0., 0.5, 0.25, 1.])).detach())
print(model_lstm(torch.tensor([1., 0.5, 0.25, 1.])).detach())

Global seed set to 42
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

  | Name | Type | Params
------------------------------
0 | lstm | LSTM | 16    
------------------------------
16        Trainable params
0         Non-trainable params
16        Total params
0.000     Total estimated model params size (MB)


Training: 0it [00:00, ?it/s]

`Trainer.fit` stopped: `max_steps=600` reached.



After...
Parameters...
lstm.weight_ih_l0 tensor([[3.5364],
        [1.3869],
        [1.5390],
        [1.2488]])
lstm.weight_hh_l0 tensor([[5.2070],
        [2.9577],
        [3.2652],
        [2.0678]])
lstm.bias_ih_l0 tensor([-0.9143,  0.3724, -0.1815,  0.6376])
lstm.bias_hh_l0 tensor([-1.0570,  1.2414, -0.5685,  0.3092])

Output Values...
tensor([[-0.1887]])
tensor([[0.9752]])


In [14]:
# %reload_ext tensorboard
# %tensorboard --logdir=lightning_logs/

Reusing TensorBoard on port 6006 (pid 63773), started 13:27:43 ago. (Use '!kill 63773' to kill it.)

In [None]:
# tensorboard --logdir=lightning_logs/

In [15]:
test = nn.LSTM(input_size=1, hidden_size=1)
print("Parameters...")
for name, param in test.named_parameters():
    print(name, param.data)

Parameters...
weight_ih_l0 tensor([[-0.1648],
        [-0.5693],
        [-0.1617],
        [ 0.8111]])
weight_hh_l0 tensor([[-0.7420],
        [ 0.2270],
        [-0.9828],
        [ 0.5243]])
bias_ih_l0 tensor([0.3695, 0.0424, 0.4292, 0.0011])
bias_hh_l0 tensor([ 0.5534, -0.7916, -0.1469,  0.4436])
