# Quick Tutorial for the Revised Tensor Network Code
Here's a quick overview of how to use the revised code for optimizing a tensor network (TN) with fixed ranks _(credit: Michelle for writing the initial version of the code)_. The current version is a bit more modular, so that we can use the same continuous optimization routine for each of the problems being solved. The ingredients that still need to be implemented are:

1. Code implementing the different discrete optimization procedures
2. Loss function for the tensor completion task, which takes in our TN and dataset of known tensor elements, and returns a loss based on the average loss in each of these known elements

Before starting those pieces though, it would be helpful to first understand how to interface with the continuous optimization code

In [1]:
import sys

import torch
import tensornetwork as tn

# Make sure notebook can find the core code
sys.path.append("..")
import core_code as cc

# Set random seed for reproducibility
_ = torch.manual_seed(0)

## Initializing TNs
Tensor networks of arbitrary rank are initialized with the `random_tn` function, with the rank being specified in one of several ways. Scalar values of the `rank` argument give constant-rank TNs (default: rank=1), but individual ranks are also possible using a list as input. Note how the ranks are being displayed, with upper diagonal entries giving the TN ranks, and diagonal entries giving the input dimension of each core _(credit: Meraj for the idea of putting the input dims on the diagonals)_

In [2]:
# The dimensions of each of the inputs to our tensor network (TN)
input_dims = [2, 4, 5, 6]

# Initialize a random rank-1 TN (default rank is 1)
example_tn = cc.random_tn(input_dims)

print("Rank-1 TN has ranks")
cc.print_ranks(example_tn)
print("...and input dimensions")
print(cc.get_indims(example_tn))

# TNs can be expanded into dense tensors with expand_network
print(f"\nShape of TN is {cc.expand_network(example_tn).shape}")

Rank-1 TN has ranks
tensor([[2, 1, 1, 1],
        [0, 4, 1, 1],
        [0, 0, 5, 1],
        [0, 0, 0, 6]])
...and input dimensions
(2, 4, 5, 6)

Shape of TN is torch.Size([2, 4, 5, 6])


In [3]:
# Random TNs with higher ranks can also be defined
big_tn = cc.random_tn(input_dims, rank=10)

print("Rank-10 TN has ranks")
cc.print_ranks(big_tn)
print(f"...and has {cc.num_params(big_tn)} parameters")
print()

# As our trainable model, let's use a rank-3 TN
base_tn = cc.random_tn(input_dims, rank=3)

# Individual ranks of TN edges can be set with upper-triangular format
# (this is the TN used as a target in the following)
rank_list = [[1,2,3], 
               [5,8], 
                [13]]
goal_tn = cc.random_tn(input_dims, rank=rank_list)

print("Irregularly-shaped goal TN has ranks")
cc.print_ranks(goal_tn)
print(f"This TN has {cc.num_params(goal_tn)} parameters")

Rank-10 TN has ranks
tensor([[ 2, 10, 10, 10],
        [ 0,  4, 10, 10],
        [ 0,  0,  5, 10],
        [ 0,  0,  0,  6]])
...and has 17000 parameters

Irregularly-shaped goal TN has ranks
tensor([[ 2,  1,  2,  3],
        [ 0,  4,  5,  8],
        [ 0,  0,  5, 13],
        [ 0,  0,  0,  6]])
This TN has 2694 parameters


## Tensor Recovery

Let's now see how we use continuous optimization in a tensor recovery problem. The key difference with the earlier version of the code is that a loss function is input to `continuous_optim`, which specifies the type of problem being solved.

In this case, we use `tensor_recovery_loss`, but generally the loss function must take in the TN being trained and a problem-dependent data format, and return a loss. In other words:

`loss_value = loss_fun(our_tn, target_data)`

In [4]:
# To train, the tensor network cores must first be made trainable
base_tn = cc.make_trainable(base_tn)

# continuous_optim requires the following as input:
# (1) A tensor network model, base_tn
# (2) A target dataset, in this case just goal_tn
# (3) A loss function of the form loss_fun(base_tn, batch), where 
#     batch is a minibatch of training data (just goal_tn here)
# The choice of (2)+(3) fully determines the learning task, with other 
# problems taking regular datasets as inputs

# For tensor recovery, use tensor_recovery_loss from core_code module
# Remember, goal_tn is the weirdly-shaped TN and base_tn has rank 3
loss_fun = cc.tensor_recovery_loss
trained_tn, init_loss, final_loss = cc.continuous_optim(base_tn, goal_tn, 
                                                        loss_fun)
print(f"Train loss went from {init_loss:.3f} to {final_loss:.3f} in 10 epochs\n")

# Note that trained_tn gives the model after training, which will be 
# needed for discrete optimization algorithm. To continue training the 
# trained model, just run the same code above, but with trained_tn as input
_, init_loss, final_loss = cc.continuous_optim(trained_tn, goal_tn, loss_fun)
print("Note how the loss continued decreasing from where it had left off")

  EPOCH 1 
    Train loss: 10856.338
  EPOCH 2 
    Train loss: 10659.803
  EPOCH 3 
    Train loss: 10486.983
  EPOCH 4 
    Train loss: 10333.783
  EPOCH 5 
    Train loss: 10196.966
  EPOCH 6 
    Train loss: 10073.937
  EPOCH 7 
    Train loss: 9962.594
  EPOCH 8 
    Train loss: 9861.219
  EPOCH 9 
    Train loss: 9768.391
  EPOCH 10 
    Train loss: 9682.927

Train loss went from 10856.338 to 9682.927 in 10 epochs

  EPOCH 1 
    Train loss: 9603.833
  EPOCH 2 
    Train loss: 9530.271
  EPOCH 3 
    Train loss: 9461.524
  EPOCH 4 
    Train loss: 9396.978
  EPOCH 5 
    Train loss: 9336.105
  EPOCH 6 
    Train loss: 9278.445
  EPOCH 7 
    Train loss: 9223.598
  EPOCH 8 
    Train loss: 9171.211
  EPOCH 9 
    Train loss: 9120.974
  EPOCH 10 
    Train loss: 9072.611

Note how the loss continued decreasing from where it had left off


## Customizing the Continuous Optimization Procedure

Although we can't directly tweak the continuous_optim code for different problem types, we still have a lot of flexibility owing to the `other_args` argument (a dictionary). Let's explore the current options for `other_args`, and more options can be added later if needed.

In [5]:
# 10 epochs is the default, but you can change this
cc.continuous_optim(base_tn, goal_tn, loss_fun, epochs=2)

# Feeding a dictionary as other_args arg of continuous_optim lets you 
# control the optimizer (chosen from torch.optim), and lots else
print("Does Adam do any better than SGD? Let's find out!")
adam = {'optim': 'Adam'}   # Default: 'SGD'

_ = cc.continuous_optim(base_tn, goal_tn, loss_fun, other_args=adam)
print("Nope\n")

# Other important arguments are learning rate and batch size, shown
# below with their default values
other_args = {'lr':    1e-3,
              'batch': 100}

# You can also run optimization silently via the `print` argument
silent = {'print': False}      # Default: True
print("Beginning silent training...")
_, init_loss, final_loss = cc.continuous_optim(base_tn, goal_tn, loss_fun, 
                                               other_args=silent)
print("Silent training finished")
print(f"Train loss went from {init_loss:.3f} to {final_loss:.3f} in 10 epochs\n")

# For tensor recovery, there is only one item in our dataset (goal_tn), 
# leading to only one gradient step per epoch. Using the `reps` argument 
# can reduce printing by going through training data many times per epoch
print("For recovery, it's useful to go through the dataset many times per epoch")
print("Note: This is just a trick to avoid excessive printing")
lotsa_reps = {'reps': 10}   # Default: 1
_ = cc.continuous_optim(base_tn, goal_tn, loss_fun, other_args=lotsa_reps)

  EPOCH 1 
    Train loss: 10856.338
  EPOCH 2 
    Train loss: 10659.803

Does Adam do any better than SGD? Let's find out!
  EPOCH 1 
    Train loss: 10856.338
  EPOCH 2 
    Train loss: 10849.237
  EPOCH 3 
    Train loss: 10842.151
  EPOCH 4 
    Train loss: 10835.081
  EPOCH 5 
    Train loss: 10828.027
  EPOCH 6 
    Train loss: 10820.989
  EPOCH 7 
    Train loss: 10813.968
  EPOCH 8 
    Train loss: 10806.963
  EPOCH 9 
    Train loss: 10799.975
  EPOCH 10 
    Train loss: 10793.005

Nope

Beginning silent training...
Silent training finished
Train loss went from 10856.338 to 9682.927 in 10 epochs

For recovery, it's useful to go through the dataset many times per epoch
Note: This is just a trick to avoid excessive printing
  EPOCH 1 (10 reps)
    Train loss: 10188.294
  EPOCH 2 (10 reps)
    Train loss: 9319.555
  EPOCH 3 (10 reps)
    Train loss: 8833.026
  EPOCH 4 (10 reps)
    Train loss: 8426.848
  EPOCH 5 (10 reps)
    Train loss: 7988.277
  EPOCH 6 (10 reps)
    Train lo

## Regression of Scalar-Valued Function

Continuous optimization for function regression works exactly the same as for tensor recovery, but with a different loss function and target dataset format. To generate this data from a TN, use `generate_regression_data`

In [6]:
# The generate_regression_data function takes in a target TN and a dataset
# size, and produces a pair of random inputs and associated (noisy) outputs
num_train = 10000
# The noise argument sets the StDev of Gaussian noise added to outputs
# (default: 1e-6)
train_data = cc.generate_regression_data(goal_tn, num_train, noise=1e-6)

# For regression, use cc.regression_loss
loss_fun = cc.regression_loss
_ = cc.continuous_optim(base_tn, train_data, loss_fun)

# Since we're doing machine learning, it's good to have a held-out validation 
# set to determine loss, early stopping, etc. This is easy to do
num_val = 1000
val_data = cc.generate_regression_data(goal_tn, num_val)
print("Same training process, but with validation data")
_ = cc.continuous_optim(base_tn, train_data, loss_fun, val_data=val_data)

# It appears there's a lot of overfitting going on! In this case, we can
# use the validation loss to choose the stopping time, which is done by
# setting epochs=None in continuous_optim
_, _, best_loss = cc.continuous_optim(base_tn, train_data, loss_fun, 
                                      val_data=val_data, epochs=None)
print(f"Lowest validation error was {best_loss:.3f}")

  EPOCH 1 
    Train loss: 9357.021
  EPOCH 2 
    Train loss: 8719.072
  EPOCH 3 
    Train loss: 8234.186
  EPOCH 4 
    Train loss: 7839.052
  EPOCH 5 
    Train loss: 7498.393
  EPOCH 6 
    Train loss: 7191.073
  EPOCH 7 
    Train loss: 6903.963
  EPOCH 8 
    Train loss: 6629.097
  EPOCH 9 
    Train loss: 6362.363
  EPOCH 10 
    Train loss: 6102.885

Same training process, but with validation data
    Val. loss:  10309.398
  EPOCH 1 
    Train loss: 9357.021
    Val. loss:  10387.790
  EPOCH 2 
    Train loss: 8719.072
    Val. loss:  10455.533
  EPOCH 3 
    Train loss: 8234.186
    Val. loss:  10514.973
  EPOCH 4 
    Train loss: 7839.052
    Val. loss:  10568.012
  EPOCH 5 
    Train loss: 7498.393
    Val. loss:  10616.134
  EPOCH 6 
    Train loss: 7191.073
    Val. loss:  10660.522
  EPOCH 7 
    Train loss: 6903.963
    Val. loss:  10702.162
  EPOCH 8 
    Train loss: 6629.097
    Val. loss:  10741.914
  EPOCH 9 
    Train loss: 6362.363
    Val. loss:  10780.555
  EPOC

## Weird Behavior in Regression

Strangely, the validation error for function regression is increasing with time, which should not be happening. I played around with different target tensors, and this doesn't always occur. However, the validation loss is consistently much larger than the training loss for function regression. Some experimentation is needed here!

Another issue comes at the end of training, where the loss seems to oscillate a lot around a final value. Using a learning rate that decreases throughout the training would likely be beneficial, and could improve the problem mentioned above. This isn't currently supported, but would be easy to implement by passing in scheduler-specific arguments via `other_args`, and initializing one of the schedulers in `torch.optim.lr_scheduler`.

Lastly, here's a quick example of how you can set up an experiment to test some behavior. I'm looking at the relative performance of different optimizers, as tested by how much they decrease the tensor recovery loss after 200 epochs of training. Surprisingly, a lot of the fancy optimizers do very poorly, suggesting that what works for gradient descent with neural networks might not be a good fit for tensor networks (at least, the fully-connected tensor networks we have here).

In [7]:
# Example mini-experiment to compare performance of different optimizers

candidate_optims = ['Adadelta', 'Adagrad', 'Adam', 'AdamW', 
                    'Adamax', 'RMSprop', 'Rprop', 'SGD']
loss_fun = cc.tensor_recovery_loss
percent_dec = {}
my_args = {'lr': 1e-3, 'print': False}
print("Testing optimizers...")
for optim in candidate_optims:
    print("  " + optim)
    my_args['optim'] = optim
    _, loss_i, loss_f = cc.continuous_optim(base_tn, goal_tn, loss_fun, 
                                            epochs=200, other_args=my_args)
    percent_dec[optim] = 100 * (loss_i-loss_f) / (loss_i)

print("The ranking in loss decrease is (max is 100%)")
percent_dec = sorted(percent_dec.items(), key=lambda p_dec: p_dec[1])
for optim, p_dec in percent_dec:
    print(f"  {optim+':':<9} {p_dec:.2f}% decrease")

Testing optimizers...
  Adadelta
  Adagrad
  Adam
  AdamW
  Adamax
  RMSprop
  Rprop
  SGD
The ranking in loss decrease is (max is 100%)
  Adadelta: 0.06% decrease
  Adagrad:  1.69% decrease
  Adamax:   10.11% decrease
  Adam:     10.20% decrease
  AdamW:    10.29% decrease
  RMSprop:  14.90% decrease
  SGD:      94.69% decrease
  Rprop:    99.95% decrease
