<a href="https://colab.research.google.com/github/Jan-Agatz/loss-landscape/blob/no_mpi/MML_Paper_04_Implementations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Mounting Google Drive to access and save files.

In [None]:
NAME = "JHA"

if NAME == "JHA":
  MOUNT_PATH = "/content/drive"
  CODE_PATH  = "/content/drive/MyDrive/Studium/WS 2023/MML/Paper 04/code"

In [None]:
from google.colab import drive
drive.mount(CODE_PATH)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
%cd $CODE_PATH
%ls

/content/drive/MyDrive/Studium/WS 2023/MML/Paper 04/code
 [0m[01;34mcifar10[0m/            mpi4pytorch.py          [01;34m__pycache__[0m/
 dataloader.py       net_plotter.py          README.md
 evaluation.py       pbs_job_launch.py       scheduler.py
 h52vtp.py           plot_1D.py              [01;34mscript[0m/
 h5_util.py          plot_2D.py              test.py
 hess_vec_prod.py    plot_hessian_eigen.py   _weights.h5
 launch_cluster.py   plot_surface.py        '_weights.h5_[-1.0,1.0,101]x[-1.0,1.0,101].h5'
 LICENSE             plot_trajectory.py
 model_loader.py     projection.py


## Installing libraries (with the newest versions)

In [None]:
!pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Looking in indexes: https://download.pytorch.org/whl/cu118


In [None]:
!pip install mpi4py



In [None]:
# Checks whether the necessary libraries have already been installed
try:
  import os
  import time
  import torch
  import torchvision
  import numpy as np
  import mpi4py
  import h5py
  import matplotlib
  import scipy
  from sklearn import datasets
except:
  print("Install the necessary libraries.")

In [None]:
# Check whether CUDA is available in the current environment and adjusts the
# bash commands accordingly
CUDA_AVAILABLE = torch.cuda.is_available()

if CUDA_AVAILABLE:
  CUDA_FLAG = "--cuda"
else:
  CUDA_FLAG = ""

## Creating 1D linear interpolations

In [None]:
!python plot_surface.py --mpi $CUDA_FLAG --model vgg9 --x=-0.5:1.5:51 --dir_type states \
--model_file cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1/model_300.t7 \
--model_file2 cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=8192_wd=0.0_save_epoch=1/model_300.t7\
--plot --timestamp --fileformat png

Using manual seed: 123
The current seed is 123.
Rank 0 use GPU 0 of 1 GPUs on 7dc99e10bf0f
-------------------------------------------------------------------
setup_direction
-------------------------------------------------------------------
cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1_model_300.t7_vgg9_sgd_lr=0.1_bs=8192_wd=0.0_save_epoch=1_model_300.t7_states.h5 is already set up
cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1_model_300.t7_vgg9_sgd_lr=0.1_bs=8192_wd=0.0_save_epoch=1_model_300.t7_states.h5_[-0.5,1.5,51].h5 is already set up
True
Files already downloaded and verified
Files already downloaded and verified
Computing 0 values for rank 0
Rank 0 done!  Total time: 0.00 Sync: 0.00
------------------------------------------------------------------
plot_1d_loss_err
------------------------------------------------------------------
<KeysViewHDF5 ['test_acc', 'test_loss', 'train_acc', 'train_loss', 'xcoordinates']>
train_loss
[6.18976120e+00 4

## Producing plots along random normalized directions

In [None]:
!python plot_surface.py --mpi $CUDA_FLAG --model vgg9 --x=-1:1:51 \
--model_file cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1/model_300.t7 \
--dir_type weights --xnorm filter --xignore biasbn\
--plot --timestamp --fileformat png

Using manual seed: 123
The current seed is 123.
Rank 0 use GPU 0 of 1 GPUs on 7dc99e10bf0f
-------------------------------------------------------------------
setup_direction
-------------------------------------------------------------------
cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1/model_300.t7_weights_xignore=biasbn_xnorm=filter.h5 is already set up
cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1/model_300.t7_weights_xignore=biasbn_xnorm=filter.h5_[-1.0,1.0,51].h5 is already set up
True
Files already downloaded and verified
Files already downloaded and verified
Computing 0 values for rank 0
Rank 0 done!  Total time: 0.00 Sync: 0.00
------------------------------------------------------------------
plot_1d_loss_err
------------------------------------------------------------------
<KeysViewHDF5 ['dir_file', 'test_acc', 'test_loss', 'train_acc', 'train_loss', 'xcoordinates']>
train_loss
[8.31840509e+00 7.55533535e+00 6.80431453e+00 6.06600893e+00


## Visualizing 2D loss contours

call without cuda (sorry need it to copy and paste)


```
python plot_surface.py --mpi --model resnet56 --x=-1:1:5 --y=-1:1:5 \
--model_file cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7 \
--dir_type weights --xnorm filter --xignore biasbn --ynorm filter --yignore biasbn  --plot
```


```
python plot_surface.py --mpi --model resnet56 --x=-1:1:5 --y=-1:1:5 \
--xnorm filter --xignore biasbn --ynorm filter --yignore biasbn  --plot
```


```
mpirun -n 4 python plot_surface.py --mpi --model resnet56 --x=-1:1:51 --y=-1:1:51 \
--model_file cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7 \
--dir_type weights --xnorm filter --xignore biasbn --ynorm filter --yignore biasbn  --plot
```





In [None]:
!python plot_surface.py --mpi $CUDA_FLAG --model resnet56 --x=-1:1:5 --y=-1:1:5 \
--model_file cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7 \
--dir_type weights --xnorm filter --xignore biasbn --ynorm filter --yignore biasbn\
--plot --timestamp --fileformat png

Using manual seed: 123
The current seed is 123.
Rank 0 use GPU 0 of 1 GPUs on 7dc99e10bf0f
-------------------------------------------------------------------
setup_direction
-------------------------------------------------------------------
cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7_weights_xignore=biasbn_xnorm=filter_yignore=biasbn_ynorm=filter.h5 is already set up
cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7_weights_xignore=biasbn_xnorm=filter_yignore=biasbn_ynorm=filter.h5_[-1.0,1.0,5]x[-1.0,1.0,5].h5 is already set up
True
cosine similarity between x-axis and y-axis: 0.003037
Files already downloaded and verified
Files already downloaded and verified
Computing 0 values for rank 0
Rank 0 done!  Total time: 0.00 Sync: 0.00
------------------------------------------------------------------
plot_2d_contour
------------------------------------------------------------------
loading surface file: cifar10/trained_nets/resnet56_sgd_lr=

## Plotting the eigenvalue ratio heatmaps

In [None]:
!python plot_hessian_eigen.py --mpi $CUDA_FLAG --model resnet56 --x=-1:1:5 --y=-1:1:5 \
--model_file cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7 \
--dir_type weights --xnorm filter --xignore biasbn --ynorm filter --yignore biasbn\
--plot --timestamp --fileformat png

Using manual seed: 123
The current seed is 123.
Rank 0 use GPU 0 of 1 GPUs on e656efa94441
-------------------------------------------------------------------
setup_direction
-------------------------------------------------------------------
cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7_weights_xignore=biasbn_xnorm=filter_yignore=biasbn_ynorm=filter.h5 is already set up
cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7_weights_xignore=biasbn_xnorm=filter_yignore=biasbn_ynorm=filter.h5_[-1.0,1.0,5]x[-1.0,1.0,5].h5 is already set up
True
cosine similarity between x-axis and y-axis: 0.003037
Files already downloaded and verified
Files already downloaded and verified
Computing 25 values for rank 0
The number of parameters is: 851504
Rank 0: computing max eigenvalue
   Iter: 1  time: 104.078464
   Iter: 2  time: 108.540136
   Iter: 3  time: 108.432164
   Iter: 4  time: 108.901910
   Iter: 5  time: 108.583978
   Iter: 6  time: 108.356730
   Iter

# Appendix

## Reverse engineering the structure of .h5 files

In [None]:
FOLDER_PATH = FILE_PATH + "/cifar10/trained_nets" +\
"/resnet56_noshort_sgd_lr=0.1_bs=128_wd=0.0005"

interesting_attributes = ["min_eig", "max_eig"]

for file_name in os.listdir(FOLDER_PATH):
    if file_name.endswith(".h5"):
        print(f"File name: {file_name}")

        with h5py.File(FOLDER_PATH + "/" + file_name, "r") as f:
          for key in f:
            print(f"Key: {key}")

            if key in interesting_attributes:
              print("Data:")
              try:
                print(list(f[key]))
              except:
                print("Not printable")

        print("---------------------------------------")


File name: model_300.t7_weights_xignore=biasbn_xnorm=filter_yignore=biasbn_ynorm=filter.h5
Key: xdirection
Key: ydirection
---------------------------------------
File name: model_300.t7_weights_xignore=biasbn_xnorm=filter_yignore=biasbn_ynorm=filter.h5_[-1.0,1.0,201]x[-1.0,1.0,201].h5
Key: dir_file
Key: test_loss
Key: train_acc
Key: train_loss
Key: xcoordinates
Key: ycoordinates
---------------------------------------
File name: model_300.t7_weights_xignore=biasbn_xnorm=filter_yignore=biasbn_ynorm=filter_idx=1.h5
Key: xdirection
Key: ydirection
---------------------------------------
File name: model_300.t7_weights_xignore=biasbn_xnorm=filter_yignore=biasbn_ynorm=filter_idx=1.h5_[-1.0,1.0,51]x[-1.0,1.0,51].h5
Key: dir_file
Key: loss_vals
Key: test_loss
Key: train_acc
Key: train_loss
Key: xcoordinates
Key: ycoordinates
---------------------------------------


## Counting the number of iterations in the Implictly Restarted Lanczos Method

In [None]:
dimension = 1
base = 2
exponent = 0

current_time = 0
max_time = 1000 * 60 * 10 # 10 minutes in microseconds

while current_time <= max_time:
  exponent += 1
  dimension *= base

  print(f"Current exponent: {exponent}")
  print(f"Current dimension: {dimension}x{dimension}")

  start = time.perf_counter()
  random_spd_matrix = 4 * scipy.sparse.identity(dimension)
  end = time.perf_counter()

  print(f"Generated a s.p.d. matrix of dimensions {dimension}x{dimension}\n\
          in {end - start} seconds.")

  def mat_vec_prod(vector):
    mat_vec_prod.count += 1
    return random_spd_matrix @ vector

  mat_vec_prod.count = 0
  lin_op = scipy.sparse.linalg.LinearOperator((dimension, dimension),
                                               matvec = mat_vec_prod)

  start = time.perf_counter()
  eigvals, eigvecs = scipy.sparse.linalg.eigsh(lin_op, k = 1, tol = 1e-10)
  end = time.perf_counter()

  current_time = end - start

  print(f"Calculated eigenvalues and eigenvectors for a s.p.d. matrix of\n\
         dimensions {dimension}x{dimension} using {mat_vec_prod.count}\n\
         iterations and {current_time} seconds.")

  print("---------------------------------------------------------------------")




Current exponent: 1
Current dimension: 2x2
Generated a s.p.d. matrix of dimensions 2x2
          in 0.0004720729998552997 seconds.
Calculated eigenvalues and eigenvectors for a s.p.d. matrix of
         dimensions 2x2 using 3
         iterations and 0.020444603999976607 seconds.
---------------------------------------------------------------------
Current exponent: 2
Current dimension: 4x4
Generated a s.p.d. matrix of dimensions 4x4
          in 0.00035630000002129236 seconds.
Calculated eigenvalues and eigenvectors for a s.p.d. matrix of
         dimensions 4x4 using 5
         iterations and 0.0006658349998360791 seconds.
---------------------------------------------------------------------
Current exponent: 3
Current dimension: 8x8
Generated a s.p.d. matrix of dimensions 8x8
          in 0.00026480699989406276 seconds.
Calculated eigenvalues and eigenvectors for a s.p.d. matrix of
         dimensions 8x8 using 9
         iterations and 0.0009655869998823619 seconds.
----------------

KeyboardInterrupt: ignored