# ADAM Optimizer

A common optimizer used in machine learning is ADAM. We have created the components for it in a separate file of ADAM.json and will use that to showcase it here.

In [1]:
import json
from pybdp import load_project
from IPython.display import Markdown
from copy import deepcopy
from pprint import pprint

# Load the project JSON from file
with open("ADAM.json", "r") as f:
    project_json = json.load(f)

# Load the project
project = load_project(project_json)

## High Level

At a high level, the ADAM algorithm initializes the some parameters and then loops updating theta.

In [2]:
print("Zoomed out:")
display(Markdown(project.processors_map["ADAM"].create_mermaid_graphic()[0]))
print()
print("Zoomed in:")
display(Markdown(project.processors_map["ADAM"].create_mermaid_graphic_composite()[0]))

Zoomed out:


```mermaid
---
config:
    layout: elk
---
graph LR
subgraph G0[ADAM - ADAM Block Block]
direction LR
X0[ADAM]
subgraph G0P[Ports]
direction TB
XX0P0[theta]
end
XX0P0[theta] o--o X0
subgraph G0T[Terminals]
direction TB
XX0T0[theta]
end
X0 o--o XX0T0[theta]
end

```


Zoomed in:


```mermaid
---
config:
    layout: elk
---
graph LR
subgraph GC0[ADAM - ADAM Block Block]
direction LR
subgraph GS0[ADAM System]
subgraph G1[ADAM Initialization - ADAM Initialization Block Block]
direction LR
X1[ADAM Initialization]
subgraph G1P[Ports]
direction TB
XX1P0[theta]
end
XX1P0[theta] o--o X1
subgraph G1T[Terminals]
direction TB
XX1T0[theta]
XX1T1[m]
XX1T2[v]
XX1T3[t]
end
X1 o--o XX1T0[theta]
X1 o--o XX1T1[m]
X1 o--o XX1T2[v]
X1 o--o XX1T3[t]
end
subgraph G2[ADAM Update Loop - ADAM Update Loop Block Block]
direction LR
X2[ADAM Update Loop]
subgraph G2P[Ports]
direction TB
XX2P0[theta]
XX2P1[m]
XX2P2[v]
XX2P3[t]
end
XX2P0[theta] o--o X2
XX2P1[m] o--o X2
XX2P2[v] o--o X2
XX2P3[t] o--o X2
subgraph G2T[Terminals]
direction TB
XX2T0[theta]
end
X2 o--o XX2T0[theta]
end
XX1T0[theta] ---> XX2P0[theta]
XX1T1[m] ---> XX2P1[m]
XX1T2[v] ---> XX2P2[v]
XX1T3[t] ---> XX2P3[t]
end
subgraph GC0P[Ports]
direction TB
X1P0[theta]
end
X1P0[theta] o--o XX1P0[theta]
subgraph GC0T[Terminals]
direction TB
X1T0[theta]
end
XX2T0[theta] o--o X1T0[theta]
end

```

## Current Issue

How can we have the ADAM update loop take the ports in from initialization but then have the interior subsystem looping it. For now we skip over this issue and instead show what the inside of the ADAM update loop should look like.

High level there should be two components:
1. Convergence checking to make sure the algorithm should not terminate
2. Optimization step composite processor for doing a step

In [5]:
display(Markdown(project.systems_map["ADAM Optimization Loop System"].create_mermaid_graphic()[0]))

```mermaid
---
config:
    layout: elk
---
graph LR
subgraph GS0[ADAM Optimization Loop System]
subgraph G0[ADAM Update Step - Optimization Step Block]
direction LR
X0[ADAM Update Step]
subgraph G0P[Ports]
direction TB
XX0P0[theta]
XX0P1[m]
XX0P2[v]
XX0P3[t]
XX0P4[theta]
end
XX0P0[theta] o--o X0
XX0P1[m] o--o X0
XX0P2[v] o--o X0
XX0P3[t] o--o X0
XX0P4[theta] o--o X0
subgraph G0T[Terminals]
direction TB
XX0T0[theta]
XX0T1[m]
XX0T2[v]
XX0T3[t]
end
X0 o--o XX0T0[theta]
X0 o--o XX0T1[m]
X0 o--o XX0T2[v]
X0 o--o XX0T3[t]
end
subgraph G1[Theta Convergence Criteria - Convergence Criteria Block]
direction LR
X1[Theta Convergence Criteria]
subgraph G1P[Ports]
direction TB
XX1P0[m]
XX1P1[v]
XX1P2[t]
XX1P3[theta]
end
XX1P0[m] o--o X1
XX1P1[v] o--o X1
XX1P2[t] o--o X1
XX1P3[theta] o--o X1
subgraph G1T[Terminals]
direction TB
XX1T0[m]
XX1T1[v]
XX1T2[t]
XX1T3[theta]
end
X1 o--o XX1T0[m]
X1 o--o XX1T1[v]
X1 o--o XX1T2[t]
X1 o--o XX1T3[theta]
end
XX0T0[theta] ---> XX1P3[theta]
XX0T1[m] ---> XX1P0[m]
XX0T2[v] ---> XX1P1[v]
XX0T3[t] ---> XX1P2[t]
XX1T3[theta] ---> XX0P0[theta]
XX1T0[m] ---> XX0P1[m]
XX1T1[v] ---> XX0P2[v]
XX1T2[t] ---> XX0P3[t]
end

```

## The Update Step

The update step is a complex subsystem that updates the state variables and also returns the latest iteration of theta.

In [6]:
display(Markdown(project.processors_map["ADAM Update Step"].create_mermaid_graphic_composite()[0]))

```mermaid
---
config:
    layout: elk
---
graph LR
subgraph GC0[ADAM Update Step - Optimization Step Block]
direction LR
subgraph GS0[ADAM Update Step System]
subgraph G1[Get Function Gradients - Get Gradients Block]
direction LR
X1[Get Function Gradients]
subgraph G1P[Ports]
direction TB
XX1P0[theta]
end
XX1P0[theta] o--o X1
subgraph G1T[Terminals]
direction TB
XX1T0[g]
end
X1 o--o XX1T0[g]
end
subgraph G2[Exponential Smoothing First Moment - Update Biased First Moment Block]
direction LR
X2[Exponential Smoothing First Moment]
subgraph G2P[Ports]
direction TB
XX2P0[m]
XX2P1[g]
end
XX2P0[m] o--o X2
XX2P1[g] o--o X2
subgraph G2T[Terminals]
direction TB
XX2T0[m]
end
X2 o--o XX2T0[m]
end
subgraph G3[Exponential Smoothing Second Moment - Update Biased Second Moment Block]
direction LR
X3[Exponential Smoothing Second Moment]
subgraph G3P[Ports]
direction TB
XX3P0[v]
XX3P1[g]
end
XX3P0[v] o--o X3
XX3P1[g] o--o X3
subgraph G3T[Terminals]
direction TB
XX3T0[v]
end
X3 o--o XX3T0[v]
end
subgraph G4[Increment Timestep - Update Timestep Block]
direction LR
X4[Increment Timestep]
subgraph G4P[Ports]
direction TB
XX4P0[t]
end
XX4P0[t] o--o X4
subgraph G4T[Terminals]
direction TB
XX4T0[t]
end
X4 o--o XX4T0[t]
end
subgraph G5[Exponential Decay First Moment Bias Correction - Compute Bias-Corrected First Moment Block]
direction LR
X5[Exponential Decay First Moment Bias Correction]
subgraph G5P[Ports]
direction TB
XX5P0[m]
XX5P1[t]
end
XX5P0[m] o--o X5
XX5P1[t] o--o X5
subgraph G5T[Terminals]
direction TB
XX5T0[m]
end
X5 o--o XX5T0[m]
end
subgraph G6[Exponential Decay Second Moment Bias Correction - Compute Bias-Corrected Second Moment Block]
direction LR
X6[Exponential Decay Second Moment Bias Correction]
subgraph G6P[Ports]
direction TB
XX6P0[v]
XX6P1[t]
end
XX6P0[v] o--o X6
XX6P1[t] o--o X6
subgraph G6T[Terminals]
direction TB
XX6T0[v]
end
X6 o--o XX6T0[v]
end
subgraph G7[ADAM Theta Update - Update Theta Block]
direction LR
X7[ADAM Theta Update]
subgraph G7P[Ports]
direction TB
XX7P0[m]
XX7P1[v]
XX7P2[theta]
end
XX7P0[m] o--o X7
XX7P1[v] o--o X7
XX7P2[theta] o--o X7
subgraph G7T[Terminals]
direction TB
XX7T0[theta]
end
X7 o--o XX7T0[theta]
end
XX5T0[m] ---> XX7P0[m]
XX6T0[v] ---> XX7P1[v]
XX4T0[t] ---> XX5P1[t]
XX4T0[t] ---> XX6P1[t]
XX2T0[m] ---> XX5P0[m]
XX3T0[v] ---> XX6P0[v]
XX1T0[g] ---> XX2P1[g]
XX1T0[g] ---> XX3P1[g]
end
subgraph GC0P[Ports]
direction TB
X1P0[theta]
X1P1[m]
X1P2[v]
X1P3[t]
X1P4[theta]
end
X1P0[theta] o--o XX1P0[theta]
X1P1[m] o--o XX2P0[m]
X1P2[v] o--o XX3P0[v]
X1P3[t] o--o XX4P0[t]
X1P4[theta] o--o XX7P2[theta]
subgraph GC0T[Terminals]
direction TB
X1T0[theta]
X1T1[m]
X1T2[v]
X1T3[t]
end
XX7T0[theta] o--o X1T0[theta]
XX2T0[m] o--o X1T1[m]
XX3T0[v] o--o X1T2[v]
XX4T0[t] o--o X1T3[t]
end

```