# Hardware For ML Class Project

# Modeling Albiero
To model Albiero, we divide the dot product kernel into several steps:
- Input Conversion:
    - Handles conversion from DE -> AE -> AO.
    - Accounts for the losses/noises that occur along the way.
- Weight Conversion:
    - Handlers conversion from DE -> AE.
    - Accounts for normalization to [-1, 1].
- The Dot Product itself.
    - Performs the AE/AO dot product.
    - Handles the conversion from AO to AE in the PD.
- The Output conversion.
    - Handles conversion from AE to quantized DE.


I don't know if this is the level of expected detail, but it's a good start to actually understand what the accelerator is doing.


There are many things I am very unsure about.
I have left them as `TODO(Ask)` in the code. We should ask about them in office hours.
Feel to modify the code or add your own questions.

Once we have clarified these points, we can just turn these classes into pytorch operations, and run the DNNs.

# Outline of Clarifications to Ask
## General Questions
1. Level of Detail:
    - It seems impossible to capture the noise without a semi-detailed step by step computation.
    - Is this an overkill? What's the alternative? Looks like the proposal above seems basically good to go.
2. Parameter Values:
    - The paper does not specify all the values (e.g., feedback resistance at the PD, or crosstalk noise in PLCU MRRs)
        - Do you know where to find them? TBD.
        - Or can they be derived from the provided ones (e.g., MRR crosstalk from $k^2$ and FSR)? Look below for cross-talk for specifics.
        - Or can we assume some 'ideal' default (e.g., the feedback resistance that would allow loss-less computation). Yes we can assume the ideal to begin with.
3. Losses:
    - In addition to noise, there are also losses.
    - Do we ignore them, or do we take them into account? Answer: we should ignore them and we can justify this by mentioning that the losses are predictable. If the losses are not predictable then maybe we should model them.
4. Cross-talk?
    - Cross talk seems input dependent, meaning that the amount of noise depends on surrounding values (meaning receptive fields that are multiplexed in the same waveguide).
    - Should we derive cross-talk for micro-ring resonator? Answer: We should try and if not we might not do it. Cross-talk is important.
5. Do we assume constants or make something parameterized? Yes. Do not just hard-code.
    

## Specific Questions
### Input Conversion
- I understand that quantized inputs are turned into voltages.
    - With what precision? In what range? Just assume some sort of ideal if it's very much not defined from the paper (i.e. just don't model it).
    - Like [0, 1.0]?
- The voltage is then turned into an optical signal, after being multiplied by a 'gain' in (W/V).
    - I can't find this value.
    - I can assume defaults that match the output deconversion?
- AWG (Arrayed Waveguide Grating) Crosstalk.
    - This is given as a fixed value in the paper.
    - Can we assume it?
    - Isn't crosstalk input-dependent.

### Weight Conversion
- The paper expects weights to be in [-1, 1]. So I assume we have to manually scale down, then scale back up right?
- What the weights become voltages, can we assume a perfect conversion?
    - E.g., if the weight is $0.378934373$, the voltage can exactly match that.

### Optical Dot Product
- How to compute MRR cross-talk?
    - We are given $k^2$ (cross-coupling factor) and FSR (free spectral range).
    - It should input-dependent?
- How to capture RIN (relative intensity noise)?
    - The units we are given are decibels relative to the carrier per hertz (dBc/Hz)?
        - The bandwidth (frequency?) is later given as 5GHz.
- How to get the "feedback resistance"?
    - Allows converting current to voltage.

### Output Conversion
- How do we map voltage back to integers.
- Like:
    - Can we assume some uniform mapping, from (V_min -> 0) and (V_max -> int_max).
    - Are V_min and V_max fixed parameters, or do change input by input?
        - I.e., does 1V always correspond to the same integer, is it relative to other voltage values in the output.
- Same question about voltage precision.
    - Can we assume perfect voltage precision, or is something lost.

In [1]:
import torch
import typing as t
import math
import numpy as np
from src.kernels import * 

In [2]:
# Regular dot product.
seed = 47
torch.manual_seed(seed)
np.random.seed(seed)
input_tensor = torch.randint(0, 256, (9,), dtype=torch.float)
print(input_tensor)
weight_tensor = (torch.rand((9,), dtype=torch.float) - 0.5) * 2
weight_tensor[0] = 1 # For max to be 1
print(weight_tensor)

reference_result = torch.dot(input_tensor, weight_tensor)

optical_dot_product = OpticalDotProduct(
    weight_tensor
)
print("Reference result: ", reference_result)

seed = torch.seed()&(2**32-1)
torch.manual_seed(seed)
np.random.seed(seed)
print("Final output: ", optical_dot_product(input_tensor.unsqueeze(0)))

tensor([135., 134.,  71.,   8.,  72., 179.,  80.,  23.,  59.])
tensor([ 1.0000,  0.3976,  0.7766, -0.4723, -0.4684, -0.7250, -0.0781,  0.4879,
        -0.9299])
Reference result:  tensor(26.2492)
Final output:  tensor([26.3235])


In [3]:
# Regular convolution.
seed = 48
torch.manual_seed(seed)
np.random.seed(seed)
input_tensor = torch.randint(0, 256, (1,2,9,9), dtype=torch.float)
weight_tensor = (torch.rand((1,2,3,3), dtype=torch.float) - 0.5) * 2


#print(input_tensor)
#print(weight_tensor)

conv = OpticalConvolution(weight_tensor, stride=2)
print(conv(input_tensor))
print("////")
print(F.conv2d(input_tensor, weight_tensor, stride=2))

tensor([[[[199.8896, 703.5664, 461.2446, 116.5387],
          [631.2899, 872.0850, 441.4659, 520.9825],
          [357.9944, 635.6241, 538.0027, 896.3140],
          [131.2198, 680.6720, 430.2575, 544.4305]]]])
////
tensor([[[[200.0725, 700.8811, 461.5175, 115.8915],
          [631.2335, 870.1544, 440.6736, 520.6542],
          [357.0515, 633.6689, 537.3054, 893.9677],
          [131.5579, 680.0823, 429.4817, 543.5601]]]])


In [4]:
seed = 48
torch.manual_seed(seed)
np.random.seed(seed)


batch_size=2
N=16
O=3
input_tensor = torch.randint(0, 256, (batch_size, N), dtype=torch.float)
weight_tensor = (torch.rand((O, N), dtype=torch.float) - 0.5) * 2
bias_tensor = (torch.rand((O,), dtype=torch.float) - 0.5) * 2

# PyTorch FCC
linear = torch.nn.Linear(N, O)
linear.weight.data = weight_tensor.clone()
linear.bias.data = bias_tensor.clone()

fc = OpticalFC(weight_tensor, bias_tensor)
print(fc(input_tensor))
print("////")
print(linear(input_tensor))  

tensor([[907.1793, 322.1807,   0.0000],
        [558.5370, 562.3074, 102.1137]])
////
tensor([[ 906.1716,  322.7424, -159.4286],
        [ 556.2352,  561.4160,  101.3292]], grad_fn=<AddmmBackward0>)
