# 1. Function Approximation

We want to approximate a function with two inputs and one output.
This function can be interpreted as 3-dimensional plot $t=f(x_1,x_2)$.
We first provide a plotting function to visualize the data and the resulting network output.

In [None]:
import torch
from matplotlib import pyplot
import numpy

# a plotting function to show the data and the result
def plot_3d(X, Y, Z):
  f = pyplot.figure(figsize=(5,14))
  ax = f.add_subplot(121, projection='3d', azim = -90, elev=90)
  ax.plot_surface(X,Y,Z, cmap="hot", alpha=.8)
  ax.set_xlabel("$x_1$")
  ax.set_ylabel("$x_2$")
  ax.set_zlabel("$t$")

  ax = f.add_subplot(122, projection='3d', azim = -60, elev=30)
  ax.plot_surface(X,Y,Z, cmap="hot", alpha=.8)
  ax.set_zlim(-1,1)
  ax.set_xlabel("$x_1$")
  ax.set_ylabel("$x_2$")
  ax.set_zlabel("$t$")

Now, we define our original data. This is just for visualization purposes, you do not have to modify this.

In [None]:
# our data function that takes two parameters, x_1 and x_2
def data(x_1, x_2):
  return torch.cos(x_1) * torch.sin(x_2)

# display our function in a three-dimensional range
rge = numpy.arange(-3,3,0.1)
X,Y = numpy.meshgrid(rge, rge)
Z = data(torch.tensor(X), torch.tensor(Y)).numpy()
plot_3d(X, Y, Z)

### 1. (e) Autograd Function

We want to implement the gradient that we have computed in (b) as a `torch.autograd.Function`.
Remember that the output of the `backward` function are the gradients with respect to all input parameters of the `forward` function. 

In [None]:
from torch._C import wait
class MyFunction(torch.autograd.Function):

  # implement the forward propagation
  @staticmethod
  def forward(ctx, x, w):
    # compute the output
    output = torch.sum(x-w, dim=0)**2
    # save required parameters for backward pass
    ctx.save_for_backward(x, w)
    return output

  # implement Jacobian
  @staticmethod
  def backward(ctx, grad):
    # get results stored from forward pass
    x, w = ctx.saved_tensors
    # compute the derivatives
    da/dw = -2(x-w)
    return da/dw, None #because we do not need the da/dx -> None


### 1. (f) Network Implementation

We implement our network to combine the first layer with an activation function, and a fully-connected layer to produce our desired output.

In [1]:
class Network(torch.nn.Module):
  def __init__(self, K, D):
    super(Network, self).__init__()
    # we select our function defined above as the first layer that we apply
    self.first_layer = MyFunction.apply

    # We need to instantiate and initialize our weights
    self.W = torch.nn.Parameter(torch.empty((K,D)))
    # initialize the matrix between -3 and 3 (since the range of the inputs is (-3,3))
    torch.nn.init.uniform_(self.W, -3, 3)

    # We then instantiate the second fully-connected layer
    self.secoond_layer = torch.nn.Linear(K, 1) #since this is a regression, the output is O=1

    # anything else to instantiate here?
    self.activation = torch.nn.Tanh()

  def forward(self, x):
    # forward input through our custom function
    a = self.first_layer(x)
    
    # possibly apply an activation function
    h = self.activation(a)

    # apply second layer
    z = self.secoond_layer(h)

    return z

# instantiate a network with the desired parameters
network = Network(6, D=x.shape[0]) #D could also have been defined like this in the constructor of Nework

SyntaxError: ignored

We implement a function that draws random samples from the input distribution.
You can make use of this function, there is no need to modify it.

In [None]:
# provides a sample from the function that we want to approximate
def sample():
  # get a random input
  x = torch.rand(2) * 6 - 3
  # compute the target
  t = data(x[0], x[1])[None]
  # return both
  return x, t

### 1. (g) Network Training

Finally, we train our network on 100000 samples, using a batch size of 1. We report the average loss for 10000 samples once they are processed.

In [None]:
# instantiate loss function and optimizer
loss = torch.nn.MSELoss()
optimizer = optimizer = torch.optim.Adam(network.parameters(), lr=0.0005, weight_decay=1e-05)

#create variable to store train loss
train_loss = 0.

# iterate over 100000 samples
for i in range(100000):
  # obtain a sample
  x, t = sample()
  # train the network with this sample
  optimizer.zero_grad()
  z=network(x)
  # ... compute loss from network output and target data
  J=loss(z, x)
  J.backward()
  # ... perform parameter update
  optimizer.step()
  # ... remember loss
  train_loss += J.item()

  # compute average loss
  avg_loss = train_loss/100000.
  # report it after every 10000 iterations
  if i % 10000 == 9999:
    return avg_loss

Finally, we plot the output of your network to visually see whether the data has been approximated well.
We actually extend the range of the input samples to be $[-5,5]$ to see if the network has learned to extend the function well beyond the training range.
Note that this is just for visualization purposes, you do not need to change this part.

In [None]:
# define a range a little larger than our input range
rge = numpy.arange(-5,5,0.1)
X,Y = numpy.meshgrid(rge, rge)
# compute the result of our network for the given range [-5,5]x[-5,5]
Z = numpy.array([[network(torch.FloatTensor((X[i,j], Y[i,j]))).item() for j in range(len(rge))] for i in range(len(rge))])
# plot the results
plot_3d(X,Y,Z)