# Algorithms of the Mind

**Instructions:** Answer all questions below. Be sure to show all intermediate steps and equations that you used to arrive at each answer. Please type your answers (including your equations). For coding questions, your code and its execution will do.

**How to submit?:** Execute all blocks of your Jupyter notebook, save it, and submit your assignment using Canvas.

<div class="alert alert-info">
    <strong>Note</strong>

Your answers in each question can be a combination of markdown and Julia code.
</div>

## Preliminaries

In [None]:
NAME = ""
NETID = ""

Next, please take the honor pledge by reordering the following phrases so that it makes sense to you, and then typing the resulting full sentence.

- and that this work is my own.
- or received
- I have not given
- I affirm that
- on this assignment,
- any unauthorized help 

In [None]:
HONOR_PLEDGE = ""

---

# Problem Set 3

<div class="alert alert-info" markdown="1">
    <strong>A note on resources:</strong>

For this problem set, on Grace, request a job with 2 or 4 cores (instead of 1), as it has certain computationally-heavy portions. 
</div>

In [None]:
import Pkg
# Pkg.activate("psyc261")
Pkg.add(["Distributions"])
# load necessary packages for this problem set
# Note that running this for the first time might take a good 15 mins &ndash; plan ahead
using Random
using Gen
using Plots
using DelimitedFiles

## Question 1: Perception in a rectangle world

In this question, you will develop a perception system that operates in a two-dimensional grayscale world where all objects are axis-aligned rectangular frames (i.e., unfilled rectangles) and there is just one such object in a given scene. An example scene in this world is illustrated below. Given such an input, the perception system should provide a posterior over where the object is, its constrast and size.

<img src="./images/examples.png"  width="500"/>

### Q 1A [4 pts]

Your first task is to write a generative model of this process. You will do this in the generative function `two_d_world`, below.

Here are the basic assumptions your generative model should reflect.

* Assume that the world size is 10x10 pixels.
* Assume that there is one object in each scene in this world. 
* An object's position, in particular its bottom-left corner, can be anywhere in the world. So in a lot of the scenes, the object will only be partially visible. 
* Each dimension of the objects in this world follow a uniform distribution between 3 to 7 pixels. Notice that a rectangle cannot have a negative dimension.
* An object's brightness can vary between 0.1 and 1 with a uniform distribution, where the background brigthness is set to 0. 
* Finally, assume that the observations are corrupted by some small gaussian noise (std = 0.05)

<details class="alert alert-info" markdown="1">
    <summary><strong>Hint</strong></summary>

To make our variational approximation less of a pain, we recommend setting up each of your priors to be uniform distributions `[0, 1]`, then scaling them before "rendering" your object.
</details>

We provide examples for two of the relevant random variables &ndash; the y-coordinate of the south-west (bottom-left) of the object and the height of the object.

```julia
# draw where the object's y coordinate will be
SW_row ~ uniform(0, 1)
# draw the height of the object
h ~ uniform(0, 1)

# scale the y-coordinate so that it is an integer (we will use this to index into a Matrix of 10x10, and lies between 1 and 10).
scaled_SW_row = ceil(Int, SW_row * 10)
# scale the height so that it lies between 3 and 7 and is an integer
scaled_h = round(Int, height * 4 + 3)

```

In [None]:
N_COLS = 10
N_ROWS = 10

@gen function two_d_world()
    # your code here
    throw(Exception("Not Implemented."))
end
;

Below is a function to visualize a given draw from your generative model.

In [None]:
function visualize(input::Matrix{<:Real})
    heatmap(input, clim=(0,1), thickness_scaling=3.5, size=(1600, 1300), aspect=:equal)
end
;

Draw samples from your generative model and visualize them (using the `visualize` function above).

In [None]:
Random.seed!(42)
# your code here
throw(Exception("Not Implemented."))

### Q 1B [6 pts]

Now implement an amortized variational approximation of this generative model, parametrized with a deep neural network conditioning this approximation to input observations. You will do this in the generative function `neural_amortized_inference`, below.

Assume that the neural network takes as input a vector &ndash; so, the observations should be flattened to vectors (from 2D matrices). Your network architecture should be rather simple: one hidden layer and one output layer. The hidden layer should be activated with a `tanh` non-linearity (provided in the code block below).

The output layer should consist of all of the variational family parameters. 

<details class="alert alert-info" markdown="1">
    <summary>Hint</summary>

As for your variational approximation, for a random variable `x ~ uniform(0,1)` in your generative model (`two_d_world`), a reasonable choice would be `x ~ beta(shape, scale)`. Your neural network would be outputting the parameters of the beta, but you'd need to be careful to ensure that these parameters are non-negative. 
</details>

In [None]:
σ(x) = tanh.(x)

@gen function neural_amortized_inference(input::Vector{Float64})
    # write out the architecture of your model using `@params` to define weights and biases
    # and then the forward pass (the matrix multiplications, non-linearity etc.)
    # your code here
    throw(Exception("Not Implemented."))

    # non-linear hidden layer
    # your code here
    throw(Exception("Not Implemented."))
    # output layer
    # your code here
    throw(Exception("Not Implemented."))

    # collect the variational approximation parameters from the output layer
    # your code here
    throw(Exception("Not Implemented."))

    # make the relevant random choices with these parameters
    # your code here
    throw(Exception("Not Implemented."))

    # your code here
    throw(Exception("Not Implemented."))
    return
end
;

### Q 1C [1.5 pts]

Next create a data generator function, called `data_generator`. Notice that this function takes no arguments. In each call, it will simulate the generative model of our world once. This will yield a pair of input and output for training the neural network based estimator `neural_amortized_inference`.

In [None]:
function data_generator()
    tr = Gen.simulate(two_d_world, ())

    # record the "observations" (inputs to the NN model)
    # your code here
    throw(Exception("Not Implemented."))
    
    # record the random choices (outputs of the NN model)
    # your code here
    throw(Exception("Not Implemented."))
    
    return ((obs,), choices)
end
;

### Q 1D [3 pts]

Initialize the `params` in the `neural_amortized_inference`. You will have to pay attention to your dimensions.

Choose the dimensionality of the hidden layer to be 200. Use the `init_weight` function (provided in the code block below) to initialize your weight matrices. 

In [None]:
Random.seed!(42)
# a function for randomly initializing the weight matrices
init_weight(shape...) = (1. / sqrt(shape[2])) * randn(shape...)

# choose the number of units each layer of the network
# your code here
throw(Exception("Not Implemented."))

# create and initialize W1 and W2
# your code here
throw(Exception("Not Implemented."))

# now initialize the params of the data-driven proposal function
init_param!(neural_amortized_inference, :W1, init_W1)
init_param!(neural_amortized_inference, :b1, zeros(num_units_hidden_layer))
init_param!(neural_amortized_inference, :W2, init_W2)
init_param!(neural_amortized_inference, :b2, zeros(10))
;

### Q 1E [1 pt]

Create an optimizer for updating the weights using `Gen.FixedStepGradientDescent` with a learning rate of `1e-5`. 

Train your amortized estimator using this optimizer using `Gen.train!`. 

Use the following arguments for the `train!` function:
```
num_epoch=200
epoch_size=1000
num_minibatch=100
minibatch_size=10
evaluation_size=100
verbose=true
```



In [None]:
# get a gradient-based optimizer and train!
# your code here
throw(Exception("Not Implemented."))

Plot the loss, i.e., the return value of the `train!` function.

In [None]:
# your code here
throw(Exception("Not Implemented."))

The following code loads a test observation and visualizes it. 

In [None]:
obs_matrix = readdlm("test-scene.txt")
obs = vec(obs_matrix)
obs = convert(Vector{Float64}, obs)
p1 = visualize(obs_matrix)

### Q 1F [2.5 pts] <a id="q-1f"></a>

Input this observation to your amortized inference module, `neural_amortized_inference`. Store the trace (call this inference trace).

Generate a new trace with your generative model `two_d_world` constrained by the choices in your inference trace.

Visualize the `:pred` in that new trace (`new_tr`), which is the visualization of a sample from your amortized posterior. 

In [None]:
# your code here
throw(Exception("Not Implemented."))

p2 = visualize(new_tr[:pred])
plot(p1, p2, size=(3000, 1200))

### Q 1G [1 pts]

Run your code from your answer above ([Q 1F](#q-1f)) to visualize multiple samples from your posterior. In English, describe what your approximate posterior gets and what it doesn't get: Is it generally at the right ballpark? In what ways its estimates vary from sample to sample? (1-2 sentences)

YOUR ANSWER HERE

### Q 1H [1 pts]

Suggest two ways in which you could improve the estimated posterior. 1 sentence per suggestion.

YOUR ANSWER HERE

## Question 2

One of the readings we went over in class was the following. 

Dasgupta, I., Schulz, E., Tenenbaum, J. B., & Gershman, S. J. (2020). A theory of learning to infer. *Psychological review, 127(3),* 412. [\[link\]](https://paperpile.com/app/p/45edc04e-8c3e-0fc4-9c19-4e772cc8c079)

### Q 2A [2 pts]
What does "underreaction to prior" mean in the context of probabilistic reasoning? (1 sentence)

YOUR ANSWER HERE

### Q 2B [3 pts]

Why is it the case that the Learned Inference Model (LIM) is more accurate near the query distribution it is trained on? (1-2 sentences)

YOUR ANSWER HERE