# Homework 4

On this Homework you'll get a chance to 

1. Practice applying the definitions of probability and entropy described in class 
2. Experiment with Olshausen & Field's sparse coding model 

## Setting up the notebook

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from tqdm.notebook import tqdm
import scipy.io as sio
import os

# Importing functions from class
from OF import *

Downloading O&F images and circuit board...
http://www.rctn.org/bruno/sparsenet/IMAGES.mat has already been downloaded.
http://www.rctn.org/bruno/sparsenet/IMAGES_RAW.mat has already been downloaded.
Starting to download https://dz2cdn1.dzone.com/storage/temp/3542733-printed-circuit-boards.jpg...
...download complete.
...all downloads complete.
Importing natural_imgs, natural_imgs_raw, circuit_imgs_raw.


## Question 1: Probability Definitions

Below is a joint probability distribution for the chance of rain ($R$) and the cloudiness ($C$) on any given day in the spring in Pittsburgh. For simplicity we are considering each as binary:

$$
\begin{align}
    R & = \{\textrm{no rain}, \textrm{rain}\} \\
    C & = \{\textrm{sunny}, \textrm{cloudy}\} \\
\end{align}
$$

Below is a table of values for $p(R, S)$:

| $R \backslash C$  | $\textrm{sunny}$ |  $\textrm{cloudy}$ |
|:------:|:----------------:|:------------------:|
| $\textrm{no rain}$ | 0.3 | 0.2 |
| $\textrm{rain}$ | 0.1 | 0.4 |


### 1.1 

In class, we discussed how all probability distributions are _implicity_ conditional distributions. What are some of the unwritten variables we are conditioning on here? 

_Type your answer here or include as a seperate scan_

### 1.2 

Calculate the marginal probabilities of rain ($p(R)$, two values) and cloudiness ($p(C)$, two values). Based on these values does $R$ appear to be _statsitically independent_ of $S$?

_Type your answer here or include as a seperate scan_

### 1.3

Calculate the _conditional_ probability distribution $p(R|C)$ (four values).

_Type your answer here or include as a seperate scan_

### 1.4

Use Bayes' rule and your previous answers to calculate $p(C|R)$

_Type your answer here or include as a seperate scan_

## Question 2: Intro to information Theory

In this question, you'll use the probabilty distribution from question 1 to calcualte some relevant quantities from information theory.

Recall the following expressions to calculate _entropy_ ($H(X)$, $H(X,Y)$), _conditional entropy_ ($H(X|Y)$), and the _mutual information_ ($I(X;Y)$)

$$
\begin{align}
    H(X) & = \sum_x - p(x) \log_2 p(x) \\
    H(X,Y) & = \sum_{x,y} - p(x,y) \log_2 p(x, y) \\
    H(X|Y) & = \sum_{x,y} - p(x,y) \log_2 p(x|y) \\
    I(X;Y)  & = D_{KL}(p(x,y) || p(x)p(y)) = \sum_{x,y} p(x,y) \log_2 \frac{p(x,y)}{p(x)p(y)} \\
            & = H(X) + H(Y) - H(X,Y) \\
            & = H(X) - H(X|Y) = H(Y) - H(Y|X)  
\end{align}
$$


### 2.1

Caculate the entropy of $R$ ($H(R)$) and the entropy of $C$ ($H(C)$). Before the day begins are you more uncertain about whether it will rain or whether it will be cloudy? 

_Type your answer here or include as a seperate scan_

### 2.2 

caculate the coniditional entropy of rain given you know whether it is cloudy ($H(R|C)$). Compare this to the uncertainty about whether it will rain _whithout_ knowing it is cloudy $H(R)$. Does seeing whether it is cloudy teach you anything about whether it will rain? 

_Type your answer here or include as a seperate scan_

### 2.3 

Calculate the _mutual information_ between the rain ($R$) and cloudiness ($C$) using any of the forumlas above for the mutual information. Before you use your chosen formula, interpret it. how much does knowing $C$ teach you about $R$? How much does knowing $R$ teach you about $C$?

_Type your answer here or include as a seperate scan_

## Question 3: Entropy of the possion distribution. 

In class, we discussed the _poisson distribution_, the chance of observing a number of $x$ events in set period of time, such as a neuron firing (under the assumption that the chance of an event occurring at one time is uncorrelated with any other). If the mean number of events $E(X) = \mu$.

$$
p(x) = \frac{\mu^x e^{-\mu}}{x!}
$$

Calculate the entropy of the possion distribution using _Sterling's approximation_ 

$$
\log_2 x! = x \log_2 x - \frac{x}{\ln{2}}.
$$

Rember that $x$ can take all whole numbers $x = 0, 1, 2, \dots, \infty$ 

_Type your answer here or include as a seperate scan_

## Question 4: O&F sparse coding model

In this question, you will get a chace to investigate the O&F model. Specifically we'll look at the role of two factors in producing the receptive fields observed. 

1. The size of the network (`num_units`)
2. The weight given to sparsity (`lmda`)


### 4.1: Size of network (`num_units`)

Generate receptive fields for networks of three sizes $n = 20, 100, 500$ and compare the results. You should keep all other constants fixed to the defaults used in class.

In [2]:
# Default simulation constants 
patch_size = 16 # image patch size

num_iter = 500 # number of iterations
batch_size = 250 # Batch size

lmda = 5e-3 # Sparisty weight

# Image set
image_set = natural_imgs

### 4.2 Sparsity (`lmda`)

Generate receptive fields for networks of three sparsity weights $\lambda = 2.5, 5, 10 \times 10^3$ and compare the results. This paramater controlls to what extent sparsity is valued over reconstruction error. Larger values of `lmda` encourage the model to prioritize sparsity over reconstruction error. You may wish to compare these to the results produced by PCA - the method which prioritizes reconstruction error entirely without regard for sparsity. You should keep all other constants fixed to the defaults used in class.

In [15]:
# Default simulation constants 
num_units = 100 # number of neurons (units)
patch_size = 16 # image patch size

num_iter = 500 # number of iterations
batch_size = 250 # Batch size

# Image set
image_set = natural_imgs