# Homework 2

In [None]:
%mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/

%maven ai.djl:api:0.7.0-SNAPSHOT
%maven org.slf4j:slf4j-api:1.7.26
%maven org.slf4j:slf4j-simple:1.7.26

%maven ai.djl.mxnet:mxnet-engine:0.7.0-SNAPSHOT
%maven ai.djl.mxnet:mxnet-native-auto:1.7.0-a

In [None]:
import ai.djl.ndarray.NDManager;
import ai.djl.ndarray.NDArray;
import ai.djl.ndarray.types.Shape;

NDManager manager = NDManager.newBaseManager();

# 1. Multinomial Sampling

Implement a sampler from a discrete distribution from scratch, mimicking the function `manager.randomMultinomial()`. Its arguments should be a vector of probabilities $p$. You can assume that the probabilities are normalized, i.e. tha they sum up to $1$. Make the call signature as follows:

```
samples = sampler(probs, shape) 

probs   : A float array of size n of nonnegative numbers summing up to 1
shape   : Shape object declaring dimensions for the output
samples : Samples from probs with shape matching shape
```

Hints:

1. Use `manager.randomUniform()` to get a sample from $U[0,1]$.
1. You can simplify things for `probs` by computing the cumulative sum over `probs`.

In [None]:
NDArray sampler(float[] probs, Shape shape) {
    // Add your code here
    NDManager manager = NDManager.newBaseManager();
    return manager.zeros(shape);
}

// A simple test
sampler(new float[]{0.2f, 0.3f, 0.5f}, new Shape(2,3));

# 2. Central Limit Theorem

Let's explore the Central Limit Theorem when applied to text processing. 

* Download [https://www.gutenberg.org/ebooks/84](https://www.gutenberg.org/files/84/84-0.txt) from Project Gutenberg 
* Remove punctuation, uppercase / lowercase, and split the text up into individual tokens (words).
* For the words `a`, `and`, `the`, `i`, `is` compute their respective counts as the book progresses, i.e. 
    $$n_\mathrm{the}[i] = \sum_{j = 1}^i \{w_j = \mathrm{the}\}$$
* Plot the proportions $n_\mathrm{word}[i] / i$ over the document in one plot.
* Find an envelope of the shape $O(1/\sqrt{i})$ for each of these five words. (Hint, check the last page of the [sampling notebook](http://courses.d2l.ai/berkeley-stat-157/slides/1_24/sampling.pdf))
* Why can we **not** apply the Central Limit Theorem directly? 
* How would we have to change the text for it to apply? 
* Why does it still work quite well?

In [None]:
URL url = new URL("https://www.gutenberg.org/files/84/84-0.txt");
Scanner s = new Scanner(url.openStream());
ArrayList<String> book = new ArrayList();
while (s.hasNext()) {
    book.add(s.next());
}
for (int i = 0; i < 10; i++) {
    System.out.println(book.get(i));
}

// Add your code here

## 3. Denominator-layout notation

We used the numerator-layout notation for matrix calculus in class, now let's examine the denominator-layout notation.

Given $x, y\in\mathbb R$, $\mathbf x\in\mathbb R^n$ and $\mathbf y \in \mathbb R^m$, we have

$$
\frac{\partial y}{\partial \mathbf{x}}=\begin{bmatrix}
\frac{\partial y}{\partial x_1}\\
\frac{\partial y}{\partial x_2}\\
\vdots\\
\frac{\partial y}{\partial x_n}
\end{bmatrix},\quad 
\frac{\partial \mathbf y}{\partial {x}}=\begin{bmatrix}
\frac{\partial y_1}{\partial x}, 
\frac{\partial y_2}{\partial x}, 
\ldots,
\frac{\partial y_m}{\partial x}
\end{bmatrix}
$$

and 

$$
\frac{\partial \mathbf y}{\partial \mathbf{x}}
=\begin{bmatrix}
\frac{\partial \mathbf y}{\partial {x_1}}\\
\frac{\partial \mathbf y}{\partial {x_2}}\\
\vdots\\
\frac{\partial \mathbf y}{\partial {x_3}}\\
\end{bmatrix}
=\begin{bmatrix}
\frac{\partial y_1}{\partial x_1}, 
\frac{\partial y_2}{\partial x_1},
\ldots,
\frac{\partial y_m}{\partial x_1}
\\ 
\frac{\partial y_1}{\partial x_2},
\frac{\partial y_2}{\partial x_2},
\ldots,
\frac{\partial y_m}{\partial x_2}\\ 
\vdots\\
\frac{\partial y_1}{\partial x_n},
\frac{\partial y_2}{\partial x_n},
\ldots,
\frac{\partial y_m}{\partial x_n}
\end{bmatrix}
$$

Questions: 

1. Assume $\mathbf  y = f(\mathbf u)$ and $\mathbf u = g(\mathbf x)$, write down the chain rule for $\frac {\partial\mathbf  y}{\partial\mathbf x}$
2. Given $\mathbf X \in \mathbb R^{m\times n},\ \mathbf w \in \mathbb R^n, \ \mathbf y \in \mathbb R^m$, assume $z = \| \mathbf X \mathbf w - \mathbf y\|^2$, compute $\frac{\partial z}{\partial\mathbf w}$.

## 4. Numerical Precision

Given scalars `x` and `y`, implement the following `logExp()` function such that it returns 
$$-\log\left(\frac{e^x}{e^x+e^y}\right)$$.

In [None]:
import java.util.function.BinaryOperator; 

// Here we wrap the function in a class
// so that we can pass its reference to a function
static class Function {
    static NDArray logExp(NDArray x, NDArray y) {
        // Add your solution here
        NDManager manager = NDManager.newBaseManager();
        return manager.zeros(new Shape(1));
    }
}

Test your codes with normal inputs:

In [None]:
var x = manager.create(new float[]{2});
var y = manager.create(new float[]{3});
var z = Function.logExp(x, y);

Now implement a function to compute $\partial z/\partial x$ and $\partial z/\partial y$ with a `GradientCollector`.

In [None]:
void grad(BinaryOperator<NDArray> forwardFunction, 
          NDArray x, NDArray y) {
    // Add your code here
    // Note: This will throw an error 
    // if you try to run this in its present form
    // since the gradient for each NDArray 
    // has not yet been calculated.
    System.out.printf("Gradient of x = ");
    System.out.println(x.getGradient());
    System.out.printf("Gradient of y = ");
    System.out.println(y.getGradient());
}

Test your codes, it should print the results nicely. 

In [None]:
grad(Function::logExp, x, y);

But now let's try some "hard" inputs

In [None]:
x = manager.create(new float[]{50});
y = manager.create(new float[]{100});
grad(Function::logExp, x, y);

Does your code return correct results? If not, try to understand the reason. (Hint, evaluate `exp(100)`). Now develop a new function `stableLogExp()` that is identical to `logExp()` in math, but returns a more numerical stable result.

In [None]:
static class Function {
    static void stableLogExp(NDArray x, NDArray y) {
        // Add your code here
    }
}

grad(Function::stableLogExp, x, y);