# Assignment 3: Neural networks in natural language processing

### Due Date: Oct 30 (both sections)

### Grade (100 pts, 10%)

#### Your Name: Linpei Zhang

#### Your EID: Lz497

*Note: This assignment covers material from the recording, notes, demo, and suggested readings from Lecture-08*

---

## Questions

### 1. Dropout (50 pts)

Dropout is a regularization technique that randomly sets units in each activation layer, $a \in \mathbb{R}^{D}$, to zero and then multiplies the resultant vector elementwise by a constant $\gamma$ according to:

$$a_{dropout} \leftarrow  \gamma H \odot a$$

where $\odot$ represents the element-wise product operator and $H \in \{0, 1\}^D$ is a mask with entries drawn from 

$$\begin{cases} p(0) &= p_{dropout} \\ p(1) &= 1 - p_{dropout} \end{cases}$$

Select a scaling factor ${\gamma}$ that ensures the expected value over the activation layer remains invariant to the above operation, $E\big[ a_{dropout} \big] = E\big[ a \big]$, and provide rationale for your selection.

*Hint: You want to show that*

$$
\sum_{i=1}^D a_i = \gamma \sum_{i=1}^D a_{dropout, i}
$$

In [12]:
import numpy as np
import random 
def dropout(a_i, drop_prob):
    keep_prob= 1- drop_prob
    gamma= 1/(1-drop_prob)
    P=[drop_prob, keep_prob]
    mask= np.random.choice([0,1],1,P)
    return  (gamma* a_i * mask)

drop_prob=0.4
right=[]
gamma= 1/(1-drop_prob)
a= np.random.randint(1,1000,200)
for i in a:
    right.append(dropout(i,drop_prob)[0])
print(sum(a))
print(gamma * sum(right))
print("gamma:1/(1-drop_prob)",gamma)

105115
136011.1111111111
gamma:1/(1-drop_prob) 1.6666666666666667


### 2. Convolutions (50 pts)

Consider a sequence of $T$ token embeddings, $Z \in \mathbb{R}^{T \times D}$, for which $D=3$:

In [3]:
import numpy as np

Z = np.array([
    [1.3,   0.4, -0.2],
    [-3.1,  1.1,  2.1],
    [0.9,   2.8, -1.5],
    [1.3,   2.4,  0.1],
    [1.0,   1.0,  0.5],
    [3.0,  -1.4, -0.2],
    [-0.7,  1.8,  1.3]
])


and a set of convolutional filters, $W=\{ w^{(1)}, w^{(2)} \}$, and corresponding filter widths $S=\{ s^{(1)}, s^{(2)}  \}$:

In [302]:
w1 = np.array([
    [1, 1, 1],
    [1, 1, 1]
])

w2 = np.array([
    [2, 2, 2],
    [2, 2, 2],
    [2, 2, 2]
])

W = [w1, w2]

S = [2, 3]


In Lecture 08 we discussed a set of operations that maps $Z \in \mathbb{R}^{T \times D}$ onto $Z' \in \mathbb{R}^{N_F D}$ (in this problem $N_F = 2$). This involved three steps:

1. **Convolution**: The convolutional operation produces $N_F$ feature maps, $B^{(n)} \in \mathbb{R}^{(T - s^{(n)} + 1) \times D}$, where $n=\{1, \dots, N_F\}$, according to:

$$
\forall_{t \in \{ 1, \dots, T - s^{(n)} + 1 \} } \; B^{(n)}_{t,j} = \sum_{t'=1}^{S^{(n)}} w^{(n)}_{t',j} \; Z_{t,j}
$$

2. **Max pooling**: The max pooling operation computes the max over the sequence dimension in each feature map, $ B_{maxpool}^{(n)} \in \mathbb{R}^D$, according to:

$$
B_{maxpool, j}^{(n)} = \underset{1 \leq t' \leq T - s^{(n)} + 1 }{\max} B^{(n)}_{t', j}
$$

3. **Concatenation**: The resultant set of $N_F$ feature vectors are then concatenated into a single vector $Z'$ according to:

$$
Z' = \big[ B_{maxpool}^{(1)}, \dots, B_{maxpool}^{(n)}, \dots,  B_{maxpool}^{(N_F)}  \big] \in \mathbb{R}^{D \cdot N_F}
$$

In the cell below, perform these three operations to produce $Z' \in \mathbb{R}^6$ and print it.

*Hint: The max pooling operation computes the maximum over each column in $B^{(n)}$*

In [389]:
"""
### when 
def conv(w,s):
    B=[]
    for t in range(7-s+1):
        for j in range(3):
            for t_ in range(1):
                 B.append(w[t_,j]*Z[t,j]*s)
    return B
B1= np.reshape(conv(w1,S[0]),[6,3]) 
B2= np.reshape(conv(w2,S[1]),[5,3]) 

def pooling(B):
    B_pooling=[]
    for j in range(3):
        B_pooling.append(np.max(B[:,j]))
    return B_pooling

B1_pooling=pooling(B1)
B2_pooling=pooling(B2)

B1_pooling.extend(B2_pooling)

"""

'\n### when \ndef conv(w,s):\n    B=[]\n    for t in range(7-s+1):\n        for j in range(3):\n            for t_ in range(1):\n                 B.append(w[t_,j]*Z[t,j]*s)\n    return B\nB1= np.reshape(conv(w1,S[0]),[6,3]) \nB2= np.reshape(conv(w2,S[1]),[5,3]) \n\ndef pooling(B):\n    B_pooling=[]\n    for j in range(3):\n        B_pooling.append(np.max(B[:,j]))\n    return B_pooling\n\nB1_pooling=pooling(B1)\nB2_pooling=pooling(B2)\n\nB1_pooling.extend(B2_pooling)\n\n'

In [390]:

### if S is a slide width

def conv_single_step(a_slice,W):
    
    s= np.multiply(a_slice,W)
    #output=np.sum(s)
    output= s.sum(axis=0)
    return list(output)


def conv_forward(A_prev,w,S):
    single_element=[]
    (M, N) = A_prev.shape
    s1=1 #vertical step length
    s2=1 #horizen step length
    (f1,f2)= w.shape
    n_H = int((M-f1)/s1) + 1 
    n_L =int((N-f2)/s2) + 1                            
    for h in range(n_H):                           
        for l in range(n_L):                                         
            vert_start = h*s1 
            vert_end = h*s1 + f1
            horiz_start = l*s2 
            horiz_end = l*s2 + f2
            A_slice_prev = A_prev[vert_start:vert_end,horiz_start:horiz_end]
            single_element.append(conv_single_step(A_slice_prev, w))
    return single_element

B1= conv_forward(Z,w1,S)
B2= conv_forward(Z,w2,S)

"""
### when pooling before convolution:

def max_pooling(A_prev,w,S):
    single_element=[]
    (M, N) = A_prev.shape
    s1=S[0] #vertical step length
    s2=S[1] #horizen step length
    (f1,f2)= w.shape
    n_H = int((M-f1)/s1) + 1 
    n_L =int((N-f2)/s2) + 1                            
    for h in range(n_H):                           
        for l in range(n_L):                                         
            vert_start = h*s1 
            vert_end = h*s1 + f1
            horiz_start = l*s2 
            horiz_end = l*s2 + f2
            A_slice_prev = A_prev[vert_start:vert_end,horiz_start:horiz_end]
            single_element.append(np.max(A_slice_prev))
    return single_element

B1_pooling= max_pooling(Z,w1,S)
B2_pooling= max_pooling(Z,w2,S)

"""

### when pooling after convolution: 

def maxpooling(A_prev):
    results=[]
    A_prev= np.reshape(A_prev,(len(A_prev),3))
    for l in range(3):
        results.append(np.max(A_prev[:,l]))
    return results
B1_pooling= maxpooling(B1)     
B2_pooling= maxpooling(B2) 




def Concatenation(B1_pooling,B2_pooling):
    Z= B1_pooling+(B2_pooling)
    return Z

Concatenation(B1_pooling,B2_pooling)

[4.0, 5.199999999999999, 1.9000000000000001, 10.6, 12.6, 3.2]