$\newcommand{\vect}[2]{[#1, #2]^T}$
## Q1.8

Some setup code:

In [20]:
import os
import numpy as np
from numpy.linalg import norm
from scipy.io import loadmat

def get_distance(x, y):
    return np.sqrt(np.sum((x - y) ** 2))
    
def get_angle(x, y):
    cos_angle = np.dot(x, y) / (norm(x) * norm(y))
    return np.arccos(cos_angle)

# This will help us get answers for both parts of the problem.
# V is the transposed data matrix, with shape (10, 1651).
def print_best_pairs(V):
    min_dist, min_angle = None, None
    min_dist_pair = None
    min_angle_pair = None
    for i in range(len(V)):
        for j in range(i + 1, len(V)):
            dist = get_distance(V[i], V[j])
            angle = np.abs(get_angle(V[i], V[j]))
            if dist < min_dist or min_dist is None:
                best_dist_pair = 1 + i, 1 + j
                min_dist = dist
            if angle < min_angle or min_angle is None:
                best_angle_pair = 1 + i, 1 + j
                min_angle = angle
            
    print('Lowest distance pair is: (v%d, v%d)' % (best_dist_pair))
    print('Lowest angle pair is: (v%d, v%d)' % (best_angle_pair))

data_path = os.path.join('PS01_dataSet', 'wordVecV.mat')
data = loadmat(data_path)
V = data['V'].T
num_docs = len(V)

4094.0


### a) ###

In [4]:
print_best_pairs(V)

Lowest distance pair is: (v7, v8)
Lowest angle pair is: (v9, v10)


They are not the same pair. The reason for this is probably that the vectors aren't normalized, and in this case using angle vs distance for metrics gives us different answers. This was shown in Q1.7.

### b) ###

In [3]:
normalizer = np.sum(V, axis=1, keepdims=True)
V_l1_normed = V / normalizer
print_best_pairs(V_l1_normed)

Lowest distance pair is: (v9, v10)
Lowest angle pair is: (v9, v10)


The lowest angle difference pair is the same as part a); this is expected as all we've done is scale the vectors (i.e they still point in the same directions). What has changed is that the distance metric now agrees with the angle difference metric on the nearest neighbor.

One possible reason for using this normalization would be to decrease the relative distance for documents with very similar structure but differing lengths. A contrived example would be two documents A and B, where B is just A repeated a few times. In this case normalization will help make the distance between the two 0. 

### c) ###

In [6]:
fdoc = np.sum(V > 0, axis=0, keepdims=True) 
tfidf_log_term = np.sqrt(np.log(num_docs / fdoc))
V_tfidf = V_l1_normed * tfidf_log_term
print_best_pairs(V_tfidf)

Lowest distance pair is: (v9, v10)
Lowest angle pair is: (v8, v10)


### d) ###

The "inverse document frequency" adjustment lowers the $f_{term}$ values for words that occur frequently across _all_ documents, while putting relatively more scaling on words that occur only in fewer documents. Geometrically, this means creating more separation along axes of words that occur more rarely across documents, and having the resulting vectors "point" more along these axes.

For example, the only two documents that contain the word "optimization" will separate themselves more from the other documents by pointing more along the "optimization" axis.

This scaling might be useful as it helps lower the importance of words that don't help identify a document uniquely or help our distance metric, since they occur everywhere regardless of the document.

### e) ###

In [22]:
import os
from collections import Counter

# Boilerplate as given by the problem handout.
file_path = os.path.join('PS01_dataSet', 'wordVecArticles.txt')
articles = [ line.rstrip ('\n ') for line in open (file_path) ]

# Keep a running index for each unique word encountered.
cur_idx = 0
t_dict = {}
all_word_counts = []
for i, article in enumerate(articles):
    wordcounts = Counter(article.split())
    for word, count in wordcounts.items():
        if not word in t_dict:
            t_dict[word] = cur_idx
            cur_idx += 1
    all_word_counts.append(wordcounts)

# Add the wordcounts to the matrix as defined by their index.
my_V = np.zeros((len(articles), cur_idx))
for i, wordcounts in enumerate(all_word_counts):
    for word, count in wordcounts.items():
        t_idx = t_dict[word]
        my_V[i][t_idx] = count

# Part a.
print('Part a: ')
print_best_pairs(my_V)

# Part b.
print('')
print('Part b: ')
normalizer = np.sum(my_V, axis=1, keepdims=True)
my_V_l1_normed = my_V / normalizer
print_best_pairs(my_V_l1_normed)

# Part c.
print('')
print('Part c: ')
fdoc = np.sum(my_V > 0, axis=0, keepdims=True) 
tfidf_log_term = np.sqrt(np.log(num_docs / fdoc))
my_V_tfidf = my_V_l1_normed * tfidf_log_term
print_best_pairs(my_V_tfidf)

Part a: 
Lowest distance pair is: (v7, v8)
Lowest angle pair is: (v9, v10)

Part b: 
Lowest distance pair is: (v9, v10)
Lowest angle pair is: (v9, v10)

Part c: 
Lowest distance pair is: (v9, v10)
Lowest angle pair is: (v8, v10)


## Q1.9

### a) ###
1. $ \nabla f_1 = \vect{2}{3} $
2. $ \nabla f_2 = \vect{2x - y}{2y - x} $
3. $ \nabla f_3 = \vect{cos(y - 5) - (y - 5)cos(x - 5)}{(5 - x)sin(y - 5) - sin(x - 5)} $

### b) ###
Contour plot for equation 1:

In [20]:
import numpy as np
import plotly.offline as py
import plotly.graph_objs as go
py.offline.init_notebook_mode(connected=True)

# Common function used for plotting.
def plot_f(f, filename='default-plot', title='default-plot', plot_contour=True):
    n_x, n_y = 50, 50
    x_1d, y_1d = np.linspace(-2, 3.5, n_x), np.linspace(-2, 3.5, n_y)
    x, y = np.meshgrid(x_1d, y_1d)
    layout = go.Layout(
        title=title,
        margin=go.layout.Margin(
            l=10,
            r=10,
            b=25,
            t=50
        )
    )
    if plot_contour:
        assert not isinstance(f, list)
        data = [go.Contour(z=f(x, y), x=x_1d, y=y_1d)]
    else:
        if not isinstance(f, list):
            f = [f]
        data = [go.Surface(z=func(x, y), x=x_1d, y=y_1d) for func in f]
    fig = go.Figure(data=data, layout=layout)
    py.iplot(fig, filename=filename)

# The 3 functions as defined in the problem.
def f1(x, y):
    return 2 * x + 3 * y + 1

def f2(x, y):
    return x ** 2 + y ** 2 - x * y - 5

def f3(x, y):
    return (x - 5) * np.cos(y - 5) - (y - 5) * np.sin(x - 5)

# Plot figures.
plot_f(f1, '1.9-b-f1', title='f1')
plot_f(f2, '1.9-b-f2', title='f2')
plot_f(f3, '1.9-b-f3', title='f3')

### c) ###

We need to get equations for the tangent plane approximations of the given functions. We can treat each of these functions as the level surface of 4D functions $g(x, y, z)$, and the gradient of this function will give us the normal $a$ to the plane offset to the origin. Let $g(x, y, z) = f(x, y) - z$, then:

$\nabla g = [\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, -1]^{T}$

The gradient $g$ evaluated at $x_0, y_0$, equal to the normal vector $a$ then describes the tangent plane offseted to pass through the origin (made a subspace), which has equation:

$$
a = \nabla g(x_0, y_0) \\
a^{T}v' = 0
$$

Let $w_{i0} = [x_0, y_0, f_i(x_0, y_0)]^{T} $. 
So our tangent plane approximation has equation:

$$
\begin{aligned}
a^{T}(v' + w_{i0}) &= a^{T}w_0 \\
a^{T}v &= a^{T}w_0
\end{aligned}
$$

where $v = [x, y, z]^{T}$.
We will borrow results from part a) for the gradients.

1. $a = [2, 3, -1]^{T}$. $w_{0} = [1, 0, 3]^T$:

    $$
    \begin{aligned}
    a^{T}v = a^{T}w_{i0}  \\
    2x + 3y -z = -1 \\
    z = 1 + 2x + 3y
    \end{aligned}
    $$

    This makes sense as we are making a tangent plane approximation to a plane, i.e they should be identical.

In [21]:
def f1_tangent(x, y):
    return 2 * x + 3 * y + 1
    
plot_f([f1, f1_tangent], '1.9-c-f1', title='f1-tangent', plot_contour=False)

2. $a = [2, -1, -1]^{T}$. $w_{0} = [1, 0, -4]^T$:

    $$
    \begin{aligned}
    a^{T}v = a^{T}w_{i0}  \\
    2x - y - z &= 2 + 4 \\
    z = y - 2x -6
    \end{aligned}
    $$


In [22]:
def f2_tangent(x, y):
    return y - 2 * x -6
    
plot_f([f2, f2_tangent], '1.9-c-f2', title='f2-tangent', plot_contour=False)

3. $a = [cos(5) + 5cos(4), sin(4) - 5sin(5), -1]^{T}$. $w_0 = [1, 0, -4cos(5) - 5sin(4)]^T$:

    $$
    \begin{aligned}
    a^{T}v = a^{T}w_{0}  \\
    (cos(5) + 5cos(4))x + (sin(4) - 4sin(5))y - z &= cos(5) + 5cos(4) + 4cos(5) + 5sin(4) \\
    (cos(5) + 5cos(4))x + (sin(4) - 4sin(5))y - z &= 5cos(5) + 5cos(4) + 5sin(4) \\
    z = (sin(4) - 4sin(5))y + (cos(5) + 5cos(4))x - (5cos(5) + 5cos(4) + 5sin(4))
    \end{aligned}
    $$

In [27]:
def f3_tangent(x, y):
    x_term = (np.cos(5) + 5*np.cos(4)) * x
    y_term = (np.sin(4) - 4 * np.sin(5)) * y 
    constant = -(5 * np.cos(5) + 5 * np.cos(4) + 5 * np.sin(4))
    return x_term + y_term + constant
    
plot_f([f3, f3_tangent], '1.9-c-f3', title='f3-tangent', plot_contour=False)