# NMDS Gradients

In class, we performed NMDS by initializing points for each object (i.e., animal) randomly and then iteratively moving points in a direction that will reduce stress. In this assignment, we will implement the calculation of these directions along with the rest of the NMDS algorithm.

You will need to refer to the lecture notebooks from class as well as what you learned in discussion section to complete this assignment.

In [176]:
!pip install -q otter-grader

import otter
grader = otter.Notebook("hw6.ipynb")

import numpy as np
import pandas as pd

The human similarity data is loaded below. We want to find a set of points that is related to these similarity ratings in the in the way that Shepard's Law predicts. That set of points is an inferred psychological representation.

In [177]:
sim_vals = [
 [1., 0.600554459474065, 0.7536383164437648, 0.5312856091329679],
 [0.600554459474065, 1., 0.49306869139523984, 0.7288934141100247],
 [0.7536383164437648, 0.49306869139523984, 1., 0.4088417197978041],
 [0.5312856091329679, 0.7288934141100247, 0.4088417197978041, 1.]
]
labels = ['dog', 'cat', 'wolf', 'rabbit']
df_sim = pd.DataFrame(sim_vals, columns=labels, index=labels)
df_sim

Unnamed: 0,dog,cat,wolf,rabbit
dog,1.0,0.600554,0.753638,0.531286
cat,0.600554,1.0,0.493069,0.728893
wolf,0.753638,0.493069,1.0,0.408842
rabbit,0.531286,0.728893,0.408842,1.0


First we will randomly initialize points for each animal:

In [178]:
# do not change
np.random.seed(10)

# do not change
def create_initial_random_points():
    points = np.random.rand(4, 2)
    points_df = pd.DataFrame(
        points, 
        columns=['dim1', 'dim2'], 
        index=['dog', 'cat', 'wolf', 'rabbit']
    )
    return points_df

# do not change
df_guesses = create_initial_random_points()
df_guesses

Unnamed: 0,dim1,dim2
dog,0.771321,0.020752
cat,0.633648,0.748804
wolf,0.498507,0.224797
rabbit,0.198063,0.760531


As discussed in section, NMDS iteratively adjusts each point $x_i$ with respect to each other point $x_j$ (where $i \neq j$) using the formula:

$x_i = x_i + \text{step\_size} * (d̂ᵢⱼ - dᵢⱼ) × (x_j - x_i)/d̂ᵢⱼ$,

where $dᵢⱼ$ is distance in psychological space, $d̂ᵢⱼ$ is the distance between points we are adjusting, and $(d̂ᵢⱼ - dᵢⱼ) × (x_j - x_i)/d̂ᵢⱼ$ is called the **gradient**.

To review only briefly, $(x_j - x_i)/d̂ᵢⱼ$ is a unit vector that points from $x_i$ to $x_j$, and $(d̂ᵢⱼ - dᵢⱼ)$ is the important signed term in the stress that determines (1) whether we step in the direction of $(x_j - x_i)/d̂ᵢⱼ$ or $-(x_j - x_i)/d̂ᵢⱼ$, and (2) the size of the step we take relative to other points.

**Exercise 1:**

Perform NMDS using the following criteria:

- Store gradients for all points on all iterations in a multidimensional numpy array called `directions` with shape `(n_iterations, n_animals, n_animals - 1, 2)`.
- Set $\text{step\_size}$ to $0.4$.
- Run for $100$ iterations.
- Store the stress for each iteration in an array called `stress_vals`.

In [179]:
# Your code goes here
step_size = 0.4
n_iterations = 100
n_animals = 4
stress_vals = []
directions = np.ndarray((n_iterations, n_animals, n_animals - 1, 2))

df_d = -np.log(df_sim)

for iter in range(n_iterations):
    gradients = np.zeros((n_animals, n_animals - 1, 2))
    points = df_guesses.values
    
    for i in range(n_animals):
        grad_iter = 0
        
        for j in range(n_animals):
            if i == j:
                continue
            
            xi, xj = points[i], points[j]
            
            d_hat_ij = np.sqrt(np.pow(xi[0] - xj[0], 2)+ np.pow(xi[1] - xj[1],2))
            
            d_ij = df_d.iloc[i, j]
            
            unit_vector = (xj - xi) / d_hat_ij
            
            gradient = (d_hat_ij - d_ij) * unit_vector
            gradients[i, grad_iter, :] = gradient
            grad_iter += 1


    for i in range(n_animals):
        total_gradient = np.sum(gradients[i], axis=0)
        points[i] += step_size * total_gradient
        

    df_guesses.iloc[:, :] = points
    
    stress = 0
    for i in range(n_animals):
        for j in range(i + 1, n_animals):
            xi, xj = points[i], points[j]
            
            d_hat_ij = np.sqrt(np.pow(xi[0] - xj[0], 2)+ np.pow(xi[1] - xj[1],2))
            d_ij = df_d.iloc[i, j]
            
            stress += (d_ij - d_hat_ij) ** 2

    stress_vals.append(np.sqrt(stress))
    directions[iter] = gradients

# DO NOT CHANGE
for stress_val in stress_vals:
    print(stress_val)

0.1615034213424213
0.09293248150287595
0.07289205767606863
0.06531480818210283
0.06200313834106787
0.060306441287781544
0.05925134305801186
0.05849464217426252
0.0578910313908312
0.05738045560924361
0.056931070082903275
0.05652574534898005
0.05615281878693984
0.055804305945440176
0.055473929200291185
0.05515675615451018
0.05484860880833651
0.0545459073271852
0.05424542726194481
0.0539441968567224
0.05363936808156965
0.05332813782373726
0.053007661223566385
0.052674983005149834
0.05232696505959684
0.05196021906741544
0.0515710356809378
0.05115531365272295
0.050708486248981235
0.05022544767529442
0.049700480933655145
0.049127192158068975
0.04849845788043023
0.04780639562089573
0.04704237170267962
0.046197065050504214
0.04526061024007379
0.044222847145177074
0.043073706107402625
0.041803754735055
0.04040492212410471
0.038871395699320015
0.037200653005401214
0.035394546654322734
0.033460311056684865
0.03141131664770502
0.029267378426758212
0.02705444882860243
0.02480360072102024
0.02254932

In [180]:
grader.check("q1")