# Inferring Psychological Features

Before starting:

- Introduce yourself to someone you don't know

- Join a **group** of 2+ students

- Feel free to move your desks together and **discuss** / **help** each other

In [None]:
# Run this cell first
import numpy as np
import pandas as pd
from sklearn.manifold import MDS
import matplotlib.pyplot as plt

from tools import *

### Review ###

Last time we learned that we can think of similarity as a sum of importance weights for features that objects share.

For example, the similarity between two animals increases when (1) they can both fly (shared feature) and (2) the ability to fly is psychologically important (high importance weight).

For reference, recall that estimated similarity

$\hat{s}_{ij} = \sum_{k=1}^{m} w_k f_{ik} f_{jk}$, where:
- $f_{ik}$ is the $k{^\text{th}}$ feature for object $i$,
- $f_{jk}$ is the $k{^\text{th}}$ feature for object $j$, and
- $w_k$ is a non-negative weight corresponding to the $k{^\text{th}}$ feature.

While we can observe human similarity ratings for object pairs $s_{ij}$, we can't observe features $f_{ik}$ or importance weights $w_k$.

### Inferring Features ###

In the previous lab, we learned that, given a set of features, we can infer importance weights for those features that best predict similarity. That's part of the battle, but we also need to infer the features themselves.

Below we load a data set describing the similarity between pairs of four numbers (3, 4, 6, and 8) -- e.g., how similar is the numer 3 to the number 6?

In [None]:
df_sim = load_sim_data()
df_sim

**Exercise 1:** 

(NOTE: Before answering the below, compare your rationale to your partner or group.)

Based on the above, which two numbers do people think are most similar?

In [None]:
# answer1 = "3 and 4"
# answer1 = "3 and 6"
# answer1 = "3 and 8"
# answer1 = "4 and 6"
# answer1 = "4 and 8"
# answer1 = "6 and 8"

In [None]:
# TEST YOUR SOLUTION

# DON'T CHANGE THIS CELL
if int(answer1[-1]) + int(answer1[0]) == int(36/3):
    print('Test passed')
else:
    print('Test failed')

The above similarity data should give us hints about **what features the mind uses to represent numbers**.

In particular, we need to answer two questions:
1. **How many** features does the mind use to represent numbers?
2. **What** are those features?

Let's make an initial guess.

Consider the output of the cell below. Just like the features for the animals data we looked at last time, the rows of the dataframe correspond to the numbers being compared (3, 4, 6, 8) and the columns correspond to a set of features that describe those numbers.

In [None]:
df_first_guess = load_first_guess()
df_first_guess

The above feature representation has four very simple features. For example, if the number is a three, it has the feature "is_3", and there's a 1 in the "is_3" column. The same goes for the other columns.

**Exercise 2:** Without writing any code or doing any calculations, and regardless of how important each feature might be, what is the similarity between the numbers 4 and 6 given this feature representation?

Discuss with your partner.

In [None]:
answer2 = ???

In [None]:
# TEST YOUR SOLUTION

# DON'T CHANGE THIS CELL
if answer2 == check_ex2():
    print('Test passed')
else:
    print('Test failed')

We can calculate the full set of similarities that these features predict by calling the function we wrote last time called `compute_similarity(some_feature_df, some_weights)` that is already loaded.

**Exercise 3:** How many weights will be needed to compute similarity?

In [None]:
answer3 = ???

In [None]:
# TEST YOUR SOLUTION

# DON'T CHANGE THIS CELL
if answer3 == check_ex3():
    print('Test passed')
else:
    print('Test failed')

**Exercise 4:** Use the dummy weights given in the next cell to compute similarity in the following cell given the features in  `df_first_guess`. Store the result in `df_sim_first_guess`.

(We'll worry about finding the best weights later after we find some good features.)

In [None]:
dummy_weights = np.ones(df_first_guess.shape[1]) * 0.1
dummy_weights

In [None]:
# Your code here



# do not change
df_sim_first_guess

In [None]:
# TEST YOUR SOLUTION

# DON'T CHANGE THIS CELL
if df_sim_first_guess.loc[4, 3] == check_ex2():
    print('Test passed')
else:
    print('Test failed')

The current feature representation implies that none of the numbers are similar because none of them have any features in common, which makes intuitive sense.

Let's compute error (comparing to human similarity) as a baseline and then see if we can improve things. We can do this using the already-loaded function that we created last time: 
```python
    error_given_weights(some_weights, some_feature_df, human_sim_df)
```

**Exercise 5:** Use `dummy_weights` again and the features in `df_first_guess` to compute error in reconstructing human similarities. Store the result in `baseline_error`.

In [None]:
# Your code here



# don't change
try:
    print(baseline_error)
except:
    print('baseline_error not defined yet')

In [None]:
# TEST YOUR SOLUTION

# DON'T CHANGE THIS CELL
if baseline_error == 0.71828:
    print('Test passed')
else:
    print('Test failed')

Let's see if we can do any better.

**Exercise 6:** Edit the 8 binary values below to produce two features, one representing "smaller numbers" and one representing "bigger numbers". Do not change any of the code.

In [None]:
features = np.array(
    # CHANGE only these numbers (flip 0s and 1s)
    [
        [1, 0],
        [0, 1],
        [0, 0],
        [0, 0],
    ]
)

# DO NOT CHANGE ANYTHING BELOW THIS LINE
print("YOUR GUESSED FEATURES:")
df_student_guess = pd.DataFrame(
    features,
    index=[3, 4, 6, 8], 
    columns=["feature1", "feature2"]
)
print(df_student_guess, "\n")

student_error = error_given_weights(
    np.ones(df_student_guess.shape[1]) * 0.1, 
    df_student_guess, 
    df_sim
)
print("YOUR ERROR:", float(student_error))
if student_error < baseline_error:
    print("-- Your guess ({}) is better than the baseline ({})".format(
        student_error, baseline_error))
else:
    print("-- TRY AGAIN - Your guess ({}) is worse than the baseline ({})".format(
        student_error, baseline_error))

In [None]:
# TEST YOUR SOLUTION

# DON'T CHANGE THIS CELL
if student_error == 0.61208:
    print('Test passed')
else:
    print('Test failed')

Your intuitions about what the correct features might be are relevant, and almost certainly better than a random guess. However, strictly speaking, it's next to impossible in most cases to figure out the correct features by hand.

In order to find the feature representation that best explains human similarity, we need to evaluate all possible feature representations of a given size.

For each cell in the feature matrix (e.g., feature1 for the number 3), we have exactly 2 possibilities: either the object has that feature (1) or it doesn't (0). If we have just one object and one feature, there are $2^1 = 2$ possibilities (0 or 1). For one object with two features, we'd have $2^2 = 4$ possible feature combinations.

When we extend this to multiple objects, the possibilities multiply. With $n$ objects and $m$ features, we need to fill $n \times m$ cells, each with a binary choice.

**Exercise 7:** Create a function called `n_possible_representations` that takes two integer arguments, number of objects and number of features, and returns the total number of possible feature combinations across all cells (the number of possible feature representations for a given size feature matrix (n_objects, n_features)).

In [None]:
# Your code here



In [None]:
# TEST YOUR SOLUTION

# DON'T CHANGE THIS CELL
if n_possible_representations(1, 2) == 4:
    print('Test passed')
else:
    print('Test failed')

We are currently looking for features that characterize a set of 4 numbers (3, 4, 6, and 8).

**Exercise 8:** Compute the number of possible representations for our numbers as n_features varies from 1 to 10. Store the results in a numpy array called `n_reps`.

In [None]:
# Your code here



# do not change
try:
    for i, n in enumerate(n_reps):
        print("For {} features, there are {} possible representations".format(i+1, n))
except:
    print('Something is wrong.')

As we can see, for even modest numbers of features, there are more possible represenations than we can evaluate.

With only 4 objects and 10 features, there are 1,099,511,627,776 possible representations to evaluate. With more than 4 objects, this trend gets considerably worse.

Shepard proposed some clever methods for selecting among these possibilities, but for now, let's just restrict the number of features we will consider to 3.

To infer the set of three features that best explain human similarity, we first need to enumerate all possible represenations of that size. To do this, we provide the preloaded function `generate_all_binary_matrices(n_objects, n_features)` that returns all possible matrices for the given input parameters.

**Exercise 9:** Compute all possible 3-feature representations that could charaterize our number similarity data. Store the result in `possible_representations`.

In [None]:
# Your code here



In [None]:
# TEST YOUR SOLUTION

# DON'T CHANGE THIS CELL
if len(possible_representations) == 4096:
    print('Test passed')
else:
    print('Test failed')

Run the cell below to see how many candidate representations we have to evaluate.

In [None]:
len(possible_representations)

Run the cell below to see one of these candidate representations.

In [None]:
possible_representations[2000]

Now we have everything we need to identify the best 3-feature psychological representation that explains our number similarity data.

For each candidate representation, we need to (1) compute the best weights $w_k$ given that representation, and (2) compute the error given both the representation and the best weights.

**Exercise 10:** Complete the below code to find the `best_features`.

In [None]:
# do not change
best_error = np.inf
print("Initial best error:", best_error)

for ??? in ???:

    # ???
    # ???

    # do not change
    if error < best_error:
        best_features = current_guess.copy()
        best_error = error
        print('Found better features with error:', error)

# do not change
best_features

In [None]:
# TEST YOUR SOLUTION

# DON'T CHANGE THIS CELL
if best_error == 1.2926804362317114e-16:
    print('Test passed')
else:
    print('Test failed')

Notice the pattern of feature membership. For example, feature1 and feature 2 overlap in that they both characterize the digit 4, but do not share any other digits. Such features are often referred to as **overlapping clusters** (i.e., partially overlapping groups of objects) and the process of finding such clusters is called **additive clustering**. One kind of overlap is to have no overlap (i.e., objects belong to only one cluster), and thus feature representations can take on either an overlapping or non-overlapping structure.

**Exercise 11:** What is the best interpretation of the best features (feature1, feature2, and feature3) we found?

In [None]:
# answer4 = "even numbers, multiples of 3, large numbers"
# answer4 = "even numbers, small numbers, large numbers"
# answer4 = "powers of 2, small-to-medium numbers, medium-to-large numbers"

In [None]:
# TEST YOUR SOLUTION

# DON'T CHANGE THIS CELL
if "em ,"[::-1] in answer4:
    print('Test passed')
else:
    print('Test failed')

Shepard's full set of similarity data concerned 10 numbers (0-9), and he was able to infer up to 10 features.

**Exercise 12:** Compute the number of possible 10x10 feature representations and store the result in `shep_possible`.

Shepard used methods to cut down this number that we will explore in a later assignment.

In [None]:
# Your code here



# do not change
try:
    print("Number of possible 10x10 representations:", shep_possible)
except:
    print('Something is wrong.')

In [None]:
# TEST YOUR SOLUTION

# DON'T CHANGE THIS CELL
if shep_possible == check_ex12():
    print('Test passed')
else:
    print('Test failed')

Using his more efficient methods, Shepard inferred the following **psychological features of numbers**, listed from left to right (as columns) in order of importance. The exact weights are also printed.

In [None]:
shep_features, shep_weights = load_number_features(with_weights=True, subset=False)
print("Weights: ", shep_weights)
shep_features

**Exercise 13:** Enter four integers into the `feature_subset` python list below to sub-select the features that appear to be better described as abstract / formal mathematical properties as opposed to magnitude-based properties.

In [None]:
feature_subset = [???]

shep_features.iloc[:, feature_subset]

In [None]:
# TEST YOUR SOLUTION

# DON'T CHANGE THIS CELL
if feature_subset == check_ex13():
    print('Test passed')
else:
    print('Test failed')

**Exercise 14:** Which feature of numbers does the mind appear to regard as the most important?

In [None]:
# answer5 = "Smallish Numbers"
# answer5 = "Powers of Two"
# answer5 = "Additive & Multiplicative Identities"

In [None]:
# TEST YOUR SOLUTION

# DON'T CHANGE THIS CELL
if "o s"[::-1] in answer5:
    print('Test passed')
else:
    print('Test failed')

Below we load the full empirical similarity matrix for Shepard's numbers.

In [None]:
df_sim_full = load_numbers_full()
df_sim_full

**Exercise 15:** Which pair of numbers is the most similar? You don't need to write any code, but you can if you want.

**Hint:** It's not 0 and 1, even though they share the most features. **Discuss with your partner** why this is the case.

In [None]:
smaller_number = ???
larger_number = ???

In [None]:
# TEST YOUR SOLUTION

# DON'T CHANGE THIS CELL
if (smaller_number, larger_number) == check_ex15():
    print('Test passed')
else:
    print('Test failed')

We can quickly perform NMDS on this similarity data in just a few lines of code using the `scikit-learn` python package. `scikit-learn` requires "dissimilarities" which are just 1 - similarities.

**Exercise 16:** Complete the code below to run NMDS and store the resulting points for each of Shepard's numbers in `points`. 
- To get a 2D representation, set the `n_components` argument to `2`. 
- Also set the `metric` argument to `False` to make sure we use NMDS and not classic MDS (i.e., where the relationship between similarity and distance must be linear).
- Call `nmds.fit_transform` at the end using `dissimilarities` as the argument.

In [None]:
# scikit-learn requires "dissimilarities" which are 1 - similarities

similarities = df_sim_full.values
dissimilarities = 1 - ???

# set up the NMDS algorithm
nmds = MDS(
    n_components=?,
    metric=?,
    dissimilarity='precomputed', 
    random_state=1,
)

# run it and find points for each number
points = nmds.fit_transform(???)

In [None]:
# TEST YOUR SOLUTION

# DON'T CHANGE THIS CELL
if points.shape == (10, 2) and points[0, 0] == 0.3188197867587467:
    print('Test passed')
else:
    print('Test failed')

Run the cell below to visualize the NMDS solution.

Note that the dimensions are hard to interpret.

In [None]:
plot_spatial_representation(points)

Now let's visualize the clusters in this same space. We will simply draw an ellipse around each cluster of points that share a discrete feature (from the set Shepard inferred above).

In [None]:
plot_spatial_representation(points, shep_features)

Note that 3, 6, and 9, are circled. Also, 2, 4, and 8 are circled. Other clusters are a bit harder to see given all of the overlap, but all 10 are there.

Importantly, this cluster-based representation captures aspects of human psychological representations that are not as obvious in a continuous space. For example, in the 2D space, 6 and 9 are very far apart. However, in the discrete feature representation, they are grouped by the elongated purple ellipse.

We also aren't just limited to the above. There are many other possible representational structures that we might find in the mind beyond spaces and arbitrarily overlapping features.

### Hierarchical Representations

For example, many psychological representations have a **hierarchical** structure, which is an organization of objects into hierarchies or trees. 

Finding such an organization that best explains similarity is called **hierarchical clustering**, and we can think of the resulting clusters as features with special properties / constraints.

1. **Features have a nested structure**: High-level features contain low-level features. For example, the high-level feature "animals" might contain two low-level features "mammals" and "non-mammals".
2. **Features form a proper hierarchy**: If two objects share feature X (e.g., "mammals"), they must share all higher-level features that contain feature X (e.g., "animals").
3. **Constraints on feature / cluster overlap**: Unlike additive clustering, hierarchical features can't overlap arbitrarily. They can only overlap hierarchically in the way described above.

The **hierarchical clustering algorithm** start by treating each observation as a separate cluster. Then, it repeatedly executes the following two steps: 
1. identify the two clusters that are closest together, and 
2. merge the two most similar clusters.

This iterative process continues until all the clusters are merged together. We will learn more about this algorithm in a future exercise.

Run the cell below to see a hierachical tree inferred for Shepard's number data. This kind of visualize is called a **dendrogram**, and we will learn more about it later.

In [None]:
Z = find_and_plot_hierarchical_features(df_sim_full)

Note that at the highest level, all numbers are split into two clusters: (0 and 1) versus all other numbers.

Within the latter (much bigger) cluster, numbers are split between two clusters: (2, 4, 8) and all other numbers.

The pattern continues: each cluster is further split into smaller subclusters until individual numbers remain.

Run the cell below to see the corresponding feature matrix for this hierarchically organized feature reprepresentation.

In [None]:
hierarchical_features = hierarchical_to_feature_matrix(Z, df_sim_full.index)
hierarchical_features

We can see the hierarchical structure in the features above. For example, `hier_feature8` shows all 1s for numbers 2 - 8. Those 8 cluster members are then divided between lower level features `hier_feature7` and `hier_feature4`.

A hierarchical feature matrix is still a feature matrix, and thus additive clustering can find such matrices. However, since hierarchical feature matrices have constraints on their features, there are less of them compared to arbitrary overlapping clusters given a fixed matrix size. Thus, assuming a hierarchy is one way to cut down the number of feature representations to evaluate, but in the case of Shepard's numbers, it's not the *correct* assumption.

For example, consider the best features that Shepard found. Note that:
- Numbers 3 and 5 share the "Odd Numbers" feature
- Numbers 3 and 6 share the "Middle Numbers" feature
- Numbers 5 and 6 share the "Moderately Large Numbers" feature

This creates a triangle pattern where each pair shares a different feature. In a hierarchical structure, if two pairs share features, then either all three objects must share a feature, or the third pair can't share any features.

Our task of uncovering psychological representations thus requires evaluating **which potential representational structures provide the best fit to human behavior** (e.g., similarity data).