---

In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from representations import mds, plot_dendrogram

Michael Lee has an excellent page full of datasets that include human similarity judgments: [http://faculty.sites.uci.edu/mdlee/similarity-data/](http://faculty.sites.uci.edu/mdlee/similarity-data/).

To access this data, download the zip file using the function `download_similarity_datasets`. Note that you will need an internet connection for this! Also, if you want to see what this function is doing, you can look at its source code using the double question mark:

In [3]:
from util import download_similarity_datasets
download_similarity_datasets??

In [4]:
dataset_location = download_similarity_datasets()

We can take a look at what files we now have:

In [5]:
import os
print(os.listdir(dataset_location))

['addtreegrow.m', 'faces11.mat', 'sport_romney.mat', 'toys_romney.mat', 'displayrepn_c.m', 'mds2labs.m', 'colour.mat', 'bodies_viken.mat', 'displaytree.m', 'dotpatterns.mat', 'texturebrodatz_heaps.mat', 'bic.m', 'displayrepn_d.m', 'faces5.mat', 'adclus.m', 'fish_romney.mat', 'distinctclus.m', 'animalnames11.mat', 'texturemit_heaps.mat', 'adclusgrow.m', 'countriessim.mat', 'commonclus.m', 'fruit2_romney.mat', 'vehicles2_romney.mat', 'nonsense_romney.mat', 'furniture2_romney.mat', 'phonemes.mat', 'classicalmds.m', 'fruit_romney.mat', 'fruits.mat', 'countriesdis.mat', 'similarity_dump.txt', 'abstractnumbers.mat', 'weapons_romney.mat', 'faces_steyvers.mat', 'druguse.mat', 'animalpictures11.mat', 'mds2c.m', 'flowerpots.mat', 'country_robinsonhefner.mat', 'diprox.m', 'clothing2_romney.mat', 'textures.mat', 'congress.mat', 'lines_cohen.mat', 'furniture_romney.mat', 'letters.mat', 'auditory.mat', 'tools_romney.mat', 'morsenumbers.mat', 'adclus2.m', 'birds_romney.mat', 'animalpictures5.mat', 'm

You'll notice that there are `.m` and `.mat` files. The `.m` files are Matlab scripts, so we'll just ignore those. The `.mat` files are Matlab data files (described on the website above) and we can load them into Python using the scipy function `scipy.io.loadmat`. We've written a helper function called `load_dataset` to do this for you, but you can see what this function is doing by using the double question marks:

In [6]:
from util import load_dataset
load_dataset??

As an example, here is some data that is similarity judgments of spoken letters:

In [7]:
filename = os.path.join(dataset_location, 'auditory.mat')
data = load_dataset(filename)
data

{'similarities': array([[1.        , 0.21186101, 0.        , ..., 0.        , 0.57995833,
         0.        ],
        [0.21186101, 1.        , 0.56632333, ..., 0.        , 0.33153499,
         0.        ],
        [0.        , 0.56632333, 1.        , ..., 0.23119166, 0.03050499,
         0.        ],
        ...,
        [0.        , 0.        , 0.23119166, ..., 1.        , 0.        ,
         0.03050499],
        [0.57995833, 0.33153499, 0.03050499, ..., 0.        , 1.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.03050499, 0.        ,
         1.        ]]),
 'names': ['A',
  'B',
  'C',
  'D',
  'E',
  'F',
  'G',
  'H',
  'I',
  'J',
  'K',
  'L',
  'M',
  'N',
  'P',
  'Q',
  'R',
  'S',
  'T',
  'U',
  'V',
  'W',
  'X',
  'Y',
  'Z',
  '0',
  '1',
  '2',
  '3',
  '4',
  '5',
  '6',
  '7',
  '8',
  '9']}

---
## Part A (5 points)

In Part A of this challenge, you will select a new dataset from those downloaded above and analyze it.

<div class="alert alert-success">Select a dataset. Go to the website above, find the relevant paper, and read its abstract. Write two sentences below describing the stimuli and what kind of judgment was being made.</div>

YOUR ANSWER HERE

<div class="alert alert-success">Create 2 or 3 plots revealing patterns or structure present in the data. You may want to create multiple subplots to do this. Hint: you can use two functions we provided you below.</div>

We have provided you with a function, `mds`, which performs the MDS algorithm. 
We have also provided for you a `plot_dendrogram` function that will perform hierarchical clustering and then create a dendrogram plot for you. Look at the documentation for the functions to figure out how to call them.

In [None]:
def plot_data():
    """Plots data from one of the provided data sets."""
    # YOUR CODE HERE
    raise NotImplementedError()

Here's a cell to test your plotting:

In [None]:
plot_data()

<div class="alert alert-success">Write a paragraph of at least 100 words describing what you observe from the plots.</div>

YOUR ANSWER HERE

---
## Part B (5 points)

In Part B of this challenge, you will collect a new set of similarity data and explore it.

<div class="alert alert-success">Select a domain that you think has an interesting similarity space (e.g., ponies from My Little Pony, nose shapes of California politicians). Be creative! Your set should have between 5 and 10 items. You should collect pairwise similarity ratings from at least 5 people (you and at least 4 others). Describe the domain that you chose, and give a bulleted list of the items in your dataset.</div>

YOUR ANSWER HERE

<div class="alert alert-success">In the following cell, construct a NumPy array that includes the raw similarity scores from each participant. Then, return the average similarity score across participants.</div>

In [None]:
def similarities():
    """Returns a matrix of similarity scores, averaged across participants."""
    # YOUR CODE HERE

<div class="alert alert-success">Create 2 or 3 plots revealing patterns or structure present in this data.</div>

In [None]:
def plot_collected_data(similarities):
    """Plots data from the data set you collected."""
    # YOUR CODE HERE

Here is a cell to test out your code:

In [None]:
plot_collected_data(similarities())

<div class="alert alert-success">Write a paragraph of at least 100 words describing what you observe from the plots.</div>

YOUR ANSWER HERE

---