# Naturalistic Analyses

This tutorial will cover features in quail that allow the package to be used to analyze and plot data from naturalistic free recall experiments. 

There are three key word arguments in the `analyze` method particularly useful for naturalistic analysis:
1. `match` - the matching approach used to compute recall matrices
2. `distance` - the distance function used to compare presented and recalled items
3. `features` - which features to consider when computing distance

First, let's load in some example data

In [1]:
import quail
%matplotlib inline
egg = quail.load_example_data(dataset='naturalistic')

The example data used in this tutorial is based on an open dataset from Chen et al., 2017, in which 17 participants viewed and then verbally recounted an episode of the BBC series _Sherlock_. We fit a topic model to hand-annotated text descriptions of the episode and used the model to transform the video annotations and the recall transcriptions for each subject. We then used a Hidden Markov Model to segment the video and recall models into and optimal number of "events". 

Here, the egg's `pres` field consists of 34 stimulus events (the number of segments determined by our HMM) for each subject. Each stimulus event is represented by a dictionary containing the temporal position of the video segment (`'item'`) and and the array of topic vectors comprising that event (`'features'`).

In [2]:
# The presentation position of each stimulus event...
egg.get_pres_items().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,0,1,2,3,4,5,6,7,8,9,...,24,25,26,27,28,29,30,31,32,33
Subject,List,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
0,0,0,1,2,3,4,5,6,7,8,9,...,24,25,26,27,28,29,30,31,32,33
1,0,0,1,2,3,4,5,6,7,8,9,...,24,25,26,27,28,29,30,31,32,33
2,0,0,1,2,3,4,5,6,7,8,9,...,24,25,26,27,28,29,30,31,32,33
3,0,0,1,2,3,4,5,6,7,8,9,...,24,25,26,27,28,29,30,31,32,33
4,0,0,1,2,3,4,5,6,7,8,9,...,24,25,26,27,28,29,30,31,32,33


In [3]:
# ...and their corresponding topic vectors
egg.get_pres_features().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,0,1,2,3,4,5,6,7,8,9,...,24,25,26,27,28,29,30,31,32,33
Subject,List,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
0,0,"{'features': [1.3095040225638681e-05, 1.309504...","{'features': [1.1307998068474283e-05, 1.130799...","{'features': [9.693039650300355e-06, 9.6930396...","{'features': [9.913957255604304e-06, 9.9139572...","{'features': [1.0626906175571998e-05, 1.062690...","{'features': [1.0663216195539594e-05, 1.066321...","{'features': [1.0532922580253435e-05, 1.053292...","{'features': [1.022682519393662e-05, 1.0226825...","{'features': [9.815446687603108e-06, 9.8154466...","{'features': [9.707331800972813e-06, 9.7073318...",...,"{'features': [1.1240052636101565e-05, 1.124005...","{'features': [1.2320787734091317e-05, 1.232078...","{'features': [1.224700269751644e-05, 1.2247002...","{'features': [1.1916331700340415e-05, 1.191633...","{'features': [1.1204077693770254e-05, 1.120407...","{'features': [1.1367721441462295e-05, 1.136772...","{'features': [1.0987763886778575e-05, 1.098776...","{'features': [1.0659164036611576e-05, 1.065916...","{'features': [1.8863247709037007e-05, 1.886324...","{'features': [0.0001067916639972405, 0.0001067..."
1,0,"{'features': [1.3095040225638681e-05, 1.309504...","{'features': [1.1307998068474283e-05, 1.130799...","{'features': [9.693039650300355e-06, 9.6930396...","{'features': [9.913957255604304e-06, 9.9139572...","{'features': [1.0626906175571998e-05, 1.062690...","{'features': [1.0663216195539594e-05, 1.066321...","{'features': [1.0532922580253435e-05, 1.053292...","{'features': [1.022682519393662e-05, 1.0226825...","{'features': [9.815446687603108e-06, 9.8154466...","{'features': [9.707331800972813e-06, 9.7073318...",...,"{'features': [1.1240052636101565e-05, 1.124005...","{'features': [1.2320787734091317e-05, 1.232078...","{'features': [1.224700269751644e-05, 1.2247002...","{'features': [1.1916331700340415e-05, 1.191633...","{'features': [1.1204077693770254e-05, 1.120407...","{'features': [1.1367721441462295e-05, 1.136772...","{'features': [1.0987763886778575e-05, 1.098776...","{'features': [1.0659164036611576e-05, 1.065916...","{'features': [1.8863247709037007e-05, 1.886324...","{'features': [0.0001067916639972405, 0.0001067..."
2,0,"{'features': [1.3095040225638681e-05, 1.309504...","{'features': [1.1307998068474283e-05, 1.130799...","{'features': [9.693039650300355e-06, 9.6930396...","{'features': [9.913957255604304e-06, 9.9139572...","{'features': [1.0626906175571998e-05, 1.062690...","{'features': [1.0663216195539594e-05, 1.066321...","{'features': [1.0532922580253435e-05, 1.053292...","{'features': [1.022682519393662e-05, 1.0226825...","{'features': [9.815446687603108e-06, 9.8154466...","{'features': [9.707331800972813e-06, 9.7073318...",...,"{'features': [1.1240052636101565e-05, 1.124005...","{'features': [1.2320787734091317e-05, 1.232078...","{'features': [1.224700269751644e-05, 1.2247002...","{'features': [1.1916331700340415e-05, 1.191633...","{'features': [1.1204077693770254e-05, 1.120407...","{'features': [1.1367721441462295e-05, 1.136772...","{'features': [1.0987763886778575e-05, 1.098776...","{'features': [1.0659164036611576e-05, 1.065916...","{'features': [1.8863247709037007e-05, 1.886324...","{'features': [0.0001067916639972405, 0.0001067..."
3,0,"{'features': [1.3095040225638681e-05, 1.309504...","{'features': [1.1307998068474283e-05, 1.130799...","{'features': [9.693039650300355e-06, 9.6930396...","{'features': [9.913957255604304e-06, 9.9139572...","{'features': [1.0626906175571998e-05, 1.062690...","{'features': [1.0663216195539594e-05, 1.066321...","{'features': [1.0532922580253435e-05, 1.053292...","{'features': [1.022682519393662e-05, 1.0226825...","{'features': [9.815446687603108e-06, 9.8154466...","{'features': [9.707331800972813e-06, 9.7073318...",...,"{'features': [1.1240052636101565e-05, 1.124005...","{'features': [1.2320787734091317e-05, 1.232078...","{'features': [1.224700269751644e-05, 1.2247002...","{'features': [1.1916331700340415e-05, 1.191633...","{'features': [1.1204077693770254e-05, 1.120407...","{'features': [1.1367721441462295e-05, 1.136772...","{'features': [1.0987763886778575e-05, 1.098776...","{'features': [1.0659164036611576e-05, 1.065916...","{'features': [1.8863247709037007e-05, 1.886324...","{'features': [0.0001067916639972405, 0.0001067..."
4,0,"{'features': [1.3095040225638681e-05, 1.309504...","{'features': [1.1307998068474283e-05, 1.130799...","{'features': [9.693039650300355e-06, 9.6930396...","{'features': [9.913957255604304e-06, 9.9139572...","{'features': [1.0626906175571998e-05, 1.062690...","{'features': [1.0663216195539594e-05, 1.066321...","{'features': [1.0532922580253435e-05, 1.053292...","{'features': [1.022682519393662e-05, 1.0226825...","{'features': [9.815446687603108e-06, 9.8154466...","{'features': [9.707331800972813e-06, 9.7073318...",...,"{'features': [1.1240052636101565e-05, 1.124005...","{'features': [1.2320787734091317e-05, 1.232078...","{'features': [1.224700269751644e-05, 1.2247002...","{'features': [1.1916331700340415e-05, 1.191633...","{'features': [1.1204077693770254e-05, 1.120407...","{'features': [1.1367721441462295e-05, 1.136772...","{'features': [1.0987763886778575e-05, 1.098776...","{'features': [1.0659164036611576e-05, 1.065916...","{'features': [1.8863247709037007e-05, 1.886324...","{'features': [0.0001067916639972405, 0.0001067..."


The `rec` field contains the recall events generated by the HMM for each subject, similarly represented by the event's temporal posotion in the recall sequence (`'item'`) and topic vectors it comprises (`'features'`).

In [4]:
# The temporal position of each recall event...
egg.get_rec_items().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,0,1,2,3,4,5,6,7,8,9,...,17,18,19,20,21,22,23,24,25,26
Subject,List,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
0,0,0,1,2,3,4,5,6,7,,,...,,,,,,,,,,
1,0,0,1,2,3,4,5,6,7,8.0,9.0,...,,,,,,,,,,
2,0,0,1,2,3,4,5,6,7,8.0,9.0,...,,,,,,,,,,
3,0,0,1,2,3,4,5,6,7,8.0,,...,,,,,,,,,,
4,0,0,1,2,3,4,5,6,7,8.0,9.0,...,,,,,,,,,,


In [5]:
# ...and their corresponding topic vectors
egg.get_rec_features().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,0,1,2,3,4,5,6,7,8,9,...,17,18,19,20,21,22,23,24,25,26
Subject,List,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
0,0,"{'features': [0.00019430992385381985, 0.000194...","{'features': [0.00017027929367231326, 0.000170...","{'features': [0.00017369078885231097, 0.000173...","{'features': [0.00019384831708918682, 0.000193...","{'features': [0.00021172689406944874, 0.000211...","{'features': [0.0001967417938833908, 0.0001967...","{'features': [0.00018739098716436903, 0.000187...","{'features': [0.0007029745321813258, 0.0007029...",{},{},...,{},{},{},{},{},{},{},{},{},{}
1,0,"{'features': [0.0001893882261801681, 0.0001893...","{'features': [0.000147870701282558, 0.00014787...","{'features': [0.00019023353033654396, 0.000190...","{'features': [0.00018166938444643692, 0.000181...","{'features': [0.0001625481095703973, 0.0001625...","{'features': [0.00017324045166665215, 0.000173...","{'features': [0.00020984959154624308, 0.000209...","{'features': [0.00021871502948704968, 0.000218...","{'features': [0.00023880652209349367, 0.000238...","{'features': [0.00020811361546785266, 0.000208...",...,{},{},{},{},{},{},{},{},{},{}
2,0,"{'features': [0.00018590937074552605, 0.000185...","{'features': [0.0002086223458447991, 0.0002086...","{'features': [0.00014376082951536893, 0.000143...","{'features': [0.00017953275349176222, 0.000179...","{'features': [0.0001266373016964342, 0.0001266...","{'features': [0.00012300799059941801, 0.000123...","{'features': [0.00012981062271781097, 0.000129...","{'features': [0.00013602953297943621, 0.000136...","{'features': [0.00014111281486418062, 0.000141...","{'features': [0.00012592147339385905, 0.000125...",...,{},{},{},{},{},{},{},{},{},{}
3,0,"{'features': [0.00018228403991448364, 0.000182...","{'features': [0.00018629081950116504, 0.000186...","{'features': [0.0001622345358076629, 0.0001622...","{'features': [0.00023810735183527187, 0.000238...","{'features': [0.0001552297878347715, 0.0001552...","{'features': [0.00016179442394432278, 0.000161...","{'features': [0.00016889623308560304, 0.000168...","{'features': [0.00019056297121315633, 0.000190...","{'features': [0.0005565476190479086, 0.0005565...",{},...,{},{},{},{},{},{},{},{},{},{}
4,0,"{'features': [0.00025446487845416085, 0.000254...","{'features': [0.00027871701312132265, 0.000278...","{'features': [0.00023918073796144946, 0.000239...","{'features': [0.00017315740792590048, 0.000173...","{'features': [0.0001356024754279286, 0.0001356...","{'features': [9.160956058842264e-05, 9.1609560...","{'features': [9.360224876174755e-05, 9.3602248...","{'features': [0.00013752072250850062, 0.000137...","{'features': [0.0001320162835250347, 0.0001320...","{'features': [0.00014302384767721986, 0.000143...",...,{},{},{},{},{},{},{},{},{},{}


# The `match` key word argument

The `match` kwarg in `egg.analyze` sets the approach for matching a recall event to its corresponding stimulus event.
There are three options: `'exact'`, `'best'`, and `'smooth'`. 

## `exact`

If `match='exact'`, the recall item must be identical to the stimulus to constitute a recall. This is the traditional approach for free recall experiments (either a subject accurately recalled the stimulus item, or did not) but it is not particularly useful with naturalistic data.

## similarity matrices: `best` and `smooth`

For `match='best'` and `match='smooth'`, quail computes a similarity matrix, which we can access in the `recmat` module:

In [6]:
import numpy as np
import seaborn as sns
from quail.analysis.recmat import recall_matrix

recall_matrix(egg, match='best', distance='correlation', features='features')
similarity_matrix = recall_matrix.simmtx

  return np.nanmean(res, 1)


Here is the similarity matrix for a single subject, with stimulus events along the x-axis and recall events on the y-axis. Color corresponds to the level of similarity between stimulus and recall events.

In [7]:
a_single_subject = similarity_matrix[12]
sns.heatmap(a_single_subject, robust=True)

NameError: name 'sns' is not defined

and the across-subject average matrix:

In [None]:
#@andy: include this or no?
similarity_across_subs = np.nanmean(similarity_matrix,axis=0)
sns.heatmap(similarity_across_subs, robust=True)

## `best`

If `match='best'`, each recall event is assigned to the single stimulus whose feature vectors are most similar based on the `distance` kwarg (more on that later).

In [None]:
spc = egg.analyze(analysis='spc', match='best', distance='correlation', features='features')

# Here, each stimulus event is assigned a binary value for each recall event – it either was matched or it was not
spc.data.head()

In [None]:
spc.plot()

## `smooth`

if `match='smooth'`, quail computes a weighted average across all stimulus events for each recall event, where the weights are derived from similarity between the stimulus and recall.

In [None]:
#@andy something's wrong here...
spc = egg.analyze(analysis='spc', match='smooth', distance='correlation', features='features')

# Here, each stimulus event was assigned a value for every recall event based on the similarity of the two
spc.data.head()

In [None]:
spc.plot()

# The `distance` key word argument

The `distance` kwarg assigns the distance formula quail will use to compute similarity between stimulus and recall events. The examples above used the correlation coefficient as a measure of similarity (`distance='correlation'`). Alternatively, `distance='euclidean'` (default) can be used to compute similarity based on Euclidean distance. 

In [None]:
spc = egg.analyze(analysis='spc', match='smooth', distance='euclidean', features='features')
spc.plot()

## more options

Quail supports any similarity matric supported by `scipy.spatial.distance.cdist`. A complete list is available [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html)

As discussed in the Egg tutorial, custom distance functions can be added to `Egg` objects, both during and after instantiating. Once added, custom distance functions are stored in `egg.dist_funcs`. 

# The `features` key word argument

The `features` kwarg tells quail which features to consider when computing distance. This can be a single feature passed as a string, multiple features passed as a list, or all available features (`features=None`; default). 

In the above examples, we passed the `features` argument the array of topic vectors for each each stimulus and recall event (`features`). 
What if we hadn't?

In [None]:
recall_matrix(egg, match='smooth', distance='euclidean')
similarity_matrix = recall_matrix.simmtx
similarity_across_subs = np.nanmean(similarity_matrix,axis=0)
sns.heatmap(similarity_across_subs, robust=True)

In [None]:
spc = egg.analyze(analysis='spc', match='smooth', distance='euclidean')
spc.plot()

This definitely doesn't look right...let's look at the contents of our feature dictionaries

In [None]:
egg.feature_names

There's another feature, `'temporal'` that contains the presentation position for the stimulus data and the recall position for the recall data. Since these are just indices of the events from 0 to n, they're perfectly related, and will confound our analyses!