# Basic analyzing and plotting

This tutorial will go over the basics of analyzing `eggs`, the primary data structure used in `quail`. To learn about how an egg is set up, see the egg tutorial.

An egg is made up of (at minimum) the words presented to a subject and the words recalled by the subject.  With these, two components we can perform a number of analyses:

1. **Recall Accuracy** - the proportion of words presented that were later recalled
2. **Serial Position Curve** - recall accuracy as a function of the encoding position of the word
3. **Probability of First Recall** - the probability that a word will be recalled first as a function of its encoding position
4. **Lag-CRP** - given the recall of word n, the probability of recalling words at neighboring positions (n+/-1, 2, 3 etc).

If we have a set of features for the stimuli, we can also compute a **Memory Fingerprint**, which is an estimate of how a subject clusters their recall responses with respect to features of a stimulus (see the fingerprint tutorial for more on this).

Let's get to analyzing some `eggs`.  First, we'll load in some example data: 

In [None]:
import quail
egg = quail.load_example_data()

This dataset is comprised of 19 subjects, who each performed 16 study/test blocks of 16 words each.  Here are some of the presented words:

In [None]:
egg.pres.head()

and some of the recalled words:

In [None]:
egg.rec.head()

We can start with the simplest analysis - recall accuracy - which is just the proportion of words recalled that were in the encoding lists.  To compute accuracy, simply call the `analyze` function, with the `analysis` key word argument set to `accuracy`:

## Recall Accuracy

In [None]:
accuracy = quail.analyze(egg, analysis='accuracy')
accuracy.head()

The result is a multi-index Pandas DataFrame where the first-level index is the subject identifier and the second level index is the list number.  By default, note that each list is analyzed separately.  However, you can easily return the average over lists using the `listgroup` kew word argument:

In [None]:
accuracy_avg = quail.analyze(egg, analysis='accuracy', listgroup=['average']*16)
accuracy_avg.head()

Now, the result is a single value for each subject representing the average accuracy across the 16 lists. The `listgroup` kwarg can also be used to do some fancier groupings, like splitting the data into the first and second half of the experiment:

In [None]:
accuracy_split = quail.analyze(egg, analysis='accuracy', listgroup=['First Half']*8+['Second Half']*8)
accuracy_split.head()

These analysis results can be passed directly into the plot function like so:

In [None]:
ax = quail.plot(accuracy_split)

For more details on plotting, see the plot tutorial. Next, lets take a look at the serial position curve analysis.  As stated above the serial position curve (or spc) computes recall accuracy as a function of the encoding position of the word.  To use it, use the same `analyze` function illustrated above, but set the `analysis` kwarg to `spc`.  Let's also average across lists within subject:

## Serial Position Curve

In [None]:
spc = quail.analyze(egg, analysis='spc', listgroup=['average']*16)
spc.head()

The result is a df where each row is a subject and each column is the encoding position of the word. To plot, simply pass the result of the analysis function to the plot function:

In [None]:
ax = quail.plot(spc)

## Probability of First Recall

The next analysis we'll take a look at is the probability of first recall, which is the probability that a word will be recalled first as a function of its encoding position. To compute this, call the `analyze` function with the `analysis` kwarg set to `pfr`. Again, we'll average over lists:

In [None]:
pfr = quail.analyze(egg, analysis='pfr', listgroup=['average']*16)
pfr.head()

This df is set up just like the serial position curve.  To plot:

In [None]:
ax = quail.plot(pfr)

## Lag-CRP

The next analysis to consider is the lag-CRP, which again is a function that given the recall of word n, returns the probability of recalling words at neighboring positions (n+/-1, 2, 3 etc). To use it? You guessed it: call the `analyze` function with the `analysis` kwarg set to `lagcrp`:

In [None]:
lagcrp = quail.analyze(egg, analysis='lagcrp', listgroup=['average']*16)
lagcrp.head()

Unlike the previous two analyses, the result of this analysis returns a df where the number of columns are double the length of the lists.  To view the results:

In [None]:
ax= quail.plot(lagcrp)

## Memory Fingerprint

Last but not least is the memory fingerprint analysis.  For a detailed treatment of this analysis, see the fingerprint tutorial.

As described in the fingerprint tutorial, the `features` data structure is used to estimate how subjects cluster their recall responses with respect to the features of the encoded stimuli.  Briefly, these estimates are derived by computing the similarity of neighboring recall words along each feature dimension.  For example, if you recall "dog", and then the next word you recall is "cat", your clustering by category score would increase because the two recalled words are in the same category.  Similarly, if after you recall "cat" you recall the word "can", your clustering by starting letter score would increase, since both words share the first letter "c".  This logic can be extended to any number of feature dimensions.

To use this analysis function, you'll need to include a `features` field when you create your `egg`.  Our example data has this field included.  For more info on how to create this field, see the egg and fingerprint tutorials.

Here is a glimpse of the features df:

In [None]:
egg.features.head()

Like the other analyses, computing the memory fingerprint can be done using the `analyze` function with the `analysis` kwarg set to `fingerprint`:

In [None]:
fingerprint = quail.analyze(egg, analysis='fingerprint', listgroup=['average']*16)
fingerprint.head()

The result of this analysis is a df, where each row is a subject's fingerprint and each column is a feature dimensions.  The values represent a subjects tendency to cluster their recall responses along a particular feature dimensions.  They are probability values, and thus, greater values indicate more clustering along that feature dimension.  To plot, simply pass the result to the plot function:

In [None]:
ax = quail.plot(fingerprint)

This result suggests that subjects in this example dataset tended to cluster their recall responses by category as well as the size (bigger or smaller than a shoebox) of the word.