# Assignment 7: D-Prime Plot
## Computational Methods in Psychology (and Neuroscience)
### Psychology 4215/7215 --- Fall 2021


# Objectives

Upon completion of this assignment, students will have:

1. Read in all the recognition memory data
2. Performed some simple data clean-up (code provided)
3. Calculated d-prime for the word recognition task
4. Plotted d-prime as a function of valence and condition

# Assignment

* Write code in a Jupyter notebook (after making a copy and renaming it to have your userid in the title --- e.g., A07_DPrime_Plot_mst3k).


## Details

Below is code that will load in the data from the word recognition memory experiment. As long as you have updated this repository from GitHub and unzipped the `recog_dat.zip` file in the `lessons` directory, the code should work unchanged to load in the data, create two data frames, and perform some minor clean-up of the data.

Your task is to calculate d-prime for the word recognition data and then plot the result as a function of valence (negative, neutral, positive) and condition (mixed and pure).

All the code you need to perform this analysis is in the most recent lesson notebook. You will need to identify the correct pieces of code to copy into this notebook and how to modify it to examine valence as opposed to image location. 

We have some code below to help you get started reading in the data, so that you can focus on the d-prime calculation and plot.

* ***When you are done, save this notebook as HTML (`File -> Download as -> HTML`) and upload it to the matching assignment on UVACollab.***  

# New library to install

You're going to need a new plotting library, so run this line at your Anaconda Prompt/Terminal:

`conda install -c conda-forge plotnine` 

## General Imports

In [1]:
# import some useful libraries
import numpy as np                # numerical analysis linear algebra
import pandas as pd               # efficient tables
import matplotlib.pyplot as plt   # plotting
import plotnine as pn 
import scipy.stats.distributions as dists     # probability distributions
from scipy import stats
from glob import glob
import os

from smile.log import log2dl

from ci_within import ci_within

## Custom SLOG loading function

In [2]:
# custom function to load slogs
def load_all_subj_logs(data_dir, log_file):
    # load in a list of all the subj
    subjs = [os.path.splitext(os.path.split(filepath)[1])[0].split('_')[0]
             for filepath in glob(os.path.join(data_dir, '*' + log_file + '.slog'))]
    subjs.sort()
    #print(subjs)
    # loop over subj and their data
    all_dat = []
    for subj in subjs:
        # set the file
        log_path = os.path.join(data_dir, subj+'_'+log_file+'.slog')
        print(log_path)

        # load the data
        try:
            all_dat.extend(log2dl(log_path, subj=subj))
        except:
            pass

    df = pd.DataFrame(all_dat)
    
    return df

## Load in all the data

In [3]:
# load the data from the word recog task
task_dir = os.path.join('..', 'lessons', 'recog_dat')

df_w = load_all_subj_logs(task_dir, 'log_word_test')
df_w.head()

../lessons/recog_dat/s000_log_word_test.slog
../lessons/recog_dat/s001_log_word_test.slog
../lessons/recog_dat/s002_log_word_test.slog
../lessons/recog_dat/s003_log_word_test.slog
../lessons/recog_dat/s004_log_word_test.slog
../lessons/recog_dat/s005_log_word_test.slog
../lessons/recog_dat/s006_log_word_test.slog
../lessons/recog_dat/s007_log_word_test.slog
../lessons/recog_dat/s008_log_word_test.slog
../lessons/recog_dat/s009_log_word_test.slog
../lessons/recog_dat/s010_log_word_test.slog
../lessons/recog_dat/s011_log_word_test.slog
../lessons/recog_dat/s012_log_word_test.slog


Unnamed: 0,resp_map_target,resp_map_lure,block_num,trial_num,stim_on_time,stim_on_error,resp,resp_time_time,resp_time_error,rt,...,valence_sd,arousal_mean,arousal_sd,dominance_mean,dominance_sd,word_frequency,novelty,cond,subj,log_num
0,F,J,0,0,101.188904,0.0,J,101.82801,0.000163,0.639106,...,1.66,5.76,2.5,6.25,2.1,75,lure,mixed,s000,0
1,F,J,0,1,102.763942,0.0,F,103.238753,0.000167,0.474811,...,1.76,4.51,2.68,2.95,2.12,25,target,mixed,s000,0
2,F,J,0,2,104.222439,0.0,J,104.701179,0.000175,0.47874,...,2.16,6.07,2.15,4.17,2.4,12,lure,mixed,s000,0
3,F,J,0,3,105.26712,0.0,J,105.840203,0.000205,0.573082,...,2.01,5.5,2.4,4.0,2.6,1,lure,mixed,s000,0
4,F,J,0,4,106.67312,0.0,F,107.047914,0.000165,0.374794,...,1.92,5.57,2.8,6.39,1.72,4,target,mixed,s000,0


## Some data clean-up

In [4]:
# add in log_rt columns
df_w['log_rt'] = np.log(df_w['rt'])

# must make correct an int
df_w['correct'] = df_w['correct'].astype(int)

# add in a column for whether they made an 'old' response
df_w['old_resp'] = (df_w['resp_map_target'] == df_w['resp']).astype(int)

## Calculating sensitivity

- Under assumptions of equal variance for both the signal and noise distributions, the d' (d-prime) is the measure of sensitivity

$$d' = ((\mu + \alpha) - \mu) / \sigma$$
$$d' = \alpha / \sigma$$

- Thus, $d'$ is the difference between the two distributions in units of the standard deviation
- Note, this is independent of the criterion


In [5]:
def calc_dprime(n_hits, n_targets, n_false_alarms, n_lures):
    # calculate corrected hit rate and false alarm rate (to avoid zeros)
    hr_trans = (n_hits+.5)/(n_targets+1)
    far_trans = (n_false_alarms+.5)/(n_lures+1)
    
    # calculate dprime
    Z = dists.norm.ppf
    dprime = Z(hr_trans) - Z(far_trans)
    return dprime

# Your code goes below here

All code above should work without modification.

In [6]:
# use the agg method to get the counts

In [7]:
# collapse the multi-index


In [8]:
# use apply to add the dprime as a new column (axis=1 tells it to go by row)

In [9]:
# use ci_within to calcuate the mean and confidence interval of d-prime

In [10]:
# use plotnine to plot dprime as a function of condition, with a fill-color defined by valence
# be sure to label your axes correctly and add the confidence interval with error bars