# Assignment 2: Memory Task List Generation
## Computational Methods in Psychology and Neuroscience
### Psychology 4215/7215 --- Fall 2023

# Objectives

Upon completion of this assignment, the student will have:

1. Read in a stimulus pool from a file.

2. Created unique trial conditions with sequential constraints.

3. Generated randomized lists to use in a recognition experiment.


# Assignment

* Write code in a Jupyter notebook (after making a copy and renaming it to have your userid in the title --- e.g., A02_Memory_ListGen_mst3k).

## Design

Your assignment is to write a script that creates lists of dictionaries that you will later present to participants as part of an experiment.  

The script should be configurable such that you can specify different
numbers of lists and trials, along with other details specific to the
experiment you decide to do.

Each dictionary represents a trial and should contain all the
information necessary to identify the stimulus to be presented,
details about that stimulus, and the condition in which to present it.
This information will be experiment-specific, as outlined below.

You have two options for your experiment.  Please select **one** of
the following experiments, keeping in mind that your next assignment
will be to code the experiment presentation and response collection
for the lists you generate from this assignment.
  
* ***When you are done, make sure you have run every cell, so that we can see it ran without error and produces the correct output. Then please save the notebook as HTML (`File -> Download as -> HTML`) and upload it to the matching assignment on Canvas.***  

## Option 1: Refreshing Valence Study

The main question of this study is whether recognition memory for
words depends on the emotional or affective valence of those words and whether there is an interaction between attention refreshing and valence.

Participants will study lists of positive (+), negative (-), and
neutral (~) words and then, after a short delay, they will be given a
recognition test over all the studied target words plus a matched set
of non-studied lures.  The stimuli are contained in three separate CSV
files:

- [Positive Pool](./pos_pool.csv)
- [Negative Pool](./neg_pool.csv)
- [Neutral Pool](./neu_pool.csv)

You will need to read these files in as lists of dictionaries (hint,
use the ``DictReader`` from the ``csv`` module that was covered in
class.)  

Use these pools to create lists with trials of valence crossed with three experimental conditions:

1. *Repeated*: Where a word will be immediately repeated as the next word.
2. *Refreshed*: Where you will indicate the participant should "refresh" the previous word by presenting a "+".
3. *Once-presented*: Where a word is only presented once and is repeated or refreshed.

We suggest that you generate the study items for a list in two stages. In the first stage you shuffle all combinations of the trial types (Valence crossed with Condition). In the second stage you loop over those conditions and append trials to a block depending on the information in each trial type. For the Repeated and Refreshed you would append two items, for the Once-presented you would only append one.

You will need to generate a matching test list for each study list
that includes all the studied items, plus a set of lures that match
the valence of the studied words.

Be sure to add in information to each trial dictionary that identifies
the word, its valence, the condition of that trial, and whether it is a
target or a lure.  Feel free to add in more information if you would
like.

## Option 2: Spacing Scene Study

This study will test whether recognition memory for outdoor and outdoor
scenes is modulated by whether the items are once-presented, repeated immediately following the first presentation of the item (i.e., massed repetition), or repeated after a number of other items (i.e., spaced repetition). The participants will then be given a
recognition test over all the studied target images plus a matched set
of non-studied lures.  You can access the lists of stimuli available:

- [outdoor Pool](./indoor.csv)
- [Outdoor Pool](./outdoor.csv)

You will need to read these files in as lists of dictionaries (hint,
use the ``DictReader`` from the ``csv`` module that was covered in
class.)  For the actual experiment we will give you the images that
are referenced by the file names in these pools, but for the list
generation you do not need the images, themselves and should identify
the image you will be presenting using the file name.  

Use these pools to create lists of trials for the experimental conditions consiting of indoor/outdoor vs once-presented/massed/spaced items. Each
list should contain an equal number of each combination of these conditions in *random* order, but handling the spaced items will take some care. 

While the massed items come immediately after the first time the item was presented, the spaced repetitions need to come at a range of 3 to 7 (though this should be a configuration variable) items following the first presentation of the matching item. We will provide some suggestions for how to attain this structure in class discussions, but generally following a two-stage approach of shuffling all possible conditions first and then filling in specific items will work best. *Note, you can not have a spaced item condition in the last two slots on the list because it would not be possible to have the repetition be spaced.*

You will need to generate a matching test list for each study list
that includes all the studied items, plus a set of lures that match
the image categories from the studied items.

Be sure to add in information to each trial dictionary that identifies
the file name, the category of the image, the condition of the trial,
and whether it is a target or a lure.


# My work

## Outline and test code

Study phase
- Should be randomized indoor outdoor
- Conditions: 
    - How to balance number of items in each condition (item specific), vs number of instances of each condition (item agnostic)
        - These will be generated for each condition
    - Once presented
        - simple just present the item once
        - do this one last cause it can go anywhere
        - e.g. `{"pool":"indoor", "type":"1p", "reps":0, "distances":[None], "placement":[0]}`
    - massed repeated
        - take up rep number of slots 
        - all slots are contiguous
        - should have param for distribution of number of repetitions=default to one level
        - e.g. `{"pool":"indoor", "type":"massed-rep", "reps":3, "distances":[1,1], "placement":[0,1,2]}`
    - spaced repeated
        - take up rep number of slots
        - slots are not continuous
        - should have param for distribution of number of repetitions=default to one level
        - should have param for distance of repetitions=default to one level
        - e.g. `{"pool":"indoor", "type":"spaced-rep", "reps":3, "distances":[3,3], "placement":[0,3,6]}`

Test phase
- Should be randomized between indoor outdoor
- Should have varied time from study test (ie early middle late items)
- 2fac or old/new?

Check my code
- summary stats
    - number of each condition (in/out)
    - number of condition items (1p, massed rep, spaced rep)
    - number of condition trials (1p, massed rep, spaced rep)
- for each trial check numbers add up (ie image is used 3 times if it said so)


## Importing and Logging

In [2]:
import numpy as np  # for classic array stuff
import csv          # for reading in files
import random       # for shuffling lists
import pandas as pd # for tables
import logging      # practice doing good stuff
from copy import deepcopy # for fixing terrible bugs

logging_level = logging.WARNING
logging.basicConfig(format='%(levelname)s (%(asctime)s): %(message)s (Line: %(lineno)d [%(filename)s])',
                    datefmt='%Y-%m-%d %H:%M:%S',
                    level=logging_level)

## Condition creation

This section creats the counterbalanced trial conditions

In [3]:
# Make the set of possible conditions
# Output should be list of dictionaries like the examples below
# {"pool":"indoor", "type":"1p", "reps":0, "distances":[None], "placement":[0]}
# {"pool":"indoor", "type":"massed-rep", "reps":3, "distances":[1,1], "placement":[0,1,2]}
# {"pool":"indoor", "type":"spaced-rep", "reps":3, "distances":[3,3], "placement":[0,3,6]}

POOLS = ["indoor", "outdoor"] # for the stimulus types
CONDITION_TYPES = ["1p", "massed-rep", "spaced-rep"] # for the type of presentation
REP_TYPES = [2] # for the number of repetitions
DISTANCE_TYPES = [np.arange(3,7),]  # for the lists to randomly pull the distance from

conditions = []
for pool in POOLS:
    for condition_type in CONDITION_TYPES:
        if condition_type=="1p":
            # do stuff
            conditions.append({"pool":pool, "type":condition_type, "reps":1, "distances":None, "placement":np.array([0])})
        else:
            for reps in REP_TYPES:
                placements = np.arange(reps)
                if condition_type=="massed-rep":
                    distances = np.diff(placements)
                    conditions.append({"pool":pool, "type":condition_type, "reps":reps, "distances":distances, "placement":placements})

                elif condition_type=="spaced-rep":
                    # If a list then it picks a random one. 
                    for dist in DISTANCE_TYPES:
                        conditions.append({"pool":pool, "type":condition_type, "reps":reps, "distances":dist.copy(), "placement":placements})

print(f"There are {len(conditions)} conditions")
for c in conditions:
    print(c)

There are 6 conditions
{'pool': 'indoor', 'type': '1p', 'reps': 1, 'distances': None, 'placement': array([0])}
{'pool': 'indoor', 'type': 'massed-rep', 'reps': 2, 'distances': array([1]), 'placement': array([0, 1])}
{'pool': 'indoor', 'type': 'spaced-rep', 'reps': 2, 'distances': array([3, 4, 5, 6]), 'placement': array([0, 1])}
{'pool': 'outdoor', 'type': '1p', 'reps': 1, 'distances': None, 'placement': array([0])}
{'pool': 'outdoor', 'type': 'massed-rep', 'reps': 2, 'distances': array([1]), 'placement': array([0, 1])}
{'pool': 'outdoor', 'type': 'spaced-rep', 'reps': 2, 'distances': array([3, 4, 5, 6]), 'placement': array([0, 1])}


## Trial set creation

This section takes the conditions and turns it into a list of trials with thier specific information (i.e. id, spacing for the spaced reps, and image filename)

### Read in all the images

In [4]:
# Get image filenames and shuffle them
# create a dictionary reader
indoor_reader = csv.DictReader(open('indoor.csv','r'))
# read in all the lines into a list of dicts
images_indoor = [l for l in indoor_reader]
# create a dictionary reader
outdoor_reader = csv.DictReader(open('outdoor.csv','r'))
# read in all the lines into a list of dicts
images_outdoor = [l for l in outdoor_reader]

# shuffle the images
random.shuffle(images_indoor)
random.shuffle(images_outdoor)

### Create the bare trial set

No additional information but this is where we set the trial number

In [5]:
# Create set of trials
NUMBER_OF_TRIALS = None
if not NUMBER_OF_TRIALS:
    # Figure out the number of times we can repeat all the conditions evenly with the given images 
    # Even though the images are for only half the conditions they also need to be used for the test lures
    indoor_reps = int(np.floor(len(images_indoor)/(len(conditions))))
    outdoor_reps = int(np.floor(len(images_outdoor)/(len(conditions))))
    condition_reps = np.min((indoor_reps, outdoor_reps))
else:
    if NUMBER_OF_TRIALS % len(conditions) != 0:
        logging.warning("Number of trials will result in imperfect condition balancing")
    condition_reps = np.ceil(NUMBER_OF_TRIALS/len(conditions), dtype=int)

# initialize the trial set
trial_set = []
logging.debug(f"Conditions are repeated {condition_reps} times")
for i in range(condition_reps):
    for condition in conditions:
        trial_set.append(condition.copy())
trial_set = trial_set[:NUMBER_OF_TRIALS] # this trims off the extra if needed

### Go through trial set

Here is where I add all the trial specific information

In [6]:
# add id, spacing information, and image file names
for i, trial in enumerate(trial_set):
    # add id to trial
    trial['id'] = i
    
    # add spacing information
    if trial['type']=='spaced-rep':
        dist_choice = random.choice(trial['distances'])
        logging.debug(dist_choice)
        trial['placement'] = dist_choice * trial['placement'].copy() # yet again the pointer strikes back
        trial['distances'] = np.diff(trial['placement'])

    # add image to each item
    if trial["pool"]=="indoor":
        assert len(images_indoor)>0, "Not enough indoor images"
        trial["image_filename"] = images_indoor[0]["filename"]
        images_indoor.pop(0)
    elif trial["pool"]=="outdoor":
        assert len(images_outdoor)>0, "Not enough outdoor images"
        trial["image_filename"] = images_outdoor[0]["filename"]
        images_outdoor.pop(0)
    else:
        print("invalid pool type")

# trial_set now has file names for each condition
print(f"There are {len(trial_set)} trials")
for trial in trial_set:
    print(trial)

There are 306 trials
{'pool': 'indoor', 'type': '1p', 'reps': 1, 'distances': None, 'placement': array([0]), 'id': 0, 'image_filename': 'in0200.jpg'}
{'pool': 'indoor', 'type': 'massed-rep', 'reps': 2, 'distances': array([1]), 'placement': array([0, 1]), 'id': 1, 'image_filename': 'in0109.jpg'}
{'pool': 'indoor', 'type': 'spaced-rep', 'reps': 2, 'distances': array([5]), 'placement': array([0, 5]), 'id': 2, 'image_filename': 'in0033.jpg'}
{'pool': 'outdoor', 'type': '1p', 'reps': 1, 'distances': None, 'placement': array([0]), 'id': 3, 'image_filename': 'out2184.jpg'}
{'pool': 'outdoor', 'type': 'massed-rep', 'reps': 2, 'distances': array([1]), 'placement': array([0, 1]), 'id': 4, 'image_filename': 'out1386.jpg'}
{'pool': 'outdoor', 'type': 'spaced-rep', 'reps': 2, 'distances': array([5]), 'placement': array([0, 5]), 'id': 5, 'image_filename': 'out0010.jpg'}
{'pool': 'indoor', 'type': '1p', 'reps': 1, 'distances': None, 'placement': array([0]), 'id': 6, 'image_filename': 'in0086.jpg'}
{'

## Fitting functions

Here I define several functions that will be used to create the final study list

In [7]:
def check_placement(working_list, trial, placement):
    """
    Check if a specific trial can be placed at a given location in a list
    ------
    INPUTS
        working_list: a list with None in all empty/available slots
        trial: a dictionary with a relative array of repetitions
        placement: an index for where 
    OUTPUTS
        a boolean True or False
    """
    # The only reason this is a separate function is so that I can return out of for loops
    locations = trial['placement']+placement
    
    for location in locations:
        try:
            if not working_list[location]==None:
                # if there is no spot at this location, then this trial doesn't work here
                return False
        except IndexError:
            # if this location doesn't exist then this trial doesn't work
            return False
    # only if all locations fit is it true
    return True

def find_placement(working_list, trial):
    """
    Finds all the indices of 'working_list' where 'trial' could be placed
    ------
    INPUTS
        working_list: a list with None in all empty/available slots
        trial: a dictionary with a relative array of repetitions
    OUTPUTS
        good_inds: a list of indices where this trial can be placed in working list
    """
    proposal_indices = [index for index,value in enumerate(working_list) if value == None]
    good_inds = [] # list to store which of the proposal indices work for this trial
    for proposal in proposal_indices:
        if check_placement(working_list,trial=trial, placement=proposal):
            good_inds.append(proposal)    

    return good_inds

def place_trial_in_list(working_list, trial, proposal):
    """
    Place a trial into the working list at the given location
    ------
    INPUTS
        working_list: a list with None in all empty/available slots, and trials in the other
        trial: a dictionary with a relative array of repetitions
        proposal: an index of working list to place the first repetition of the trial
    OUTPUTS
        working_list: same as input but now with the trial and its possible repetitions added
    """
    trial['placement'] = trial['placement'] + proposal
    for i, placement in enumerate(trial['placement']):
        working_list[placement] = trial.copy()
        working_list[placement]['repetition'] = i
        working_list[placement]['location'] = placement

    return working_list
        
def fit_trials_in_list(working_list, trial_set, level=0):
    """
    Takes a working_list and fit all the trials in trial_set into it. 
    This works by calling itself after each proposal until all the trials have been used up.
    ------
    INPUTS
        working_list: a list with None in all empty/available slots, and trials in the other
        trial_set: a list of trials (dictionaries) that need to be placed in the working list
        level: an int describing the level/depth of recursions (used for debugging)
    OUTPUTS
        temp_list: a solution to fitting the trial_set in the working_list
        OR
        None: if there is no way to fit the trial_set in the working_list
    """

    logging.debug(f"starting loops with {len(trial_set)} trials")
    if len(trial_set)<1: return working_list
    proposal_inds = find_placement(working_list, trial_set[0])
    logging.debug(f"starting loops with {len(proposal_inds)} proposals")
    
    if len(proposal_inds)>1:
        random.shuffle(proposal_inds) # don't want to try the trials in order

        temp_list = working_list.copy()
        for proposal in proposal_inds:
            # attempt a fit
            place_trial_in_list(temp_list, trial_set[0], proposal)
            # Check if that fit works for the rest of the trials
            temp_list = fit_trials_in_list(temp_list, trial_set[1:], level=level+1)
            if temp_list: 
                return temp_list # it works so let's use it
            else: 
                temp_list = working_list.copy() # it doesn't work so keep going
        
        # this should only happen if all proposals don't work for future trials
        return None
    
    elif len(proposal_inds)==1:
        temp_list = working_list.copy()
        # attempt a fit
        place_trial_in_list(temp_list, trial_set[0], proposal_inds[0])
        
        # Check if that fit works for the rest of the trials
        # But only if there are future trials
        if len(trial_set)>1:
            temp_list = fit_trials_in_list(temp_list, trial_set[1:], level=level+1)
            if temp_list: 
                # it works so let's use it
                return temp_list 
            else: 
                # no good future fits
                return None 
        # Only one trial left == nearly done!    
        elif len(trial_set)==1:
            logging.debug("Last trial placed!")
            return temp_list

    elif len(proposal_inds)<1:
        # only happens if there are trials left (otherwise it would have returned in the ==1 condition)
        logging.warning(f"No good proposals; recursion level {level}")
        return None
    
def complete_list_gen(trial_set, conditions):
    """ 
    Takes a trial set and turns it into a list of stimuli(+metadata) to present 
    ------
    INPUTS
        trial_set: a list of dictionaries of trials that should be experienced 
        conditions: a list of condition types(str) in order of most constrained to least constrained
    OUTPUT
        final_list: a list of trials in order to be presented
    """ 
    # first create null list
    trial_df = pd.DataFrame(trial_set)
    null_list = [None] * trial_df["reps"].sum()
    logging.debug(f"Null list is {len(null_list)} long")

    trial_df = trial_df.sample(frac=1) # randomize the trials

    # sort list from most constrained to least constrained
    sorted_trials = []
    for condition in conditions:
        df = trial_df[trial_df["type"]==condition]
        sorted_trials += df.to_dict('records')

    logging.debug(f"sorted trials is {len(sorted_trials)} long")
    
    # run the fitting process
    final_list = fit_trials_in_list(null_list, sorted_trials)

    return final_list

## Create list

In [8]:
study_list = complete_list_gen(trial_set, ['spaced-rep', 'massed-rep', '1p'])

In [9]:
for trial in study_list:
    print(trial)

{'pool': 'indoor', 'type': '1p', 'reps': 1, 'distances': None, 'placement': array([0]), 'id': 162, 'image_filename': 'in0120.jpg', 'repetition': 0, 'location': 0}
{'pool': 'outdoor', 'type': 'massed-rep', 'reps': 2, 'distances': array([1]), 'placement': array([1, 2]), 'id': 166, 'image_filename': 'out0120_new.jpg', 'repetition': 0, 'location': 1}
{'pool': 'outdoor', 'type': 'massed-rep', 'reps': 2, 'distances': array([1]), 'placement': array([1, 2]), 'id': 166, 'image_filename': 'out0120_new.jpg', 'repetition': 1, 'location': 2}
{'pool': 'indoor', 'type': 'massed-rep', 'reps': 2, 'distances': array([1]), 'placement': array([3, 4]), 'id': 139, 'image_filename': 'in0327.jpg', 'repetition': 0, 'location': 3}
{'pool': 'indoor', 'type': 'massed-rep', 'reps': 2, 'distances': array([1]), 'placement': array([3, 4]), 'id': 139, 'image_filename': 'in0327.jpg', 'repetition': 1, 'location': 4}
{'pool': 'outdoor', 'type': '1p', 'reps': 1, 'distances': None, 'placement': array([5]), 'id': 201, 'imag

## Test list

Take the study list
get all the unique trial ids in order
split into first third, second thrid, last third
shuffle so that testing first third has all times

create test lures images paird for indoor/outdoor amount
if 2afc:
    pair each study with test
if old/new:
    create test list by picking from study or test list randomly until they are all done


In [12]:
TEST_LENGTH = 200
OLD_NEW_PROP = 0.5

# Make sure enth and prop make sense

if not float(TEST_LENGTH * OLD_NEW_PROP).is_integer():
    logging.warning(f"Exact prop ({OLD_NEW_PROP}) is not possible")
    old_items = np.round(TEST_LENGTH * OLD_NEW_PROP, 0) 
    OLD_NEW_PROP = old_items/TEST_LENGTH
    logging.warning(f"Prop is corrected to:{OLD_NEW_PROP}")

n_old_items = int(TEST_LENGTH * OLD_NEW_PROP)
n_new_items = int(TEST_LENGTH-n_old_items)

assert n_old_items <= len(trial_set), f"Not enough study trials: required={n_old_items}; availible={len(trial_set)}"

if not n_old_items/(n_old_items+n_new_items) == OLD_NEW_PROP:
    logging.critical(f"Prop is not as expected; expected={OLD_NEW_PROP}, actual={n_old_items/(n_old_items+n_new_items)}")

# generate list
olds = [True] * n_old_items
news = [False] * n_new_items
old_new_order = random.shuffle(olds+news)

### Get all unique trials

In [13]:
unique_trials = []
unique_ids = []
for trial in study_list:
    if not trial['id'] in unique_ids:
        unique_trials.append(trial)
        unique_ids.append(trial['id'])


### Create shufffled study items

In [15]:
time_blocks = 3 # how course to counter balance delay
block_size = int(len(unique_trials)/time_blocks) 
shufled_balanced_study = []
for i in range(time_blocks-1): # the last block is special
    block_ind = i*block_size
    # Shufffle order
    shufl_block= random.shuffle(unique_trials[block_ind:block_ind+block_size].copy())
    shufled_balanced_study.append(shufl_block)
i +=1
block_ind = i*block_size
shufl_block= random.shuffle(unique_trials[block_ind:].copy()) # don't want an indx err
shufled_balanced_study.append(shufl_block)


In [17]:
random.shuffle(unique_trials[block_ind:].copy())

### Shufle in lures

In [16]:
shufled_balanced_study

[None, None, None]

In [None]:
test_list = []
for i, old in enumerate(old_new_order):
    if old:
        
        
        old_trial = shufled_balanced_study[0]
        old_trial['old'] = old
        test_list.append(shufled_balanced_study[0])
        shufled_balanced_study.pop(0)
    else:
