# File I/O and Experimental Design
## Computational Methods in Psychology (and Neuroscience)
### Psychology 4500/7559 --- Fall 2020
By: Per B. Sederberg, PhD



# Lesson Objectives

Upon completion of this lesson, students should have learned:

1. Read and write basic text files

2. Read and write CSV files

3. Know how to pickle objects

then:

4. Fundamentals of experiment design

5. The link between science and coding

6. Dependent vs. Independent variables

7. Constraints on list structure

8. How to make a simple list of dictionaries to define trials


## Sorting data in files

- Say we have some numbers in a file:

In [10]:
!more spaced_numbers.txt

88 71 42 18 17 88 41 3 13 7 86 85 8 25 57 55 100 43 14 73 


- Let's read them in, sort them, and write them back out sorted!

## Reading from files

* Since these numbers are all on one line, we just have to read one
  line in:


In [22]:
# you can open a file for reading, writing, or appending
f = open('spaced_numbers.txt', 'r')

# Read one line in
line = f.readline()

# print what we read in
print(line)

# close the file
f.close()

42 25 98 98 58 54 31 73 0 51 32 34 63 48 29 46 86 51 97 32 



## Files are objects, too!

* You can see that `f` is a file object with methods:


In [23]:
# print out all non-hidden attributes and methods
print([x for x in dir(f) if x[0]!='_'])

['buffer', 'close', 'closed', 'detach', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'line_buffering', 'mode', 'name', 'newlines', 'read', 'readable', 'readline', 'readlines', 'reconfigure', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'write_through', 'writelines']


## Parsing the numbers

* We need to turn our big string into a list of numbers:


In [24]:
line

'42 25 98 98 58 54 31 73 0 51 32 34 63 48 29 46 86 51 97 32 \n'

* First we can use ``strip`` to pull off the trailing ``newline``.

In [25]:
line.strip()

'42 25 98 98 58 54 31 73 0 51 32 34 63 48 29 46 86 51 97 32'

* We can combine that with split to make a list of numbers

In [30]:
# note how you can apply the strip and split right after one another
# that's because strip returns a string
print(line.strip().split(' '))

['42', '25', '98', '98', '58', '54', '31', '73', '0', '51', '32', '34', '63', '48', '29', '46', '86', '51', '97', '32']


## Convert to numbers

* Now we have a list of strings, but we want numbers.

* We could loop over each item in that list with a for loop, creating a new list.

In [33]:
ints = []
for s in line.strip().split(' '):
    ints.append(int(s))

* Or, we can use a list comprehension to convert it in one line :)

In [34]:
ints = [int(s) for s in line.strip().split(' ')]
print(ints)

[42, 25, 98, 98, 58, 54, 31, 73, 0, 51, 32, 34, 63, 48, 29, 46, 86, 51, 97, 32]


## Sorting things out

* Now that we have a list, sorting is easy :)


In [35]:
ints.sort()
print(ints)

[0, 25, 29, 31, 32, 32, 34, 42, 46, 48, 51, 51, 54, 58, 63, 73, 86, 97, 98, 98]


In [36]:
# you can reverse it, too!
ints.sort(reverse=True)
print(ints)

[98, 98, 97, 86, 73, 63, 58, 54, 51, 51, 48, 46, 42, 34, 32, 32, 31, 29, 25, 0]


## Write it back out

* Now we have our sorted list, let's save it back to file


In [37]:
with open('spaced_numbers_sorted.txt', 'w') as f:
    for i in ints:
        f.write('%d ' % i)
    f.write('\n')

In [38]:
!more spaced_numbers_sorted.txt

98 98 97 86 73 63 58 54 51 51 48 46 42 34 32 32 31 29 25 0 


## Random Numbers

How did I generate those random numbers in the first place?

In [21]:
# import the random module
import random

# open a file for writing
with open('spaced_numbers.txt', 'w') as f:
    # loop some number of times
    for i in range(20):
        # write out a random integer, followed by a space
        f.write('%d ' % random.randint(0, 100))
    f.write('\n')


## Random Numbers

* You have loads of random operations at your fingertips:


In [39]:
print([m for m in dir(random) if m[0] != '_'])

['BPF', 'LOG4', 'NV_MAGICCONST', 'RECIP_BPF', 'Random', 'SG_MAGICCONST', 'SystemRandom', 'TWOPI', 'betavariate', 'choice', 'choices', 'expovariate', 'gammavariate', 'gauss', 'getrandbits', 'getstate', 'lognormvariate', 'normalvariate', 'paretovariate', 'randint', 'random', 'randrange', 'sample', 'seed', 'setstate', 'shuffle', 'triangular', 'uniform', 'vonmisesvariate', 'weibullvariate']


* `random.shuffle` is particularly useful in our work to randomize a list:

In [40]:
random.shuffle(ints)
print(ints)

[32, 48, 98, 25, 32, 31, 34, 51, 86, 98, 51, 73, 58, 29, 42, 0, 97, 63, 54, 46]


## What about CSV files?

* Most often our data are in formatted files, such as comma-separated values (CSV) files, not just lists of numbers:

In [42]:
!more exp_res.csv

Subject,Performance
0,0.19839211032002024
1,0.41428489112125344
2,0.027715898314496612
3,0.05627103270567213
4,0.27079871696692148
5,0.93739232241039394
6,0.49069767020105493
7,0.24287893232441449
8,0.97942327679701313
9,0.3229346781148571


## Using the csv module

* We could parse the file with strip and split like before

* or we can use the builtin ``csv`` module to read and write them:


In [44]:
import csv

# create a dictionary reader
dr = csv.DictReader(open('exp_res.csv','r'))

# read in all the lines into a list of dicts
d = [l for l in dr]

# note it creates OrderedDict instances!!!
d

[OrderedDict([('Subject', '0'), ('Performance', '0.19839211032002024')]),
 OrderedDict([('Subject', '1'), ('Performance', '0.41428489112125344')]),
 OrderedDict([('Subject', '2'), ('Performance', '0.027715898314496612')]),
 OrderedDict([('Subject', '3'), ('Performance', '0.05627103270567213')]),
 OrderedDict([('Subject', '4'), ('Performance', '0.27079871696692148')]),
 OrderedDict([('Subject', '5'), ('Performance', '0.93739232241039394')]),
 OrderedDict([('Subject', '6'), ('Performance', '0.49069767020105493')]),
 OrderedDict([('Subject', '7'), ('Performance', '0.24287893232441449')]),
 OrderedDict([('Subject', '8'), ('Performance', '0.97942327679701313')]),
 OrderedDict([('Subject', '9'), ('Performance', '0.3229346781148571')])]

## Pickling!

* Often we want to dump and object to file for future use.

* Pickling allows us to *serialize* Python objects (i.e., turn them into a byte stream that can be saved to file):


In [47]:
import pickle

# dump the list of ordered dicts to a file 
# (note the 'b' in the 'wb', which means a 
# binary stream instead of a ascii text stream)
pickle.dump(d, open('my_dict.pickle', 'wb'))

!cat my_dict.pickle

�]q (ccollections
OrderedDict
q)Rq(X   SubjectqX   0qX   PerformanceqX   0.19839211032002024quh)Rq(hX   1qhX   0.41428489112125344q	uh)Rq
(hX   2qhX   0.027715898314496612quh)Rq(hX   3qhX   0.05627103270567213quh)Rq(hX   4qhX   0.27079871696692148quh)Rq(hX   5qhX   0.93739232241039394quh)Rq(hX   6qhX   0.49069767020105493quh)Rq(hX   7qhX   0.24287893232441449quh)Rq(hX   8qhX   0.97942327679701313quh)Rq(hX   9q hX   0.3229346781148571q!ue.

## Unpickling

* As you can see, the serialization process is not usually human-readable
* Once pickled, it's easy to load it back:


In [48]:
# open the file back for reading
d2 = pickle.load(open('my_dict.pickle','rb'))
d2

[OrderedDict([('Subject', '0'), ('Performance', '0.19839211032002024')]),
 OrderedDict([('Subject', '1'), ('Performance', '0.41428489112125344')]),
 OrderedDict([('Subject', '2'), ('Performance', '0.027715898314496612')]),
 OrderedDict([('Subject', '3'), ('Performance', '0.05627103270567213')]),
 OrderedDict([('Subject', '4'), ('Performance', '0.27079871696692148')]),
 OrderedDict([('Subject', '5'), ('Performance', '0.93739232241039394')]),
 OrderedDict([('Subject', '6'), ('Performance', '0.49069767020105493')]),
 OrderedDict([('Subject', '7'), ('Performance', '0.24287893232441449')]),
 OrderedDict([('Subject', '8'), ('Performance', '0.97942327679701313')]),
 OrderedDict([('Subject', '9'), ('Performance', '0.3229346781148571')])]

## Notes on Pickles

* Delicious, but...

* Note that pickles are *NOT* portable across languages

* If you require interoperability, then you'll want to use a different
  file format

* Raw text is about as portable as they get, but is not always the
  most efficient

* My favorite data storage format is Hierarchical Data Format v. 5 (HDF5), which is widely used (even adopted by Matlab) and has I/O libraries for almost every programming language.

  * e.g., [h5py](https://www.h5py.org/)

# Experimental Design

## Science is hard

![](https://imgs.xkcd.com/comics/purity.png)

## It all starts with a question

### What are we trying to do, anyway?

![](./figs/brain_quest.png)

## The Scientific method as a computer program

* Science basically involves figuring out how a function works by passing in variables and observing the output.

In [None]:
def human_brain(*args, **kwargs):
    # stuff happens
    
    return output

## Independent vs. dependent variables

- The inputs are the ***independent*** variables
  - e.g., items, conditions, etc...
- The outputs are the ***dependent*** variables
  - e.g., choices, reaction times, etc...
- There are also ***controlled*** variables that you keep the same. 
  - The goal is to prevent their influence the effect of independent on dependent variables.
  - e.g., if you changed items when you changed conditions, you wouldn't know if it was the items or the conditions that affected the output.

## The Hypothesis

- The scientist makes a conjecture about how change in independent variables will give rise to change in dependent variables.

- The hypothesis is an instantiation of your ***model*** of the world, even if it's a poorly specified model.

- It could be that the independent variables have no relation to the dependent variables, in which case we need a new hypothesis.

## Experiments test hypotheses

- The goal is to design an experiment that can reliably ***disprove*** your hypothesis.
- Ideally, your hypothesis is a *generative* model and you can run simulations to help you design a powerful experiment.

## Generative model?

- A ***generative*** model is like a function you've written to mimic the behavior of the function you're trying to understand.
- The alternative is a ***descriptive/discriminative*** model, which tests whether a change in the input to a function gives rise to a significant change in the output.

(Details in another course, Quantified Cognition, which I typically teach in the Spring.)


## Learning by example: Flanker Task

Which of these is harder to indicate the direction the middle arrow is pointing?

# <<<<<<<

# <<<><<<

# ===<===

## The Flanker task

Tests the role of attention and cognitive control in decision-making.

### Hypothesis

The items that flank a target item will affect processing of that item, requiring exertion of cognitive control to overcome the interference.

## How should we test this hypothesis?

- How many trials do we need?
- Should we do a between- or within-subject manipulation?
- What conditions should we include?
- What proportion of each condition should we include?
- Does the order of the items matter?


## List generation vs. Stimulus Presentation

- Most experiments can separate the generation of random lists that govern what we will present to participants and the code necessary to handle the presentation of stimuli and collect the responses.

  - The primary exception would be adaptive experiments that depend on the behavior (or neural activity) of the participant to determine subsequent trials.

- We'll focus here on the list generation portion of the experiment.

## Define the trial types

We have the following variables:

- Condition: Incongruent, Congruent, Neutral
- Direction: Left, Right

In [63]:
# conditions
conds = [{'condition': 'congruent',
          'direction': 'left',
          'stimulus': '<<<<<<<'
         },
         {'condition': 'congruent',
          'direction': 'right',
          'stimulus': '>>>>>>>'
         },
         {'condition': 'incongruent',
          'direction': 'left',
          'stimulus': '>>><>>>'
         },
         {'condition': 'incongruent',
          'direction': 'right',
          'stimulus': '<<<><<<'
         },
         {'condition': 'neutral',
          'direction': 'left',
          'stimulus': '===<==='
         },
         {'condition': 'neutral',
          'direction': 'right',
          'stimulus': '===>==='
         },]

## Turning conditions into trials

- As long as we want to keep the conditions balanced, we can just specify the number of repetitions.

In [64]:
num_reps = 4
trials = conds * num_reps
trials

[{'condition': 'congruent', 'direction': 'left', 'stimulus': '<<<<<<<'},
 {'condition': 'congruent', 'direction': 'right', 'stimulus': '>>>>>>>'},
 {'condition': 'incongruent', 'direction': 'left', 'stimulus': '>>><>>>'},
 {'condition': 'incongruent', 'direction': 'right', 'stimulus': '<<<><<<'},
 {'condition': 'neutral', 'direction': 'left', 'stimulus': '===<==='},
 {'condition': 'neutral', 'direction': 'right', 'stimulus': '===>==='},
 {'condition': 'congruent', 'direction': 'left', 'stimulus': '<<<<<<<'},
 {'condition': 'congruent', 'direction': 'right', 'stimulus': '>>>>>>>'},
 {'condition': 'incongruent', 'direction': 'left', 'stimulus': '>>><>>>'},
 {'condition': 'incongruent', 'direction': 'right', 'stimulus': '<<<><<<'},
 {'condition': 'neutral', 'direction': 'left', 'stimulus': '===<==='},
 {'condition': 'neutral', 'direction': 'right', 'stimulus': '===>==='},
 {'condition': 'congruent', 'direction': 'left', 'stimulus': '<<<<<<<'},
 {'condition': 'congruent', 'direction': 'rig

## Randomizing the order

- We don't want the participant to know what trials will come next
- We can use the random module to help us here:

In [65]:
random.shuffle(trials)
trials

[{'condition': 'congruent', 'direction': 'left', 'stimulus': '<<<<<<<'},
 {'condition': 'incongruent', 'direction': 'right', 'stimulus': '<<<><<<'},
 {'condition': 'congruent', 'direction': 'left', 'stimulus': '<<<<<<<'},
 {'condition': 'neutral', 'direction': 'left', 'stimulus': '===<==='},
 {'condition': 'incongruent', 'direction': 'left', 'stimulus': '>>><>>>'},
 {'condition': 'incongruent', 'direction': 'left', 'stimulus': '>>><>>>'},
 {'condition': 'congruent', 'direction': 'right', 'stimulus': '>>>>>>>'},
 {'condition': 'incongruent', 'direction': 'left', 'stimulus': '>>><>>>'},
 {'condition': 'congruent', 'direction': 'right', 'stimulus': '>>>>>>>'},
 {'condition': 'neutral', 'direction': 'left', 'stimulus': '===<==='},
 {'condition': 'incongruent', 'direction': 'right', 'stimulus': '<<<><<<'},
 {'condition': 'congruent', 'direction': 'right', 'stimulus': '>>>>>>>'},
 {'condition': 'neutral', 'direction': 'left', 'stimulus': '===<==='},
 {'condition': 'congruent', 'direction': '

## Multiple trial blocks

- We often want to give participants a break during a task.
- One way to do this is to split the trials into blocks

In [68]:
# turn the trial list generation into a function
def gen_trials(conds, num_reps):
    # warning, even though this give you a new list
    # each dictionary in the list is the same one, repeated
    # see the `deepcopy` in the `copy` module 
    trials = conds[:] * num_reps
    random.shuffle(trials)
    
    return trials

# Specify the number of blocks
num_blocks = 3
blocks = [gen_trials(conds, num_reps) for b in range(num_blocks)]
blocks

[[{'condition': 'neutral', 'direction': 'right', 'stimulus': '===>==='},
  {'condition': 'congruent', 'direction': 'left', 'stimulus': '<<<<<<<'},
  {'condition': 'neutral', 'direction': 'left', 'stimulus': '===<==='},
  {'condition': 'congruent', 'direction': 'left', 'stimulus': '<<<<<<<'},
  {'condition': 'incongruent', 'direction': 'right', 'stimulus': '<<<><<<'},
  {'condition': 'neutral', 'direction': 'left', 'stimulus': '===<==='},
  {'condition': 'neutral', 'direction': 'left', 'stimulus': '===<==='},
  {'condition': 'congruent', 'direction': 'right', 'stimulus': '>>>>>>>'},
  {'condition': 'congruent', 'direction': 'left', 'stimulus': '<<<<<<<'},
  {'condition': 'neutral', 'direction': 'right', 'stimulus': '===>==='},
  {'condition': 'incongruent', 'direction': 'left', 'stimulus': '>>><>>>'},
  {'condition': 'incongruent', 'direction': 'right', 'stimulus': '<<<><<<'},
  {'condition': 'incongruent', 'direction': 'right', 'stimulus': '<<<><<<'},
  {'condition': 'congruent', 'dire

## General tips

- Give your future self a gift!
  - Try to include as much information as possible in your trials to facilitate subsequent analyses (e.g., don't just have a stimulus column.)
- Try as much as possible to avoid hard-coded values.
  - Make use of a configuration section in your code to set all the variables that would determine the lists that are generated.

## First bigger project!

- We're going to be generating lists for an experiment we'll run in class.
- We'll work on this now for the rest of class, though they are due next week.


### See you next week!!!