modeling/todo.qmd

---
title: "Project Scratchpad"
author:
  - name: Matthew J. C. Crump
    orcid: 0000-0002-5612-0090
    email: mcrump@brooklyn.cuny.edu
    affiliations:
      - name: Brooklyn College of CUNY
abstract: "Scratchpad for thinking through the project."
date: 6-27-2023
title-block-style: default
order: 1
---

I've already started on the work in a blog post: <https://crumplab.com/blog/771_GPT_Stroop/>

Need to briefly list what I've done already.

Then, come up with some goals for moving the project forward.

## Blog post summary

- Briefly described the Stroop task
- Develops some motivation for why I care whether or not LLMs can simulate performance in a Stroop-task
  - data-spoofing, such as mturk workers using LLMs to exploit online tasks.
  
- Describe methods
  - use the openai API and R to send instructions for a Stroop task, and then individual trials (in text format)
  - have the model simulate trial-by-trial responses and reaction times, return them in JSON
  - analyse the data and see what happens
  
- The post shows some draft code to simulate a single subject, and to simulate multiple subjects

- Got data from 10 simulated subjects

- Results showed:
  - gpt-3.5-turbo generated data files that showed Stroop effects
  - Simulated RTs were different across simulated subjects
  - Accuracy was 100%
  - RTs looked almost credible. Many individual RTs had 0 ending, and were too round looking.

## Modelling To do

Not an exhaustive list, something to get me started.

- [x] Settle on one script that can be extended across the examples. 
- [x] Run multiple subjects, say batches of 20-30, which should be enough for the kinds of questions I want to ask

Answer the following basic questions

- [x] Does the model produce Stroop effects in RT and accuracy?
- [x] Does the model produce different answers for each run of simulated subjects?
- [x] Does the model produce RTs that look like human subject RTs?
- [x] Does the model produce additional Stroop phenomena without further prompting?
  - [x] Congruency sequence effect?
  - [x] Proportion Congruent effect?
- [x] Can the instruction prompt be used to control how the model simulates performance.
  - [x] simulate 75% correct
  - [] simulate 50% correct
  - [x] simulate some long reaction times
  - [x] simulate a reverse Stroop effect.
  - [] simulate proportion congruent effects
  
- [] simulate some other tasks

- [x] Flanker task
- [x] Simon Task
- [] SRT (Nissen & Bullemer)
- [] Negative Priming
- [] Inhibition of Return
- [] SNARC effect

## Writing to do

- [x] interim overview

- [] consider in more detail what I'm trying to accomplish here and whether this project turns into something more formal than a shared repo on github

- [] Things that make me go hmmm.

  - the modeling work is not reproducible
  - openAI says the new default is that data sent through the API will not be used to train models, but this is an opt-in option
  - I did not opt in, but if someone else did and re-ran this kind of code with feedback to alter the model resposnes, then the results from this kind of work would change
  - Unclear what happens with newer models like GPT 4