# Data Science in Psychology and Neuroscience

## Class info:
* Week #1
* Day: January 22, 2026
* Time: 9:30—10:45 AM
* Location: Logan Hall 125
* <a href="https://forms.microsoft.com/r/26vAcJWrwH">Click here to submit your attendance for Week 1, Question 2!</a>
  
## Instructor info:
* Dr. Jeremy Hogeveen
* jhogeveen@unm.edu
* Logan Hall 281 (Office Hours By Appointment)
  
## Syllabus:
* <a href="https://www.dropbox.com/scl/fi/6fs6fi4kvkwtxn7j7x8ua/PSY450650_DSPN_Spring2026_Syllabus.pdf?rlkey=148e5t4ah8q2n1daclt7mp0h6&dl=1">Download here</a>

## Today's topics:
1. Making a PB&J sandwich
2. Coding Style
3. File & Data Management
4. AI in data science discussion.

# Section #1: Think, Pair, & Share: Instructions for a PB&J

<img src="img/pbj/pbandj_imgs.png" width=800>

* Task: Write clear instructions on how to make a delicious PB&J sandwich.

In [1]:
from IPython.display import YouTubeVideo
YouTubeVideo("cDA3_5982h8?start=45", width=720, height=480)

## 1.1 Lesson: Computers are dumb and unthinking, the instructions you feed them are important!

### 1.1a Imperative Instructions
* For example:
  * Identify a clean work surface or counter.
  * Collect 1 cutting board, 1 butter knife, 1 jar peanut butter, 1 jar jam, and 1 bag of bread; place each item on the work surface side by side.
  * Open the bag of bread and grab two slices.
  * Place the two bread slices flat, side by side, on the cutting board.
  * Unscrew the lid of peanut butter jar.
  * Pick up butter knife handle with right hand.
  * Insert the blade of the butter knife into peanut butter jar, pushing it beneath the surface of the peanut butter.
  * Scoop out approximately 2 tbsp of peanut butter with the butter knife.
  * Spread the peanut butter evenly on the flat side of one slice of bread, covering its surface area.
  * Scrape off any excess peanut butter on the rim of the RB jar.
  * Close peanut butter jar lid.
  * ...
  * Place the slice of bread with peanut butter on top of the slice with jam, ensuring the edges are aligned and the peanut butter covered surface is touching the jam covered surface.

#### Step-by-step, explicit instructions.
  * Pros: Good control & clarity. Efficient for simple tasks. Intuitive.
  * Cons: Repetitive. Error-prone. Limited re-usability. Difficult to scale.

### 1.1b Alternative... (via <a href="https://bsky.app/profile/matthiasnau.bsky.social/post/3m4asjr2it22h">Matthias Nau</a>)

<img src="img/pbj/declarative_pbj.png" width=800>

<img src="img/pbj/declarative_pbj_graph.png" width=800>

* A list of outcomes to be achieved and their relationship.
* Do not necessarily need to occur in a specific order
* Modular, and can be flexibly recombined in new recipes!

# 1.2 Top-Down vs. Bottom-Up Programming

## Design your code "Top-Down"
* Begin thinking about the problem / goal you want to resolve.
* Break desired end state down into progressively simpler parts.
* The first step of programming isn't coding, but thinking about what to code and how / why!
## Implement your code "Bottom-Up"
* Once you have a design in mind, begin to solve each step.
* Starts at the lowest level of the graph and works up to resolving the main goal.
* Gives you a chance to <b>test</b> at each step, which increases the odds of getting things right the first time!
## What do you mean by "test"?
* Line-by-line, testing whether the variables or objects you create are being created / look the way you expect them to.
* Section-by-section, institute <b>positive controls</b>—ground truths where you know the answer and can confirm your code produces the accurate outcome.

### Note: You are not expected to be excellent declarative coders who use top-down design and bottom-up implementation right away. Just keep these principles in mind as you develop your skills!

# Section 2: Coding Style.

<img src="img/readable_code.png">

## 2.1 What is "coding style"?
* How YOUR code looks
    * Analogous to formatting preferences in Word docs
    * Differs person-to-person

* Things to consider
    1. Comments
        * Be succinct, but too much often better than not enough

In [6]:
# Here is a comment describing the next chunk of code
x = 10.50
print(x)

10.5


In [10]:
"""
Sometimes, we have lots to say.
these quotations allow us to have multi-line comment
"""
x = "testing something"
print(x)

testing something


2. Indentation
    * Tabs or spaces?

In [11]:
x = 10
if x % 2 == 0:
    print('this is a tab indent')

this is a tab indent


In [12]:
x = 10
if x % 2 == 0:
     print('this is a four space indent')

this is a four space indent


3. Naming things
    * snake_case, kebab-case, or camelCase?
    * Common logic for id'ing variables, data frames, plots, models, etc.

In [13]:
reactionTime = "RT in camelCase"
reactionTime

'RT in camelCase'

In [15]:
reaction_time = "RT in snake_case"
reaction_time

'RT in snake_case'

In [19]:
name = "simple variable names can just state what they are."
print(name)
f_name = "I use 'f_*' prefix to denote functions I have created"
print(f_name)
df_name = "I use 'df_*' prefix to denote data frames"
print(df_name)
sum_name = "I use 'sum_*' prefix to denote descriptive stats / summary tables"
print(sum_name)
p_name = "I use 'p_*' prefix to denote graphics and figure objects"
print(p_name)
m_name = "I use 'm_*' prefix to denote results from inferential models"
print(m_name)

simple variable names can just state what they are.
I use 'f_*' prefix to denote functions I have created
I use 'df_*' prefix to denote data frames
I use 'sum_*' prefix to denote descriptive stats / summary tables
I use 'p_*' prefix to denote graphics and figure objects
I use 'm_*' prefix to denote results from inferential models


4. Be consistent!

### Overarching principle for coding style: Code should be written to minimize the time it would take for someone else— including __future you__ —to understand it.

# Section 3: File & Data Management

## 3.1. The "Container" Rule (Project Organization).
* One project (or class) = one folder
  * I often call this the "parent directory" or "base directory"
  * If you compressed the folder into a Zip and I opened in on the other end, can I run the code without asking you for a missing file?
* Start having a "standard" project organization
  * For me: {base dir} --> {code, plots, data, readme} --> {subfolders may vary...}

## 3.2. Code Management & Paths.
* Relative vs. absolute paths: The leading cause of "the code broke" experiences!
  * Bad: <i>data = load('/Users/JeremyHogeveen/Documents/DSPN/data/raw/datafile.csv')</i>
  * Good: <i>data = load('../data/raw/datafile.csv')</i>
    * If I sent this code to someone else, they wouldn't have a "JeremyHogeveen" username. They may even be running a different operating system with a different folder tree structure. Relative filepaths are our only hope of coding truly <i>shareable</i> code.
* No spaces in directory or file names!
  * Use <i>snake_case</i>, <i>kebab-case</i>, or <i>camelCase</i>.
  * Bad: <i>DSPN Final Project.ipynb</i>
  * Good: <i>DSPN_Final_Project.ipynb</i>

## 3.3. Data Hygiene.
* The raw data is sacred, either make it "read-only" or pretend it is.
  * Any edits needing to be made on the raw data should be done programmatically.
  * E.g. bad URSI example.
* Code should read data from <i>raw/</i> or <i>in/</i> , and save output files to something like <i>derivatives/</i> or <i>out/</i>.
  * It should never overwrite the raw input data.

## 3.4. Naming Conventions.
* Git version control (next week).
* If averse to git, do <i>something</i> to preserve version history!
  * Dates in filename, version naming (v01, v02, etc.), etc.
  * Avoid "*_FINAL.py"—Final is a state of mind, not a sensible file name.

## 3.5. The "Restart & Run All" Test.
* This applies especially for assignments you send in (or code you upload to online repos): Close everything out, run the code step by step, verify that it all runs well in sequence.
* This often cates errors created when you had a package, variable, etc. loaded in memory but not in the code itself.

## 3.6 File management schematic
<img src="img/file_system_hierarchy.png" width=300>

### Note: You will get this for free for this class once we get github set up and are using <i>git clone</i> (first sync) and <i>git pull</i> (subsequent syncs) to synchronize the class github to your computer!

# Section 4: AI in Data Science Discussion.

<img src="img/elephant_in_the_room.png" width=800>

* QUESTION: If ChatGPT can write code and analyze my data in 10 seconds, what are we even doing here?
  * ANSWER: We are training scientists, not technicians.

## 4.1 The Explore-Exploit Dilemma as an analogy for AI use in science

<img src="img/explore_exploit.png" width=600>

* <b>Exploitation</b>: Sticking with familiar options where you know you will get an immediate reward (i.e., maximizing immediate expected value, IEV).
* <b>Exploration</b>: Trying new things, learning by making mistakes, and gathering NEW information about your environment and the world (i.e., maximizing the information bonus, BONUS).
* <b>Coding entirely through AI is pure exploitation</b>. It will maximize IEV, but you are learning no new information.
* <b>Learning to code involves exploration</b>. It is frustrating, you will have filepath errors and environment bugs and spend hours searching stack overflow. BUT, these errors or what drive you to learn new information. By exploring you are maximizing your BONUS.

### <b><i>If we want to conduct the most rigorous and robust science possible, you need some competence with data science, not just the illusion of competence that AI can give you.</b></i>

## 4.2 Social contract for this semester:

### <mark> I hereby commit to not use AI to generate code, write text, or solve problems for this course. </mark>

* <b>In the meantime</b>: When you get stuck, ask a human (me, a peer, etc.). And, learn the art of Googling the error message!

<img src="img/m_baxter.png" width=400>