# Darik's School to Code Good

![](https://drive.google.com/uc?export=view&id=1qcuelOOzTVT-h9lNSKbQHvV4Pe7p0NDu)

Imports & Initializaition

In [1]:
%matplotlib inline
import numpy as np

**Why bother?**
- You don't want to struggle running a script named `decode_final_v2` 6 months from now
- You don't want to worry that modifying a function will break everything
- You don't want your colleagues to hate you when they do a code review
- You want to hand someone a script and not have to explain what it does
- You want to be confident in your code, especially when publishing
- You want to be efficient & productive: *good code works*

Today, will be split into two parts: **guiding principles** and **modern tooling**. In guiding principles, we'll learn the difference between good code and bad code. In modern tooling, we'll learn how to use some modern software development tools to help us code good.

 The concepts are pretty brisk, but you could easily do an entire hour-long tutorial on literally every one of the modern tools I'll present. *I can always meet 1:1 or schedule something in the future.*

# Part I: Guiding Principles
First, we'll look at how to *think* about code so that it's designed to be *used* and not to *work*  <br>
1. **Keep it Simple, Stupid** <br>
2. **Easy to Change** <br>
3. **Comments aren't Documentation** <br>
4. **Program Defensively** <br>
5. **My Software Sucks** <br>
6. **Scripts Call Code** <br>

## 1. Keep it Simple, Stupid

![](https://drive.google.com/uc?export=view&id=1m56YB8_w0JHK4gyQgaH5Jk-2PQ8-cILp)

Imagine we want to find which neuron in our calcium imaging dataset has the most activations, and our labmate has kindly offered us some code they've written.

In [4]:
def fmp(x, y, t):
    # Finds neuron with most peaks. x is timestamps, y is data, and t is threshold.
    s = len(x)
    thr = t * np.std(y, axis=0)
    a = []
    for n in range(y.shape[1]):
        i = 0
        one = 0
        pk = thr[n]
        while i < s:
            if y[i, n] > pk:
                one += 1
                i += t
            else:
                i += 1
        a.append(one)
    return a.index(max(a))

#### *"Code is read much more often than it is written"*

Write your code for humans, not the machine!

##### Master the Art of Naming

**DON'T**:
- Use single letter names outside of coordinate systems
  - `x`, `y`, and `z` are fine for cartesian coordinates
  - `m` is not a good name for "mouse"
  - `neuron` is better than `n` when iterating
- Use acronyms as shorthand unless they are common vernacular
  - `zs` is not a good name for a z-scored value. Use `z_score`
  - `glm` is fine for generalized linear model
- Use ambiguous booleans
    - Does the boolean `check` mean the data *was* checked or does it mean it *needs* to be checked?

**BEST**:
- Names should indicate purpose.
    - `ImagingMetadata` is a better name than `information`, `experiment`, or `Records`
- Don't lie
    - If you name a function `calc_baseline` it should **only** calculate the baseline
- Pay attention to context and make meaningful distinctions 
    - `neural_activity` is a better name than `neurons`
- Being consistent reduces cognitive load
    - Always use `num_trials` instead of `num_trials`, `n_trials`, and `ntrials`
- Follow language naming conventions to reduce cognitive load
    - **Python** is expected to adhere to PEP8
    - **Rust** is expected to follow RFC430
    - **Mathworks** just introduced a style guide in 2025
    - **R**'s style guide is Tidyverse (or Google's derivative)
    - The **C++** community can't agree on literally anything

Recall...

In [None]:
def fmp(x, y, t, distance):
    # Finds neuron with most peaks. x is timestamps, y is data, and t is threshold.
    s = len(x)
    thr = t * np.std(y, axis=0)
    a = []
    for n in range(y.shape[1]):
        i = 0
        one = 0
        pk = thr[n]
        while i < s:
            if y[i, n] > pk:
                one += 1
                i += distance
            else:
                i += 1
        a.append(one)
    return a.index(max(a))

How might our labmate have labeled their function better?

In [None]:
def find_neuron_most_peaks(fluorescence, num_std_dev, min_separation):
    num_samples, num_neurons = fluorescence.shape
    thresholds = num_std_dev * np.std(fluorescence, axis=0)
    events_per_neuron = []
    for neuron_index in range(num_neurons):
        sample_index = 0
        num_events = 0
        peak_threshold = thresholds[neuron_index]
        while sample_index < num_samples:
            if fluorescence[sample_index, neuron_index] > peak_threshold:
                num_events += 1
                sample_index += min_separation
            else:
                sample_index += 1
        events_per_neuron.append(num_events)
    return events_per_neuron.index(max(events_per_neuron))

**LOOKING FORWARD** <br>
- In the IDE section, we'll learn how to use "autocomplete" & "refactor" tools to avoid typing long names. Avoid the urge to prefer expediency over precision! <br>

##### The Art of Readability
- Always prioritize readability <br>
- Always prioritize readability <br>
- Premature optimization is the root of all evil (so always prioritize readability). <br>
- Once your code is working, IF and ONLY IF you have a demonstrated performance problem, consider optimizing. <br>
- Always profile your code to determine WHAT to optimize. Only then will you optimize! <br>

The Art of Humility
- It is better to be clear than clever.

Kernighan's Law:
1. Debugging is twice as hard as writing code in the first place.
2. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

Imagine we have a friend who's a theorist and he sends us this code to solve the classic "FizzBuzz" problem: <br>
You print the number n from 1 to N, EXCEPT: <br>
- you print "fizz" if its a multiple of 3 <br>
- "buzz" if it's a multiple of 5 <br>
- "fizzbuzz" if it's a multiple of both <br>

In [9]:
(lambda f: f(f, 1))(lambda f, n: None if n > 10 else print((not n % 3 and "fizz" or "") + (not n % 5 and "buzz" or "") or n) or f(f, n+1))

1
2
fizz
4
buzz
fizz
7
8
fizz
buzz


As we can see, wow he's very smart! He's also very stupid for playing code golf. He could've just written...

In [10]:
for n in range(1, 11):
    output = ""
    if n % 3 == 0:
        output += "fizz"
    if n % 5 == 0:
        output += "buzz"
    print(output or n)

1
2
fizz
4
buzz
fizz
7
8
fizz
buzz


Cyclomatic complexity ~ **number of independent paths** through your code.

High complexity:
- Many `if`/`elif` branches
- Deeply nested loops / conditionals
- Harder to test and reason about

It's better to have more functions than more paths:
It's hard to test all six paths of one function than it is to test six different functions.


Rule of thumb: keep functions simple enough that you can hold the logic in your head.

## 2. Easy to Change
It's unlikely that your first iteration of your code will be the final version. <br>
By making your code *modular*, you make it 
- easier to change
- easier to reuse
- easier to test
- and easier to read.

Imagine we are TAing a course at UCSD and we write a function to automatically grade exams:

In [None]:
def process_grades(file)
    
    (names, scores) = open(file).read()

    def letter(score: int) -> str:
        if score >= 90: return "A"
        if score >= 80: return "B"
        if score >= 70: return "C"
        if score >= 60: return "D"
        return "F"

    grades = [(name, score, letter(score)) for name, score in zip(names, scores)]

    return grades

In [None]:
todays_grades = process_grades(todays_file)

Mid-semester, UCSD mandates that all courses be graded on a 93% A scale instead of 90%. <br>
While it may be trivial to edit the "letter" portion of this function, in scientific practice things can get extremely tedious and complicated <br>
A more experienced TA might have written their code like this:

In [None]:
def load_scores(file):
    return open(file).read().splitlines()

def grade(names, scores, scale):
    grades = []
    for name, score in zip(names, scores):
        for letter, threshold in scale.items():
            if score >= threshold:
                grades.append(name, score, letter)
                break
    return grades

In [None]:
(names, scores) = load_scores(todays_file)
grading_scale = {
    "A": 93,
    "B": 87,
    "C": 80,
    "D": 73,
    "E": 67
}
todays_grades = grade(names, scores, grading_scale)

## 3. Comments aren't Documentation

Comments are for clarification of non-obvious steps and other implementation information to a **developer**.
- I did it this way because XYZ.
- What I am doing here is this.
- TODO: We should redesign this.
- BUG: Users report this function causes random crashes.
- OPTIMIZE: I am a bottleneck.

Documentation is to explains end users how to **use** the function
- **What** the function does
- **Arguments** (types, meaning, units if relevant)
- **Returns** (what and in what format)
- Edge cases, assumptions, side effects
- Examples
- Mathematical formulations

**BAD**

In [None]:
def calculate_insurance_premium(age, risk_factor):
    """
    Calculates the annual insurance premium based on age and risk factor.
    """
    # make sure valid
    if risk_factor < 1.0 or risk_factor > 5.0:
        raise ValueError(f"risk_factor must be between 1.0-5.0; risk_factor = {risk_factor}")
    if age < 0:
        raise ValueError(f"age must be greaten than 0; age = {age}")
    # calculate base premimum
    base_premium = 100 * risk_factor
    # apply additional surchage if less than 25
    if age < 25:
        base_premium *= 1.5
    return base_premium

**GOOD**

In [None]:
def calculate_insurance_premium(age, risk_factor):
    """
    Calculates the annual insurance premium based on age and risk factor.
    
    Premiums for individuals under 25 are subject to an additional surcharge
    due to a higher historical claim rate (specific regulatory requirement).
    
    :params age: The client's age
    :params risk_factor: Assessed risk factor (1.0 to 5.0)
    :raises: ValueError if the individual is less than 0 years old
    :raises: ValueError if the risk factor is not satisfied by 1.0 <= risk_factor <= 5.0
    :return: The final annual insurance premium
    """
    
    if risk_factor < 1.0 or risk_factor > 5.0:
        raise ValueError(f"risk_factor must be between 1.0-5.0; risk_factor = {risk_factor}")
    if age < 0:
        raise ValueError(f"age must be greaten than 0; age = {age}")
        
    base_premium = 100 * risk_factor
    if age < 25:
        base_premium *= 1.5
    return base_premium

## 4. Program Defensively

Your code may be called with incorrectly formatted data, non-existent paths, and other nonsense. <br>
Good code anticipates how user might make mistakes and protects them from themselves. <br>

- Strive to catch errors gracefully. A function failing within immediately makes you sign; a function that fails after 37 hours is traumatizing.
- Identify and be proactive about "silent" mistakes. Alert users if their requests are incoherent.

**Errors are announced, not corrected.** <br> 
Do not forget to protect your code from the users...<br>
Do not chase the wind to make your code idiotproof by correcting input; <br>
the universe will ALWAYS design a better idiot <br>
...and excessive boilerplate will only make the relevant software harder to maintain.

## 5 My Software Sucks

Good programmers assume their code has bugs until proven otherwise. <br>
- A common developmental strategy in software is to write a test that would be fulfilled by your code.
- Write the code to pass the test.

In scientific analysis, this is not always straightforward.
In practice
- we can use simulated or real datasets with known properties (e.g., a decoder should be capable of decoding a simulated dataset).
- we can write "implementation" tests that simply ensure functions *run* without crashing/errors
- we can write tests about bugs we identify and ensure they are "stamped out"
- we can write tests to protect against changes breaking the code

The more modular your code, the easier it is to test and the more useful tests become... especially in scientific programmming! It's quite easy to test whether all the individual steps of a motion correction pipeline operate as itended, but it's much more difficult make sense of a motion correction pipeline failing to correct a video

## 6 Scripts Call Code

This is one is pretty straightforward. Scripts should be calling functions, not defining functions. <br>
When scripts contain code it is hard to reuse, hard to debug (run the whole script? copy-paste?) and hard to verify <br>

For example, jupyter notebooks do not guarantee that the cells are executed in order. Therefore, it is impossible to 
determine whether the results within a jupyter notebook used the defined code written within the notebook. Even if the cells were executed in order, it's possible for the jupyter notebooks to inadvertently modify each others state when run concurrently (e.g., you have two different notebooks open).

Instead, use a jupyter notebook to *showcase* analysis. Import required modules and functions, and the notebook like a guide to the data:
- The data is loaded
- This function is called
- I plot the result
- I print the stats
- I save to file.

# Part II. MODERN TOOLS

1. **Integrated Development Environments**
2. **Virtual Environments & Dependency Management** -
3. **Linting** -
4. **Formatting** -
5. **Testing & Code-Coverage** -
6. **Validation** -
7. **Navigable Documentation**

## 1. Integrated Development Environments

![](https://intellipaat.com/blog/wp-content/uploads/2025/12/What-is-Pycharm.webp)

## 2. Virtual Environments & Dependency Management

![](https://substackcdn.com/image/fetch/$s_!g3Si!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff90d967c-3153-4233-ae1d-eb75bdfdc5f1_3005x1573.png)

![](https://substackcdn.com/image/fetch/$s_!FB0V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3176fef-ba14-4d0b-b3a8-de5b70a34945_3639x1512.png)