# Darik's School to Code Good

![](https://github.com/darikoneil/dariks_school_to_code_good/blob/main/assests/LOGO_SMALL.png)

***
***

## **Why bother?**
***
- You don't want to struggle running a script named `decode_final_v2` 6 months from now
- You don't want to worry that modifying a function will break everything
- You don't want your colleagues to hate you when they do a code review
- You want to hand someone a script and not have to explain what it does
- You want to be confident in your code, especially when publishing
- You want to be efficient & productive: *good code works*

## Outline
***

Today, will be split into two parts: **guiding principles** and **modern tooling**. 
- In guiding principles, we'll learn the difference between good code and bad code.
- In modern tooling, we'll learn how to use some modern software development tools to help us "code good".

### We don't actually have to follow the outline
 The concepts are pretty brisk, but you could easily do an entire hour-long tutorial on literally every one of the modern tools I'll present. *I can always meet 1:1 or schedule something in the future.*

### I am the train kid from Polar Express
![](https://pbs.twimg.com/media/Eo4n1E6XUAI_YnN.jpg)

# Part I: Guiding Principles
***
***
First, we'll look at how to *think* about code so that it's designed to be *used* and not just to *work.*  <br>
Getting code that works is a **pre-requisite** for any software, not an endpoint in itself. <br>
We'll highlight these six guiding principles to ensure that we always *code good*:


### Darik's Creed for the Scientists who Codes Good
<div class="alert alert-block alert-success">
1. <b>Keep it Simple, Stupid</b><br>&emsp;Prioritize readability<br>
2. <b>Be Easy to Change</b><br>&emsp;Always with legos instead of blobs<br>
3. <b>Comments aren't Documentation</b><br>&emsp;Comments guide developers, Documentation guides users<br>
4. <b>Program Defensively</b><br>&emsp;Protect users from themselves<br>
5. <b>My Software Sucks</b><br>&emsp;It doesn't work until proven otherwise<br>
6. <b>Scripts Call Code</b><br>&emsp;Separate execution from implementation<br>
</div>

## 1. Keep it Simple, Stupid
***

![](https://github.com/darikoneil/dariks_school_to_code_good/blob/main/assests/KISS_410_410.jpg)

#### *"Code is read much more often than it is written"*
Always write your code for humans, not for machines!

<div class="alert alert-block alert-warning">
<b>EXAMPLE</b>
</div>

Imagine we want to find which neuron in our calcium imaging dataset has the most activations, and our labmate has kindly offered us some code they've written.

In [10]:
def fmp(x, y, t):
    # Finds neuron with most peaks. x is timestamps, y is data, and t is threshold.
    s = len(x)
    thr = t * np.std(y, axis=0)
    a = []
    for n in range(y.shape[1]):
        i = 0
        one = 0
        pk = thr[n]
        while i < s:
            if y[i, n] > pk:
                one += 1
                i += t
            else:
                i += 1
        a.append(one)
    return a.index(max(a))

How might our labmate have labeled their function better?

In [11]:
def find_neuron_most_peaks(fluorescence, num_std_dev, min_separation):
    num_samples, num_neurons = fluorescence.shape
    thresholds = num_std_dev * np.std(fluorescence, axis=0)
    events_per_neuron = []
    for neuron_index in range(num_neurons):
        sample_index = 0
        num_events = 0
        peak_threshold = thresholds[neuron_index]
        while sample_index < num_samples:
            if fluorescence[sample_index, neuron_index] > peak_threshold:
                num_events += 1
                sample_index += min_separation
            else:
                sample_index += 1
        events_per_neuron.append(num_events)
    return events_per_neuron.index(max(events_per_neuron))

### Master the Art of Naming

#### **DON'T**:
- Use single letter names outside of coordinate systems
  - `x`, `y`, and `z` are fine for cartesian coordinates
  - `m` is not a good name for "mouse"
  - `neuron` is better than `n` when iterating
- Use acronyms as shorthand unless they are common vernacular
  - `zs` is not a good name for a z-scored value. Use `z_score`
  - `glm` is fine for generalized linear model
- Use ambiguous booleans
    - Does the boolean `check` mean the data *was* checked or does it mean it *needs* to be checked?

### **BEST**:
- Names should indicate purpose.
    - `ImagingMetadata` is a better name than `information`, `experiment`, or `Records`
- Don't lie
    - If you name a function `calc_baseline` it should **only** calculate the baseline
- Pay attention to context and make meaningful distinctions 
    - `neural_activity` is a better name than `neurons`
- Being consistent reduces cognitive load
    - Always use `num_trials` instead of `num_trials`, `n_trials`, and `ntrials`
- Follow language naming conventions to reduce cognitive load
    - **Python** is expected to adhere to PEP8
    - **Rust** is expected to follow RFC430
    - **Mathworks** just introduced a style guide in 2025
    - **R**'s style guide is Tidyverse (or Google's derivative)
    - The **C++** community can't agree on literally anything

**LOOKING FORWARD** <br>
- In the IDE section, we'll learn how to use "autocomplete" & "refactor" tools to avoid typing long names. Avoid the urge to prefer expediency over precision! <br>

### The Art of Readability
- Always prioritize readability <br>
- Always prioritize readability <br>
- Premature optimization is the root of all evil (so always prioritize readability). <br>
- Once your code is working, IF and ONLY IF you have a demonstrated performance problem, consider optimizing. <br>
- Always profile your code to determine WHAT to optimize. Only then will you optimize! <br>

### The Art of Humility
- It is better to be clear than clever.

#### Kernighan's Law:
1. Debugging is twice as hard as writing code in the first place.
2. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

<div class="alert alert-block alert-warning">
<b>EXAMPLE</b>
</div>

Imagine we have a friend who's a theorist and he sends us this code to solve the classic "FizzBuzz" problem: <br><br>
You print the number n from 1 to N, EXCEPT: <br>
- you print "fizz" if its a multiple of 3 <br>
- "buzz" if it's a multiple of 5 <br>
- "fizzbuzz" if it's a multiple of both <br>

Your friends implements...

In [13]:
(lambda f: f(f, 1))(lambda f, n: None if n > 10 else print((not n % 3 and "fizz" or "") + (not n % 5 and "buzz" or "") or n) or f(f, n+1))

1
2
fizz
4
buzz
fizz
7
8
fizz
buzz


As we can see, wow he's very smart! He's also very stupid for playing code golf. He could've just written...

In [14]:
for n in range(1, 11):
    output = ""
    if n % 3 == 0:
        output += "fizz"
    if n % 5 == 0:
        output += "buzz"
    print(output or n)

1
2
fizz
4
buzz
fizz
7
8
fizz
buzz


### The Art of Simplicity
Cyclomatic complexity ~ **number of independent paths** through your code.

High complexity:
- Many `if`/`elif` branches
- Deeply nested loops / conditionals
- Harder to test and reason about

It's better to have more functions than more paths:
It's hard to test all six paths of one function than it is to test six different functions.


Rule of thumb: keep functions simple enough that you can hold the logic in your head.

## 2. Easy to Change
***
It's unlikely that your first iteration of your code will be the final version. <br>
By making your code *modular*, you make it 
- easier to change
- easier to reuse
- easier to test
- and easier to read.

<div class="alert alert-block alert-warning">
<b>EXAMPLE</b>
</div>

Imagine we are TAing a course at UCSD and we write a function to automatically grade exams:

In [18]:
def process_grades(file):
    
    (names, scores) = open(file).read()

    def letter(score: int) -> str:
        if score >= 90: return "A"
        if score >= 80: return "B"
        if score >= 70: return "C"
        if score >= 60: return "D"
        return "F"

    grades = [(name, score, letter(score)) for name, score in zip(names, scores)]

    return grades

usage:
```
todays_grades = process_grades(todays_file)
```

Mid-semester, UCSD mandates that all courses be graded on a 93% A scale instead of 90%. <br>
While it may be trivial to edit the "letter" portion of this function, in scientific practice things can get extremely tedious and complicated <br><br>
A more experienced TA might have written their code like this:

In [19]:
def load_scores(file):
    return open(file).read().splitlines()

def grade(names, scores, scale):
    grades = []
    for name, score in zip(names, scores):
        for letter, threshold in scale.items():
            if score >= threshold:
                grades.append(name, score, letter)
                break
    return grades

usage:
```
(names, scores) = load_scores(todays_file)
grading_scale = {
    "A": 93,
    "B": 87,
    "C": 80,
    "D": 73,
    "E": 67
}
todays_grades = grade(names, scores, grading_scale)
```

## 3. Comments aren't Documentation
***

Comments are for clarification of non-obvious steps and other implementation information to a **developer**.
- I did it this way because XYZ.
- What I am doing here is this.
- TODO: We should redesign this.
- BUG: Users report this function causes random crashes.
- OPTIMIZE: I am a bottleneck.

Documentation is to explains end users how to **use** the function
- **What** the function does
- **Arguments** (types, meaning, units if relevant)
- **Returns** (what and in what format)
- Edge cases, assumptions, side effects
- Examples
- Mathematical formulations

Typehints guide both developers and users

<div class="alert alert-block alert-warning">
<b>EXAMPLE</b>
</div>

**BAD**

In [22]:
def calculate_insurance_premium(age, risk_factor):
    """
    Calculates the annual insurance premium based on age and risk factor.
    """
    # make sure valid
    if risk_factor < 1.0 or risk_factor > 5.0:
        raise ValueError(f"risk_factor must be between 1.0-5.0; risk_factor = {risk_factor}")
    if age < 0:
        raise ValueError(f"age must be greaten than 0; age = {age}")
    # calculate base premimum
    base_premium = 100 * risk_factor
    # apply additional surchage if less than 25
    if age < 25:
        base_premium *= 1.5
    return base_premium

### **GOOD**

In [26]:
def calculate_insurance_premium(age: int, risk_factor: float) -> float:
    """
    Calculates the annual insurance premium based on age and risk factor.
    
    Premiums for individuals under 25 are subject to an additional surcharge
    due to a higher historical claim rate (specific regulatory requirement).
    
    :params age: The client's age
    :params risk_factor: Assessed risk factor (1.0 to 5.0)
    :raises: ValueError if the individual is less than 0 years old
    :raises: ValueError if the risk factor is not satisfied by 1.0 <= risk_factor <= 5.0
    :return: The final annual insurance premium
    """
    
    if risk_factor < 1.0 or risk_factor > 5.0:
        raise ValueError(f"risk_factor must be between 1.0-5.0; risk_factor = {risk_factor}")
    if age < 0:
        raise ValueError(f"age must be greaten than 0; age = {age}")
        
    base_premium = 100 * risk_factor
    if age < 25:
        base_premium *= 1.5
    return base_premium

## 4. Program Defensively
***

Your code may be called with incorrectly formatted data, non-existent paths, and other nonsense. <br>
Good code anticipates how user might make mistakes and protects them from themselves. <br>

- Strive to catch errors gracefully.
- Fail fast & fail early
- Identify and be proactive about "silent" mistakes. Alert users if their requests are incoherent.

### **Errors are announced, not corrected.**
1. Do not forget to protect your code from the users...<br>
2. Do not chase the wind to make your code idiotproof by correcting input!<br>
**The universe will ALWAYS design a better idiot!**

## 5. My Software Sucks
***

### Good programmers assume their code has bugs
A common developmental strategy in software:
1. write a test that would be fulfilled by your code.
2. Write the code to pass the test.

### **In scientific analysis, this is not always straightforward.**
In practice
- we can use simulated or real datasets with known properties (e.g., a decoder should be capable of decoding a simulated dataset).
- we can write "implementation" tests that simply ensure functions *run* without crashing/errors
- we can write tests about bugs we identify and ensure they are "stamped out"
- we can write tests to protect against changes breaking the code

### Modularity makes testing easy
The more modular your code, the easier it is to test and the more useful tests become... especially in scientific programmming! It's quite easy to test whether all the individual steps of a motion correction pipeline operate as itended, but it's much more difficult make sense of a motion correction pipeline failing to correct a video

## 6. Scripts Call Code
***

This is one is pretty straightforward. Scripts should be calling functions, not defining functions. <br>
When scripts contain code it is hard to reuse, hard to debug (run the whole script? copy-paste?) and hard to verify <br>

For example, jupyter notebooks do not guarantee that the cells are executed in order. Therefore, it is impossible to 
determine whether the results within a jupyter notebook used the defined code written within the notebook. Even if the cells were executed in order, it's possible for the jupyter notebooks to inadvertently modify each others state when run concurrently (e.g., you have two different notebooks open).

Instead, use a jupyter notebook to *showcase* analysis. Import required modules and functions, and the notebook like a guide to the data:
- The data is loaded
- This function is called
- I plot the result
- I print the stats
- I save to file.

## Comprehensive Example
***
```
def deinterlace(
    images: NDArrayLike,
    parameters: DeinterlaceParameters | None = None,
) -> None:
    """
    Deinterlace images collected using resonance-scanning microscopes such that the
    forward and backward-scanned lines are properly aligned. A fourier-approach is
    utilized: the fourier transform of the two sets of lines is computed to calculate
    the cross-power spectral density. Taking the inverse fourier transform of the
    cross-power spectral density yields a matrix whose peak corresponds to the
    sub-pixel offset between the two sets of lines. This translative offset was then
    discretized and used to shift the backward-scanned lines.

    Unfortunately, the fast-fourier transform methods that underlie the implementation
    of the deinterlacing algorithm have poor spatial complexity
    (i.e., large memory constraints). This weakness is particularly problematic when
    using GPU-parallelization. To mitigate these issues, deinterlacing can be performed
    batch-wise while maintaining numerically identical results (see `block_size`).

    To improve performance, the deinterlacing algorithm can be applied to a pool
    of the images while maintaining efficacy. Specifically, setting the `pool`
    parameter will apply the deinterlacing algorithm to the the standard deviation of
    each pixel across a block of images. This approach is better suited to images with
    limited signal-to-noise or sparse activity than simply operating on every n-th
    frame.

    Finally, it is often the case that the auto-alignment algorithms used in microscopy
    software are unstable until a sufficient number of frames have been collected.
    Therefore, the `unstable` parameter can be used to specify the number of frames
    that should be deinterlaced individually before switching to batch-wise processing.

    .. note::
        This function operates in-place.

    .. warning::
        The number of frames included in each fourier transform must be several times
        smaller than the maximum number of frames that fit within your GPU's VRAM
        (`CuPy <https://cupy.dev>`_) or RAM (`NumPy <https://numpy.org>`_). This
        function will not automatically revert to the NumPy implementation if there is
        not sufficient VRAM. Instead, an out of memory error will be raised.
    """
    parameters = parameters or DeinterlaceParameters()
    parameters.validate_with_images(images)
    calculate_offset, align_images = _dispatcher(parameters)

    pbar = tqdm(total=images.shape[0], desc="Deinterlacing Images", colour="blue")
    for start, stop in index_image_blocks(
        images, parameters.block_size, parameters.unstable
    ):
        # NOTE: We invoke a similar routine for ALL implementations:
        #  (1) We extract a block of the provided images
        #  (2) We calculate the offset/s necessary to correct deinterlacing artifacts
        #  (3) We align the images such that the artifact is minimized or eliminated

        # NOTE: Extraction isn't done inline due to the 'pool' parameter potentially
        #  changing the shape of the images being processed. In some cases this means
        #  the returned block_images will not be views of the original images, but
        #  currently this only occurs when reducing the number of frames to process
        #  through pool.If adding a feature here in the future (e.g., upscaling), one
        #  will need to remember this is no view guarantee here.

        block_images = extract_image_block(images, start, stop, parameters.pool)
        offset = calculate_offset(block_images)
        align_images(images, start, stop, offset)

        pbar.update(stop - start)
    pbar.close()
```

# Part II. MODERN TOOLS
***
***

You can write good code **without** tools â€” but tools make it much easier to stay clean.

## Suggested Stack
***
- **IDE**: PyCharm (autcomplete, refactoring, jump-to-definition, docs on highlight)
- **Virtual Environments**: UV
- **Formatter**: Ruff
- **Linter**: Ruff
- **Testing & Code Coverage**: pytest+coverage
- **Validation**: Pydantic
- **Documentation**: Sphinx

### A short primer on how python works

![](https://substackcdn.com/image/fetch/$s_!g3Si!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff90d967c-3153-4233-ae1d-eb75bdfdc5f1_3005x1573.png)

Like MATLab, python is an *interpreted* language.<br>

![](https://substackcdn.com/image/fetch/$s_!FB0V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3176fef-ba14-4d0b-b3a8-de5b70a34945_3639x1512.png)

When we use MATLab, our particular MATLab install has an IDE that we use to write our code in, and the interpeter that runs it. **With python, these are decoupled!**
To write code efficiently, we need to provide ourselves with an IDE and the interpeter

## 1. Integrated Development Environments
***

![](https://intellipaat.com/blog/wp-content/uploads/2025/12/What-is-Pycharm.webp)

[Download PyCharm](https://www.jetbrains.com/pycharm/download/?section=windows)

I use the `ClassicUI` plugin because I'm an unc

Pycharm is **free** for students & academics

## 2. Virtual Environments & Dependency Management
***

## UV
An extremely fast Python package and project manager, written in Rust.

### Mac / Linux
Enter this in your terminal
```
curl -LsSf https://astral.sh/uv/install.sh | sh
```

### Windows
Enter this in powershell
```
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
```

## 3. Formatting & Linting
***

### Formatting
In your terminal:
```
ruff format --fix .
```

### Linting
In your terminal
```
ruff check .
```

## 4. Testing & Code Coverage
***

### Pytest
In your terminal:
```
pytest .
```

### Coverage (with pytest)
In your terminal:
```
coverage run
coverage html
```

## 5. Documentation
***

### To setup
In your terminal
```
sphinx-quickstart
```

### To build source
In your terminal:
```
sphinx-apidoc -o docs/source example  -f
```

### To build html
In your terminal:
```
make html
```

## 6. Validation
***

[Pydantic](https://docs.pydantic.dev/latest/)