In [None]:
#Image(url= "https://github.com/darikoneil/dariks_school_to_code_good/blob/dev/logo.png")
#from IPython.display import Image
#from IPython.core.display import HTML 
#Image(r"C:\Users\dao25\PycharmProjects\dariks_school_to_code_good\logo.png")

# Darik's School to Code Good

Imports & Initializaition

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np
from pathlib import Path
from scratch import simulate_calcium_data, plot_neuron

**Why bother?**
- You don't want to struggle running a script named `decode_final_v2` 6 months from now
- You don't want to worry that modifying a function will break everything
- You don't want your colleagues to hate you when they do a code review
- You want to hand someone a script and not have to explain what it does
- You want to be confident in your code, especially when publishing
- You want to be efficient & productive: *good code works*

Today, will be split into two parts: **core concepts** and **modern tooling**. In core concepts, we'll learn why code the difference between good code and bad code. In modern tooling, we'll learn how to use modern development tools. We can pace this as slow or fas ast as everyone feels is actually helpful, and even skip around. The concepts are pretty brisk, but you could easily do an entire hour-long tutorial on literally every one of the modern tools I'll present. *I can always meet 1:1 or schedule something in the future.*  On the flipside, I also don't want to like you're just listening to me rant at the pulpit for the opportunity to eat pizza, so feel free to let me know if something not actually helpful or boring or whatever.

# Part I: Core Concepts
First, we'll look at how to *think* about code so that it's designed to be *used* and not to *work*  <br>
1. **The Art of Naming** - How to write *self-documenting* code <br>
2. **Modularity** - Building with bricks, not blobs <br>
3. **KISS** - Keep It Simple, Stupid! <br>
4. **Noiseless Documentation** - Documentation $\neq$ Comments <br>
5. **Error Handling & Validation** - Errors are *announced*, not corrected <br>
6. **The Principle of Least Astonishment** - Be obvious, consistent, and *predictable* <br>

## Example Dataset

To illustrate these core concepts, we'll reference code used to analyze a simulated calcium imaging dataset. The dataset is stored as a "tall" matrix, where each row is a unique sample. The first column is a vector of timestamps, while the remaining columns represent the fluorescence of identified neurons.

In [None]:
example_data = np.load(Path(R"C:\Users\dao25\PycharmProjects\dariks_school_to_code_good\simulated_calcium_data.npy"))
num_samples = example_data.shape[0]
num_neurons = example_data.shape[1]-1
frame_rate = np.round(1/example_data[0, 0], decimals=2)
duration_seconds = example_data[-1, 0]
print(f"Our dataset contains {num_neurons} neurons recorded over {duration_seconds} seconds ({frame_rate} Hz, {num_samples} samples)")

## 1. The Art of Naming

Imagine we want to find the neuron with most activations, and our labmate has kindly offered us some code they've written.

In [None]:
def fmp(x, y, t):
    # Finds neuron with most peaks. x is timestamps, y is data, and t is threshold.
    s = len(x)
    thr = t * np.std(y, axis=0)
    a = []
    for n in range(y.shape[1]):
        i = 0
        one = 0
        pk = thr[n]
        while i < s:
            if y[i, n] > pk:
                one += 1
                i += t
            else:
                i += 1
        a.append(one)
    return a.index(max(a))

In [None]:
answer = fmp(example_data[:, 0], example_data[:, 1:], 2)

Does it work? Yes. Is it naseauting? Yes. Sure, you can probably sit down and make sense of this function in a few minutes...
- But what happens when you have 10,000 lines of code?
- What happens when your function does something more complex?
- And, what happens when you use `fmp` in a script? Are you going to remember what that means?

#### The most important guiding principle in writing code: 
##### *"Code is read much more often than it is written"*

**DON'T**:
- Use single letter names outside of coordinate systems
  - `x`, `y`, and `z` are fine for cartesian coordinates
  - `m` is not a good name for "mouse"
  - `neuron` is better than `n` when iterating
- Use acronyms as shorthand unless they are common vernacular
  - `zs` is not a good name for a z-scored value. Use `z_score`
  - `glm` is fine for generalized linear model
- Use ambiguous booleans
    - Does the boolean `check` mean the data *was* checked or does it mean it *needs* to be checked?

**BEST**:
- Names should indicate purpose.
    - `ImagingMetadata` is a better name than `information`, `experiment`, or `Records`
- Don't lie
    - If you name a function `calc_baseline` it should **only** calculate the baseline
- Pay attention to context and make meaningful distinctions 
    - `neural_activity` is a better name than `neurons`
- Being consistent reduces cognitive load
    - Always use `num_trials` instead of `num_trials`, `n_trials`, and `ntrials`
- Follow language naming conventions to reduce cognitive load
    - **Python** is expected to adhere to PEP8
    - **Rust** is expected to follow RFC430
    - **Mathworks** just introduced a style guide in 2025
    - **R**'s style guide is Tidyverse (or Google's derivative)
    - The **C++** community can't agree on literally anything

Recall...

In [None]:
def fmp(x, y, t, distance):
    # Finds neuron with most peaks. x is timestamps, y is data, and t is threshold.
    s = len(x)
    thr = t * np.std(y, axis=0)
    a = []
    for n in range(y.shape[1]):
        i = 0
        one = 0
        pk = thr[n]
        while i < s:
            if y[i, n] > pk:
                one += 1
                i += distance
            else:
                i += 1
        a.append(one)
    return a.index(max(a))

How might our labmate have labeled their function better?

In [None]:
def find_neuron_most_peaks(fluorescence, num_std_dev, min_separation):
    num_samples, num_neurons = fluorescence.shape
    thresholds = num_std_dev * np.std(fluorescence, axis=0)
    events_per_neuron = []
    for neuron_index in range(num_neurons):
        sample_index = 0
        num_events = 0
        peak_threshold = thresholds[neuron_index]
        while sample_index < num_samples:
            if fluorescence[sample_index, neuron_index] > peak_threshold:
                num_events += 1
                sample_index += min_separation
            else:
                sample_index += 1
        events_per_neuron.append(num_events)
    return events_per_neuron.index(max(events_per_neuron))

**LOOKING FORWARD** <br>
- In the IDE section, we'll learn how to use "autocomplete" & "refactor" tools to avoid typing long names. Avoid the urge to prefer expediency over precision! <br>
- In the formatting section, we'll learn how to use a "formatter" to enforce consistent styling and formatting.

## 2. Modularity

In [None]:
def find_neuron_most_peaks(fluorescence, num_std_dev, min_separation):
    # Finds neuron with most peaks.

    # get number of samples and number of neurons
    num_samples, num_neurons = fluorescence.shape
    
    # this is the number of thresholds
    thresholds = num_std_dev * np.std(fluorescence, axis=0)
    
    # stores number of events per neuron
    events_per_neuron = []

    # for each neuron
    for neuron_index in range(num_neurons):
        # which sample
        sample_index = 0
        num_events = 0
        # this neurons threshold
        peak_threshold = thresholds[neuron_index]
        # calculate peak
        while sample_index < num_samples:
            if fluorescence[sample_index, neuron_index] > peak_threshold:
                num_events += 1
                sample_index += min_separation
            else:
                sample_index += 1
        events_per_neuron.append(num_events)
    return events_per_neuron.index(max(events_per_neuron))

# Part II: Modern Tooling
1. **Your IDE & U** - Integrated Development Environments
2. **Virtual Environments & Dependency Management** -
3. **Testing & Code-Coverage** -
4. **Linting** -
5. **Formatting** -
6. **Validation** -
7. **Navigable Documentation**