# Refactoring

While writing this course, I often wondered whether to teach concepts from the perspective of a fresh start&mdash;assuming you have a blank slate to write new software&mdash;or from the perspective of inheriting (or being originally guilty of creating) a messy codebase that you're now reluctantly sifting through, trying to make sense of it and improve what you can in the limited time you have ... before some other unfortunate soul takes over.

For this Lecture, I'll mostly assume the latter. You have a codebase at hand that you're already using the scientific work. Mostly commonly, there simply isn't the time or resources to invest in a full clean-up with a dedicated specialist. It's down to you&mdash;and you know it. You're now searching for a strategy, a guiding light, to lead you through the process. Rest assured: this Lecture is here to provide exactly that, and we will draw on everything you've learnt in the previous Lectures to make it a practical reality. And while future Lectures&mdash;especially those on logging and testing&mdash;will reinforce and extend these ideas, you shouldn't have to wait any longer before we finally address the elephant in the room of legacy code.  **Now it’s time to step up, face the code, and tame the chaos.**

Finally, I apologise in advance for not including many code snippets in this Lecture. But I certainly don't want to post bad examples of code just to demonstrate how to fix them&mdash;lest they accidentally be copied, reused, or even taken as tacit approval of poor practices. The goal here is to guide you towards better habits, not to seed your projects with anti-patterns disguised as teaching material.

## Anti-patterns

Before we dive deeper into the process of refactoring, it’s worth pausing to understand how messy legacy code often ends up the way it is. Rarely is a codebase "bad" because its authors didn’t know any better; far more often, it’s the natural consequence of **time pressure, shifting project goals, incremental research hacks, or design decisions that made sense once but aged poorly.** In other words, many of the problems you face today originate from well-intentioned shortcuts that later solidified into anti-patterns.

**An anti-pattern is a common but counterproductive solution to a recurring problem.** It’s something that looks like it solves an issue&mdash;often because it’s familiar, easy, or seems intuitive&mdash;but in reality it creates more problems than it solves.

A few key traits of an anti-pattern:

- It’s a pattern you see often, which is why people keep repeating it.
- It feels like a solution, especially to newcomers or under time pressure.
- It leads to negative consequences: technical debt, bugs, complexity, or maintainability issues.
- There’s a better, well-understood alternative, but that alternative may require more upfront thought or structure.

Classic examples across software include overusing global state, copy-pasting blocks of code instead of refactoring, or adding layers of abstraction to "fix" unclear logic.

In short, **a pattern becomes an anti-pattern when its apparent convenience hides long-term harm**, and recognising them is the first step toward untangling the legacy code you’ve inherited.

## Technical Debt

A central concept here&mdash;often misunderstood or completely ignored in academic environments&mdash;is technical debt. Unlike financial debt, which appears explicitly on a balance sheet, technical debt silently accumulates in the background of a research project. Every quick fix, undocumented assumption, or "temporary" workaround that becomes permanent incurs debt that must eventually be "paid back" through time spent understanding, repairing, or rewriting the affected components.

In academia, this debt is especially pernicious because no one explicitly budgets for it. Grant proposals don’t include a line item called "refactoring", PIs rarely allocate time for structural improvements, and papers certainly don’t give credit for reducing code complexity. Yet the debt still exacts its toll&mdash;just not in a way people track.

One of the most overlooked costs is onboarding time. Each new PhD student or postdoc inherits not only a scientific question, but the accumulated quirks, workarounds, and architectural compromises of whoever came before them. If the code is messy, poorly structured, or undocumented, their first months&mdash;sometimes years&mdash;are spent deciphering it rather than building on it. This is technical debt being paid in the form of human time and stalled scientific progress.

You can think of onboarding lag as interest on that debt: the messier the system, the higher the interest rate.

In some groups, onboarding becomes an informal rite of passage ("everyone struggles at the beginning"), but this normalisation shouldn’t distract from the underlying cause: structural neglect accumulates, and someone always pays eventually. Refactoring isn’t busywork; it is one of the few deliberate actions that reduces long-term debt and improves the scientific productivity of everyone who touches the codebase in the future.

## How to Refactor

**Refactoring legacy code is, at its core, the process of incrementally transforming a working but messy codebase into one that is clearer, safer, and easier to maintain&mdash;without changing what it does.** When anti-patterns have accumulated over years of quick fixes or shifting research goals, refactoring becomes the means of carefully untangling them.

A practical refactoring strategy usually involves the following steps:

1. **Start with a Safety Net**

    Before changing anything, ensure you have:

    - Tests, even minimal ones, that verify current behaviour.
    - Version control, so you can track every change.
    - Small, reversible steps, rather than grand reorganisations.

    You cannot confidently refactor without knowing whether you’ve broken something.

2. **Identify Hotspots**

    Not every part of the code needs attention. Focus on:

    - The modules you touch most often.
    - Places where bugs frequently occur.
    - Functions or classes that are hard to understand or modify.
    - Areas where anti-patterns are most entrenched (e.g., copy-paste blocks, sprawling functions, global state).

    Refactor where it matters, not everywhere.

3. **Make the Smallest Possible Improvements First**

    Legacy code is delicate. Instead of rewriting it wholesale, apply incremental improvements:

    - Break large functions into smaller ones.
    - Replace duplicated logic with shared helpers.
    - Remove unused parameters or dead code.
    - Encapsulate global state behind clean interfaces.

    Small steps compound into large improvements without risking destabilisation.

4. **Replace Anti-patterns with Better Patterns**

    Once you recognise an anti-pattern, deliberately rewrite it using a healthier alternative. Examples:

    - Replace copy-paste repetition with DRY functions.
    - Replace scattered configuration with dependency injection or structured config objects.
    - Replace deeply nested conditionals with clearer control flow or dispatch mechanisms.
    - Replace ad-hoc data containers with well-defined classes or dataclasses.

    Each substitution improves clarity and reduces future errors.

5. **Improve Naming and Structure**

    Much legacy confusion comes not from logic, but from poor communication:

    - Rename variables and functions to reflect what they actually do now, not what they once did.
    - Group related code into modules or classes.
    - Untangle mixed responsibilities so each function does one thing.

    Often the easiest wins come from renaming and reorganising, not rewriting.

6. **Add Tests as You Go**

    Refactoring is the ideal moment to:

    - Add unit tests for newly clarified functions,
    - Write regression tests for previously buggy behaviours,
    - Capture edge cases that you finally understand.

    Each new test expands your safety net for the next round of improvements.

7. **Iterate Continuously**

    Refactoring is not a single pass&mdash;it’s a habit:

    - Every time you touch code, leave it slightly better than you found it.
    - Treat refactoring as part of normal development, not a separate "cleanup phase".
    - Avoid grand rewrite fantasies unless absolutely necessary.
    - Small, continual refactoring prevents anti-patterns from re-accumulating.

**The Goal**

Refactoring legacy code is not about achieving perfection. It’s about making the codebase gradually more coherent, removing the anti-patterns that cause pain, and ensuring that the next person&mdash;possibly future you&mdash;can navigate and extend it without fear.

By understanding anti-patterns and applying a disciplined refactoring process, you turn a fragile legacy system into a codebase that actively supports your scientific work rather than hindering it.

## Refactoring vs. Code Creep

In industry, developers are explicitly paid to improve software quality, maintain systems, and invest in long-term maintainability. In academia, the situation is starkly different: **your time is limited, your incentives are tied to scientific output, and no one pays you "just to refactor."** Every hour spent restructuring code is an hour not spent analysing results, writing papers, or moving the science forward.

Because of this, it becomes absolutely essential to **draw a clear boundary** between genuine refactoring and the far more dangerous path of code creep.

Academic research code is often written under intense time pressure:

- short project timelines
- grant deadlines
- student turnover
- rapid prototyping for new ideas

This environment encourages quick fixes and experimental hacks. When you decide to refactor, you must therefore strike a careful balance:

- Refactor too cautiously and the code remains a mess.
- Refactor too ambitiously and you break everything or sink weeks into restructuring with no paper to show for it.

The goal is not perfection; the goal is **improving clarity and stability without derailing the research timeline.**

### A Practical Boundary for Researchers

Here’s the academic reality check:

You are refactoring if:

- The behaviour of the code does not change.
- You can state exactly what structural improvement you’re making.
- The work has a clear stopping point.
- The task reduces the future burden on you or your group.

You are doing code creep if:

- You start "just cleaning up" and suddenly find yourself redesigning algorithms.
- You add new functionality while reorganising existing code.
- The scope expands faster than your available research time.
- You cannot articulate when the task will be "done."

In other words:

- Refactoring is controlled, deliberate, and time-boxed.
- Code creep is open-ended, exploratory, and dangerous.

### Why Time-Boxing Is Essential

In academia, you must protect both your research momentum and your mental energy. A simple rule: set a fixed amount of time for refactoring (e.g., 1–2 hours, a single afternoon), and stop when the time is up. If the task can’t be finished within the time slot, either:

- shrink the scope, or
- break the work into smaller, separately justified refactoring steps.

This avoids the trap of "infinite cleanup," where your good intentions consume days or weeks that were never budgeted for in the first place.

### The Academic Payoff

When refactoring is bounded and intentional, it:

- lowers onboarding time for future students and collaborators
- reduces your own cognitive load when returning to the code later
- cuts down on debugging time
- increases reproducibility and confidence in results
- makes future extensions far less painful

You may not be paid to refactor, but you absolutely pay the price *for not* refactoring.