# Writing new code

## Scenarios

Calling this lecture "Writing new code" is actually a bit of a misnomer. In scientific research, rarely do we truly start from scratch. Even when you have a blank slate, you inevitably build on existing code&mdash;importing libraries, frameworks, or packages that someone else has written, tested, and maintained. This is not only normal, it’s essential: without leveraging existing software, every project would reinvent the wheel, creating bloated, hard-to-maintain codebases. By trusting these underlying tools, you can **focus on the specific scientific problem or workflow that you care about**.

For this lecture, let’s assume you have the opportunity to start something relatively fresh&mdash;whether as part of a new research project, a learning exercise, or a small prototype. That said, the principles discussed in this section are not limited to genuinely new code. **Many of the ideas apply just as well when extending, restructuring, or gradually improving existing or legacy codebases**. In practice, applying these strategies effectively usually requires that you are either the developer of the code or very familiar with its structure and behaviour, as high-level organisation and strategic decisions depend on understanding the system’s existing design.

In practice, when “writing new code” in a research context, you are likely to encounter one or more of the following scenarios:

- **Pipeline / workflow development**: You want to build a workflow that orchestrates the functionality of existing libraries. This is arguably the most common reason academics start coding: combining and automating tools to achieve a research goal. The main challenge is structuring code clearly, ensuring reproducibility, and avoiding fragile scripts.

- **Interface / glue code**: You are working at the intersection of two or more independent codebases, essentially creating the glue that allows them to interact seamlessly. This often arises in multi-physics simulations, data analysis pipelines, or integrating legacy tools with modern frameworks. The focus is on robustness, explicitly defining inputs, outputs, and assumptions between components, and thorough error handling

- **High-performance numerical or scientific code**: Sometimes (albeit increasingly rarely), existing libraries are insufficient, and you need to implement new algorithms or simulations. Examples include solving PDEs with custom solvers, molecular dynamics or Monte Carlo simulations, or developing specialized numerical methods for your domain. Here, the priorities are efficiency, numerical accuracy, maintainability, and sometimes parallel performance.

- **Tool / library development for reuse**: You may be writing a utility, library, or domain-specific toolkit intended for reuse by others&mdash;whether your future self, colleagues, or the wider community. Examples include new solvers, data processing pipelines, or general-purpose scientific tools. The focus shifts to modularity, well-defined ways of interacting with the code, testing, and documentation.

- **Exploratory or prototyping code**:
Often, you write code simply to explore ideas, test methods, or evaluate potential approaches. This code is usually messy initially but may later evolve into a pipeline, library, or production-quality implementation. Here, the priority is fast iteration and clarity for yourself, with refactoring later if it proves useful.

In all these scenarios, the goal is not to reinvent existing functionality, but to structure your code clearly, efficiently, and reliably&mdash;so that it integrates well with existing tools, can be understood by others, and remains maintainable as your research progresses. Understanding these different contexts helps you choose the right trade-offs between speed, clarity, and long-term maintainability when writing new code in scientific projects.

## From science to code

In this section, I will explore the thought process of mapping a research problem to code&mdash;a skill that is distinct from programming itself. **Translating your research problem into something that can be codified is a discipline in its own right**, and as a domain expert, you are often better equipped to do this than to focus solely on writing code. Ultimately, you may still need to seek input or guidance from professional software developers, but the initial translation from science to code is where your expertise shines.

The reason I began this course with the "Object-oriented programming" lecture, immediately followed by "Interfaces", is intentional. These concepts, combined with the more familiar functional programming, gives you the foundational "Lego bricks" to construct code. With these building blocks in place, we can now begin mapping a research or science question to code components. Importantly, this initially does not require writing code; it is about conceptualising how you will structure and implement your solution.

1. **Tangible objects**: For example, in planetary science it is common to work with properties of a planet, such as its radius and mass. A planet is a tangible physical object that can be directly mapped to a *Planet* class in your code. Starting with tangible objects makes the structure of your code intuitive and easier to reason about. This approach prioritises thinking from the **end-user perspective**: the API, which fellow physicists interact with, revolves around creating and manipulating physically intuitive objects.

2. **Abstract objects**: In addition to tangible objects, you can define abstractions that are convenient to codify but do not correspond to physical entities. For instance, if you are simulating a chemical reaction network, you might create a *Reaction* class which encapsulates all the data and behaviours needed to model the system computationally. These abstractions make the code modular and reusable, even if they do not represent something “real” in the physical world.

3. **Functions that operate on objects**: It is common to pass objects (or object-like structures, if you are working with a language that does not directly support OOP) as function arguments (see the “Functional programming” lecture). When designing your code, consider which operations involve which objects, and whether it makes sense to implement the operation as a class method or as a standalone function.

4. **Interaction or dependency diagram**: Once you have identified your key objects (both tangible and abstract) and the functions that operate on them, it is helpful to diagram how these components interact. **A well-designed structure minimises unnecessary dependencies and “cross-talk” between components**, making the code more robust and less brittle. The goal is to create a clear map of interactions that helps you reason about the system and guides future refactoring or expansion.

At this point, ask the question: *Will this code outline facilitate what I want to accomplish?*.  If the answer is yes (likely), you can begin constructing a minimum working example as described in Phase 1 of the "Guiding principles" section. If not, iterate on the overall structure and reasoning behind the code&mdash;and its relationship to the physical problem&mdash;until you are confident it can (likely) achieve the desired aims.

## A common design tension

When writing new code, software developers can fall into the trap of "You Aren't Gonna Need It" (YAGNI)&mdash;over-engineering solutions in anticipation of future requirements that may never materialise. In contrast, academic researchers often err in the opposite direction, giving **too little thought to what will be needed in the near term**&mdash; for example, for the next publication or even during the review stage of the current one. This creates a real tension between two extremes:

- A quick, bespoke solution that satisfies only the immediate requirements.
- An over-investment in generalisation, trying to predict the future evolution of the code.

Why this matters: In academia, researchers often fall into one of two traps:

1. “I shouldn’t refactor because I need to get results.”
1. “I shouldn’t add features because I’m still cleaning the code.”

Both slow progress and increase technical debt. The phased approach below is designed to help you balance making the code work, making it easy to change, and making it stable, so you can develop features safely while keeping the code maintainable. The key is to find a balance: write code that **meets current needs clearly and efficiently**, while remaining flexible enough to accommodate the likely next steps in your research&mdash;bearing in mind that those next steps may involve new team members who were not part of the earlier stages of development.

## Guiding principles

With this tension in mind, the following guiding principles help you write code that is neither prematurely over-engineered nor short-sightedly fragile. Before writing substantial code, invest time in understanding *exactly what you want to accomplish*. This step is not always easy, but it is precisely where humans have a key advantage over AI: we can reason about context, nuance, and trade-offs in ways that AI tools cannot. Clarifying the goal, constraints, and desired behaviour prevents wasted effort and unnecessary complexity.

### Phase 1

With this in mind, there are several principles you should consider from the very beginning&mdash;principles that are easy to apply early and pay off quickly as the project grows:

1. **Minimum working example (MWE)**: There is real value in building a minimal working example that demonstrates what you want to achieve. First, convince yourself that the overall structure of the task is feasible&mdash;and scientifically useful&mdash;before investing significant time refining individual components. This aligns naturally with the *KISS* principle: *Keep it simple, stupid!*

1. **Anchor your MWE**: Once you have a MWE, anchor it with one or more tests. This is important not only for your own continued development, but also for future users of the code, who need confidence that its behaviour can be **demonstrably validated**. Even simple tests provide a crucial safety net.

1. **Start refactoring early**: With the MWE in place, begin refactoring. Apply the techniques from the "Refactoring" lecture, paying particular attention to the *DRY* principle (*Do Not Repeat Yourself*). Early refactoring prevents shortcuts from solidifying into long-term technical debt.

The objective of Phase 1 is a clean, minimal codebase that performs a useful task, whose behaviour can be verified through tests or examples, and which is structured clearly enough to support future development.

### Phase 2

Phase 1 puts you in a strong position to move forward in several possible directions&mdash;often in parallel with various team members. Depending on your research goals, you might:

1. **Use the MWE directly for scientific work**: In some cases, the minimal working example is already sufficient for a concrete research application and can be used as-is.

1. **Extend the code to support new features or objectives**: As research questions evolve, you will often need to add functionality or adapt the code. The key principle here is to make hard changes easy before making them. If a modification feels difficult, that is usually a signal that the surrounding code is not structured well for change.

    Instead of forcing new behaviour into an awkward design, first perform a *pure refactor*: improve structure, clarify responsibilities, and simplify interfaces without changing behaviour. Once the code is clean and flexible, the change you originally wanted to make will often become straightforward.

    This two-step process&mdash;refactor first, change second&mdash;reduces the risk of introducing bugs, limits technical debt, and keeps your codebase healthy as it evolves over time.

The objective of Phase 2 is a codebase that can evolve alongside the research: one in which new scientific features are actively developed, and where refactoring is used deliberately to make those changes safe and straightforward rather than risky and ad hoc.

### Phase 3

Once the code has matured beyond exploratory use and is actively supporting research, the focus shifts from change to stability and sustainability. At this stage, the goal is to ensure that the code can be reliably used, extended, or handed over to others without excessive effort.

Typical activities in this phase include:

1. **Strengthen tests and validation**: Expand test coverage to include edge cases, failure modes, and representative scientific use cases. Tests now serve not only as a safety net for development, but as executable documentation of intended behaviour.

1. **Clarify interfaces and assumptions**: Make inputs, outputs, and expectations explicit&mdash;through function signatures, docstrings, or lightweight documentation. This is especially important when new team members or external users are involved.

1. **Improve reproducibility and usability**: This may include pinning dependencies, adding example scripts or notebooks, improving error messages, or standardising configuration. Small improvements here can dramatically reduce onboarding time.

1. **Prepare for handover or publication**: Whether the code will be shared with collaborators, released alongside a paper, or passed on to a new student, Phase 3 ensures that knowledge is encoded in the codebase itself rather than residing only in the original author’s head.

The objective of Phase 3 is a codebase that is reliable, understandable, and resilient to personnel changes&mdash;a common reality in academic research.

### At a glance

The three phases serve distinct purposes:

- **Phase 1: Make it work**: establish correctness, testing, and basic structure  
- **Phase 2: Make it easy to change**: actively develop new features and adapt the code, using refactoring to make changes safe and straightforward
- **Phase 3: Make it stable and shareable**: prepare the code for reuse, handover, and publication

## API and library layer

We have already discussed the concepts of modularisation and encapsulation. Broadly speaking, these principles manifest in two main components of a typical codebase:

> *Definition*: An **Application Programming Interface (API)** is the entry point to your application. It is the part users interact with to create objects, call functions, and receive or process output.

> *Definition*: The **library layer** is the back-end of your application, containing the foundational code that makes everything run. Users typically do *not* interact with this layer directly&mdash;it represents the "hidden" details of how your application works.

Hence, as soon as you begin early refactoring in Phase 1, you can start (re)organising your code into these two parts: the **API**, which exposes functionality to the user, and the **library**, which remains hidden and supports the core logic. This kind of organisation goes a long way towards helping any professional software developer you involve to quickly understand your code and start working their magic.