# 2. Good coding practices

```{admonition} Additional resources
:class: warning
A collection of links that appear in this lecture:
- [book on git](https://git-scm.com/book/en/v2/Getting-Started-The-Command-Line)
- [GitHub documentation](https://docs.github.com/en)
```

## 2.1 Version control (with GitHub)

Version control is the practice of tracking and organizing (source) code in software engineering.
Git is a version control system which automates this process, and GitHub is one of the hosting platforms that uses Git. It allows you to roll back to previous versions of your code, make experimental changes on new branches, merge fixes, and work collaboratively on the same codebase.

The only place where you can run all Git commands is via its command line interface, accessed via the `git` package. All commands start with `git`, and you can generate a list of them with descriptions using the `man` (for "manual") bash command in your terminal,

```bash
man git
```

This [book on git](https://git-scm.com/book/en/v2/Getting-Started-The-Command-Line) serves as a complete reference for the system and is an excellent source of information with many examples.

```{admonition} Exercise
Create an account on GitHub. 
It is strongly recommended that you use the command line interface, so if you're using Windows, please make sure you have access to a unix-based terminal, via e.g. the [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/about), or [Cygwin](https://www.cygwin.com), or the Windows Terminal, or Git Bash, etc.
Then share your username with the instructor so you can be added to our GitHub classroom.
```

### 2.1.1 Repositories, cloning and forking

On GitHub, your code is organized in structures called _repositories_, which you can think of as separate projects. Within a repository, your code may be organized in a directory structure. It is stored "in the cloud", meaning on some server somewhere else.

You can have several copies of the code that lives on GitHub, e.g. on your personal laptop, on your workstation at the university, etc. We'll call these copies _local_ copies. To get a local copy, you can _fork_ or _clone_ a repository. 

**Forking** creates a copy of the repository in that moment and associates the copy with your account. You're then free to develop it however you like, it is your individual copy with no others having access to it. You would typically fork a repository if you 
- have no write access to it, e.g. it's a large project like SciPy, but you want to contribute, or
- you want complete control over your copy of this repository, without other developers having access to your copy.

When **cloning** a repository, there is no copy created in your account, just your local machine. The remote (the repository you cloned) remains visible to and editable by everyone who had access to it. You'd do this for
- compiling software from source,
- working on projects owned by yourself,
- working collaboratively on a project with multiple active developers.

```{admonition} Exercise
If you haven't contributed to a repository on GitHub (a "remote") yet, you'll need to set up ssh (secure shell protocoll) to communicate with your remote repository. Follow the instructions [here](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent) to generate a new private-public key pair and the instructions [here](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account) to add it to your GitHub account. You'll need to do this on every new machine you try to access your account from. This method of authentication replaced the username-password method a while ago.
```
To clone a remote from the command line, run

```bash
git clone git@github.com:<username>/<repository name>
git clone git@github.com:cu-comptools/gh-tutorial
```

The command line interface has no "fork" command. To create a fork, you have to navigate to the repository's page on GitHub, find the fork button, then clone your fork via

```bash
git clone git@github.com:<YOUR username>/<repository name>
```

You can also create a new repository from scratch on your local machine via

```bash
mkdir <repository name>
cd <repository name> 
git init
```

These three commands created a directory with your chosen repo name, changed diretories into that directory, and then initialized a git repository. Alternatively, the GitHub web interface walks you through creating a repository which you can then clone to your local machine.


### 2.1.2 Making progress: add, commit, push

Once a local copy is created, GitHub tracks changes to the files within a repository via _commits_. These are incremental changes to source code. 

All files within your repository start off being untracket by git. You can add them to the version control system like this

```bash
git add <files> 
```

and then commit your changes:

```bash
git commit -m "<Useful commit message>"
```

Note that if you leave the `-m`, git will automatically open a default text editor for you to add a message.

```{admonition} 
:class: warning
Avoid storing binary files on GitHub, as even incremental changes to these result in large commits and you'll run out of your storage/bandwidth quota.
```

It is good practice to add an **informative** commit message so that future you can understand what changes were made in a commit. 

### 2.1.3 Branching and merging

### 2.1.4 Pull requests

## 2.2 Documentation

Documentation is essential for people (including yourself) to be able to use and reproduce your results. 
There are roughly 3 levels of it:

1. In-line comments
2. Examples
3. API- (application programmer's interface) level documentation

In-line comments may be enough for yourself to understand code that you wrote a few months back, but is insufficient in general. A much better practice is to write a _docstring_, a short description for each function, which has the following general structure (regardless of programming language, though shown in Python):

- A short description of the function,
- Input parameters with types and a brief description (e.g. any restrictions, default values, physical/mathematical meaning),
- Output values, with types and brief descriptions as above

In [1]:
def square(a):
    """
    Computes the square of a number via a*a.
    
    Parameters
    ----------
    a: complex<float>
        Number to square. Note that for complex numbers, it doesn't conjugate.
    
    Returns
    -------
    b: complex<float>
        a*a.
    """
    b = a*a
    return b

If you write docstrings in the above format (which is called the NumPy format), or a different standardized format such as the Google format, it can be converted to API documentation automatically. For example, [this C++/Python software package](https://oscode.readthedocs.io/en/latest/) has API documentation that was auto-generated from docstrings. Docstrings may contain math formulae, or more complicated examples, see e.g the SciPy documentation.

## 2.3 Testing

Debugging code becomes far quicker when it is written in a modular way (i.e. is broken down into functions that only do one specific task), and and the modules have separate tests. This allows you to narrow down _where_ the bug is coming from. If you have version control, you can also track _when_ a bug was introduced. Roughly speaking, there are two types of tests,

- unit tests (test the functionality of a small function that only does one thing),
- integration tests (these test whether functions work together as expected),

both of which are necessary to ensure that your code is working correctly.

```{admonition} Exercises
:class: danger
For exercises on writing docstrings and unit tests, please see the [Git and GitHub tutorial repository](https://github.com/cu-comptools/git-tutorial-2025).
```