# Workshop 4: cartopy and best practices

# Part II: Best Practices

Here I dump my entire accumulated wisdom upon you, not so much hoping that you know it all by the end, but that you know of the concepts and know what to search for. I realize that many lessons will be learned the hard way.

## 1. Technical tips

### jupyter lab tips

|||
|--- |--- |
| `Option/Alt` + drag | multi-line selection |
| `Ctrl/Cmd` + `X`/`C`/`V` | cut/copy/paste lines|
| `Cmd` + `?`| comment out line |

### Linux and SSH
The science faculty has its own cluster: `gemini.science.uu.nl`. There you can perform heavier, longer computations.
You can tunnel into the cluster by typing
```
ssh (your_solis_id)@gemini.science.uu.nl
```
you are then prompted for your password.

Note that working `ssh` is not a very stable connection: `broken pipe` error. All active commands are interrupted if your connection is severed. One way around this is to use the `screen` functionality.

Many advanced text editors let you work remotely. For example, Visual Studio Code has a `Remote-SSH` extension that lets you work on the remote machine as if it were local.

Some useful commands when navigating the terminal and submitting jobs on gemini.

| command              | effect |
| ---                  | --- |
| `pwd`                | print working directory, where  are you in the file system|
| `cd`                 | change directory, without: to home directory; `..` for level up|
| `ls`                 | list content of folder |
| `cp`/`mv`            | copy/move file |
| `rm`                 | remove file (CAUTION: this is permanent) |
| `grep (word) (file)` | search for `word` in `file` |
| `touch (file)`       | create empty `file` |
| `top`                | monitor processes and system resources |
| `Up(arrow)`          | previous command
| `Ctrl + C`           | cancel |
| `Ctrl + R`           | search command history |
| `qsub (your_job.sh)` | submits `your_job.sh` bash script to queue* |
| `qstat`              | check on your jobs in the batch queue |
| `qdel (job-id)`      | deletes job with id `job-id` in queue |

\* There are 48 job slots (12 nodes with 4 cores) with 4Gb of memory per core

### Project organization
Once your projects contain more than a few results (e.g. for SOAC and your thesis), it is worthwhile organizing. A common structure has proven useful for most cases:
```
| project_name
  README            (markdown or simple text file describing your project and its structure)
  | data            (when working with big data this may be external to the project folder)
    | raw_data      (never touch these files)  
    | processed     (derived files)
  | doc             (includes `requirements.txt` with python environment description)
  | src             (all your [well-documented] code: .py, .ipynb, ... files)
  | results         (all figures, maybe in subfolders)
```
A well organized project helps you current and future self as well as anyone else looking at the results. It is thus an important step for reproducibility.

### colormaps
Be conscious about the colors and colormaps you use! Colors can hide or emphasize data, which can be used to improve your presentation. Read this [short blog post](https://jakevdp.github.io/blog/2014/10/16/how-bad-is-your-colormap/) and learn why the `jet`colormap is bad.

A main consideration for accessibility of your results must be the color blindness which afflicts quite a few people. See [ColorBrewer](https://colorbrewer2.org) for colors that work well together.

The `cmocean` package adds some [beautiful, well-designed colormaps for oceanography](https://matplotlib.org/cmocean/) to the standard matplotlib colormaps.

## 2. Fundemental programming guidelines

### Understanding Python error messages

Understanding Python errors can be daunting at first, especially if they are very long. But don;t despair, after some practive you become better at interpreting them and will find aathem helpful in pinning down the problem. The general idea is that Python shows you the steps from the line where you called the offending line all the way down to the line in the file that raised an error. Often, the most important part of the error message is located at the end.

### DRY: Don't repeat yourself
It is almost always a sign of bad programming if you have to repeat a line several times. It clutters the code and makes the code harder to maintain.

### simplify code
Instead of writing one huge function, __break your functions down into logical component functions__. This will save you many headaches when hunting for bugs.

### coding style
Python is very forgiving towards your code writing style. Just because it runs wihtout errors does not mean is well written, though.
How to write good, readable python code is laid out in the __[PEP8 Style Guide for Python Code](https://pep8.org/)__. Read it and try to adhere to it.

### reuse code
Once you have iterated to stable code (and you want to share it across jupyter notebooks), you should put it in a separate `.py` file. You can __import functions from `.py` files__ simply as `from file import function`.

### Defensive programming

Defensive programming is a programming philosophy that tries to guard against errors and minimize time spent on solving bugs. The fundamental idea is that of __unit testing__: you break the code into the small into the small steps (functions) and then test whether they give the expected (known) results for simple test cases. One would write a function with known input and output. Unit testing can be automated and this is known as "continuous integration (CI)" (integrated in GitHub,for example).

This approach works well for traditional software development with fixed goals, but it is not always suited to scientific programming as the goals shift with new knowledge.

However, the principle of defensive programming is still very valuable. A __simple and easy-to-implement version of this defensive philosophy__ can be implemented by using the `assert` statement often (this is not exactly unit testing). It checks whether a statement is true and can raise a custom error.

In [1]:
import numpy as np

def calc_circumference(radius):
    """ simple example function to calculate circle cirumference """
    return 2*np.pi*radius

def calc_circumference2(radius):
    """ simple example function to calculate circle cirumference """
    assert type(radius) in [float, int], 'radius must be a number'
    return 2*np.pi*radius

In [2]:
# this works as expected
calc_circumference(1)

6.283185307179586

In [3]:
# this does not work and Python tells us why with its own error message
calc_circumference('hello')

TypeError: can't multiply sequence by non-int of type 'float'

In [4]:
# this does not work and our message tells us why
calc_circumference2('hello')

AssertionError: radius must be a number

### Back-up

__Always back up your code and data!__ There is nothing more frustrating than having to rewrite code after you dropped your laptop or something crashed. Cloud services like _[SURFdrive](https://surfdrive.surf.nl)_ or Dropbox/OneDrive make this very easy. the advantage here is that is is __automated__ and does not rely on you remembering that you need to backup.

### Version control
Do you know this?

![](figures/phd101212s.png)

There is a better way: version control.

Version control systems start with a base version of the document and then record changes you make each step of the way. You can think of it as a recording of your progress: you can rewind to start at the base document and play back each change you made, eventually arriving at your more recent version.

![](https://osulp.github.io/git-beginner/fig/play-changes.svg)

Once you think of changes as separate from the document itself, you can then think about “playing back” different sets of changes on the base document, ultimately resulting in different versions of that document. For example, two users can make independent sets of changes on the same document - these changes can be organized into separate “branches”, or groupings of work that can be shared.

![](https://osulp.github.io/git-beginner/fig/versions.svg)

Unless there are conflicts, you can even incorporate two sets of changes into the same base document, or “merged”.

![](https://osulp.github.io/git-beginner/fig/merge.svg)

__Key points:__
- Version control is like an unlimited ‘undo’.
- Version control also allows many people to work in parallel.
- version control works well for human-readable files (e.g. .py, .txt, .tex), but not binary files (e.g. .docx, .png, ...) because it does line-by-line comparison.

`git` is one implementation of a distributed version control system. You can `Github` is a company that let's you host repositories (version controlled folders) online. Everyone can create a free GitHub account and as a student you can create a free Pro account.

The use of `git` and GitHub is requires its own tutorial, as there is a small learning up-front cost before you benefit from it. Much information in this section was talen from the [Software Carpentry tutorial](https://osulp.github.io/git-beginner/) which I recommend.

## 3. Open science, open access, reproducibility

### Open Science
From the [Open Science Wikipedia article](https://en.wikipedia.org/wiki/Open_science): 
> Open science is the movement to make scientific research (including publications, data, physical samples, and software) and its dissemination accessible to all levels of an inquiring society, amateur or professional.

The fundamentl idea is that should be able to see all your steps for arriving at certain conclusions. In our context this specifically refers to making the (documented!) code available

At Utrecht University, there is a society dedicated to Open Science: [Open Science Community Utrecht](https://openscience-utrecht.com/). 

### Open access
The traditional publishing business model has been to charge readers for access to articles. This is less than ideal, as the public pays for the research and the results are behind a paywall. This hinders knowledge transfer and there is a growing movement to open access (i.e. make it free) to scientific knowledge.

To publish open access usually costs money, as the publishers cannot earn money with selling the articles/journals. __The Dutch Universities have agreements with all major publishers to cover open access fees.__ Use this!

### Licenses
If you want to reuse any online content you must check for the license. If there is no license, you are legally not allowed to use it. This is why it's important that you include a license with your code if you want others to reuse it.

### reproducible code
When you publish your results (as a thesis or paper) you must __ensure that your code can reproduce all the results__. In jupyter notebooks you should check whether it runs completely from start to finish __without error__. The code must be documented. Ultimately, the clean version of __your code (and if possible raw data) should be uploaded to a permament repository__ such as [UU's own Yoda system](https://www.uu.nl/en/research/yoda), [Zenodo](https://zenodo.org/), [figshare](https://figshare.com/). It can then receivce a __digital object identifier number (DOI)__ and should be cited in your paper and __can be cited by others__.


### virtual environments
Virtual environments are custom python environments, with specific packages installed. So far you have likely used the `root` environment of your conda installation. This is fine for your course work. With conda you can easily create new environments in the GUI or the command line as such:
```
conda create -n my_new_env python
```
where `my_new_env` is the name of the environment. In the command line you would activate this environment with `conda activate my_new_env`. You will then see that name in parantheses in front of your prompt.

In general, it is advised to create a new environment for every major project (like a thesis or a particular paper). This ensures that you know which packages + versions you used to do your calculations. You can then __export a list of all the packages used__ at the end of your project and save it with the rest of the code. Only this __ensures reproducibility__.

You can view a list of your environments directly in the Anaconda GUI or byt typing `conda info --envs`