# Schedule data imports

This guide explains how to set up automated, scheduled imports of climate data into DHIS2. We demonstrate how to do this using the [Import ERA5-Land Daily](https://climate-tools.dhis2.org/workflows/import-era5/import-era5-daily/) workflow, showing how to move from interactive notebook exploration to production-ready scheduled imports. But the same approach can be used to setup a workflow for any other workflow or script. 

For running the notebook we are going to use [papermill](https://papermill.readthedocs.io/). 

## Prerequisites

Before starting, ensure you have:

- Completed the [CDS API Authentication](../getting-data/climate-data-store/api-authentication.md) setup
- A DHIS2 instance with a configured data element for daily ERA5-Land temperature (see [Prepare Metadata](./prepare-metadata.ipynb))
- An installation of [Docker Desktop](https://www.docker.com/)
    - Also make sure Docker Desktop is running at the time of running this notebook. 
- A basic familiarity with [the workflow for importing ERA5-Land data](../../workflows/import-era5/import-era5-daily.ipynb) which we are going to be automating. 

## 1) Gather the needed files

For this tutorial, we are going to be using the provided [example](./example/) folder. Most of the files are already provided in the folder, and will be explained in more detail later. 

```bash
    workflows/
    └── scheduling/
        └── example
            ├── Dockerfile
            ├── docker-compose.yml
            ├── cronfile
            ├── params.yaml
            ├── requirements.txt  (copied from root folder later)
            └── import-era5-daily.ipynb  (copied from workflows folder later)
```

Since we are also going to be needing the environment dependencies and ERA5-Land import notebook defined elsewhere in the toolkit, let's copy them over to our `example` folder:

In [None]:
import os
import shutil
shutil.copy('../../import-era5/import-era5-daily.ipynb', './example/import-era5-daily.ipynb')
shutil.copy('../../../../requirements.txt', './example/requirements.txt')
os.listdir('./example')

['cronfile',
 'docker-compose.yaml',
 'Dockerfile',
 'import-era5-daily.ipynb',
 'params.yaml',
 'requirements.txt']

## 2) Make notebook configurable

The [Import ERA5-Land Daily](https://climate-tools.dhis2.org/workflows/import-era5/import-era5-daily/) notebook hardcodes all input parameters, including sensitive settings like DHIS2 instance, username, and password. For automation, credentials and settings should be externalized rather than hardcoded in scripts.

### Tag the parameters cell

Since we are using [papermill](https://papermill.readthedocs.io/) to run the notebook, we first need to tell papermill where the parameters are defined. [As described here](https://papermill.readthedocs.io/en/latest/usage-parameterize.html#designate-parameters-for-a-cell), this is done by adding a `parameters` tag to the notebook cell containing the parameters. 

### Create the parameters yaml file

Papermill can [read parameters from a yaml file](https://papermill.readthedocs.io/en/latest/usage-execute.html#using-a-parameters-file). This yaml file is used to define parameters that will override those in the notebook. Variable names should match those used in the notebook (from the cell tagged in the previous step). 

For this tutorial we have already created this [params.yaml](./example/params.yaml) file. We set the parameters so that the notebook imports temperature data instead of the default precipitation, and only for a single month for demonstration purposes. 

You can modify the parameters used by changing the contents of [params.yaml](./example/params.yaml)

This approach allows:

- **Security** - credentials stay out of version control
- **Flexibility** - different settings for different import schedules
- **Docker compatibility** - containers can load the parameters using the `--parameters_file` flag


## 3) Test the configured import notebook (NOT SURE IF THIS SHOULD BE DONE...)

To test that the notebook can be run using the custom configuration, run this in your terminal:

```bash
    papermill example/import-era5-daily.ipynb ../../data/local/import-era5-daily-temperature-output.ipynb -f example/params.yaml --kernel climate-tools
```

Papermill will then copy the original notebook to the provided path (`../data/local/import-era5-daily-temperature-output.ipynb`), inject the parameters from `params.yaml`, and run the notebook. This means that after completion, you can open the notebook to see the results of the run and also what errors may have occured. 

Note that you won't see any output until the notebook hsa completed. 

## 4) Schedule with Docker and cron

For production use, we run imports automatically on a schedule using Docker and Cron schedules.

For this step let's step into the `example` folder. Write this in your terminal:

```bash
    cd example
```

### Define the Docker image

Defining a Docker image is needed to define the virtual operating system with needed tools such as `cron` to run schedules, the code and script files to import the data, and install the packages and environment needed to run them. We include an example [Dockerfile](./Dockerfile) that has what we will use for this tutorial. Its contents look like this:

```bash
    UPDATE LATER...
```

Note that to avoid hardcoding the scripts and input parameters into the Docker image, we only copy over the requirements.txt file needed to build the environment. The other files will be linked in a later stage. 

The image will be built in a later step. 

### Define the cronfile

To define the scheduled imports, we create a [cronfile](./example/cronfile), looking something like this:

```bash
    UPDATE LATER...
```

What the above `cronfile` does:

1. Creates a crontab entry for running `import-era5-daily.ipynb` using the `params.yaml` parameters.
2. Forwards output to Docker logs for monitoring.
3. Runs continuously, executing the import on schedule.

This will be running inside the Docker container and use the container file paths, as defined in the next step. 

You can also add multiple schedules to the same `cronfile`, so that one schedule runs the notebook with the temperature parameters file, and another schedule with a precipitation parameters file, and so on. Just remember to use filenames that differentiate the parameter and output files for each of the import schedules. 

Cron expression examples:

| Expression | Description |
|------------|-------------|
| `0 6 * * *` | Daily at 6:00 AM |
| `0 1 * * *` | Daily at 1:00 AM |
| `0 0 * * 0` | Weekly on Sunday at midnight |
| `0 0 1 * *` | Monthly on the 1st |

Use [crontab.guru](https://crontab.guru/) to build expressions.

**Important Gotcha on Windows**: If you are working with the `cronfile` on Windows, make sure you are saving it with LF line endings rather than the Windows CRLF, or Cron will complain. 

### Create a docker-compose file

Create a docker compose file which will take the Docker image that we built, and run the `crontab` command on the [cronfile](./example/cronfile). We include an example [docker-compose.yaml](./example/docker-compose.yaml) file that can be used for this tutorial. It should look like this:

```bash
    UPDATE LATER...
```

Things to note:

- The line `.:/app` is important and means that the `/app` folder Docker container will automatically be synced with the contents of the current folder (`.`). This way, any changes to the notebook, the parameters, and the cronfile scheduled runs, will not require you to `docker build` the Docker image again. 
- The line `~/.cdsapirc:/root/.cdsapirc:ro` makes your local CDS API key accessible in the Docker container `root` user folder. If you later change the Docker user then you have to update this to the correct user folder. This line is only needed for this particular workflow, because we are accessing data from the Climate Data Store (CDS). Other workflows may require other forms of authentication. 

### Build and run the scheduler with docker compose

Starting the docker compose file will build the image (only the first time) and start the cron scheduler:

```bash
    docker compose up --detach --build
```

To check that the docker container started successfully and is running:

```bash
    docker ps
```

To listen in on the docker and cron logs:

```bash
    docker logs -f climate-scheduler
```

Now the ERA5-Land imports should repeat at regular intervals as specified in the `cronfile`, for as long as the docker container `climate-scheduler` is running. 

### Making changes to the notebook, parameters, or schedules

If you make any changes to any of the files, you simply have to restart the docker container in order to restart `cron` and for the changes to take effect:

```bash
    docker compose down
    docker compose up --detach --build
```