# M2.4 - Processing Long Climate Data Records Concurrently

*Part of:* [**Computational Climate Science**](https://github.com/OpenClimateScience/M2-Computational-Climate-Science) | **Previous Lesson** | **Next Lesson**

**Contents:**

- [Resource limitations in computing](#Resource-limitations-in-computing)
  - [CPU-bound problems](#CPU-bound-problems)
- [Concurrent processing for large climate datasets](#Concurrent-processing-for-large-climate-datasets)
- [Computing PET using Hargreaves equation](#Computing-PET-using-Hargreaves-equation)
  - [Computing top-of-atmosphere (TOA) radiation](#Computing-top-of-atmosphere-(TOA)-radiation)
  - [Well-documented functions](#Well-documented-functions)

## Overview

In the previous lesson, we discussed how a simple bucket model can be used to quantify the difference between water supply (precipitation) and water loss (potential evapotranspiration or PET). The ratio of these two quantities is also useful as an index of how much of the water loss is replenished by precipitation:
$$
\text{Percentage replenished} \approx 100\times \frac{\text{Precipitation}}{\text{PET}}
$$

**The method for calculating PET that we will use is [the Hargreaves method](https://www.fao.org/4/X0490E/x0490e07.htm#minimum%20data%20requirements) (Allen et al. 2000), because it only requires temperature data.** We'll use temperature data from MERRA-2 to calculate PET. Then, we'll use precipitation data from CHIRPS, again, to derive our hydrologic drought index.

**While there are many sources of PET data, we're going to calculate PET on our own so that we can get more experience working with large climate datasets.** Along the way, we'll learn how large climate datasets can be processed **concurrently,** which can help to address two common problems:

1. The entire dataset is too large to load into memory all at once;
2. Data processing can be time-consuming, either because the dataset is so large or because the computations are complex.

---

## Resource limitations in computing

Generally, the bigger the dataset, the more computational resources are required to analyze it. But exactly what resources are needed depends on both the data and the kind of analysis we want to perform.

**In computing problems, there are three major kinds of resource limitations or *bottlenecks,* i.e., limiting factors to running a computer algorithm:**

1. **Read and write speed from a file system**
2. **Computer memory**
3. **Central processing unit (CPU) clock speed (e.g., 3 GHz)**
   
A bottleneck of **Type 1** occurs when we have either very large datasets or slow file-system read-write speeds. The speed of reading and writing from a file system (or hard disk) depends on the medium; solid-state drives are generally faster than spinning disk hard drives. If the drive is a network attached storage (NAS) device instead of the hard-drive on your computer, then the speed of the network connection is also part of Type 1 bottlenecks. **Problems that are limited by a Type 1 bottleneck are called I/O-Bound (Input/Output-Bound).**

A bottleneck of **Type 2** can occur if the dataset is very large and we try to store it all in memory at once, or if our analysis generates too much data in memory. Of course, memory is finite, so data either fits in memory or it doesn't. If our computer program is very sophisticated, it can offload some data stored in memory onto the computer's hard disk. This is called *swapping* and it is extremely slow. Hence, if you are running out of computer memory, your program may not stop due to a lack of memory, but it will slow down severely as it tries to juggle data between memory and hard disk. **Problems that are limited by a Type 2 bottleneck are called Memory-Bound.**

A bottleneck of **Type 3** has a lot less to do with the data and more to do with the algorithm we're running. If we're reading in a huge dataset and just doing a simple unit conversion (for example, multiplying the data by 1000 and then saving it back to disk), then CPU clock speed probably isn't an issue: computers can multiply numbers very fast. But exactly how fast depends on how fast the CPU is. **Problems that are limited by a Type 3 bottleneck are called CPU-Bound.**

### CPU-bound problems

Historically, Type 3 bottlenecks have received the most attention. Improvements in the manufacturing process for CPUs have led to faster and faster chips. Gordon Moore was one of the first to notice the rate of this upward trend, and **Moore's Law** has been an article of faith in the industry for a long time: the tendency for CPU clock speeds to double every 2 years (Moore 1965).

[But there are recent signs that this rate of doubling may be slowing down.](https://www.tomshardware.com/tech-industry/semiconductors/intels-ceo-says-moores-law-is-slowing-to-a-three-year-cadence-but-its-not-dead-yet) There are several reasons for this that are beyond the scope of this lesson (Bohr 2007). A major reason is the problem of heat dissipation. Trying to maintain the same rate of growth in transistors has required making transistors smaller. But the smaller they get, the hotter they get when electricity flows through them. Modern chip design is primarily concerned with trying to keep things from melting!

However, if we combine multiple low-power CPUs together, we can actually get better performance than from a single, high-power CPU. Consider the figure below. With a single CPU, it is only possible to process data in a Sequential or Concurrent scheme. Sequential processing means that only a single task can be worked on before switching to another task.

![](./assets/M2_concurrency.jpg)

*Image by [Kevin Wahome](https://kwahome.medium.com/concurrency-is-not-parallelism-a5451d1cde8d)*

In a **Concurrent scheme,** computers can seamlessly switch between tasks so fast that it appears as if multiple tasks, or **threads,** are being worked on simultaneously. **Concurrency** or **multi-threading** is how single CPUs have allowed us to do multiple tasks for the first few decades of the personal computer. **To get faster computers and mobile phones today, we are now using multiple, low-power CPUs to work on independent tasks simultaneously.** This is the **Parallel scheme.** 

**Today, we'll see how multiple CPUs can be used to break a problem down into smaller parts that can be executed simultaneously.** Some of the tools we're working make it so easy to use a Concurrent or Parallel processing scheme that it can be hard to tell the difference between the two. So, in this lesson, we'll use the terms "Concurrent processing" or "Concurrency" to refer to both the Concurrent and Parallel processing schemes.

---

## Concurrent processing for large climate datasets

As we've seen previously, we can use `earthaccess` to download [MERRA-2](https://gmao.gsfc.nasa.gov/reanalysis/MERRA-2/) data from NASA EarthData Search. We'll be using the daily, aggregated data we used before, with the `short_name` `"M2SDNXSLV"`.

In [None]:
import earthaccess
import xarray as xr
from matplotlib import pyplot

auth = earthaccess.login()

results = earthaccess.search_data(
    short_name = 'M2SDNXSLV',
    temporal = ("2024-01-01", "2024-05-31"))

#### &#x1F3AF; Best Practice

**Remember: We want to make sure we don't accidentally change our raw data, so these data should be downloaded to a folder reserved for raw data.**

In [None]:
# Could take about 1 minute on a broadband connection
earthaccess.download(results, 'data_raw/MERRA2')

Once again, we'll use `xr.open_mfdataset()` to open our collection of files as a single `xarray.Dataset`.

In [None]:
ds = xr.open_mfdataset('./data_raw/MERRA2/*2024*.nc4')
ds

The MERRA-2 data variables we are interested in are:

- `T2MMAX`, the maximum daily temperature (degrees C)
- `T2MMEAN`, the mean daily temperature (degrees C)
- `T2MMIN`, the minimum daily temperature (degrees C)

Note that we have 122 days of data, so the resulting data cube has a time axis of 122 daily time steps. `xarray` has automatically broken our dataset into equal-sized **chunks** that could be processed independently.

&#x1F449; In `xarray`, a **chunk** (also called a **block**) is a piece of our dataset: a defined subset along one or more axes.

In [None]:
ds['T2MMEAN']

**The size and shape of the chunks are important if we are going to use concurrency.** Consider, for example, if we wanted to calculate long-term trends. With the chunks we currently have, we could not calculate trends because each chunk contains only one time step.

We could try using [the `chunks` argument of `open_mfdataset()`](https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html) to specify that chunks should have 122 elements along the `time` axis...

In [None]:
# The "chunks" argument tells xarray what size the chunks should be on one or more axes
ds = xr.open_mfdataset('./data_raw/MERRA2/*2024*.nc4', chunks = {'time': 122})
ds['T2MMEAN'].data

However, it's clear that didn't work; each chunk still only has one time step.

#### &#x1F6A9; <span style="color:red">Pay Attention</red>

**This is because the `chunks` argument is evaluated separately for each file.** `xr.open_mfdataset()` opens multiple files and combines them into a single dataset but, in this case, because each file represents a different time step, it can't create chunks that span multiple files.

Alternatively, we can tell `xarray` how big each chunk should be along the `lat` and `lon` axes, because this doesn't require spanning multiple files. Below, we specify chunk sizes that result in just 4 chunks for every file.

In [None]:
ds = xr.open_mfdataset('./data_raw/MERRA2/*2024*.nc4', chunks = {'lat': 182, 'lon': 288})
ds['T2MMEAN'].data

**If we really needed each chunk to contain the entire `time` axis (122 time steps), we would need to re-chunk the data *after* reading in all the files.** We can do this using [the `chunk()` method of an `xarray.Dataset` or `xarray.DataArray`.](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.chunk.html)

In [None]:
# TODO Re-chunking the data *after* loading is generally inefficient, but might be necessary; 
#    give example of "what if" we were interested in calculating trends

ds = xr.open_mfdataset('./data_raw/MERRA2/*2024*.nc4')
ds = ds.chunk({'time': 122})
ds['T2MMEAN'].data

#### &#x1F3AF; Best Practice

In general, it's best to use the `chunks` argument because re-chunking the data is inefficient. However, in cases where you need chunks to span multiple files, you will have to re-chunk the data using the `chunk()` method.

In this case, we don't actually need chunks to with 122 time steps. We are fine with whatever chunking `xarray` does by default. If we set `chunks = 'auto'`, then `xarray` will choose to load all the input files into memory at once; hence, there is one chunk per file.

In [None]:
ds = xr.open_mfdataset('./data_raw/MERRA2/*2024*.nc4', chunks = 'auto')
ds['T2MMEAN'].data

---

## Computing PET using Hargreaves equation

In order to calculate the Precipitation-to-PET ratio, we'll first need to use the Hargreaves equation to calculate PET:
$$
\text{PET} = 0.0023 \times R_A \times \sqrt{T_{max} - T_{min}} \times (T + 17.8)
$$

Above, $R_A$ is the top-of-atmosphere (TOA) solar radiation and $T$, $T_{max}$, and $T_{min}$ are the mean, maximum, and minimum temperatures, respectively.

#### &#x1F3AF; Best Practice

The Hargreaves equation is just complex enough that we need to develop multiple data-processing steps to get to our goal, which is the Precipitation-to-PET ratio for a defined region. This effort will require that we pay attention to several potential pitfalls of computational data science:

- Ensuring that processing steps are done in the correct order, so that data structures and/or Python variables are correctly initialized.
- Ensuring that measurement units are correct and compatible between different data processing steps.
- Documenting each processing step so that we can identify potential errors and so that a third party can verify or reproduce our analysis.

A technique from computer science called **decomposition** can help us to plan our analysis. **Decomposition** involves breaking a problem down into a series of independent, manageable steps. We might decompose our problem into these steps:

1. Load the required temperature data inputs.
2. Calculate top-of-atmosphere (TOA) solar radiation.
3. Calculate potential evapotranspiration (PET) using the Hargreaves equation.
4. Compute the Precipitation-to-PET ratio.

**These ordered steps should help us to organize our workflow in a way that someone else can easily understand.** We've already loaded the required temperature data (Step 1), so let's move on to calculating TOA radiation.

### Computing top-of-atmosphere (TOA) radiation

Here is a function for calculating TOA radiation, [based on FAO guidance.](https://www.fao.org/4/X0490E/x0490e07.htm#radiation)

In [None]:
import numpy as np

def toa_radiation(latitude, doy):
    '''
    Top-of-atmosphere (TOA) radiation for a given latitude (L) and day of year
    (DOY) can be calculated as:

    R = ((24 * 60) / pi) * G * d * (w * sin(L) * sin(D) + cos(L) * cos(D) * sin(w))

    Where G is the solar constant, 0.0820 [MJ m-2 day-1]; d is the (inverse) 
    relative earth-sun distance; w is the sunset hour angle; and D is the solar
    declination angle.
    
    For more information, consult the FAO documentation:

        https://www.fao.org/4/X0490E/x0490e07.htm#radiation
    
    Parameters
    ----------
    latitude : float
        The latitude on earth, in degrees
    doy : int
        The day of the year (DOY), an integer on [1,366]
    
    Returns
    -------
    Number
        Top-of-atmosphere (TOA) radiation, in [MJ m-2 day-1]
    '''
    assert isinstance(doy, int), 'The "doy" argument must be an integer'
    assert doy >= 1 and doy <= 366, 'The "doy" argument must be between 1 and 366, inclusive'
    solar_constant = 0.0820 # [MJ m-2 day-1]
    pi = 3.14159
    
    # Convert latitude from degrees to radians
    latitude_radians = np.deg2rad(latitude)
    # Inverse Earth-Sun distance (relative), as a function of day-of-year (DOY)
    earth_sun_dist = 1 + 0.0033 * np.cos(doy * ((2 * pi) / 365))
    # Solar declination, as a function of DOY
    declination = 0.409 * np.sin(doy * ((2 * pi) / 365) - 1.39)
    
    # Sunset hour angle; we use np.where() below to guard against
    #   warnings where arccos() would return invalid values, which
    #   happens when the argument is outside [-1, 1]
    _hour_angle = -np.tan(latitude_radians) * np.tan(declination)
    _hour_angle = np.where(np.abs(_hour_angle) > 1, np.nan, _hour_angle)
    sunset_hour_angle = np.arccos(_hour_angle)

    # Incident radiation, depends only on the relative earth-sun distance
    inc_radiation = ((24 * 60) / pi) * solar_constant * earth_sun_dist
    return inc_radiation * (sunset_hour_angle * np.sin(latitude_radians) * np.sin(declination) +
            np.cos(latitude_radians) * np.cos(declination) * np.sin(sunset_hour_angle))

### Well-documented functions

**There are several things to note about this function.**

There is a **function-level docstring** that provides rich information about the purpose and use of the function. In addition to the important "Parameters" and "Return" value sections, we have provided a simple, human-readable form of the equation we're using to calculate TOA radiation. We also provided a link to the FAO document where this equation came from. These are all very important things to include so that someone else can figure out how we're calculating TOA radiation. These things also help us to later verify that we're performing calculations correctly.

In the **Parameters** section, we made sure to define the measurement units required for each input parameter. This is *extremely* important. In the above example, we would get a different, and incorrect, answer if `latitude` was given in radians instead of degrees. We also indicated the Python **data type,** e.g., `float`. This is also important to include because, when a computation involves the wrong data type, it is often difficult to figure out that the error is due to an incorrect data type.

**Variable names** are chosen carefully. We use `latitude` instead of a name like `x`, which is too short and could signify multiple things. We also defined a variable `latitude_radians` to distinguish when we are using latitude in radians, as opposed to degrees. While `latitude` could have been written as `latitude_degrees`, we decided to compromise clarity for a shorter name in this case, although clarity is usually most important. Ultimately, there are some subjective choices to be made, but you should consider choosing variable names that communicate the meaning *and* the measurement units of the quantity they represent. If that is hard to, **inline comments** can help to keep track of units, as we did with the inline comment next to `solar_constant`.

**Constants** are defined at the top of our function: `pi` and `solar_constant`. While many people might recognize a number like 3.14 as the number pi, defining it as a variable, `pi`, in our function makes this more clear and allows for us to control the precision of this number in one place. In general, constants should be defined only once!

**Comments** are used frequently. In particular, where there are complex calculation steps to obtain the `sunset_hour_angle`, we have a long comment above the code to explain what it does. If we need to use intermediate variables in our calculation, we can use less informative variable names, like `_hour_angle`. In Python, variable names that begin with the underscore, `_`, signal to users that the variable is less important or can be ignored.

For long calculations, like the `return` value of our function, it can be helpful to break them up into smaller, more meaningful quantities, paying attention to the order of operations. This is why we defined the `inc_radiation` variable. When a calculation can't be broken down into meaningful parts, it can improve readability to break the equation across multiple lines, as we did by creating a line break after a `+` operation.

Finally, note that we included **assertions,** using the `assert` keyword, to help ensure that users call this function correctly. Consider what happens when the wrong data type, or an out-of-range value, is provided for the `doy` argument:

In [None]:
toa_radiation(36.1, doy = 14.0)

In [None]:
toa_radiation(36.1, doy = 500)

#### &#x1F3C1; Challenge: Writing a well-documented function

Now that we've reviewed what makes a well-documented function, **write the function for the next step of our analysis.** The equation below can be used to calculate PET. Write a well-documented Python function called `potential_et()` that returns PET in units of millimeters per day (mm day$^{-1}$).

$$
\text{PET} = 0.0023 \times R_A \times \sqrt{T_{max} - T_{min}} \times (T + 17.8)
$$

The inputs to the `potential_et()` function are:

- $R_A$ is the top-of-atmosphere solar radiation, in mm H$_2$O equivalent per month
- $T_{max}$ is the monthly maximum temperature, in degrees C
- $T_{min}$ is the monthly minimum temperature, in degrees C
- $T$ is the monthly average temperature, in degrees C

Expand the cell below to see one solution to this problem.

In [None]:
def potential_et(toa_radiation, temp_max, temp_min, temp_mean):
    '''
    Calculates potential evapotranspiration, according to the Hargreaves
    equation:

    PET = 0.0023 * R * sqrt(Tmax - Tmin) * (Tmean + 17.8)

    Where R is the top-of-atmosphere (TOA) radiation (mm month-1); Tmax and 
    Tmin are the maximum and minimum monthly air temperatures (degrees C),
    respectively; and Tmean is monthly mean air temperature (degrees C).

    Parameters
    ----------
    toa_radiation : Number
        The top-of-atmosphere (TOA) radiation (mm day-1)
    temp_max : Number
        Maximum monthly air temperature (degrees C)
    temp_min : Number
        Minimum monthly air temperature (degrees C)
    temp_mean : Number
        Average monthly air temperature (degrees C)

    Returns
    -------
    Number
        The potential evapotranspiration (PET) in [mm day-1]
    '''
    return 0.0023 * toa_radiation * np.sqrt(temp_max - temp_min) * (temp_mean + 17.8)

In [None]:
toa_radiation(32, 200)

In [None]:
lats = np.array([22, 32, 42])

toa_radiation(lats, 200)

In [None]:
from matplotlib import pyplot

doy = np.arange(1, 365)

rad = toa_radiation(32, doy)
pyplot.plot(doy, rad, 'k-')

In [None]:
# TODO Vectorization

toa_radiation(lats, doy)

### Deriving variables from `xarray` coordinates

In [None]:
ds.coords

In [None]:
ds.lat.shape

In [None]:
# TODO Vectorization
# TODO Getting an array of latitude values to match our temperature arrays

lats = ds['lat'].values
lats = lats.reshape((361, 1)).repeat(ds.lon.size, axis = 1)
lats.shape

In [None]:
# TODO Have to specify the dimensions of a new variable

ds['lat_grid'] = (('lat', 'lon'), lats)
ds

In [None]:
# TODO HOWEVER, it will be much easier to do some computation later
#   if our "lat_grid" has the same dimensions as all the other Variables

lats2 = lats.reshape((361, 576, 1)).repeat(122, axis = 2)
lats2.shape

In [None]:
ds['lat_grid'] = (('lat', 'lon', 'time'), lats2)
ds

In [None]:
# TODO https://docs.xarray.dev/en/stable/user-guide/time-series.html#datetime-components

doy = ds['time.dayofyear'].values
doy

### Calculating top-of-atmosphere radiation

In [None]:
test = ds.sel(time = '2024-05-01')

rad = toa_radiation(test['lat_grid'].values, test['time.dayofyear'].values)

In [None]:
test['toa_radiation'] = rad

In [None]:
# TODO Note that we should specify the dimensions of a dataset we add

test['toa_radiation'] = (('lat', 'lon', 'time'), rad)
test['toa_radiation'].plot()

In [None]:
def my_function(dataset):
    return dataset.T2MMIN + dataset.T2MMAX

xr.map_blocks(my_function, ds)

In [None]:
# TODO Lazy evaluation (should be a review from Part 1)
# TODO Remind learners that "blocks" and "chunks" are inter-changeable

result = xr.map_blocks(my_function, ds).compute()
result

In [None]:
# TODO Explain difference between the function below and my_function();
#   it's difficult for xarray to figure out what the result looks like

def toa_radiation_wrapper(dataset):
    return toa_radiation(dataset['lat_grid'], dataset['time.dayofyear'])

result = xr.map_blocks(toa_radiation_wrapper, ds)

In [None]:
ds['time.dayofyear'].shape

In [None]:
template = ds['T2MMEAN']
template.name = 'toa_radiation'
template

In [None]:
result = xr.map_blocks(toa_radiation_wrapper, ds, template = template)
result

$R_A$ should be multiplied by 0.408 to convert it from [MJ m-2 day-1] to [mm day-1].

In [None]:
toa_result = result.compute()

# Converting TOA Radiation from [MJ m-2 day-1] to [mm H2O day-1]
ds['toa_radiation'] = toa_result * 0.408
ds

#### &#x1F3AF; Best Practice

**Make sure to include some field-level metadata, in case we end up sharing this dataset with others.**

In [None]:
ds['toa_radiation'].attrs

In [None]:
ds['toa_radiation'].attrs['units'] = 'mm H2O day-1'
ds['toa_radiation']

--- 

## Profiling computational resources

In [None]:
# TODO Review computational resources and bottlenecks
# TODO Review array and chunk memory sizes

ds['T2MMEAN']

In [None]:
first_day = ds.sel(time = '2024-01-01')

In [None]:
# TODO Note there is exactly one chunk; i.e., the subsequent computation will not use more than one process

first_day['T2MMEAN']

In [None]:
def potential_et(dataset):
    '''
    Calculates potential evapotranspiration, according to the Hargreaves
    equation:

    PET = 0.0023 * R * sqrt(Tmax - Tmin) * (Tmean + 17.8)

    Where R is the top-of-atmosphere (TOA) radiation (mm month-1); Tmax and 
    Tmin are the maximum and minimum monthly air temperatures (degrees C),
    respectively; and Tmean is monthly mean air temperature (degrees C).

    Single input argument should be an xarray.Dataset with the following
    data variables:

        T2MMIN: Maximum monthly air temperature (degrees C)
        T2MMAX: Minimum monthly air temperature (degrees C)
        T2MMEAN: Average monthly air temperature (degrees C)
        toa_radiation: The top-of-atmosphere (TOA) radiation (mm day-1)

    Parameters
    ----------
    dataset: xarray.Dataset

    Returns
    -------
    Number
        The potential evapotranspiration (PET) in [mm day-1]
    '''
    return 0.0023 * dataset['toa_radiation'] * np.sqrt(dataset['T2MMAX'] - dataset['T2MMIN']) * (dataset['T2MMEAN'] + 17.8)

In [None]:
%%timeit

potential_et(first_day)

In [None]:
%%timeit

# TODO Note that we shouldn't try to assign any variables inside a timeit block
potential_et(first_day).compute()

In [None]:
# TODO About 700 ms for a single day

20e-3 * ds.time.size

In [None]:
result = potential_et(first_day).compute()
result

In [None]:
# TODO Note that this is really only valid for land surfaces

result.name = 'Potential ET (mm day-1)'
result.plot()

In [None]:
%%timeit

potential_et(ds)

In [None]:
%%timeit

# TODO Discuss how multi-process overhead can cause some concurrent operations to have a longer wall time than expected
xr.map_blocks(potential_et, ds)

Read more about the `timeit` module here:

- https://docs.python.org/3/library/timeit.html
- https://sjvrijn.github.io/2019/09/28/how-to-timeit.html

---

In [None]:
pet = potential_et(ds)
pet

In [None]:
pet_tiaret = pet.sel(lon = -1.32, lat = 35.37, method = 'nearest')
pet_tiaret

In [None]:
pet_tiaret.plot()

In [None]:
chirps = xr.open_mfdataset('data_raw/CHIRPS/CHIRPS-v2_Africa_monthly_2014-2024.nc')
chirps_tiaret = chirps['precip'].sel(x = slice(0.8, 1.8), y = slice(36.1, 35.1))
chirps_tiaret

In [None]:
# TODO Increasing the frequency of our monthly dataset to daily using nearest-neighbor interpolation

chirps_tiaret_resampled = chirps_tiaret.isel(time = slice(120, 125)).resample(time = 'D').nearest()
chirps_tiaret_resampled

In [None]:
# TODO Note that we're using a rough approximation of the number of days in a month

chirps_tiaret_daily = chirps_tiaret_resampled.mean(['x', 'y']) / 30
chirps_tiaret_daily

In [None]:
ratio = chirps_tiaret_daily.values / pet_tiaret.values

pyplot.figure(figsize = (12, 4))
pyplot.plot(pet['time'].values, ratio, 'k-')

On its own, the graph above doesn't tell us how severe the drought in Tiaret is. Although precipitation in the region has replenished less than 5% of its lost water over the past few months, this could be part of the normal seasonal cycle. Actually, we know that January through April is a relatively wet period for Tiaret, but the question remains: **Can we compare this year to past years?**

---

### More resources

- The National Center for Atmospheric Research (NCAR) has an excellent article on ["Using `dask` to scale up your data analysis."](https://ncar.github.io/Xarray-Dask-ESDS-2024/notebooks/02-dask-intro.html)
- Sander van Rijn's [tutorial on using the `timeit` module.](https://sjvrijn.github.io/2019/09/28/how-to-timeit.html)

### References

Bohr, Mark. 2007. "A 30-year retrospective on Dennard's MOSFET scaling paper." [https://www.eng.auburn.edu/~agrawvd/COURSE/READING/LOWP/Boh07.pdf](https://www.eng.auburn.edu/~agrawvd/COURSE/READING/LOWP/Boh07.pdf)

Moore, Gordon E. 1965. "Cramming more components onto integrated circuits" *Electronics Magazine.*