# 01 - Tuto xarray

This notebook is set to familiarize with the `xarray` package that you will have to use in other notebooks.

## Constraints

+ 🚨 Only cells with the comment `# NOTE: Fill me!` should be filled
+ 🚨 Notebook should be saved and commited **with** outputs for the submission


+ ⚠️ The solution only requires packages listed in the `requirements/requirements.txt`

## Note

+ The `assert` statements in the notebook are here to guide the project.
However, successful `assert` statements does not guarantee that your code is correct.


In [1]:
!pip install -r requirements/requirements.txt

Collecting matplotlib>=3.6.0
  Downloading matplotlib-3.7.0-cp39-cp39-win_amd64.whl (7.6 MB)
     ---------------------------------------- 7.6/7.6 MB 7.8 MB/s eta 0:00:00
Collecting netCDF4>=1.6.0
  Downloading netCDF4-1.6.2-cp39-cp39-win_amd64.whl (5.2 MB)
     ---------------------------------------- 5.2/5.2 MB 7.4 MB/s eta 0:00:00
Collecting numpy>=1.23.0
  Downloading numpy-1.24.2-cp39-cp39-win_amd64.whl (14.9 MB)
     ---------------------------------------- 14.9/14.9 MB 8.0 MB/s eta 0:00:00
Collecting pandas>=1.5.0
  Downloading pandas-1.5.3-cp39-cp39-win_amd64.whl (10.9 MB)
     ---------------------------------------- 10.9/10.9 MB 8.0 MB/s eta 0:00:00
Collecting pathlib2==2.3.7.post1
  Downloading pathlib2-2.3.7.post1-py2.py3-none-any.whl (18 kB)
Collecting scipy==1.9.3
  Downloading scipy-1.9.3-cp39-cp39-win_amd64.whl (40.2 MB)
     ---------------------------------------- 40.2/40.2 MB 6.9 MB/s eta 0:00:00
Collecting xarray>=2022.6.0
  Downloading xarray-2023.2.0-py3-none-any.

ERROR: Could not install packages due to an OSError: [WinError 5] Accesso negato: 'C:\\Users\\David\\anaconda3\\Lib\\site-packages\\~umpy\\core\\_multiarray_tests.cp39-win_amd64.pyd'
Consider using the `--user` option or check the permissions.



In [2]:
import xarray as xr
import pandas as pd
from pathlib import Path
from datetime import datetime

%reload_ext autoreload
%autoreload 2

# 1. Parameters

In [None]:
DATA_PATH = Path("data")
RASTER_PATH = DATA_PATH / "rasters"
CSV_PATH = DATA_PATH / "csv"

# 2. Data

## 2.1 Download data

In [None]:
raster_path = RASTER_PATH / "raster_test.nc"

## 2.2 Load data

`xarray` doc: https://tutorial.xarray.dev/intro.html

`xarray.DataArray` is xarray’s implementation of a labeled, multi-dimensional array. It has several key properties:

- values: a numpy.ndarray holding the array’s values

- dims: dimension names for each axis (e.g., ('x', 'y', 'z'))

- coords: a dict-like container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or strings)

- attrs: dict to hold arbitrary metadata (attributes)

`xarray.Dataset`

A dict-like collection of `DataArray` objects with aligned dimensions. Thus, most operations that can be performed on the dimensions of a single `DataArray` can be performed on a dataset. Datasets have data variables (see Variable below), dimensions, coordinates, and attributes.

⚠️ **In the following and in all notebooks, the term raster will denominate a `xarray.DataArray` or `xarray.Dataset`** ⚠️

In [None]:
raster = xr.load_dataset(raster_path).drop("spatial_ref")
raster

Here the raster is composed of 3 coordinates (`longitude`, `latitude` and `time`) and has 2 variables:
- `max_temp`: maximum temperature over one day
- `avg_temp`: average temperature over one day

# 3. Familiarize with xarray

You can have a look at the xarray doc for indexing and selecting: https://docs.xarray.dev/en/stable/user-guide/indexing.html

#### Let's now look at the variables: for example max_temp

It is a `xarray.DataArray`

In [None]:
raster["max_temp"]

Let's say we want to select the grid cell at position [0,0], then it becomes as follows:

In [None]:
raster["max_temp"][dict(longitude=0, latitude=0)]

And we can plot the timeseries over the year 2010:

In [None]:
raster["max_temp"][dict(longitude=0, latitude=0)].plot()

If we now want to get the value for July, 1st of 2010:

In [None]:
raster["max_temp"][dict(longitude=0, latitude=0)].sel(dict(time=datetime(2010, 7, 1))).values

#### Similarly, let's say, we have the coordinates (longitude and latitude) of a point and we want to know the min and mean temperature on the 5th of December 2010:

The coordinates here at in the geodesic Coordinate reference system (CRS)

For more info on CRS, please have a look at: https://docs.qgis.org/3.22/en/docs/gentle_gis_introduction/coordinate_reference_systems.html

In [None]:
point_latitude = -42.5776
point_longitude = 147.3224
point_date = datetime(2010, 12, 5)

In [None]:
raster.sel(
    dict(longitude=point_longitude, latitude=point_latitude, time=point_date),
    method="nearest",
)

And let's say, you want to access the mean temperature:

In [None]:
point_avg_temp = raster.sel(
    dict(longitude=point_longitude, latitude=point_latitude, time=point_date),
    method="nearest",
)["avg_temp"].values
point_avg_temp

# 4. Task

Your task here is to find the corresponding features associated to an ignition point. More precisely, you need to determine the elevation, population density, maximum and average temperature at the location of the ignition point and the day it occurred.

## 4.1 Load data

### 4.1.1 Rasters

In [None]:
topo_path = RASTER_PATH / "topo.nc"
weather_path = RASTER_PATH / "weather.nc"

In [None]:
topo_xr = xr.load_dataset(topo_path)
weather_xr = xr.load_dataset(weather_path)

### 4.1.2 Ignition point

In [None]:
ignition_point_path = CSV_PATH / "ignition_points.csv"

In [None]:
df_ignition = pd.read_csv(ignition_point_path, index_col=0)
df_ignition

## 4.2 Map features

🚨 You need to complement the dataframe with the raster features. 🚨

You need to use the 2 rasters and extract the information corresponding to the ignition point. You have to determine the elevation, population density, maximum and average temperature at the location of the ignition point at the time of ignition.

The expected result is shown below:

In [None]:
# NOTE: Fill me

df_ignition = "fill with proper dataframe"

In [None]:
expected_dataframe = pd.DataFrame(
    {
        "Date": {0: "2002-11-11"},
        "latitude": {0: -42.6},
        "longitude": {0: 147.5},
        "elevation": {0: 388},
        "pop_dens": {0: 2.5},
        "max_temp": {0: 19.3},
        "avg_temp": {0: 4.9},
    }
)

In [None]:
pd.testing.assert_frame_equal(df_ignition, expected_dataframe, rtol=1e-2)

---

# END OF SCRIPT