# Introduction 🌱

This Jupyter notebook presents an example method used to model a presence raster, with a focus on the *Angiosperms* layer. This is just one approach, and while this notebook demonstrates my specific choice of the *Angiosperms* layer, other users may choose different layers based on their own objectives. 

In this *Steps* notebook, you will find a detailed walkthrough of each stage of the project. Each step is explained in its corresponding notebook, offering a clear, organized process for modeling that can be adapted to various layers.

## Step 0: Choose Your Layer 🌍

For this example, I’ve chosen to model the *Angiosperms* layer. However, the process demonstrated here is flexible and can be adjusted for any layer you choose. To help you decide, you can explore and visualize the available layers using the following [Web Application for Layer Visualization](https://mistra-c2b2-symphony-layers-interactive-explorer-interface.streamlit.app/).

The web application provides:
- 📑 A **summary** of each layer. 
- 🛠️ **Recommendations for data improvement**, which serve as a guide to help you begin modeling the raster and provide useful starting points for your work. 

Select the layer that best fits your project needs and objectives.

## Step 1 : Download Your Datasets 📥

You can use the **summary** and the **recommendation for data improvement** of the layer to dress a list of the relevant parameters for your layer. You can also search for studies of your subject. 

The [Symphony Metadata Report (March 2019)](https://github.com/Mistra-C2B2/Symphony-Layers-Interactive-Explorer/blob/main/Symphony%20Metadata%20March%202019.pdf) recommends using additional environmental criteria such as *water depth*, *substrate salinity*, and *exposure* to model the Angiosperms probability of presence.  In addition, a study of the [distribution and co-occurrence patterns of charophytes and angiosperms in the northern Baltic Sea](https://www.nature.com/articles/s41598-023-47176-8) lists relevant parameters to create a model for Angiosperms plants. 

I decided to model the probability of presence of the *Zosteraceae* (a family of seagrasses) between 2015 and 2023. I have downloaded datasets of *Zosteraceae* locations in the Swedish marine area from [GBIF](https://www.gbif.org) and relevant datasets from the [Copernicus Marine Data Store](https://data.marine.copernicus.eu/products). 

Here is the list of datasets I have downloaded: 

- 🌐 [Baltic Sea Physics Reanalysis](https://data.marine.copernicus.eu/product/BALTICSEA_MULTIYEAR_PHY_003_011/description)
- 🧪 [Baltic Sea Biogeochemistry Reanalysis](https://data.marine.copernicus.eu/product/BALTICSEA_MULTIYEAR_BGC_003_012/description)
- 🌊 [Baltic Sea Wave Hindcast](https://data.marine.copernicus.eu/product/BALTICSEA_MULTIYEAR_WAV_003_015/description)
- 🌱 [Artportalen](https://www.gbif.org/dataset/38b4c89f-584c-41bb-bd8f-cd1def33e92f)
- 🗺️ [EMODnet Bathymetry](https://emodnet.ec.europa.eu/en/bathymetry)

### Filtering and Downloading the Artportalen Dataset

To filter the Artportalen dataset, in the **DOWNLOAD** section, you can use the simple filters and filter by:
1. 🌱 **Scientific name**: Plantae > Liliopsida > Alismatales > Zosteraceae
2. 📅 **Year**: 2015 - 2023
3. 🗺️ **Location**: I click on "Including coordinates", and drew myself a rough geometry of the Baltic Sea including Skagerrak.

### Downloading Copernicus Marine Data 

There are two ways you can download data from the Copernicus Marine Data Store:

#### 1. 💻 Manual Download 

For the **Baltic Sea Physics Reanalysis** and **Baltic Sea Biogeochemistry Reanalysis**:
1. Click on “Data access” and choose the desired time resolution (I selected Yearly).
2. Download the datasets using the “From” or “Browse” buttons. 

#### 2. 📡 Using the Python API 

For the **Baltic Sea Wave Hindcast** dataset, I used the Python API to download it programmatically, which is an efficient method for automating the process and handling large datasets. Since this dataset is available only in hourly resolution, I had to calculate the yearly mean to match the annual resolution of the other datasets.

### Downloading EMODNET Data 

- [EMODnet Digital Bathymetry (DTM 2024) - Tile C6](https://emodnet.ec.europa.eu/geonetwork/srv/eng/catalog.search#/metadata/dbf6b74e-e46d-47f9-b3d9-b208844c4588) - Living Lab North
- [EMODnet Digital Bathymetry (DTM 2024) - Tile D6 ](https://emodnet.ec.europa.eu/geonetwork/srv/eng/catalog.search#/metadata/c9e19787-dd7e-45ab-b756-3c044d1f3b86) - Living Lab East
- [EMODnet Digital Bathymetry (DTM 2024) - Tile D5 ](https://emodnet.ec.europa.eu/geonetwork/srv/eng/catalog.search#/metadata/2f10b5c3-07fe-456c-a21e-b97620634de7) - Living Lab West



## Step 2 : Pre-processing 🛠️

This step is the most important one because it prepares and formats the data for the model. Any mistake can influance your result and it is very difficult to found the mistake analyzing the final result. I have applyed 2 diffrent transformations to my datasets :
#### 1. 🗺️ **Creating a spatial mask and spatially filter the inputs** 

I was thinking that I had to filter the raster I dowloaded before resampling it but it was unecessary. 

#### 2. 📅 **Calculating the annual mean - Temporal Resampling**

#### 3. 🌐 **Creating and Apply a 250x250m grid - Saptial Resampling**



## Step 3 : Filter and format the inputs ➡️

#### 1. 🔢 **Correlation Matrix** 

#### 2. 📄 **Format of the inputs**

## Step 4 : Choose your model
I chossed a MaxEnt model because it is a specific model for presence only data, usually used to model species presence.
