# RAMP: Predicting percentage of bleached corals from the GCBD

*Emeline Bruyère, Anaëlle Cossard, Alexis Michalowski-Skarbek, Rosanne Phebe, Rémi Poulard & Marine Tognia-tonou (M2 AMI2B).*  

<div>
<table style="width:100%; background-color:transparent;">
  <tr style="background-color:transparent;">
    <td align="left"; style="background-color:transparent; width: 40%;">
        <a href="https://www.hi-paris.fr/">
            <img src="https://github.com/x-datascience-datacamp/datacamp-master/raw/main/images/logo-hi-paris-retina.png" width="200px"/>
        </a>
    </td>
    <td align="center"; style="background-color:transparent; width:60%;">
        <a href="https://dataia.eu">
            <img src="https://github.com/ramp-kits/tephra/raw/main/img/DATAIA-h.png" width="600px"/>
        </a>
    </td>
    <td align="right"; style="background-color:transparent; width: 40%;">
        <a href="https://coralecologylab.wixsite.com/coralecologylab">
            <img src="https://static.wixstatic.com/media/bc7c58_d314690735724ddea8c5c8898bc5a4e3~mv2.png/v1/fill/w_201,h_201,al_c,q_85,usm_0.66_1.00_0.01,enc_auto/Beige%20Classic%20Circular%20Fashion%20Fashion%20Logo%20(1)_edited.png" width="200px"/>
        </a>
    </td>
  </tr>
 </table>
</div>



## Table of Contents
* [Introduction](#introduction)
* [The dataset](#dataset)
* [Requirements](#requirements)
* [Data exploration](#exploration)
* [Base model](#base_model)
* [Submitting on RAMP](#submitting)

## Introduction <a class="anchor" id="introduction"></a>
***... BROUILLON ...***  
Coral reefs, the world's most diverse marine ecosystems, play a crucial role in providing resources and services benefiting millions of people. However, they have recently faced an escalation in thermal-stress events, leading to coral bleaching. The coral bleaching phenomenon results from the breakdown of the symbiotic relationship between corals and microalgae. It is caracterised by the loss of pigments and symbionts, causing corals to appear pale, bleached. Bleaching can be temporary or fatal for corals, but undoubtedly, marine heat waves pose the most significant threat to coral reefs on a global scale.

The Global Coral-Bleaching Database (GCBD) compiles 34,846 coral bleaching records from 14,405 sites in 93 countries, from 1980–2020. The GCBD provides vital information on the presence or absence of coral bleaching along with site exposure, distance to land, mean turbidity, cyclone frequency, and a suite of sea-surface temperature metrics at the times of survey.

The goal of this RAMP is to predict the percentage of bleached corals from environmental data, based on the data gathered in the GCBD.

## The dataset <a class="anchor" id="dataset"></a>
The description of all the columns of the dataset is available from [A global coral-bleaching database, 1980–2020](https://doi.org/10.1038/s41597-022-01121-y).

| ![dataset_structure.png](https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41597-022-01121-y/MediaObjects/41597_2022_1121_Fig3_HTML.png?as=webp) |
|:--:|
| <b>Schematic of the Global Coral Bleaching Database (GCBD) showing the relationships among the 20 tables.</b>|

For this challenge, the data were first preprocessed and then split in order to preserve a private test set on which to evaluate the models on our servers. This leaves ***À COMPLÉTER*** observations in the public train set and ***À COMPLÉTER*** observations in the public test set. 
Observations are grouped by sites (with a Site_ID ***À VÉRIFIER*** each) and we are very cautious to keep those examples from the same site either in the train set or in the test set (both during splitting and cross-validation).

Preprocessing steps before splitting the data:
1.  ...
2.  ...
3.  ...

## Requirements <a class="anchor" id="requirements"></a>
### Librairies

In [1]:
import pandas as pd
import numpy as np
import missingno as msno
import matplotlib.pyplot as plt
import seaborn as sns

# ... À COMPLÉTER 

### Data

In [None]:
# ... À COMPLÉTER 

## Data Exploration <a class="anchor" id="exploration"></a>

In [None]:
# ... À COMPLÉTER 

## Base model <a class="anchor" id="base_model"></a>

In [None]:
# ... À COMPLÉTER 

## Submitting to the online challenge: [ramp.studio](https://ramp.studio) <a class="anchor" id="submitting"></a>

Once you found a good model, you can submit them to [ramp.studio](https://www.ramp.studio) to enter the online challenge. If it is your first time using the RAMP platform, [sign up](https://www.ramp.studio/sign_up), otherwise [log in](https://www.ramp.studio/login). Then sign up to the event ***À MODIFIER*** [Coral bleaching](http://www.ramp.studio/events/tephra_datacamp2023). Both signups are controled by RAMP administrators, so there **can be a delay between asking for signup and being able to submit**.

Once your signup request is accepted, you can go to your [sandbox](https://www.ramp.studio/events/tephra_datacamp2023/sandbox) and copy-paste. You can also create a new folder `my_submission` in the `submissions` folder containing `classifier.py` and upload this file directly. You can check the starting-kit ([`classifier.py`](/edit/submissions/starting_kit/classifier.py)) for an example. The submission is trained and tested on our backend in the similar way as `ramp-test` does it locally. While your submission is waiting in the queue and being trained, you can find it in the "New submissions (pending training)" table in [my submissions](https://www.ramp.studio/events/tephra_datacamp2023/my_submissions). Once it is trained, your submission shows up on the [public leaderboard](https://www.ramp.studio/events/tephra_datacamp2023/leaderboard). 
If there is an error (despite having tested your submission locally with `ramp-test`), it will show up in the "Failed submissions" table in [my submissions](https://www.ramp.studio/events/tephra_datacamp2023/my_submissions). You can click on the error to see part of the trace.

The data set we use at the backend is usually different from what you find in the starting kit, so the score may be different.

The usual way to work with RAMP is to explore solutions, add feature transformations, select models, etc., _locally_, and checking them with `ramp-test`. The script prints mean cross-validation scores.

The official score in this RAMP (the first score column on the [leaderboard](http://www.ramp.studio/events/tephra_datacamp2023/leaderboard) ***À METTRE À JOUR***) is the balenced accuracy score (`bal_acc`). When the score is good enough, you can submit it at the RAMP.

Here is the script proposed as the starting_kit:

In [2]:
# ... À COMPLÉTER avec notre exemple de classifieur/régresseur

You can test your solution locally by running the ramp-test command followed by --submission <my_submission folder>.
Here is an example with the starting_kit submission:

In [None]:
!ramp-test --submission starting_kit

## More information

See the [online documentation](https://paris-saclay-cds.github.io/ramp-docs/ramp-workflow/stable/using_kits.html) for more details.

## Questions

Questions related to the starting kit should be asked on the [issue tracker](https://github.com/ramp-kits/tephra/issues).