# **Coral Bleaching: *Global Environmental Analysis* 🪸**

*I welcome any feedback as I continue my journey in learning data science. If you find the notebook helpful, feel free to give it a vote – your support means a lot! 😎*
<hr>

## <div style="padding: 20px;color:white;margin:10;font-size:90%;text-align:left;display:fill;border-radius:10px;background-color: rgba(0, 0, 0, 0.2);overflow: hidden; background-image: url(https://i.ibb.co/1fn8BZj/image4.webp); background-size: cover; background-position: center; background-blend-mode: darken;"><b><span style='color:white'> 1 | Introduction</span></b> </div>

<img src="https://i.ibb.co/bNj4ZTZ/image5.webp" alt="Notebook Cover Image" style="width:100%; height:auto; border-radius:15px; object-fit:cover; aspect-ratio: 3/1;">

❗️ *Acknowledgment*

*The dataset used in this analysis was obtained from the Biological and Chemical Oceanography Data Management Office (BCO-DMO) [website](https://www.bco-dmo.org/dataset/773466). The data has been utilized for **educational** and **personal research** purposes only. We express our gratitude to the BCO-DMO and the respective contributors for making this valuable resource available.*

### <b>1.1 <span style='color:#6495ED'>|</span> Study Background</b> 


*Why should we care about coral?*

Coral reefs are **critical to the marine ecosystem**, with about **25% of marine life** depending on the habitats they create. Furthermore, over **500 million people** worldwide **rely on reefs for food, tourism, and employment**, with coral reefs providing protection from extreme weather events and contributing nearly **$30 billion** in estimated value (TIME, 2016).

*What is coral bleaching, and why is it a concern?*

Despite their importance, coral reefs are increasingly threatened by a phenomenon known as **coral bleaching**. Coral bleaching occurs when corals, stressed by changes in environmental conditions—most notably increased **sea surface temperatures (SSTs)**—expel the symbiotic algae (*zooxanthellae*) that live within their tissues. These algae are crucial to the coral's survival, as they provide energy through **photosynthesis** and contribute to the coral's vibrant colors. The loss of these algae causes the corals to turn white or "bleach" and significantly increases their susceptibility to **disease** and **mortality** (Hughes et al., 2017; Hoegh-Guldberg et al., 2017).

*What causes coral bleaching?*

The primary driver of coral bleaching is **thermal stress** due to elevated SSTs. Even small increases in SSTs, such as **1°C above the normal maximum temperatures**, can lead to widespread bleaching events, particularly when these elevated temperatures persist for extended periods (Baker, Glynn, & Riegl, 2008). Additional stressors, such as **pollution**, **overfishing**, and **ocean acidification**, can exacerbate the effects of thermal stress, making it more difficult for corals to recover from bleaching events (Hoegh-Guldberg et al., 2017; Loya et al., 2001).

*How has coral bleaching changed over time?*

Over the past few decades, the **frequency** and **severity** of coral bleaching events have increased significantly, largely driven by **global climate change** (Hughes et al., 2017). As ocean temperatures continue to rise, the future of coral reefs is increasingly uncertain, with projections suggesting that most coral reefs could experience **annual severe bleaching** by the mid-21st century (Pandolfi et al., 2011). The potential loss of coral reefs would have profound implications not only for **marine ecosystems** but also for human communities that depend on the goods and services provided by healthy coral reefs (Eakin et al., 2016).

*What can be done to address coral bleaching?*

Understanding the environmental conditions that lead to coral bleaching, such as **sea surface temperature anomalies (SSTA)** and **thermal stress anomalies (TSA)**, is crucial for predicting and mitigating the impacts of future bleaching events (Spalding, Ravilious, & Green, 2001). By studying these factors, researchers can develop **early warning systems**, inform **conservation strategies**, and guide efforts to reduce the **anthropogenic stressors** that further endanger coral reefs (Brown, 1997). Addressing coral bleaching is not only a matter of preserving **biodiversity** but is also critical for maintaining the **resilience** and **sustainability** of marine environments and the human economies that depend on them.

<figure style="text-align: center; margin: 20px 0;">
    <img src="https://i.ibb.co/9q17cbV/image6.webp" alt="image6" border="0" alt="Coral Bleaching" style="width:100%; height:auto; border-radius:15px; object-fit:cover; aspect-ratio: 3/1;">
    <figcaption style="font-size: 14px; color: #555; margin-top: 10px;">
        Figure 1: An example of coral bleaching caused by thermal stress.
    </figcaption>
</figure>

### <b>1.2 <span style='color:#6495ED'>|</span> Objectives</b> 

The objective of this notebook is to develop a robust predictive model for coral bleaching severity using a comprehensive dataset. The workflow is divided into several key stages, each contributing to the final goal of creating a high-performing model through exploratory data analysis, preprocessing, and model training. Here's a brief overview of each section:

1. **Exploratory Data Analysis (EDA)**:  
   This section focuses on cleaning and understanding the dataset. We remove irrelevant columns, handle missing values and duplicates, and standardize the data for consistency. The exploration phase also involves generating insights to inform feature engineering and model training.

2. **Data Preprocessing**:  
   To prepare the data for modeling, we encode categorical variables, engineer new features, apply clustering to geographical data, and split the dataset into training and testing sets. This stage also includes scaling and normalization techniques to ensure that the model handles the data effectively.

3. **Model Training**:  
   In this stage, we evaluate various machine learning models, perform initial training, and validate their performance. We further fine-tune the models using hyperparameter optimization techniques such as Random Search and Bayesian Search to improve their predictive accuracy. The best model is selected based on performance metrics.

4. **Findings Summary and Conclusion**:  
   The final section summarizes the key findings from the analysis, model training, and hyperparameter tuning. The notebook concludes with insights into the performance of the models and recommendations for future improvements.

<hr>

## <div style="padding: 20px;color:white;margin:10;font-size:90%;text-align:left;display:fill;border-radius:10px;background-color: rgba(0, 0, 0, 0.2);overflow: hidden; background-image: url(https://i.ibb.co/1fn8BZj/image4.webp); background-size: cover; background-position: center; background-blend-mode: darken;"><b><span style='color:white'> 2 | Environment Setup</span></b> </div>

### <b>2.1 <span style='color:#6495ED'>|</span> Import Libraries</b> 

We will start by importing all necessary libraries required for data manipulation, visualization, machine learning models, hyperparameter tuning, and statistical analysis. The imported libraries serve different purposes, including handling datasets, performing machine learning tasks, and creating visualizations. Additionally, we include utilities to suppress unnecessary warnings and display progress during iterations.

In [1]:
# Import necessary libraries

# Data manipulation and analysis
import pandas as pd  # Library for data manipulation and analysis, especially for tabular data (DataFrames)
import numpy as np  # Library for numerical computing, especially arrays and matrix operations

# Visualization
import matplotlib.pyplot as plt  # Library for creating static, animated, and interactive visualizations
import seaborn as sns  # High-level interface for drawing attractive statistical graphics
import cartopy.crs as ccrs  # Library for map projections and geographic data visualizations
import cartopy.feature as cfeature  # Used to add features (e.g., coastlines, borders) to cartopy maps
import matplotlib.cm as cm  # Colormap utilities from Matplotlib

# Machine learning models and utilities
from sklearn.preprocessing import LabelEncoder, RobustScaler, PowerTransformer  # Data preprocessing utilities
from sklearn.decomposition import PCA  # Principal Component Analysis for dimensionality reduction
from sklearn.model_selection import train_test_split, cross_val_score, cross_validate  # Model evaluation and splitting datasets
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV  # Hyperparameter tuning using search algorithms
from sklearn.cluster import KMeans  # K-Means clustering algorithm
import xgboost as xgb # XGBoost algorithm for gradient boosting
from sklearn.linear_model import Ridge, ElasticNet  # Linear regression models (Ridge and ElasticNet)
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, AdaBoostRegressor  # Ensemble models for regression
from sklearn.svm import SVR  # Support Vector Regressor (SVR) for regression tasks
from sklearn.tree import DecisionTreeRegressor  # Decision Tree model for regression tasks
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score  # Evaluation metrics for regression models
from xgboost import XGBRegressor  # XGBoost algorithm for gradient boosting
from lightgbm import LGBMRegressor  # LightGBM algorithm for gradient boosting
from catboost import CatBoostRegressor  # CatBoost algorithm for gradient boosting

# Hyperparameter optimization using Bayesian search
from skopt import BayesSearchCV  # Bayesian search algorithm for hyperparameter tuning
from skopt.space import Integer, Real  # Define search spaces for the Bayesian search algorithm

# Progress bar for loops
from tqdm import tqdm  # Progress bar utility for iterating through loops

# Display DataFrames in Jupyter Notebooks
from IPython.display import display  # Allows displaying objects like DataFrames in Jupyter Notebook

# Imputation library for missing data
import miceforest as mf  # Library for handling missing data imputation using random forests

# Statistical modeling and hypothesis testing
import statsmodels.api as sm  # Library for statistical models and hypothesis testing
import scipy.stats as stats  # Library for statistical functions
from scipy.stats import pearsonr  # Pearson correlation coefficient calculation

# Ignore unnecessary warnings
import warnings
warnings.filterwarnings('ignore')  # Suppresses warnings to keep output cleaner

ModuleNotFoundError: No module named 'miceforest'

### <b>2.2 <span style='color:#6495ED'>|</span> Load the Dataset</b> 

Next, we will load the dataset containing global coral bleaching environmental data from a CSV file into a Pandas DataFrame. The dataset is stored locally, and we specify the file path to load it. After loading the dataset, we display the first five rows to preview its structure and get an initial sense of the data, including column names and a few sample records.

In [None]:
# Load the dataset from a CSV file into a Pandas DataFrame
file_path = '/kaggle/input/global-bleaching-environmental/global_bleaching_environmental.csv'

# Read the CSV file into a Pandas DataFrame
data = pd.read_csv(file_path)

# Display the first five rows of the dataset
data.head()

<hr>

## <div style="padding: 20px;color:white;margin:10;font-size:90%;text-align:left;display:fill;border-radius:10px;background-color: rgba(0, 0, 0, 0.2);overflow: hidden; background-image: url(https://i.ibb.co/1fn8BZj/image4.webp); background-size: cover; background-position: center; background-blend-mode: darken;"><b><span style='color:white'> 3 | Exploratory Data Analysis (EDA)</span></b> </div>

### <b>3.1 <span style='color:#6495ED'>|</span> Dataset Initial Inspection</b> 

Before beginning the Exploratory Data Analysis, it's essential to first review the dataset to identify any major issues that could impact the analysis. 

In [None]:
# Inspect the data columns and types
data.info()

In [None]:
# View some rows of the data to understand the data structure
data.head()

Upon inspecting the dataset using `data.info()` and `data.head()`, several key observations were made:

- The dataset is quite extensive, containing **41,361 entries** and **62 variables**. While this suggests a rich source of information, domain knowledge indicates that not all variables are directly relevant to the analysis of coral bleaching.
- Although no missing values were reported by `data.info()`, a closer inspection of the data using `data.head()` revealed entries marked as `nd` (no data). These entries were incorrectly recorded and should be treated as missing values.
- Based on the information provided by `data.info()`, it is evident that all attributes are currently categorized as objects. This is incorrect for numerical variables and will need to be corrected to properly reflect their data types for accurate analysis.

To prepare the dataset for further analysis, a series of data cleaning steps were conducted, ensuring that the dataset is both accurate and relevant for studying coral bleaching patterns.

### <b>3.2 <span style='color:#6495ED'>|</span> Remove Irrelevant Variables</b> 

As mentioned in the previous section, the dataset contains a total of 62 columns, indicating a rich and detailed dataset. However, in the context of this study on **coral bleaching** and **building a predictive model**, including irrelevant features could lead to unnecessary complexity in the analysis and potentially degrade the model's performance. To enhance the focus and efficiency of the analysis, the dataset was simplified by removing columns that are not directly related to coral bleaching. The following table provides detailed reasons for the removal of each attribute.

| **Column Name**                     | **Reason for Removal**                                                                                                                                                    |
|-------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Site_ID                             | Irrelevant for analysis; only used as a unique identifier for the site.                                                                                                  |
| Sample_ID                           | Irrelevant for analysis; only used as a unique identifier for the sampling event.                                                                                        |
| Data_Source                         | Not needed for current analysis, relates to metadata about how data was collected.                                                                                       |
| Reef_ID                             | Redundant identifier, same purpose as Site_ID.                                                                                                                           |
| Country_Name                        | Geographical information is already sufficiently captured by latitude, longitude, and ocean name variables.                                                              |
| State_Island_Province_Name           | Detailed geographical information not required for the analysis.                                                                                                         |
| City_Town_Name                      | Too granular geographical information for the intended analysis.                                                                                                         |
| Site_Name                           | Redundant with Site_ID; not needed for analysis.                                                                                                                         |
| Date_Day                            | Date is captured sufficiently by the full date column (Date).                                                                                                            |
| Date_Month                          | Date is captured sufficiently by the full date column (Date).                                                                                                            |
| Date_Year                           | Date is captured sufficiently by the full date column (Date).                                                                                                            |
| Substrate_Name                      | Not relevant for current analysis; relates to the type of substrate, which may not impact the focus of the study.                                                        |
| Temperature_Minimum                 | Detailed temperature variation (minimum) not required for the analysis.                                                                                                  |
| Temperature_Kelvin_Standard_Deviation | Standard deviation for temperature in Kelvin is not essential; focusing on Celsius values for analysis.                                                                  |
| SSTA_Standard_Deviation             | Not required; mean, minimum, and maximum values provide sufficient detail on SST anomalies.                                                                              |
| SSTA_Minimum                        | Not critical for analysis as the mean and maximum values of SSTA provide sufficient information.                                                                         |
| SSTA_FrequencyMax                   | Focusing on other SSTA-related variables (e.g., frequency and mean) makes this variable redundant.                                                                       |
| SSTA_DHW_Standard_Deviation         | Not critical for analysis; mean values and maximum provide sufficient details about degree heating weeks.                                                               |
| SSTA_DHWMax                         | Maximum values are not needed, as mean values of SSTA_DHW are sufficient for analysis.                                                                                   |
| SSTA_DHWMean                        | Other variables related to SSTA_DHW provide more relevant information for the analysis.                                                                                  |
| SSTA_Frequency_Standard_Deviation    | Not essential for analysis; mean frequency values are sufficient.                                                                                                       |
| SSTA_FrequencyMean                  | Other variables related to SSTA frequency provide sufficient detail.                                                                                                    |
| TSA_Standard_Deviation              | Standard deviation for TSA is not critical; mean, max, and minimum values are sufficient.                                                                                |
| TSA_Minimum                         | Minimum TSA value is not as relevant as mean and maximum values for analysis.                                                                                            |
| TSA_Frequency_Standard_Deviation    | Too granular for current analysis; focusing on mean and maximum frequency values instead.                                                                                |
| TSA_FrequencyMean                   | Other TSA-related variables provide sufficient information for analysis.                                                                                                |
| TSA_FrequencyMax                    | Maximum TSA frequency is not required as other TSA frequency variables are already included.                                                                             |
| TSA_DHW_Standard_Deviation          | Not essential; other TSA_DHW values (e.g., mean and maximum) are more relevant for the analysis.                                                                         |
| TSA_DHWMax                          | Not required for the analysis as other TSA_DHW-related variables provide sufficient information.                                                                         |
| TSA_DHWMean                         | Sufficient information is provided by other TSA_DHW-related variables.                                                                                                   |
| Site_Comments                       | Not required for the analysis as it's purely additional notes and does not impact core data.                                                                             |
| Bleaching_Comments                  | Not needed for the analysis as it contains extra information not directly related to the numerical analysis.                                                             |
| Sample_Comments                     | Extra information not relevant for the analysis.                                                                                                                        |
| ClimSST                             | Not necessary for the analysis; focus on other SST-related variables provides sufficient information.                                                                   |
| SSTA_Mean                           | Redundant information; other SSTA-related variables provide the necessary insight.                                                                                       |
| TSA_Mean                            | Redundant information; other TSA-related variables provide sufficient information.                                                                                       |
| Percent_Cover                       | Not required for the current analysis focus.                                                                                                                             |
| Temperature_Mean                    | Not essential, focusing on minimum, maximum, and relevant anomalies instead.                                                                                            |


In [None]:
# Drop irrelevant columns from the dataset

# Identify columns to drop
columns_to_drop = [
    'Site_ID', 'Sample_ID', 'Data_Source', 'Reef_ID', 'Country_Name', 'State_Island_Province_Name', 
    'City_Town_Name', 'Site_Name', 'Date_Day', 'Date_Month', 'Date_Year', 'Substrate_Name', 
    'Temperature_Minimum', 'Temperature_Kelvin_Standard_Deviation', 'SSTA_Standard_Deviation', 
    'SSTA_Minimum', 'SSTA_FrequencyMax', 'SSTA_DHW_Standard_Deviation', 'SSTA_DHWMax', 'SSTA_DHWMean', 
    'SSTA_Frequency_Standard_Deviation', 'SSTA_FrequencyMean', 'TSA_Standard_Deviation', 'TSA_Minimum', 
    'TSA_Frequency_Standard_Deviation', 'TSA_FrequencyMean', 'TSA_FrequencyMax', 'TSA_DHW_Standard_Deviation', 
    'TSA_DHWMax', 'TSA_DHWMean', 'Site_Comments', 'Bleaching_Comments', 'Sample_Comments', 'ClimSST', 
    'SSTA_Mean', 'TSA_Mean', 'Percent_Cover', 'Temperature_Mean'
]

# Drop the columns
data = data.drop(columns=columns_to_drop)

In [None]:
# Check the new shape of the dataset
data.info()

The dataset has been refined to include **24 columns**, making it more manageable and focused for analysis.

### <b>3.3 <span style='color:#6495ED'>|</span> Dataset Standardization</b> 

*Data standardization is crucial for converting datasets from multiple sources into a consistent format (Aggarwal & Kumar, 2021)*. The `Data_Source` column indicates that the dataset aggregates information from varied formats, resulting in non-uniformity that complicates analysis. Therefore, initial data processing standardizes the dataset to harmonize the data, reduce discrepancies, and enhance the reliability of subsequent insights.

#### <b>3.3.1 <span style='color:#6495ED'>|</span> Correct 'nd' Entries</b> 

The missing values in the dataset are represented as `nd`, meaning *"no data."* While this notation might be helpful for manual data review, it can lead to inaccurate results during programmatic data analysis if not properly addressed.

In [None]:
# Replace 'nd' with NaN across the entire dataframe
data.replace('nd', np.nan, inplace=True)

#### <b>3.3.2 <span style='color:#6495ED'>|</span> Fix Structural Errors</b> 

The **numerical data types** should be set as `float64` or `int64` to ensure that the `pandas` DataFrame can accurately compute statistics and generate plots in subsequent analysis. The relevant numerical columns will be selected and converted to numeric types as outlined below.

In [None]:
# Numerical columns to convert
columns_to_convert = [
    'Distance_to_Shore', 'Turbidity', 'Depth_m', 
    'Percent_Bleaching', 'SSTA', 'SSTA_Maximum', 'SSTA_Frequency', 
    'SSTA_DHW', 'TSA', 'TSA_Maximum', 'TSA_Frequency', 
    'TSA_DHW', 'Temperature_Maximum', 'Windspeed', 'Temperature_Kelvin'
]

# Convert all the identified columns to numeric, coercing errors to NaN
data[columns_to_convert] = data[columns_to_convert].apply(pd.to_numeric, errors='coerce')

Another important data type to consider is **datetime**. The dataset includes a `Date` column, which should be converted to and verified as a datetime type. This ensures that temporal data, such as the day, month, and year, can be accurately extracted from this column for further analysis.

In [None]:
# Convert the 'Date' column to datetime format: YYYY-MM-DD
data['Date'] = pd.to_datetime(data['Date'], format='%Y-%m-%d')

The temperature columns are defined as **Sea Surface Temperature (SST)** in the dataset dictionary. To ensure **consistency** with other temperature variables such as `SSTA` and `TSA`, these columns will be renamed to `SST`, and `SST_Maximum`.

In [None]:
# Rename columns for better readability
data.rename(columns={'Temperature_Kelvin': 'SST', 'Temperature_Maximum': 'SST_Maximum'}, inplace=True)

Lastly, the null values in the `Bleaching_Level` column should be replaced with "Colony." According to the column's definition, the levels are either **Colony** or **Population**, indicating the scale of bleaching. Replacing the null values with "Colony" is crucial to ensure that any analysis of the `Bleaching_Level` is more accurate and meaningful.

In [None]:
# Replace NaN values with "Colony" in the 'Bleaching_Level' column
data.fillna({'Bleaching_Level': 'Colony'}, inplace=True)

### <b>3.4 <span style='color:#6495ED'>|</span> Missing Values</b> 

Missing values in the dataset can lead to inconsistencies and skewed patterns in the data, potentially impacting the accuracy of any analysis. Therefore, examining and addressing missing values before conducting any analysis is crucial. This step ensures the dataset's integrity, leading to more reliable insights and outcomes.

To start, we will check the total number of missing values across all variables in the dataset.

In [None]:
# Check for missing values
data.isnull().sum()

The `Percent_Bleaching` column has the highest number of missing values, with **6,846 entries lacking data**. This suggests that the dataset contains incomplete samples. Since `Percent_Bleaching` is a key variable in the study, it's crucial to address these missing values. Following the methodology of **ignoring missing values** (Lakshminarayan et al., 1999), we will remove these incomplete entries from the dataset. Imputing the missing values could involve complex computations, and incorrect imputation might introduce bias and distort the patterns in subsequent analysis.

The rows with missing values in the `Percent_Bleaching` column will be dropped initially before proceeding to the next steps in the analysis.

In [None]:
# Drop rows where the target variable 'Percent_Bleaching' is missing
data = data.dropna(subset=['Percent_Bleaching'])

# Count the missing values again after dropping rows with missing target variable
missing_values_after_dropping = data.isnull().sum()

missing_values_after_dropping

While `Percent_Bleaching` has no missing values, other environmental variables do, limiting the analysis of bleaching drivers. To address this, we use Multiple Imputation by Chained Equations (MICE) for the following reasons:

1. **Preserves Data Integrity**: MICE maintains the relationships and variability between variables, providing a more accurate representation than simpler methods.

2. **Handles Complex Interactions**: Coral bleaching is influenced by multiple interrelated factors. MICE effectively imputes missing data by considering these complex relationships.

3. **Improves Analysis**: Imputing missing values allows for a more complete dataset, enhancing the accuracy and reliability of our bleaching analysis.

4. **Focus on Numerical Data**: MICE is designed for numerical data, not categorical variables. Therefore, we drop rows where `Ecoregion_Name` is missing before applying MICE to ensure accurate imputation.

In [None]:
# Drop rows where 'Ecoregion_Name' is missing
data.dropna(subset=['Ecoregion_Name'], inplace=True)

# Reset index after dropping rows
data.reset_index(drop=True, inplace=True)

# Select only the numerical columns
numeric_columns = data.select_dtypes(include=['float64', 'int64']).columns
data_numeric = data[numeric_columns]

# Create a kernel for imputation on the numerical data only
kernel = mf.ImputationKernel(
    data=data_numeric,
    data_subset=5,    # Number of datasets to create
    random_state=42
)

# Run the MICE algorithm with 5 iterations
kernel.mice(iterations=5)

# Impute the missing values and get the first imputed dataset
imputed_numeric_df = kernel.complete_data(0)

# Replace the imputed numerical columns back into the original dataset
data.update(imputed_numeric_df)

# Output the shape of the dataset
data.shape

### <b>3.5 <span style='color:#6495ED'>|</span> Duplicates</b> 

Duplicates in a dataset do not add value or new information during analysis, especially in this context. If two rows are **exactly the same**, it indicates that the *same reef at the same location* has been sampled multiple times. This redundancy does not contribute additional insights and can even distort the analysis by **overrepresenting certain data points**.

In ecological studies, where each observation is meant to provide **unique information** about environmental conditions and coral health, *duplicate entries* can lead to **biased results**, such as inflating the perceived impact of certain environmental factors or misleading trends. Therefore, it's essential to **identify and remove duplicates** to ensure that the analysis is based on truly independent observations, thereby preserving the *integrity* and **accuracy** of the results.

In [None]:
# Check for duplicate rows
data.duplicated().sum()

In [None]:
# Drop duplicate rows
data = data.drop_duplicates(keep='first')

# Reset the index of the dataset
data.reset_index(drop=True, inplace=True)

### <b>3.6 <span style='color:#6495ED'>|</span> Dataset Overview</b> 

Here is the updated dictionary for the cleaned dataset, following several data cleaning steps.

| **Column Name**        | **Description**                                                      | **Data Type**       | **Unit**                   |
|------------------------|----------------------------------------------------------------------|---------------------|----------------------------|
| `Latitude_Degrees`      | Latitude of the observation point                                   | `float64`           | Degrees                    |
| `Longitude_Degrees`     | Longitude of the observation point                                  | `float64`           | Degrees                    |
| `Ocean_Name`            | Name of the ocean where the observation was made                    | `object`            | N/A                        |
| `Realm_Name`            | Marine biogeographic realm where the observation was made           | `object`            | N/A                        |
| `Ecoregion_Name`        | Ecoregion where the observation was made                            | `object`            | N/A                        |
| `Distance_to_Shore`     | Distance from the shore to the observation point                    | `float64`           | Meters (m)            |
| `Exposure`              | Type of exposure to environmental factors                           | `object`            | Categorical (Sheltered, Exposed, etc.) |
| `Turbidity`             | Water turbidity level (higher values indicate murkier water)        | `float64`           | Index (no unit)            |
| `Cyclone_Frequency`     | Frequency of cyclones in the region                                 | `float64`           | Events per year            |
| `Depth_m`               | Depth of the observation point                                      | `float64`           | Meters (m)                 |
| `Bleaching_Level`       | Categorical variable indicating severity of bleaching               | `object`            | N/A                        |
| `Percent_Bleaching`     | Percentage of coral bleaching observed                              | `float64`           | Percentage (%)             |
| `SST`                   | Sea Surface Temperature at the time of observation                  | `float64`           | Kelvin (K)                 |
| `SST_Maximum`           | Maximum Sea Surface Temperature recorded                            | `float64`           | Kelvin (K)                 |
| `Windspeed`             | Wind speed at the observation point                                 | `float64`           | Meters per second (m/s)    |
| `SSTA`                  | Sea Surface Temperature Anomaly (difference from average SST)       | `float64`           | Kelvin (K)                 |
| `SSTA_Maximum`          | Maximum Sea Surface Temperature Anomaly recorded                    | `float64`           | Kelvin (K)                 |
| `SSTA_Frequency`        | Frequency of sea surface temperature anomalies                      | `float64`           | Events                     |
| `SSTA_DHW`              | Degree Heating Weeks due to Sea Surface Temperature Anomalies       | `float64`           | Degree Heating Weeks (DHW) |
| `TSA`                   | Thermal Stress Anomaly (measure of temperature stress)              | `float64`           | Kelvin (K)                 |
| `TSA_Maximum`           | Maximum Thermal Stress Anomaly recorded                             | `float64`           | Kelvin (K)                 |
| `TSA_Frequency`         | Frequency of Thermal Stress Anomalies                               | `float64`           | Events                     |
| `TSA_DHW`               | Degree Heating Weeks due to Thermal Stress Anomalies                | `float64`           | Degree Heating Weeks (DHW) |
| `Date`                  | Date of the observation                                             | `datetime64[ns]`    | Date                       |

We will copy the cleaned data into a new data frame to preserve the original for future reference during analysis.

In [None]:
# Copy the original dataset for analysis to avoid modifying the cleaned dataset
data_for_eda = data.copy()

To gain an overview of the dataset, we will start by examining the geographic distribution of the data.

In [None]:
# Set up the figure and axes for the plots
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Define custom colors
color_latitude = '#1192e8' 
color_longitude = '#a56eff'

# Plotting Latitude Distribution
sns.histplot(data_for_eda['Latitude_Degrees'], bins=50, kde=True, color=color_latitude, edgecolor=color_latitude, ax=axes[0])
axes[0].set_title('Latitude Distribution')
axes[0].set_xlabel('Latitude Degrees')
axes[0].set_ylabel('Frequency')

# Plotting Longitude Distribution
sns.histplot(data_for_eda['Longitude_Degrees'], bins=50, kde=True, color=color_longitude, edgecolor=color_longitude, ax=axes[1])
axes[1].set_title('Longitude Distribution')
axes[1].set_xlabel('Longitude Degrees')
axes[1].set_ylabel('Frequency')

# Adjust layout for better spacing
plt.tight_layout()

# Show the plots
plt.show()

🔎 **Observations:**

- **Latitude Distribution:**
    - The data is concentrated around **-20 to 30 degrees** latitude, indicating that most coral bleaching events occur in the **tropics and subtropics**.
    - Peaks are particularly noticeable around **-20 degrees** (which likely represents southern hemisphere regions) and **10-20 degrees north**, corresponding to tropical reef locations near the equator.

- **Longitude Distribution:**
    - The data is primarily concentrated between **-100 to 150 degrees** longitude.
    - There are significant peaks around **-100 degrees** (which may include regions in the Pacific Ocean and Americas) and **around 100-150 degrees**, which could correspond to locations like the **Indian Ocean** and **Western Pacific** regions.
    - The dataset includes a smaller presence of coral bleaching events in the Atlantic or other regions of the western hemisphere based on the gap near **-50 to 0 degrees** longitude.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Most coral bleaching data appears to come from tropical regions</strong>, both in the northern and southern hemispheres.</li>
        <li><strong>Pacific and Indian Ocean regions seem to dominate the dataset geographically</strong>, as indicated by the longitude distribution.</li>
    </ul>
</div>

These observations suggest that the dataset is focused on tropical coral reefs, likely located in major reef systems such as the **Great Barrier Reef** and the **Pacific coral regions**.

In [None]:
# Set up the figure and axes for the plots
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Define colors for the subplots
color1 = '#1192e8'
color2 = '#a56eff'

# Plot Ocean Name Distribution
sns.countplot(y=data_for_eda['Ocean_Name'], ax=axes[0], 
              order=data_for_eda['Ocean_Name'].value_counts().index, color=color1)
axes[0].set_title('Ocean Name Distribution')
axes[0].set_xlabel('Count')
axes[0].set_ylabel('Ocean Name')

# Add labels on the bars for Ocean Name
axes[0].bar_label(axes[0].containers[0], fmt='%d', label_type='edge', padding=5)

# Plot Realm Name Distribution
sns.countplot(y=data_for_eda['Realm_Name'], ax=axes[1], 
              order=data_for_eda['Realm_Name'].value_counts().index, color=color2)
axes[1].set_title('Realm Name Distribution')
axes[1].set_xlabel('Count')
axes[1].set_ylabel('Realm Name')

# Add labels on the bars for Realm Name
axes[1].bar_label(axes[1].containers[0], fmt='%d', label_type='edge', padding=5)

# Adjust layout for better spacing and display
plt.tight_layout(pad=2)

# Show the plots
plt.show()

🔎 **Observations:**

- **Ocean Name Distribution:**
    - The majority of coral bleaching data comes from the **Atlantic Ocean**, followed closely by the **Pacific Ocean**.
    - The **Indian Ocean** has significantly fewer records compared to the Atlantic and Pacific.
    - The **Red Sea** and **Arabian Gulf** are the least represented regions in the dataset, indicating less data on coral bleaching from these areas.

- **Realm Name Distribution:**
    - The **Tropical Atlantic** dominates the dataset, with the most bleaching records.
    - The **Central Indo-Pacific** and **Western Indo-Pacific** realms also have a substantial amount of data, reflecting the prominence of coral reefs in these tropical areas.
    - **Eastern Indo-Pacific** has fewer records, and other temperate and tropical regions, such as the **Temperate Northern Pacific** and **Temperate Australasia**, have minimal representation.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Tropical regions, particularly in the Atlantic and Indo-Pacific, are the most impacted or documented in the dataset.</strong></li>
        <li><strong>Data gaps are evident for some regions</strong> like the Arabian Gulf, Red Sea, and Temperate zones, suggesting either fewer bleaching events or less reporting from these regions.</li>
    </ul>
</div>

These visualizations reinforce the importance of **tropical regions** as hotspots for coral bleaching, particularly the **Atlantic** and **Indo-Pacific** realms.

In [None]:
# Create a figure with a specific PlateCarree projection
fig = plt.figure(figsize=(20, 10))
ax = plt.axes(projection=ccrs.PlateCarree())

# Set the map to display the whole globe
ax.set_global()

# Add basic geographic features to the map
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')

# Define ocean colors, with a default color for any unlisted ocean
ocean_colors = {
    'Pacific': 'tab:blue',
    'Atlantic': 'tab:green',
    'Indian': 'tab:red',
    'Southern': 'tab:purple',
    'Arctic': 'tab:orange',
    'Arabian Gulf': 'tab:cyan',  
    'Red Sea': 'tab:brown'
}
default_color = 'gray'

# Plot each ocean's coral bleaching sites on the map
for ocean in data_for_eda['Ocean_Name'].unique():
    subset = data_for_eda[data_for_eda['Ocean_Name'] == ocean]
    ax.scatter(subset['Longitude_Degrees'], subset['Latitude_Degrees'],
               color=ocean_colors.get(ocean, default_color), label=ocean, s=20, alpha=0.6,
               transform=ccrs.PlateCarree())

# Add a legend for ocean names
plt.legend(title='Ocean Name', loc='upper right', bbox_to_anchor=(1, 1))

# Set the map title
plt.title('Geographical Distribution of Coral Bleaching Sites', fontsize=14)

# Display the map
plt.show()

🔎 **Observations:**

- **Ocean Region Distribution:**

    - **Atlantic Ocean (Green):**
        - Coral bleaching sites are highly concentrated in the **Caribbean** and along the coasts of **Central and South America**.
        - Fewer bleaching events are recorded along the **west coast of Africa**.

    - **Pacific Ocean (Blue):**
        - Extensive coral bleaching records are present in the **Pacific Islands**, **Southeast Asia**, and the **Great Barrier Reef** off the coast of Australia.
        - Coral bleaching events are also scattered across the **central Pacific**, illustrating the widespread impact of bleaching in this ocean.

    - **Indian Ocean (Red):**
        - Notable coral bleaching occurs along the coasts of **East Africa**, **Madagascar**, and the **Indian subcontinent**.
        - Bleaching is also prevalent in **Southeast Asia** and **Indonesia**.

    - **Red Sea (Orange):**
        - The **Red Sea** displays a concentrated line of bleaching sites along its entire coast, impacting both **Africa** and the **Arabian Peninsula**.

    - **Arabian Gulf (Cyan):**
        - Fewer bleaching sites are observed in the **Arabian Gulf**, with concentrations around the coasts of **Saudi Arabia** and the **United Arab Emirates**.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Coral bleaching is heavily concentrated in tropical regions across the Pacific, Indian, and Atlantic Oceans.</strong></li>
        <li>The <strong>Pacific</strong> and <strong>Indian Oceans</strong> exhibit widespread and scattered bleaching events, reflecting the vast coral reef systems in these areas.</li>
        <li>The <strong>Caribbean</strong> and <strong>Red Sea</strong> stand out as smaller regions with high concentrations of bleaching events, indicating significant coral stress.</li>
    </ul>
</div>

These observations highlight the **global scale** of coral bleaching, with major hotspots in tropical and subtropical regions worldwide.

In [None]:
# Count the number of records per year
data_availability = data_for_eda['Date'].dt.year.value_counts().sort_index()

# Define colors: use 'tab:orange' for years with more than 500 records, otherwise 'tab:blue'
colors = ['#a56eff' if value > 500 else '#1192e8' for value in data_availability.values]

# Plotting the bar chart to show data availability across years
plt.figure(figsize=(14, 8))
plt.bar(data_availability.index, data_availability.values, color=colors)

# Set the title and labels
plt.title('Data Availability Across Years', fontsize=16)
plt.xlabel('Year', fontsize=14)
plt.ylabel('Number of Records', fontsize=12)

# Show the plot
plt.tight_layout()
plt.show()

🔎 **Observations:**

- **Pre-1995**: Very few records exist prior to the mid-1990s, suggesting that either coral bleaching was *less common* or that *data collection efforts* were not as extensive before this period.

- **1998-2005**: There is a steady increase in the number of records starting around **1998**, which coincides with some major **coral bleaching events**, including the **1998 global coral bleaching event**. This trend continues with a sharp rise through the early 2000s.

- **2005 Peak**: The year **2005** stands out as having the *highest number of records*, with **over 3,000 entries**. This spike may correspond to an **intense bleaching event** or an improvement in **data collection practices** during that year.

- **Post-2005 Decline**: After **2005**, there is a *noticeable decline* in the number of records, though the data remains relatively consistent between **2006-2015**, with some fluctuation.

- **Recent Years**: After **2015**, the number of records begins to decline again, with *very low data availability* in **2020**. This could be due to a variety of factors, including **less reporting**, **shifts in research focus**, or the impacts of **recent global events**.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Coral bleaching is heavily concentrated in tropical regions across the Pacific, Indian, and Atlantic Oceans.</strong></li>
        <li>The <strong>Pacific</strong> and <strong>Indian Oceans</strong> exhibit widespread and scattered bleaching events, reflecting the vast coral reef systems in these areas.</li>
        <li>The <strong>Caribbean</strong> and <strong>Red Sea</strong> stand out as smaller regions with high concentrations of bleaching events, indicating significant coral stress.</li>
    </ul>
</div>


<div style="border-radius:10px;padding: 15px;background-color:#6495ED
;color:white;font-size:100%;text-align:left">
    ⭐️ <strong>Key Takeaways:</strong>
    <ul>
        <li><strong>Tropical Regions are Most Affected:</strong> The dataset contains 23,261 records of coral bleaching percentages, indicating a substantial volume of data for analysis.</li>
        <br>
        <li><strong>Critical Coral Ecosystems Under Threat:</strong> These heavily impacted regions are home to some of the most critical coral ecosystems globally, making them especially vulnerable to environmental stressors such as rising ocean temperatures and climate change.</li>
        <br>
        <li><strong>Data Availability Peaks in 2005:</strong> The dataset shows a sharp increase in coral bleaching records starting from the late 1990s, with 2005 being a pivotal year for documentation, likely due to significant global bleaching events or enhanced monitoring efforts.</li>
        <br>
        <li><strong>Decline in Recent Years:</strong> There has been a notable decrease in data collection in recent years, particularly after 2015, which could reflect gaps in monitoring or challenges in data collection, possibly linked to global events like the COVID-19 pandemic.</li>
        <br>
        <li><strong>Importance of Ongoing Monitoring:</strong> The combination of geographic and temporal data highlights the need for continued monitoring and research to fully understand the global extent of coral bleaching and to address potential gaps in data collection efforts in recent years.</li>
        <br>
        <li><strong>Future Research Directions:</strong> These insights provide a foundation for further investigation into the specific drivers of coral bleaching and the localized impacts on different coral ecosystems worldwide. Effective strategies to mitigate these effects depend on sustained and comprehensive research efforts.</li>
    </ul>
</div>


<hr>

### <b>3.7 <span style='color:#6495ED'>|</span> Coral Bleaching Trends and Severity</b> 

<div style="border-radius:10px;padding: 15px;background-color:#6495ED;color:white;font-size:100%;text-align:left">
    ⭐️ <strong>Key Takeaways:</strong>
    <ul>
        <li><strong>Minor Bleaching is Common:</strong> The mean bleaching is 13%, with most data showing low to moderate levels and severe bleaching being less frequent.</li>
        <br>
        <li><strong>Colony-level Data is Valuable:</strong> Colony-level bleaching offers detailed insights into reef health, providing critical data for assessments.</li>
        <br>
        <li><strong>Rare Extreme Bleaching:</strong> Most corals exhibit little to no bleaching, while extreme cases (80-100%) are rare but crucial for understanding coral stress.</li>
        <br>
        <li><strong>Historical Peaks in the 1980s and 2005:</strong> Severe bleaching events peaked in the 1980s and 2005, with a decline in bleaching intensity and frequency since then.</li>
        <br>
        <li><strong>Comprehensive Data Post-1998:</strong> Colony and population-level data from 1998 onwards provides a balanced view of localized and ecosystem-wide bleaching.</li>
    </ul>
</div>

In [None]:
# Statistical summary of bleaching percentage and bleaching level
data_for_eda[['Percent_Bleaching', 'Bleaching_Level']].describe(include='all')

🔎 **Observations:**

- **Percent Bleaching:**
    - **23,261 records** with a **mean bleaching** of **13.02%** and **high variability** (**std. dev. 22.97%**).
    - Bleaching ranges from **0% to 100%** with **25% of data** showing no bleaching.
    - **75% of data** reports bleaching of **13.1% or lower**, indicating severe bleaching is less common.

- **Bleaching Level:**
    - **Two categories** with **Colony** being the most frequent (11,952 records), offering detailed insights.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Minor bleaching is common</strong>, severe bleaching less frequent.</li>
        <li><strong>Colony-level data offers valuable detail</strong> for reef health assessments.</li>
    </ul>
</div>

In [None]:
# Set up the figure with subplots
fig, axes = plt.subplots(1, 2, figsize=(18, 6))

# Plot the histogram for Percent_Bleaching on the first subplot
sns.histplot(data_for_eda['Percent_Bleaching'], bins=50, kde=True, color='#1192e8', edgecolor='#1192e8', ax=axes[0])
axes[0].set_title('Distribution of Percent Bleaching')
axes[0].set_xlabel('Percent Bleaching')
axes[0].set_ylabel('Frequency')

# Custom color palette for the bar chart
custom_palette = ['#1192e8', '#a56eff']  # Blue and Pink

# Plot the bar chart for Bleaching_Level on the second subplot
sns.countplot(data=data_for_eda, x='Bleaching_Level', palette=custom_palette, ax=axes[1])
axes[1].set_title('Count of Bleaching Levels')
axes[1].set_xlabel('Bleaching Level')
axes[1].set_ylabel('Count')

# Adjust layout for better spacing
plt.tight_layout()

# Show the combined plots
plt.show()

🔎 **Observations:**

- **Distribution of Percent Bleaching:**
    - Most records show **0% bleaching** with over **12,000 instances**, indicating healthy corals.
    - **Mild to moderate bleaching (5-20%)** is less frequent, while **higher bleaching levels (20-40%)** are rare but notable.
    - A small cluster of **extreme bleaching (80-100%)** indicates severe coral stress or mortality.

- **Count of Bleaching Levels:**
    - Near-equal split between **Colony-level** and **Population-level** data, with colony-level slightly higher.
    - This balance captures both **localized** and **widespread bleaching**, offering insights into **bleaching hotspots** and resilience patterns.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Most corals show little to no bleaching</strong>, with extreme cases rare but critical.</li>
        <li><strong>Both colony and population-level data provide comprehensive insights</strong> into localized and ecosystem-wide bleaching impacts.</li>
    </ul>
</div>

In [None]:
# Group the data by year and calculate the mean for Percent_Bleaching
percent_bleaching_over_time = data_for_eda.groupby(data_for_eda['Date'].dt.year)['Percent_Bleaching'].mean().reset_index()

# Group the data by year and count the occurrences of each Bleaching_Level
bleaching_level_over_time = data_for_eda.groupby([data_for_eda['Date'].dt.year, 'Bleaching_Level']).size().unstack(fill_value=0).reset_index()

# Define custom palette
custom_palette = ['#1192e8', '#a56eff']  # Blue and Pink

# Plotting the trends over time
fig, axs = plt.subplots(2, 1, figsize=(20, 10))

# First subplot: Percent Bleaching over time (changed to blue)
axs[0].plot(percent_bleaching_over_time['Date'], percent_bleaching_over_time['Percent_Bleaching'], 
            label='Percent Bleaching', color='#1192e8', marker='o')
axs[0].set_xlabel('Year')
axs[0].set_ylabel('Percent Bleaching')
axs[0].set_title('Percent Bleaching Over Time')
axs[0].legend(loc='best')
axs[0].grid(True)

# Second subplot: Bleaching Level over time (stacked bar chart with custom colors)
bleaching_level_over_time.set_index('Date').plot(kind='bar', stacked=True, ax=axs[1], color=custom_palette)
axs[1].set_xlabel('Year')
axs[1].set_ylabel('Count')
axs[1].set_title('Bleaching Level Over Time')
axs[1].legend(title='Bleaching Level', loc='upper left')
axs[1].grid(True)
axs[1].tick_params(axis='x', rotation=45)

# Adjust layout for better spacing
plt.tight_layout(rect=[0, 0, 1, 0.96])

# Show the plots
plt.show()

🔎 **Observations:**

- **Percent Bleaching Over Time (Top Graph):**
    - **1980s-1990s:** Severe bleaching events, with peaks up to **80%** in the early 1980s, followed by a **sharp decline** to below **20%** by the mid-1990s.
    - **Late 1990s-2000s:** Bleaching fluctuated between **20-60%**, with a spike in **1998** (global bleaching event) and another in **2005**, before dropping to **10-30%** in the 2010s.
    - **2020:** Bleaching levels fell to below **10%**.

- **Bleaching Level Over Time (Bottom Graph):**
    - **Pre-1995:** Limited data.
    - **1998-2005:** Increase in both **Colony** and **Population-level** bleaching, peaking in **2005** with the highest number of events.
    - **Post-2005:** Gradual decline in bleaching events through the 2010s, with a sharp drop in **2020**.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Early bleaching events were more severe,</strong> especially in the 1980s and 1990s.</li>
        <li><strong>2005 saw a peak</strong> in bleaching events, followed by a gradual decline in intensity and frequency.</li>
        <li><strong>Colony and Population-level bleaching data</strong> post-1998 reflects a comprehensive documentation approach.</li>
    </ul>
</div>

<hr>

### <b>3.8 <span style='color:#6495ED'>|</span> Thermal Stress and Coral Bleaching: SST, SSTA, and TSA</b> 

#### <b>3.8.1 <span style='color:#6495ED'>|</span> Univariate Analysis</b> 

<div style="border-radius:10px;padding: 15px;background-color:#6495ED;color:white;font-size:100%;text-align:left">
    ⭐️ <strong>Key Takeaways:</strong>
    <ul>
        <li><strong>Frequent extreme SSTA events</strong> contribute to coral bleaching, with severe anomalies driving stress in specific regions.</li>
        <br>
        <li><strong>Prolonged thermal stress</strong> (high DHW) plays a significant role in sustained coral bleaching and ecosystem damage.</li>
        <br>
        <li><strong>Localized extremes</strong> in SST and SSTA indicate specific regions facing high, sustained heat stress, increasing bleaching risk.</li>
        <br>
        <li><strong>SST Maximum values above 308 K</strong> (~35°C) highlight areas of extreme warming events, correlated with severe coral stress.</li>
        <br>
        <li><strong>High TSA values</strong> signal areas under heightened bleaching risk, with high frequency and DHW further weakening coral resilience.</li>
        <br>
        <li><strong>Outliers in SST, SSTA, and TSA</strong> across various regions indicate the presence of localized thermal stress, driving both acute and chronic bleaching events.</li>
    </ul>
</div>

In [None]:
data_for_eda[['SST', 'SST_Maximum', 'SSTA', 'SSTA_Maximum', 'SSTA_Frequency', 'SSTA_DHW', 'TSA', 'TSA_Maximum', 'TSA_Frequency', 'TSA_DHW']].describe(include='all')

🔎 **Observations:**

- **Sea Surface Temperature (SST) and SST Maximum:**
    - Mean **SST** is **301.43 K** (~**28.28°C**), with a maximum of **310.44 K** (~**37.29°C**), reflecting occasional extreme heat events.
    - **SST Maximum** averages **305.11 K** (~**32°C**), indicating frequent high temperatures in coral areas.

- **Sea Surface Temperature Anomaly (SSTA) and SSTA Maximum:**
    - Mean **SSTA** is **0.29 K**, with a **maximum anomaly** of **19.89 K**, showing significant deviations from normal temperatures in some regions.

- **SSTA Frequency and DHW:**
    - Mean **SSTA Frequency** is **7.61**, indicating frequent anomalies.
    - Mean **SSTA DHW** is **3.30**, with a maximum of **53.6**, reflecting prolonged thermal stress in certain areas.

- **Thermal Stress Anomaly (TSA) and TSA Maximum:**
    - Mean **TSA** is **-0.90 K**, with **TSA Maximum** of **2.76 K**, showing occasional temperatures exceeding bleaching thresholds.

- **TSA Frequency and DHW:**
    - **TSA Frequency** averages **2.12**, with **TSA DHW** peaking at **52.45**, indicating sustained stress accumulation.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Frequent extreme SSTA events</strong> contribute to coral bleaching.</li>
        <li><strong>Prolonged thermal stress</strong> (high DHW) drives sustained bleaching.</li>
        <li><strong>Localized extremes</strong> show certain regions face severe, sustained heat stress.</li>
    </ul>
</div>

##### <b>3.8.1.1 <span style='color:#6495ED'>|</span> Distribution Histograms</b> 

In [None]:
# Define custom palette
custom_palette = ['#1192e8', '#a56eff']  # Blue for SST, Pink for SST_Maximum

# Create a figure with 2 subplots (1x2 layout)
fig, axs = plt.subplots(1, 2, figsize=(24, 8))

# Plot SST on the first subplot with blue color
sns.histplot(data_for_eda['SST'], bins=50, kde=True, color=custom_palette[0], label='SST', alpha=0.5, ax=axs[0], edgecolor=custom_palette[0])
axs[0].set_title('SST Distribution')
axs[0].set_xlabel('Temperature (Kelvin)')
axs[0].set_ylabel('Frequency')

# Plot SST_Maximum on the second subplot with pink color
sns.histplot(data_for_eda['SST_Maximum'], bins=50, kde=True, color=custom_palette[1], label='SST_Maximum', alpha=0.5, ax=axs[1], edgecolor=custom_palette[1])
axs[1].set_title('SST_Maximum Distribution')
axs[1].set_xlabel('Temperature (Kelvin)')
axs[1].set_ylabel('Frequency')

# Adjust the layout to prevent overlapping
plt.tight_layout()

# Show the plot
plt.show()

🔎 **Observations:**

- **SST Distribution (Left Plot):**
    - **SST** is mostly between **300-305 K** (~**27-32°C**), peaking at **302-303 K** (~**29-30°C**), indicating tropical/subtropical conditions typical of coral regions.
    - Temperatures below **295 K** and above **305 K** are rare.

- **SST Maximum Distribution (Right Plot):**
    - Slight skew towards higher temperatures, peaking at **305 K** (~**32°C**), with extremes up to **310 K** (~**37°C**).
    - A small bump at **308-310 K** reflects occasional **extreme warming events** linked to coral bleaching.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Normal SST distributions</strong> align with tropical coral habitats.</li>
        <li><strong>Extreme SST Maximum values</strong> (above <strong>308 K</strong>) indicate localized thermal stress.</li>
        <li><strong>Slight skew in SST Maximum</strong> suggests periodic extreme warming, driving severe coral bleaching.</li>
    </ul>
</div>

In [None]:
# Define custom palette
custom_palette = ['#1192e8', '#a56eff', '#fa4d56', '#198038']  # Blue, Pink, Red, Green

# Create a figure with 4 subplots (2x2 layout)
fig, axs = plt.subplots(2, 2, figsize=(18, 12))

# Plot SSTA on the first subplot with blue color
sns.histplot(data_for_eda['SSTA'], bins=50, kde=True, color=custom_palette[0], label='SSTA', alpha=0.5, ax=axs[0, 0], edgecolor=custom_palette[0])
axs[0, 0].set_title('SSTA Distribution')
axs[0, 0].set_xlabel('Value')
axs[0, 0].set_ylabel('Frequency')

# Plot SSTA_Maximum on the second subplot with pink color
sns.histplot(data_for_eda['SSTA_Maximum'], bins=50, kde=True, color=custom_palette[1], label='SSTA_Maximum', alpha=0.5, ax=axs[0, 1], edgecolor=custom_palette[1])
axs[0, 1].set_title('SSTA_Maximum Distribution')
axs[0, 1].set_xlabel('Value')
axs[0, 1].set_ylabel('Frequency')

# Plot SSTA_Frequency on the third subplot with red color
sns.histplot(data_for_eda['SSTA_Frequency'], bins=50, kde=True, color=custom_palette[2], label='SSTA_Frequency', alpha=0.5, ax=axs[1, 0], edgecolor=custom_palette[2])
axs[1, 0].set_title('SSTA_Frequency Distribution')
axs[1, 0].set_xlabel('Value')
axs[1, 0].set_ylabel('Frequency')

# Plot SSTA_DHW on the fourth subplot with green color
sns.histplot(data_for_eda['SSTA_DHW'], bins=50, kde=True, color=custom_palette[3], label='SSTA_DHW', alpha=0.5, ax=axs[1, 1], edgecolor=custom_palette[3])
axs[1, 1].set_title('SSTA_DHW Distribution')
axs[1, 1].set_xlabel('Value')
axs[1, 1].set_ylabel('Frequency')

# Adjust the layout to prevent overlapping
plt.tight_layout()

# Show the plot
plt.show()

🔎 **Observations:**

- **SSTA Distribution (Top Left):**
    - Centered around **0 K**, most anomalies are mild, between **-1 K and 1 K**, but some reach **5-6 K**, indicating significant warming events.

- **SSTA Maximum Distribution (Top Right):**
    - Skewed right, with most anomalies between **2-5 K**; some exceed **10 K**, highlighting severe warming in specific regions.

- **SSTA Frequency Distribution (Bottom Left):**
    - Heavily skewed, with most regions facing fewer than **10 anomaly events**, but some experience up to **50**, reflecting persistent thermal stress.

- **SSTA DHW Distribution (Bottom Right):**
    - Right-skewed, with most regions below **10 DHW**, but some exceed **50**, indicating extreme and prolonged thermal stress.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Moderate anomalies are common</strong>, but extreme SSTA events drive coral stress.</li>
        <li><strong>Prolonged thermal stress (high DHW)</strong> contributes to severe bleaching in vulnerable regions.</li>
        <li><strong>Localized extreme warming events</strong> are major contributors to coral bleaching.</li>
    </ul>
</div>

In [None]:
# Define custom palette
custom_palette = ['#1192e8', '#a56eff', '#fa4d56', '#198038']  # Blue, Pink, Red, Green

# Create a figure with 4 subplots (2x2 layout)
fig, axs = plt.subplots(2, 2, figsize=(18, 12))

# Plot TSA on the first subplot with blue color
sns.histplot(data_for_eda['TSA'], bins=50, kde=True, color=custom_palette[0], label='TSA', alpha=0.5, ax=axs[0, 0], edgecolor=custom_palette[0])
axs[0, 0].set_title('TSA Distribution')
axs[0, 0].set_xlabel('Value')
axs[0, 0].set_ylabel('Frequency')

# Plot TSA_Maximum on the second subplot with pink color
sns.histplot(data_for_eda['TSA_Maximum'], bins=50, kde=True, color=custom_palette[1], label='TSA_Maximum', alpha=0.5, ax=axs[0, 1], edgecolor=custom_palette[1])
axs[0, 1].set_title('TSA_Maximum Distribution')
axs[0, 1].set_xlabel('Value')
axs[0, 1].set_ylabel('Frequency')

# Plot TSA_Frequency on the third subplot with red color
sns.histplot(data_for_eda['TSA_Frequency'], bins=50, kde=True, color=custom_palette[2], label='TSA_Frequency', alpha=0.5, ax=axs[1, 0], edgecolor=custom_palette[2])
axs[1, 0].set_title('TSA_Frequency Distribution')
axs[1, 0].set_xlabel('Value')
axs[1, 0].set_ylabel('Frequency')

# Plot TSA_DHW on the fourth subplot with green color
sns.histplot(data_for_eda['TSA_DHW'], bins=50, kde=True, color=custom_palette[3], label='TSA_DHW', alpha=0.5, ax=axs[1, 1], edgecolor=custom_palette[3])
axs[1, 1].set_title('TSA_DHW Distribution')
axs[1, 1].set_xlabel('Value')
axs[1, 1].set_ylabel('Frequency')

# Adjust the layout to prevent overlapping
plt.tight_layout()

# Show the plot
plt.show()

🔎 **Observations:**

- **TSA Distribution:** Centered around **0**, slightly skewed negative, with some areas experiencing significantly lower thermal stress.
- **TSA Maximum Distribution:** Concentrated between **2-4 K**, with fewer cases above **5 K**, indicating areas with higher bleaching risk.
- **TSA Frequency Distribution:** Most regions have **0-5 events**, but persistent stress occurs in some areas.
- **TSA DHW Distribution:** Values mostly under **10 K-weeks**, but some areas experience prolonged stress, up to **50 K-weeks**.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Negative TSA values</strong> reduce coral stress; positive values increase bleaching risk.</li>
        <li><strong>TSA Maximum values around 3 K</strong> signal heightened bleaching risk.</li>
        <li><strong>High TSA frequency</strong> weakens coral resilience.</li>
        <li><strong>High TSA DHW</strong> indicates severe, prolonged stress, increasing vulnerability to bleaching.</li>
    </ul>
</div>


##### <b>3.8.1.2 <span style='color:#6495ED'>|</span> Distribution Box Plots</b> 

In [None]:
# Select the relevant columns for the box plot
data_boxplot = data_for_eda[['SST', 'SST_Maximum']]

# Melt the DataFrame to have a long format suitable for seaborn boxplot
data_melted = pd.melt(data_boxplot, var_name='Variable', value_name='Temperature')

# Define custom colors (Blue and Pink)
palette = ['#1192e8', '#a56eff']  # Blue for SST, Pink for SST_Maximum
edge_colors = ['#1192e8', '#a56eff']  # Edge colors to match the face colors

# Plot the combined box plot
plt.figure(figsize=(12, 8))
sns.boxplot(
    x='Variable', y='Temperature', data=data_melted, palette=palette, linewidth=2,
    boxprops=dict(edgecolor='black'),  # Ensure box edges are black for visibility
    whiskerprops=dict(color='black'), capprops=dict(color='black'),  # Ensure whiskers and caps are black
    medianprops=dict(color='black'),  # Ensure median line is black
    flierprops=dict(marker='o', color='black', markersize=5, markerfacecolor='none')  # Make outliers dots with no fill
)

# Iterate through each box to set the custom colors for edge and fill with alpha
for i, artist in enumerate(plt.gca().artists):
    # Set the face color with transparency and correct edge color
    artist.set_edgecolor(edge_colors[i])
    artist.set_facecolor(palette[i])
    artist.set_alpha(0.5)

# Customize the outliers to match the colors with no outline
for i, line in enumerate(plt.gca().lines):
    if i % 6 == 5:  # Outlier points
        line.set_color(edge_colors[i // 6])
        line.set_marker('o')
        line.set_markerfacecolor(edge_colors[i // 6])
        line.set_markeredgewidth(0)  # Remove outline from the outliers

# Set the title and labels
plt.title('Box Plot of SST and SST_Maximum')
plt.xlabel('Variable')
plt.ylabel('Temperature (Kelvin)')

# Show the plot
plt.show()

🔎 **Observations:**

- **SST Box Plot:**
    - Median **SST** is around **302 K (~29°C)** with an IQR from **300 K to 304 K**.
    - Whiskers range from **295 K to 305 K**, with notable outliers below and a few above **305 K**, indicating regions with extreme temperatures.

- **SST Maximum Box Plot:**
    - Median **SST Maximum** is around **305 K (~32°C)**, with a narrow IQR from **304.5 K to 305.5 K**.
    - Fewer outliers, but some peak temperatures exceed **310 K**, reflecting isolated extreme thermal events.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
📝 <strong>Key Insights:</strong>
<ul>
    <li><strong>SST shows more variability</strong> than SST Maximum, with broader temperature fluctuations.</li>
    <li><strong>SST outliers</strong> highlight regions with unusually low or high temperatures, impacting coral resilience.</li>
    <li><strong>High SST Maximum values</strong> (above <strong>310 K</strong>) indicate areas at risk of severe thermal stress and coral bleaching.</li>
</ul>
</div>

In [None]:
# Melt the data for the relevant variables
data_melted = data_for_eda.melt(value_vars=['SSTA', 'SSTA_Maximum', 'SSTA_Frequency', 'SSTA_DHW'], 
                                var_name='Variable', value_name='Value')

# Create a 2x2 subplot for each variable
fig, axs = plt.subplots(2, 2, figsize=(18, 12))

# List of variables and the new custom colors (Blue, Pink, Red, Green)
variables = ['SSTA', 'SSTA_Maximum', 'SSTA_Frequency', 'SSTA_DHW']
colors = ['#1192e8', '#a56eff', '#fa4d56', '#198038']

# Loop over variables to create individual box plots with custom styles
for i, (variable, color) in enumerate(zip(variables, colors)):
    sns.boxplot(x='Variable', y='Value', data=data_melted[data_melted['Variable'] == variable], 
                ax=axs[i // 2, i % 2], color=color, linewidth=2,
                boxprops=dict(edgecolor='black'),  # Black edges for the box
                whiskerprops=dict(color='black'), capprops=dict(color='black'),  # Black whiskers and caps
                medianprops=dict(color='black'),  # Black median line
                flierprops=dict(marker='o', color=color, markersize=5, markerfacecolor=color, markeredgewidth=0)  # Colored outliers without border
    )
    
    axs[i // 2, i % 2].set_title(f'{variable} Box Plot')
    axs[i // 2, i % 2].set_xlabel('Variable')
    axs[i // 2, i % 2].set_ylabel('Value')

    # Set the face color with alpha transparency and ensure edges are visible
    for artist in axs[i // 2, i % 2].artists:
        artist.set_edgecolor('black')
        artist.set_facecolor(color)
        artist.set_alpha(0.5)  # Set transparency for the face color

# Adjust layout to give proper spacing between plots
plt.tight_layout()

# Show the plots
plt.show()

🔎 **Observations:**

- **SSTA Box Plot (Top Left):**
    - Median SSTA is near **0**, with an IQR from **-0.5 to 0.8 K**. Outliers show anomalies exceeding **4 K**, indicating extreme deviations.

- **SSTA Maximum Box Plot (Top Right):**
    - Median **SSTA Maximum** is **3.5 K**, with outliers reaching up to **20 K**, showing extreme maximum temperature anomalies in certain regions.

- **SSTA Frequency Box Plot (Bottom Left):**
    - Median **SSTA Frequency** is **7-8** anomalies, with outliers exceeding **50**, indicating chronic thermal stress in some areas.

- **SSTA DHW Box Plot (Bottom Right):**
    - Median **DHW** is near **0**, with some outliers above **50**, showing prolonged heat stress in a few regions.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>SSTA outliers</strong> indicate extreme temperature deviations in some coral regions.</li>
        <li><strong>Maximum SSTA values</strong> highlight regions at risk of severe thermal stress.</li>
        <li><strong>Chronic stress</strong> is evident in regions with high SSTA frequency.</li>
        <li><strong>High DHW outliers</strong> indicate prolonged thermal stress, increasing bleaching risk.</li>
    </ul>
</div>

In [None]:
# Define custom colors (Blue, Pink, Red, Green)
custom_palette = ['#1192e8', '#a56eff', '#fa4d56', '#198038']

# Create a figure with 4 subplots (2x2 layout) for box plots
fig, axs = plt.subplots(2, 2, figsize=(18, 12))

# List of variables and the custom colors
variables = ['TSA', 'TSA_Maximum', 'TSA_Frequency', 'TSA_DHW']
colors = custom_palette

# Plot box plots for each variable with custom styles
for i, (variable, color) in enumerate(zip(variables, colors)):
    sns.boxplot(data=data_for_eda, y=variable, ax=axs[i // 2, i % 2], color=color, linewidth=2,
                boxprops=dict(edgecolor='black'),  # Black edges for the box
                whiskerprops=dict(color='black'), capprops=dict(color='black'),  # Black whiskers and caps
                medianprops=dict(color='black'),  # Black median line
                flierprops=dict(marker='o', color=color, markersize=5, markerfacecolor=color, markeredgewidth=0)  # Colored outliers without border
    )
    axs[i // 2, i % 2].set_title(f'{variable} Box Plot')
    axs[i // 2, i % 2].set_ylabel('Value')

    # Set the face color with alpha transparency and ensure edges are visible
    for artist in axs[i // 2, i % 2].artists:
        artist.set_edgecolor('black')
        artist.set_facecolor(color)
        artist.set_alpha(0.5)  # Set transparency for the face color

# Adjust the layout to prevent overlapping
plt.tight_layout()

# Show the plot
plt.show()

🔎 **Observations:**

- **TSA Box Plot (Top Left):**
    - Median TSA is around **0**, with an IQR from **-1 to 0.5**, showing mostly moderate thermal stress. Outliers range from **-12.5 K to 5.5 K**, indicating significant variability across regions.

- **TSA Maximum Box Plot (Top Right):**
    - Median **TSA Maximum** is **3 K**, with outliers reaching **12 K**, suggesting some regions experience intense short-term stress.

- **TSA Frequency Box Plot (Bottom Left):**
    - Median **TSA Frequency** is **3-4 events**, but outliers show some regions face up to **30 events**, indicating chronic stress.

- **TSA DHW Box Plot (Bottom Right):**
    - Median **TSA DHW** is close to **0**, but outliers exceed **50**, indicating severe, prolonged thermal stress in certain areas.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Moderate TSA values</strong> are common, but regions with positive TSA face significant stress.</li>
        <li><strong>Extreme TSA Maximum</strong> values signal severe localized stress.</li>
        <li><strong>Chronic thermal stress</strong> is evident in regions with high TSA Frequency.</li>
        <li><strong>High TSA DHW outliers</strong> indicate prolonged, severe stress, increasing bleaching risk.</li>
    </ul>
</div>

<hr>

#### <b>3.8.2 <span style='color:#6495ED'>|</span> Affects on Bleaching Percent</b> 

<div style="border-radius:10px;padding: 15px;background-color:#6495ED;color:white;font-size:100%;text-align:left">
📝 <strong>Key Takeaways:</strong>
<ul>
    <li><strong>SST and Bleaching:</strong> Higher <strong>SST</strong> is strongly linked to coral bleaching, especially beyond <strong>300 K (~27°C)</strong>, where bleaching severity increases rapidly, showing coral sensitivity to persistent heat.</li>
    <li><strong>SSTA and Bleaching:</strong> <strong>SSTA</strong> (temperature anomalies) have a strong correlation with bleaching, with prolonged anomalies (high <strong>DHW</strong>) driving the most severe bleaching events, often reaching up to <strong>100%</strong> in some regions.</li>
    <li><strong>Thermal Stress and Bleaching:</strong> Repeated thermal stress events (high <strong>SSTA Frequency</strong>) and prolonged stress (high <strong>TSA DHW</strong>) result in compounded bleaching, but the impact levels off at extreme values, suggesting coral mortality or adaptation.</li>
    <li><strong>Nonlinear Effects:</strong> The relationship between bleaching and thermal stress metrics is nonlinear, with a threshold effect where bleaching does not continue to increase indefinitely as stress increases, pointing to the complex dynamics of coral stress responses.</li>
</ul>
</div>

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Scatter plot with both linear and polynomial regression lines
plt.figure(figsize=(14, 6))

# Plot for SST vs Percent Bleaching
plt.subplot(1, 2, 1)
# Plot the scatter points first
sns.regplot(x='SST', y='Percent_Bleaching', data=data_for_eda, 
            scatter_kws={'alpha': 0.3, 's': 10, 'color': '#1192e8'},  # Blue scatter
            line_kws={'color': '#da1e28'}, label='_nolegend_')  # No label for scatter

# Plot the linear regression line
sns.regplot(x='SST', y='Percent_Bleaching', data=data_for_eda, 
            scatter=False,  # No scatter points, only the line
            line_kws={'color': '#da1e28'}, label='Linear')  # Red solid linear line

# Plot the polynomial regression (order 2)
sns.regplot(x='SST', y='Percent_Bleaching', data=data_for_eda, 
            scatter=False,  # No scatter points, only the line
            line_kws={'color': '#f1c21b', 'linestyle': '--'}, order=2, label='Polynomial')  # Yellow dashed polynomial line

# Adding legend manually
plt.title('SST vs Percent Bleaching')
plt.xlabel('Sea Surface Temperature (SST)')
plt.ylabel('Percent Bleaching')
plt.legend()

# Plot for SST_Maximum vs Percent Bleaching
plt.subplot(1, 2, 2)
# Plot the scatter points first
sns.regplot(x='SST_Maximum', y='Percent_Bleaching', data=data_for_eda, 
            scatter_kws={'alpha': 0.3, 's': 10, 'color': '#a56eff'},  # Pink scatter
            line_kws={'color': '#da1e28'}, label='_nolegend_')  # No label for scatter

# Plot the linear regression line
sns.regplot(x='SST_Maximum', y='Percent_Bleaching', data=data_for_eda, 
            scatter=False,  # No scatter points, only the line
            line_kws={'color': '#da1e28'}, label='Linear')  # Red solid linear line

# Plot the polynomial regression (order 2)
sns.regplot(x='SST_Maximum', y='Percent_Bleaching', data=data_for_eda, 
            scatter=False,  # No scatter points, only the line
            line_kws={'color': '#f1c21b', 'linestyle': '--'}, order=2, label='Polynomial')  # Yellow dashed polynomial line

# Adding legend manually
plt.title('SST_Maximum vs Percent Bleaching')
plt.xlabel('Maximum Sea Surface Temperature (SST_Maximum)')
plt.ylabel('Percent Bleaching')
plt.legend()

# Adjust layout to prevent overlapping
plt.tight_layout()

# Show the plot
plt.show()

🔎 **Observations:**

- **SST vs. Percent Bleaching (Left Plot with Linear and Polynomial Regression):**
    - Both the **linear** and **polynomial** regressions show a **positive correlation** between **SST** and **Percent Bleaching**.
    - The polynomial regression reveals that bleaching accelerates rapidly at higher SSTs, especially beyond **300 K (~27°C)**, indicating that coral bleaching severity increases once a critical temperature threshold is crossed.
    - Higher SST values correspond with more severe bleaching (above **60%**) in some instances, highlighting coral stress at elevated temperatures.

- **SST Maximum vs. Percent Bleaching (Right Plot with Linear and Polynomial Regression):**
    - The **linear regression** shows a weak correlation between **SST Maximum** and bleaching, while the **polynomial regression** suggests a more complex relationship: bleaching increases up to a point, then stabilizes or declines slightly at very high SST Maximum values.
    - This variability suggests factors like coral recovery, adaptation, or environmental conditions may play a role in limiting bleaching at extreme temperatures.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
📝 <strong>Key Insights:</strong>
<ul>
    <li><strong>Persistent thermal stress</strong> (<strong>SST</strong>) is more strongly linked to coral bleaching than short-term temperature spikes (<strong>SST Maximum</strong>), with both linear and polynomial regressions showing this effect.</li>
    <li>There is a <strong>threshold effect</strong> around <strong>300 K (~27°C)</strong>: bleaching becomes more frequent and severe beyond this point, especially as seen in the polynomial trend, highlighting coral sensitivity to prolonged warm conditions.</li>
    <li>The <strong>polynomial regression</strong> reveals a more complex relationship for <strong>SST Maximum</strong>, where bleaching increases initially but stabilizes or decreases at extreme values, possibly due to recovery or adaptation mechanisms.</li>
    <li><strong>Combined Insights:</strong> Both linear and polynomial models emphasize that sustained high SSTs are a major driver of bleaching, with <strong>long-term temperature trends</strong> playing a more crucial role than short-term spikes.</li>
</ul>
</div>

In [None]:
# Define custom color palette for scatter points
custom_palette = ['#1192e8', '#a56eff', '#ff832b', '#198038']  # Blue, Pink, Red, Green

# Creating a 2x2 subplot for scatter plots with regression lines
fig, axs = plt.subplots(2, 2, figsize=(18, 12))

# Plot for SSTA vs Percent Bleaching
# Scatter points
sns.regplot(x='SSTA', y='Percent_Bleaching', data=data_for_eda, ax=axs[0, 0], 
            scatter_kws={'alpha': 0.3, 's': 10, 'color': custom_palette[0]},  # Blue scatter
            line_kws={'color': '#da1e28'}, label='_nolegend_')  # Suppress scatter from the legend
# Linear regression line
sns.regplot(x='SSTA', y='Percent_Bleaching', data=data_for_eda, ax=axs[0, 0], 
            scatter=False,  # No scatter points for the line
            line_kws={'color': '#da1e28'}, label='Linear')  # Red linear regression line
# Polynomial regression (order 2)
sns.regplot(x='SSTA', y='Percent_Bleaching', data=data_for_eda, ax=axs[0, 0], 
            scatter=False,  # No scatter points for the polynomial line
            line_kws={'color': '#f1c21b', 'linestyle': '--'}, order=2, label='Polynomial')  # Yellow dashed polynomial line
axs[0, 0].set_title('SSTA vs Percent Bleaching')
axs[0, 0].set_xlabel('SSTA')
axs[0, 0].set_ylabel('Percent Bleaching')
axs[0, 0].legend()

# Plot for SSTA_Maximum vs Percent Bleaching
# Scatter points
sns.regplot(x='SSTA_Maximum', y='Percent_Bleaching', data=data_for_eda, ax=axs[0, 1], 
            scatter_kws={'alpha': 0.3, 's': 10, 'color': custom_palette[1]},  # Pink scatter
            line_kws={'color': '#da1e28'}, label='_nolegend_')  # Suppress scatter from the legend
# Linear regression line
sns.regplot(x='SSTA_Maximum', y='Percent_Bleaching', data=data_for_eda, ax=axs[0, 1], 
            scatter=False,  # No scatter points for the line
            line_kws={'color': '#da1e28'}, label='Linear')  # Red linear regression line
# Polynomial regression (order 2)
sns.regplot(x='SSTA_Maximum', y='Percent_Bleaching', data=data_for_eda, ax=axs[0, 1], 
            scatter=False,  # No scatter points for the polynomial line
            line_kws={'color': '#f1c21b', 'linestyle': '--'}, order=2, label='Polynomial')  # Yellow dashed polynomial line
axs[0, 1].set_title('SSTA_Maximum vs Percent Bleaching')
axs[0, 1].set_xlabel('SSTA Maximum')
axs[0, 1].set_ylabel('Percent Bleaching')
axs[0, 1].legend()

# Plot for SSTA_Frequency vs Percent Bleaching
# Scatter points
sns.regplot(x='SSTA_Frequency', y='Percent_Bleaching', data=data_for_eda, ax=axs[1, 0], 
            scatter_kws={'alpha': 0.3, 's': 10, 'color': custom_palette[2]},  # Red scatter
            line_kws={'color': '#da1e28'}, label='_nolegend_')  # Suppress scatter from the legend
# Linear regression line
sns.regplot(x='SSTA_Frequency', y='Percent_Bleaching', data=data_for_eda, ax=axs[1, 0], 
            scatter=False,  # No scatter points for the line
            line_kws={'color': '#da1e28'}, label='Linear')  # Red linear regression line
# Polynomial regression (order 2)
sns.regplot(x='SSTA_Frequency', y='Percent_Bleaching', data=data_for_eda, ax=axs[1, 0], 
            scatter=False,  # No scatter points for the polynomial line
            line_kws={'color': '#f1c21b', 'linestyle': '--'}, order=2, label='Polynomial')  # Yellow dashed polynomial line
axs[1, 0].set_title('SSTA_Frequency vs Percent Bleaching')
axs[1, 0].set_xlabel('SSTA Frequency')
axs[1, 0].set_ylabel('Percent Bleaching')
axs[1, 0].legend()

# Plot for SSTA_DHW vs Percent Bleaching
# Scatter points
sns.regplot(x='SSTA_DHW', y='Percent_Bleaching', data=data_for_eda, ax=axs[1, 1], 
            scatter_kws={'alpha': 0.3, 's': 10, 'color': custom_palette[3]},  # Green scatter
            line_kws={'color': '#da1e28'}, label='_nolegend_')  # Suppress scatter from the legend
# Linear regression line
sns.regplot(x='SSTA_DHW', y='Percent_Bleaching', data=data_for_eda, ax=axs[1, 1], 
            scatter=False,  # No scatter points for the line
            line_kws={'color': '#da1e28'}, label='Linear')  # Red linear regression line
# Polynomial regression (order 2)
sns.regplot(x='SSTA_DHW', y='Percent_Bleaching', data=data_for_eda, ax=axs[1, 1], 
            scatter=False,  # No scatter points for the polynomial line
            line_kws={'color': '#f1c21b', 'linestyle': '--'}, order=2, label='Polynomial')  # Yellow dashed polynomial line
axs[1, 1].set_title('SSTA_DHW vs Percent Bleaching')
axs[1, 1].set_xlabel('SSTA DHW')
axs[1, 1].set_ylabel('Percent Bleaching')
axs[1, 1].legend()

# Adjust layout to prevent overlapping
plt.tight_layout()

# Show the plots
plt.show()

🔎 **Observations:**

- **SSTA vs. Percent Bleaching (Top Left with Linear and Polynomial Regression):**
    - Both the **linear** and **polynomial** regressions show a **positive correlation** between **SSTA (Sea Surface Temperature Anomaly)** and **Percent Bleaching**.
    - The polynomial regression indicates that bleaching accelerates more rapidly with larger SSTA values, especially when anomalies exceed **2-4 K**, showing significant coral stress with even slight deviations from normal temperatures.
    - Higher SSTA values are associated with severe bleaching (up to **60-80%**), highlighting the impact of persistent temperature anomalies on coral health.

- **SSTA Maximum vs. Percent Bleaching (Top Right with Linear and Polynomial Regression):**
    - The **linear regression** shows a weak correlation between **SSTA Maximum** and bleaching, while the **polynomial regression** reveals a more nuanced relationship. Bleaching increases initially with higher SSTA Maximum values but stabilizes or slightly declines at extreme values.
    - This suggests that **short-term extreme anomalies** may not consistently lead to greater bleaching, possibly due to coral adaptation, recovery periods, or other environmental factors.

- **SSTA Frequency vs. Percent Bleaching (Bottom Left with Linear and Polynomial Regression):**
    - Both regression models show a **positive correlation** between **SSTA Frequency** and bleaching. More frequent temperature anomalies lead to higher bleaching percentages, with regions experiencing **20+ anomalies** showing severe bleaching (up to **60-80%**).
    - The polynomial regression indicates that the relationship may plateau at higher frequencies, suggesting diminishing additional bleaching after a certain threshold of repeated stress events.

- **SSTA DHW vs. Percent Bleaching (Bottom Right with Linear and Polynomial Regression):**
    - **SSTA DHW (Degree Heating Weeks)** has the **strongest correlation** with bleaching among the variables, with both the linear and polynomial regressions showing that as DHW increases, bleaching percentages rise sharply.
    - **DHW values above 10** are closely associated with severe bleaching (up to **100%**), underscoring the importance of prolonged thermal stress as a primary driver of coral bleaching.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
📝 <strong>Key Insights:</strong>
<ul>
    <li><strong>Persistent and prolonged thermal anomalies</strong> (as indicated by <strong>SSTA</strong> and <strong>SSTA DHW</strong>) are the strongest drivers of coral bleaching, with more sustained stress leading to higher bleaching percentages.</li>
    <li><strong>Repeated thermal stress</strong> (captured by <strong>SSTA Frequency</strong>) has a compounding effect, causing more frequent and severe bleaching as anomalies occur more often.</li>
    <li>The <strong>polynomial regression</strong> suggests a <strong>saturation effect</strong> for <strong>SSTA Maximum</strong> and <strong>SSTA DHW</strong>, where bleaching increases up to a certain point and then levels off, possibly due to coral mortality or adaptive responses.</li>
    <li><strong>Combined Insights:</strong> The relationship between coral bleaching and temperature anomalies is nonlinear, with <strong>SSTA DHW</strong> showing the strongest correlation. <strong>Long-term and frequent thermal stress</strong> are more critical than short-term extremes in predicting coral bleaching.</li>
</ul>
</div>

In [None]:
# Define custom color palette for scatter points
custom_palette = ['#1192e8', '#a56eff', '#ff832b', '#198038']  # Blue, Pink, Red, Green

# Creating a 2x2 subplot for scatter plots with regression lines
fig, axs = plt.subplots(2, 2, figsize=(18, 12))

# TSA vs Percent Bleaching (Blue scatter, Red line)
# Scatter points
sns.regplot(x='TSA', y='Percent_Bleaching', data=data_for_eda, ax=axs[0, 0], 
            scatter_kws={'alpha': 0.3, 's': 10, 'color': custom_palette[0]},  # Blue scatter
            line_kws={'color': '#da1e28'}, label='_nolegend_')  # Suppress scatter from the legend
# Linear regression line
sns.regplot(x='TSA', y='Percent_Bleaching', data=data_for_eda, ax=axs[0, 0], 
            scatter=False,  # No scatter points for the line
            line_kws={'color': '#da1e28'}, label='Linear')  # Red linear regression line
# Polynomial regression (order 2)
sns.regplot(x='TSA', y='Percent_Bleaching', data=data_for_eda, ax=axs[0, 0], 
            scatter=False,  # No scatter points for the polynomial line
            line_kws={'color': '#f1c21b', 'linestyle': '--'}, order=2, label='Polynomial')  # Yellow dashed polynomial line
axs[0, 0].set_title('TSA vs Percent Bleaching')
axs[0, 0].set_xlabel('TSA')
axs[0, 0].set_ylabel('Percent Bleaching')
axs[0, 0].legend()

# TSA_Maximum vs Percent Bleaching (Pink scatter, Red line)
# Scatter points
sns.regplot(x='TSA_Maximum', y='Percent_Bleaching', data=data_for_eda, ax=axs[0, 1], 
            scatter_kws={'alpha': 0.3, 's': 10, 'color': custom_palette[1]},  # Pink scatter
            line_kws={'color': '#da1e28'}, label='_nolegend_')  # Suppress scatter from the legend
# Linear regression line
sns.regplot(x='TSA_Maximum', y='Percent_Bleaching', data=data_for_eda, ax=axs[0, 1], 
            scatter=False,  # No scatter points for the line
            line_kws={'color': '#da1e28'}, label='Linear')  # Red linear regression line
# Polynomial regression (order 2)
sns.regplot(x='TSA_Maximum', y='Percent_Bleaching', data=data_for_eda, ax=axs[0, 1], 
            scatter=False,  # No scatter points for the polynomial line
            line_kws={'color': '#f1c21b', 'linestyle': '--'}, order=2, label='Polynomial')  # Yellow dashed polynomial line
axs[0, 1].set_title('TSA_Maximum vs Percent Bleaching')
axs[0, 1].set_xlabel('TSA Maximum')
axs[0, 1].set_ylabel('Percent Bleaching')
axs[0, 1].legend()

# TSA_Frequency vs Percent Bleaching (Red scatter, Red line)
# Scatter points
sns.regplot(x='TSA_Frequency', y='Percent_Bleaching', data=data_for_eda, ax=axs[1, 0], 
            scatter_kws={'alpha': 0.3, 's': 10, 'color': custom_palette[2]},  # Red scatter
            line_kws={'color': '#da1e28'}, label='_nolegend_')  # Suppress scatter from the legend
# Linear regression line
sns.regplot(x='TSA_Frequency', y='Percent_Bleaching', data=data_for_eda, ax=axs[1, 0], 
            scatter=False,  # No scatter points for the line
            line_kws={'color': '#da1e28'}, label='Linear')  # Red linear regression line
# Polynomial regression (order 2)
sns.regplot(x='TSA_Frequency', y='Percent_Bleaching', data=data_for_eda, ax=axs[1, 0], 
            scatter=False,  # No scatter points for the polynomial line
            line_kws={'color': '#f1c21b', 'linestyle': '--'}, order=2, label='Polynomial')  # Yellow dashed polynomial line
axs[1, 0].set_title('TSA_Frequency vs Percent Bleaching')
axs[1, 0].set_xlabel('TSA Frequency')
axs[1, 0].set_ylabel('Percent Bleaching')
axs[1, 0].legend()

# TSA_DHW vs Percent Bleaching (Green scatter, Red line)
# Scatter points
sns.regplot(x='TSA_DHW', y='Percent_Bleaching', data=data_for_eda, ax=axs[1, 1], 
            scatter_kws={'alpha': 0.3, 's': 10, 'color': custom_palette[3]},  # Green scatter
            line_kws={'color': '#da1e28'}, label='_nolegend_')  # Suppress scatter from the legend
# Linear regression line
sns.regplot(x='TSA_DHW', y='Percent_Bleaching', data=data_for_eda, ax=axs[1, 1], 
            scatter=False,  # No scatter points for the line
            line_kws={'color': '#da1e28'}, label='Linear')  # Red linear regression line
# Polynomial regression (order 2)
sns.regplot(x='TSA_DHW', y='Percent_Bleaching', data=data_for_eda, ax=axs[1, 1], 
            scatter=False,  # No scatter points for the polynomial line
            line_kws={'color': '#f1c21b', 'linestyle': '--'}, order=2, label='Polynomial')  # Yellow dashed polynomial line
axs[1, 1].set_title('TSA_DHW vs Percent Bleaching')
axs[1, 1].set_xlabel('TSA DHW')
axs[1, 1].set_ylabel('Percent Bleaching')
axs[1, 1].legend()

# Adjust layout to prevent overlapping
plt.tight_layout()

# Show the plots
plt.show()

🔎 **Observations:**

- **Percent Bleaching vs TSA (Temperature Stress Anomaly) (Top Left):**
    - The scatter plot shows an **increasing trend** in percent bleaching as **TSA** values rise from negative to positive.
    - When TSA values are below zero, bleaching is low, with a gradual rise as TSA approaches zero.
    - As TSA becomes positive, bleaching increases significantly, especially when TSA exceeds **0**, indicating that elevated temperatures contribute to more severe bleaching events.
    - The curve steepens after TSA values reach 0, suggesting that small positive anomalies can lead to large bleaching percentages.

- **Percent Bleaching vs TSA Maximum (Top Right):**
    - The plot shows an **initial increase** in bleaching with **TSA Maximum** but then **flattens and slightly declines** after reaching a threshold (around 6-7 on the TSA Maximum scale).
    - This plateau effect may indicate that corals have experienced a critical level of temperature stress beyond which further stress does not lead to proportionally higher bleaching, potentially due to coral mortality or other limiting factors.

- **Percent Bleaching vs TSA Frequency (Bottom Left):**
    - The relationship between **TSA Frequency** and bleaching is an **inverted-U shape**: bleaching increases with more frequent temperature stress events but then declines at higher frequencies.
    - This suggests that after a certain point, more frequent stress events may not lead to further bleaching, possibly due to coral adaptation or death.

- **Percent Bleaching vs TSA DHW (Degree Heating Weeks) (Bottom Right):**
    - The relationship between **TSA DHW** and bleaching shows an **increase followed by a plateau or decline**. This pattern is similar to TSA Maximum and Frequency, indicating that prolonged heat stress leads to bleaching up to a point, after which additional stress does not cause further bleaching.
    - This may be due to corals already experiencing severe damage or mortality at high DHW values.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
📝 <strong>General Insights:</strong>
<ul>
    <li><strong>Nonlinear Relationships:</strong> All four plots show nonlinear trends, indicating that the relationship between TSA metrics and bleaching is complex. In most cases, bleaching increases with stress but eventually plateaus or declines.</li>
    <li><strong>Bleaching Plateaus at Higher Stress:</strong> Several plots exhibit a saturation effect, where higher levels or frequencies of stress do not result in more bleaching, likely due to factors such as coral mortality or adaptation.</li>
    <li><strong>Impact of TSA Metrics:</strong> Small deviations in TSA, frequent stress events, and prolonged heat stress (as seen in TSA DHW) can lead to significant bleaching, though the impact levels off at extreme values, suggesting the need to account for multiple factors when assessing coral stress.</li>
</ul>
</div>

The four scatter plots demonstrate the **complex relationship** between temperature stress metrics and coral bleaching. While increases in temperature stress generally lead to more bleaching, there is a **threshold effect** where bleaching does not continue to increase indefinitely with more frequent or extreme temperature anomalies. These results underscore the importance of considering various aspects of temperature stress, including anomaly size, frequency, and duration, to fully understand their impact on coral health.

#### <b>3.8.3 <span style='color:#6495ED'>|</span> Affects on Bleaching Level</b> 

<div style="border-radius:10px;padding: 15px;background-color:#6495ED;color:white;font-size:100%;text-align:left">
    📝 <strong>Key Takeaways:</strong>
    <ul>
        <li><strong>Higher SST Maximums</strong> are strongly associated with both colony and population-level bleaching, particularly when temperatures exceed <strong>305 K (~32°C)</strong>.</li>
        <li>The narrower IQR for population-level bleaching suggests slightly more consistent temperature conditions compared to colony-level, although extreme temperatures still drive widespread bleaching.</li>
        <li><strong>Outliers in SST and SST Maximum</strong> highlight regions exposed to extreme thermal conditions, which likely contribute to more severe or frequent bleaching events.</li>
        <li><strong>SSTA and SSTA Maximum</strong> anomalies show that temperature deviations of more than <strong>3 K</strong> significantly contribute to coral bleaching, with similar effects on both colony and population levels.</li>
        <li>Regions experiencing <strong>frequent thermal anomalies (SSTA Frequency)</strong> and <strong>prolonged heat stress (SSTA DHW)</strong> are key drivers of bleaching, with higher frequencies and durations leading to more severe events.</li>
        <li><strong>TSA and TSA Maximum</strong> further demonstrate that coral bleaching is closely tied to thermal stress anomalies, with positive TSA values indicating high bleaching risk.</li>
        <li><strong>TSA DHW</strong> suggests that prolonged heat exposure is a critical factor, especially for population-level bleaching, where extended stress results in more widespread damage.</li>
    </ul>
</div>

In [None]:
# Melt the data to have SST and SST_Maximum in a single column
data_melted = data_for_eda.melt(id_vars=['Bleaching_Level'], value_vars=['SST', 'SST_Maximum'], 
                                var_name='Temperature_Type', value_name='Temperature')

# Define custom colors (Blue for SST, Pink for SST_Maximum)
palette = ['#1192e8', '#a56eff']  # Blue and Pink
edge_colors = ['#1192e8', '#a56eff']  # Matching edge colors

# Create the combined box plot
plt.figure(figsize=(12, 8))
sns.boxplot(
    x='Bleaching_Level', y='Temperature', hue='Temperature_Type', data=data_melted, palette=palette, linewidth=2,
    boxprops=dict(edgecolor='black'),  # Black edges for boxes
    whiskerprops=dict(color='black'), capprops=dict(color='black'),  # Black whiskers and caps
    medianprops=dict(color='black'),  # Black median line
    flierprops=dict(marker='o', markersize=5, markerfacecolor='none', markeredgewidth=0)  # Outliers styled as dots, no fill
)

# Iterate through each box to set custom edge colors and transparency for the box face
for i, artist in enumerate(plt.gca().artists):
    artist.set_edgecolor(edge_colors[i % 2])  # Set the edge color to match the respective box color
    artist.set_facecolor(palette[i % 2])  # Set the face color
    artist.set_alpha(0.5)  # Set transparency for the box

# Customize the outliers to match the box colors
outlier_lines = [line for line in plt.gca().lines if line.get_marker() == 'o']  # Extract only the outlier points

# Iterate through the outliers and assign the correct color based on the Temperature_Type
num_bleaching_levels = len(data_for_eda['Bleaching_Level'].unique())  # Get the number of bleaching levels
for i, outlier in enumerate(outlier_lines):
    # Determine if the outlier corresponds to SST or SST_Maximum based on its position
    color_index = (i // num_bleaching_levels) % len(edge_colors)  # Alternate based on SST/SST_Maximum
    outlier.set_color(edge_colors[color_index])
    outlier.set_markerfacecolor(edge_colors[color_index])  # Set the outlier color
    outlier.set_markeredgewidth(0)  # Remove the border from the outliers

# Set the title and labels
plt.title('SST and SST_Maximum vs Bleaching Level')
plt.xlabel('Bleaching Level')
plt.ylabel('Temperature (Kelvin)')

# Show the plot
plt.tight_layout()
plt.show()

🔎 **Observations:**

- **SST (Sea Surface Temperature) and SST Maximum for Colony-Level Bleaching:**
    - For Colony-Level bleaching, the SST distribution shows a median temperature of around **300 K (~27°C)**, with an interquartile range (IQR) spanning from **299 to 302 K (~26°C to 29°C)**.
    - The SST Maximum for colony-level bleaching is higher, with a median around **305 K (~32°C)**, and an IQR between **304 and 306 K**. This suggests that corals experiencing colony-level bleaching are exposed to higher peak temperatures, often exceeding **30°C**.
    - Outliers for SST exist below **295 K (~22°C)** and extend above **310 K (~37°C)**, reflecting extreme temperature conditions in certain regions, which likely contribute to severe localized bleaching.

- **SST and SST Maximum for Population-Level Bleaching:**
    - For Population-Level bleaching, the SST distribution is similar to colony-level, with a slightly lower median around **300 K (~27°C)** and an IQR from **298.5 K to 302.5 K**.
    - The SST Maximum for population-level bleaching has a median value of **305 K (~32°C)**, with a slightly narrower IQR between **304 and 305.5 K**. This indicates that population-level bleaching also occurs under similar high-temperature conditions, though with less variability in the maximum temperatures than colony-level bleaching.
    - The range of SST Maximum outliers shows some extreme values above **310 K (~37°C)**, likely representing regions experiencing more severe bleaching events across entire populations.


<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Higher SST Maximums</strong> are closely linked with both colony and population-level bleaching, with bleaching more prevalent when temperatures exceed <strong>305 K (~32°C)</strong>.</li>
        <li>The narrower IQR for population-level bleaching suggests that while extreme temperatures affect colonies and populations similarly, population-level bleaching tends to occur under slightly more consistent and less variable maximum temperature conditions.</li>
        <li>The presence of extreme outliers for SST and SST Maximum in both colony and population-level bleaching indicates that certain regions experience unusually high temperatures, leading to more frequent or severe bleaching events.</li>
    </ul>
</div>

This analysis highlights the critical role of sustained high sea surface temperatures and maximum thermal events in driving both localized (colony-level) and widespread (population-level) coral bleaching. Monitoring and mitigating these temperature extremes are crucial to protecting coral reefs from bleaching and degradation.


In [None]:
# Custom color palette for the plots
custom_palette = ['#1192e8', '#a56eff', '#ff832b', '#198038']  # Blue, Pink, Red, Green

# Creating a figure for separate box plots
fig, axs = plt.subplots(2, 2, figsize=(18, 12))

# Box plot for SSTA vs Bleaching_Level (Blue for SSTA)
sns.boxplot(x='Bleaching_Level', y='SSTA', data=data_for_eda, ax=axs[0, 0], palette=[custom_palette[0]], linewidth=2,
            boxprops=dict(edgecolor='black'),
            whiskerprops=dict(color='black'), capprops=dict(color='black'),
            medianprops=dict(color='black'),
            flierprops=dict(marker='o', markersize=5, markerfacecolor=custom_palette[0], markeredgewidth=0))
axs[0, 0].set_title('SSTA vs Bleaching Level')
axs[0, 0].set_xlabel('Bleaching Level')
axs[0, 0].set_ylabel('SSTA')

# Box plot for SSTA_Maximum vs Bleaching_Level (Pink for SSTA_Maximum)
sns.boxplot(x='Bleaching_Level', y='SSTA_Maximum', data=data_for_eda, ax=axs[0, 1], palette=[custom_palette[1]], linewidth=2,
            boxprops=dict(edgecolor='black'),
            whiskerprops=dict(color='black'), capprops=dict(color='black'),
            medianprops=dict(color='black'),
            flierprops=dict(marker='o', markersize=5, markerfacecolor=custom_palette[1], markeredgewidth=0))
axs[0, 1].set_title('SSTA_Maximum vs Bleaching Level')
axs[0, 1].set_xlabel('Bleaching Level')
axs[0, 1].set_ylabel('SSTA Maximum')

# Box plot for SSTA_Frequency vs Bleaching_Level (Red for SSTA_Frequency)
sns.boxplot(x='Bleaching_Level', y='SSTA_Frequency', data=data_for_eda, ax=axs[1, 0], palette=[custom_palette[2]], linewidth=2,
            boxprops=dict(edgecolor='black'),
            whiskerprops=dict(color='black'), capprops=dict(color='black'),
            medianprops=dict(color='black'),
            flierprops=dict(marker='o', markersize=5, markerfacecolor=custom_palette[2], markeredgewidth=0))
axs[1, 0].set_title('SSTA_Frequency vs Bleaching Level')
axs[1, 0].set_xlabel('Bleaching Level')
axs[1, 0].set_ylabel('SSTA Frequency')

# Box plot for SSTA_DHW vs Bleaching_Level (Green for SSTA_DHW)
sns.boxplot(x='Bleaching_Level', y='SSTA_DHW', data=data_for_eda, ax=axs[1, 1], palette=[custom_palette[3]], linewidth=2,
            boxprops=dict(edgecolor='black'),
            whiskerprops=dict(color='black'), capprops=dict(color='black'),
            medianprops=dict(color='black'),
            flierprops=dict(marker='o', markersize=5, markerfacecolor=custom_palette[3], markeredgewidth=0))
axs[1, 1].set_title('SSTA_DHW vs Bleaching Level')
axs[1, 1].set_xlabel('Bleaching Level')
axs[1, 1].set_ylabel('SSTA DHW')

# Iterate through each box to set the transparency for the box face
for ax in axs.flat:
    for artist in ax.artists:
        color = artist.get_facecolor()
        artist.set_edgecolor('black')  # Black edges
        artist.set_alpha(0.5)  # Set transparency for the box face

# Adjust layout to give proper spacing between plots
plt.tight_layout()

# Show the plots
plt.show()

🔎 **Observations:**

- **SSTA vs. Bleaching Level (Top Left):**
    - The **SSTA (Sea Surface Temperature Anomaly)** box plot shows that both colony-level and population-level bleaching events have similar distributions of temperature anomalies.
    - The median SSTA is slightly above **0** for both colony and population levels, with the interquartile range (IQR) extending from about **-0.5 to 1.0**.
    - Outliers are present, with some areas experiencing extreme SSTA values of over **5 K** or below **-4 K**, indicating significant deviations from the expected sea surface temperatures in both colony and population-level bleaching cases.

- **SSTA Maximum vs. Bleaching Level (Top Right):**
    - The **SSTA Maximum** box plot shows that maximum temperature anomalies are also consistent across both colony and population-level bleaching events.
    - The median SSTA Maximum is around **3.5 K**, with an IQR between **2.5 and 4.5 K**. This suggests that the maximum anomalies are relatively high in both cases, contributing to bleaching.
    - There are numerous outliers, with some maximum anomalies exceeding **10 K**, highlighting regions that experience extreme thermal anomalies which likely contribute to severe bleaching.

- **SSTA Frequency vs. Bleaching Level (Bottom Left):**
    - The **SSTA Frequency** box plot reveals that both colony and population-level bleaching events occur in regions with frequent thermal anomalies.
    - The median frequency is around **8-10 events**, with the IQR extending from **3 to 15 events**, indicating that these coral regions are often exposed to repeated thermal stress.
    - Outliers indicate that some regions experience extremely frequent anomalies, with frequencies exceeding **30-40 events**, suggesting chronic stress in these areas.

- **SSTA DHW (Degree Heating Weeks) vs. Bleaching Level (Bottom Right):**
    - The **SSTA DHW** box plot shows that prolonged thermal stress (measured in degree heating weeks) is an important factor for both colony-level and population-level bleaching.
    - The median DHW is relatively low (around **5**), but the IQR extends to **15**, suggesting that many regions experience moderate to prolonged heating.
    - Outliers in DHW, with values exceeding **30-40 weeks**, highlight regions that experience extreme and prolonged thermal stress, increasing the likelihood of severe bleaching events.


<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li>Both colony and population-level bleaching events are influenced by similar patterns of <strong>SSTA</strong>, with slightly elevated sea surface temperature anomalies playing a significant role in coral bleaching.</li>
        <li><strong>SSTA Maximum</strong> values show that extreme thermal anomalies (above 3 K) are prevalent in areas experiencing both types of bleaching, with some regions exposed to even more extreme maximum values.</li>
        <li><strong>SSTA Frequency</strong> suggests that repeated exposure to thermal anomalies is a key driver of both colony and population-level bleaching, with some areas experiencing an exceptionally high frequency of thermal stress.</li>
        <li><strong>SSTA DHW</strong> highlights that prolonged heat stress is a major factor in coral bleaching, with some regions enduring extreme heating over prolonged periods, which significantly elevates bleaching severity.</li>
    </ul>
</div>


These box plots illustrate the importance of thermal stress—both in terms of frequency and duration—in driving coral bleaching at both the colony and population levels. Monitoring these metrics is essential for predicting and mitigating coral reef damage.


In [None]:
# Custom color palette for the plots
custom_palette = ['#1192e8', '#a56eff', '#ff832b', '#198038']  # Blue, Pink, Red, Green

# Creating a figure for separate box plots
fig, axs = plt.subplots(2, 2, figsize=(18, 12))

# TSA vs Bleaching Level (Blue for TSA)
sns.boxplot(x='Bleaching_Level', y='TSA', data=data_for_eda, ax=axs[0, 0], palette=[custom_palette[0]], linewidth=2,
            boxprops=dict(edgecolor='black'),
            whiskerprops=dict(color='black'), capprops=dict(color='black'),
            medianprops=dict(color='black'),
            flierprops=dict(marker='o', markersize=5, markerfacecolor=custom_palette[0], markeredgewidth=0))
axs[0, 0].set_title('TSA vs Bleaching Level')
axs[0, 0].set_xlabel('Bleaching Level')
axs[0, 0].set_ylabel('TSA')

# TSA_Maximum vs Bleaching Level (Pink for TSA_Maximum)
sns.boxplot(x='Bleaching_Level', y='TSA_Maximum', data=data_for_eda, ax=axs[0, 1], palette=[custom_palette[1]], linewidth=2,
            boxprops=dict(edgecolor='black'),
            whiskerprops=dict(color='black'), capprops=dict(color='black'),
            medianprops=dict(color='black'),
            flierprops=dict(marker='o', markersize=5, markerfacecolor=custom_palette[1], markeredgewidth=0))
axs[0, 1].set_title('TSA_Maximum vs Bleaching Level')
axs[0, 1].set_xlabel('Bleaching Level')
axs[0, 1].set_ylabel('TSA Maximum')

# TSA_Frequency vs Bleaching Level (Red for TSA_Frequency)
sns.boxplot(x='Bleaching_Level', y='TSA_Frequency', data=data_for_eda, ax=axs[1, 0], palette=[custom_palette[2]], linewidth=2,
            boxprops=dict(edgecolor='black'),
            whiskerprops=dict(color='black'), capprops=dict(color='black'),
            medianprops=dict(color='black'),
            flierprops=dict(marker='o', markersize=5, markerfacecolor=custom_palette[2], markeredgewidth=0))
axs[1, 0].set_title('TSA_Frequency vs Bleaching Level')
axs[1, 0].set_xlabel('Bleaching Level')
axs[1, 0].set_ylabel('TSA Frequency')

# TSA_DHW vs Bleaching Level (Green for TSA_DHW)
sns.boxplot(x='Bleaching_Level', y='TSA_DHW', data=data_for_eda, ax=axs[1, 1], palette=[custom_palette[3]], linewidth=2,
            boxprops=dict(edgecolor='black'),
            whiskerprops=dict(color='black'), capprops=dict(color='black'),
            medianprops=dict(color='black'),
            flierprops=dict(marker='o', markersize=5, markerfacecolor=custom_palette[3], markeredgewidth=0))
axs[1, 1].set_title('TSA_DHW vs Bleaching Level')
axs[1, 1].set_xlabel('Bleaching Level')
axs[1, 1].set_ylabel('TSA DHW')

# Iterate through each box to set the transparency for the box face
for ax in axs.flat:
    for artist in ax.artists:
        color = artist.get_facecolor()
        artist.set_edgecolor('black')  # Black edges
        artist.set_alpha(0.5)  # Set transparency for the box face

# Adjust layout to give proper spacing between plots
plt.tight_layout()

# Show the plots
plt.show()

🔎 **Observations:**

- **TSA vs. Bleaching Level (Top Left):**
    - The **TSA (Thermal Stress Anomaly)** box plot shows a similar distribution for both colony-level and population-level bleaching.  
    - The median TSA is near **0** for both, with the interquartile range (IQR) extending from approximately **-1.5 to 1 K**, indicating that coral bleaching occurs at small positive thermal stress deviations.   
    - Outliers show that some areas experience extreme TSA values, reaching up to **5 K** and as low as **-12.5 K**, showing significant variability in thermal stress across different regions.

- **TSA Maximum vs. Bleaching Level (Top Right):**
    - The **TSA Maximum** box plot indicates a similar distribution for both colony and population-level bleaching, with a median around **3-4 K** and an IQR between **2.5 to 4.5 K**.    
    - Outliers extend to **12 K** or more, suggesting that in some regions, the maximum thermal stress can be extremely high, contributing to severe bleaching events.

- **TSA Frequency vs. Bleaching Level (Bottom Left):**
    - **TSA Frequency** (the number of thermal stress events) shows that both colony-level and population-level bleaching events are associated with moderate thermal stress frequencies, with a median of about **5 events**.
    - The IQR extends to **10 events**, indicating that some regions experience frequent thermal stress.
    - Outliers indicate that some areas experience over **20-30 thermal stress events**, leading to chronic stress and likely severe bleaching.

- **TSA DHW (Degree Heating Weeks) vs. Bleaching Level (Bottom Right):**
    - The **TSA DHW** box plot shows a similar distribution for both colony and population bleaching, with a median value near **0** and an IQR extending to **5-10**.
    - Outliers show that some areas experience up to **50 DHW**, indicating prolonged exposure to high temperatures, which could drive extreme bleaching events.
    - There is a greater concentration of extreme TSA DHW outliers for population-level bleaching, suggesting that prolonged stress is more likely to affect entire coral populations.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>TSA values</strong> are generally close to 0, but both colony and population-level bleaching are driven by positive thermal stress deviations, with some regions experiencing extreme stress.</li>
        <li><strong>TSA Maximum values</strong> exceeding 5 K indicate regions of extreme thermal stress, which are likely critical contributors to severe bleaching events.</li>
        <li><strong>Frequent thermal stress events (TSA Frequency)</strong> are associated with both types of bleaching, with some regions experiencing chronic thermal stress that can lead to severe and long-lasting damage.</li>
        <li><strong>TSA DHW</strong> demonstrates that prolonged heat stress is a key factor in coral bleaching, with some areas experiencing extreme heat stress over extended periods, which affects both colony and population-level bleaching.</li>
    </ul>
</div>


These findings highlight that prolonged and repeated thermal stress plays a significant role in driving both colony and population-level coral bleaching, with some regions experiencing extreme thermal stress that pushes coral ecosystems to critical points of bleaching and degradation.


#### <b>3.8.4 <span style='color:#6495ED'>|</span> Temporal Trends</b> 

<div style="border-radius:10px;padding: 15px;background-color:#6495ED;color:white;font-size:100%;text-align:left">
    📝 <strong>Key Takeaways:</strong>
    <ul>
        <li>The rising <strong>SST Trend</strong> over time indicates long-term ocean warming, contributing to coral stress and potential bleaching events.</li>
        <li><strong>SST Maximum peaks</strong> are closely linked to higher bleaching percentages, underscoring the impact of extreme heat events on coral reef health.</li>
        <li>While recent years show lower bleaching percentages, the persistent presence of high <strong>SST Maximum</strong> values indicates that coral reefs remain at risk of future bleaching due to continued thermal stress.</li>
        <li><strong>Frequent and prolonged thermal stress</strong> (as shown by <strong>SSTA Frequency</strong> and <strong>SSTA DHW</strong>) is closely tied to increased bleaching, particularly during periods of repeated or sustained heat anomalies.</li>
        <li>Although <strong>SSTA Maximum</strong> remains consistently high, the correlation between thermal stress duration and bleaching suggests that prolonged heat exposure has a more severe impact than short-term temperature spikes.</li>
        <li><strong>TSA Frequency</strong> and <strong>TSA DHW</strong> show that extended periods of elevated temperatures are key drivers of severe bleaching, with the highest bleaching percentages corresponding to prolonged thermal stress events.</li>
        <li>Recent trends show fewer extreme bleaching events, but the potential for future bleaching remains due to continued exposure to frequent and prolonged heat anomalies.</li>
    </ul>
</div>

In [None]:
# Extract the year from the 'Date' column
data_for_eda['Year'] = data_for_eda['Date'].dt.year

# Group the data by year and calculate the mean for SST, SST_Maximum, and Percent Bleaching
grouped_data = data_for_eda.groupby('Year').agg({
    'SST': 'mean',
    'SST_Maximum': 'mean',
    'Percent_Bleaching': 'mean'
}).reset_index()

# Plotting the temporal trend using dual y-axis
fig, ax1 = plt.subplots(figsize=(14, 7))

# Plot SST and SST_Maximum on the first y-axis
ax1.plot(grouped_data['Year'], grouped_data['SST'], color='#1192e8', label='SST')  # Blue for SST
ax1.plot(grouped_data['Year'], grouped_data['SST_Maximum'], color='#a56eff', label='SST_Maximum')  # Pink for SST_Maximum
ax1.set_xlabel('Year')
ax1.set_ylabel('Average SST (K)', color='#1192e8')  # Blue for SST label and ticks
ax1.tick_params(axis='y', labelcolor='#1192e8')

# Create a second y-axis for Percent Bleaching
ax2 = ax1.twinx()
ax2.plot(grouped_data['Year'], grouped_data['Percent_Bleaching'], color='#da1e28', label='Percent Bleaching')  # Red for Percent Bleaching
ax2.set_ylabel('Average Percent Bleaching (%)', color='#da1e28')  # Red for Percent Bleaching label and ticks
ax2.tick_params(axis='y', labelcolor='#da1e28')

# Add legends
ax1.legend(loc='upper left')
ax2.legend(loc='upper right')

# Set title and show plot
plt.title('Temporal Trend of SST, SST_Maximum vs Percent Bleaching (Grouped by Year)')
fig.tight_layout()
plt.show()

🔎 **Observations:**

- **SST (Sea Surface Temperature) Trends:**
    - The **SST** (blue line) shows variability over time, with fluctuations between **300 K (~27°C)** and **303 K (~30°C)**.
    - In recent years (post-2010), there has been a gradual increase in SST, with average values consistently around **302 K (~29°C)**.
    - This rise in SST over time is an indicator of long-term warming trends in sea surface temperatures, which could contribute to coral stress and bleaching.

- **SST Maximum Trends:**
    - The **SST Maximum** (green line) shows a consistently higher trend compared to SST, ranging from **304 K to 307 K (~31-34°C)**.
    - The SST Maximum values have fluctuated over the years, peaking around **1990** and the early **2000s**, with slight decreases in the last decade.
    - Despite fluctuations, the SST Maximum remains elevated, suggesting frequent occurrences of extreme heat events that could stress coral reefs.

- **Percent Bleaching Trends:**
    - The **Percent Bleaching** (red line) shows significant fluctuations over the years, with peaks in the early **1980s**, mid-**1990s**, and early **2000s**.
    - There is a clear spike in percent bleaching corresponding to periods of higher SST Maximum values, particularly around **1985**, **1995**, and **2005**, suggesting that extreme heat events lead to increased coral bleaching.
    - More recently, the bleaching percentages have been relatively lower (below **20%**), despite consistent high SST Maximum values, possibly indicating regional variations or periods of recovery.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li>The increasing <strong>SST Trend</strong> over time reflects the gradual warming of oceans, contributing to long-term coral stress and potential bleaching.</li>
        <li><strong>SST Maximum peaks</strong> are closely associated with higher bleaching percentages, emphasizing the role of extreme heat events in driving severe coral bleaching episodes.</li>
        <li>Despite periods of lower bleaching in the most recent years, the continued presence of high <strong>SST Maximum</strong> values suggests that coral reefs remain at risk of severe bleaching due to ongoing thermal stress.</li>
    </ul>
</div>

This visualization highlights the critical relationship between extreme SST events and coral bleaching, reinforcing the importance of monitoring both average and maximum sea surface temperatures to predict and manage coral reef health in the context of climate change.

In [None]:
# Grouping by year and calculating the mean for all relevant SSTA columns and Percent_Bleaching
grouped_data = data_for_eda.groupby('Year')[['SSTA', 'SSTA_Maximum', 'SSTA_Frequency', 'SSTA_DHW', 'Percent_Bleaching']].mean()

# Creating a dual y-axis plot
fig, ax1 = plt.subplots(figsize=(14, 8))

# Plot SSTA, SSTA_Maximum, SSTA_Frequency, and SSTA_DHW on the left y-axis
ax1.plot(grouped_data.index, grouped_data['SSTA'], color='#1192e8', label='SSTA')  # Blue for SSTA
ax1.plot(grouped_data.index, grouped_data['SSTA_Maximum'], color='#a56eff', label='SSTA Maximum')  # Pink for SSTA Maximum
ax1.plot(grouped_data.index, grouped_data['SSTA_Frequency'], color='#ff832b', label='SSTA Frequency')  # Red for SSTA Frequency
ax1.plot(grouped_data.index, grouped_data['SSTA_DHW'], color='#198038', label='SSTA DHW')  # Green for SSTA DHW
ax1.set_xlabel('Year')
ax1.set_ylabel('SSTA Variables', color='#1192e8')  # Blue for the y-axis label and ticks for SSTA variables
ax1.tick_params(axis='y', labelcolor='#1192e8')

# Creating the second y-axis
ax2 = ax1.twinx()

# Plot Percent_Bleaching on the right y-axis
ax2.plot(grouped_data.index, grouped_data['Percent_Bleaching'], color='#da1e28', label='Percent Bleaching')  # Red for Percent Bleaching
ax2.set_ylabel('Percent Bleaching (%)', color='#da1e28')  # Red for Percent Bleaching label and ticks
ax2.tick_params(axis='y', labelcolor='#da1e28')

# Adding a title
plt.title('Temporal Trends of SSTA Variables and Percent Bleaching Grouped by Year')

# Adding legends
ax1.legend(loc='upper center')
ax2.legend(loc='upper right')

# Show the plot
plt.tight_layout()
plt.show()


🔎 **Observations:**

- **SSTA (Sea Surface Temperature Anomaly) Trends:**
    - The **SSTA** (blue line) shows relatively low variability over time, mostly remaining between **0 and 2 K**. This indicates that average temperature anomalies are modest across the years, though even small deviations could impact coral bleaching.    
    - There is no clear upward or downward trend in SSTA, indicating that anomalies remain relatively stable over time.

- **SSTA Maximum Trends:**
    - The **SSTA Maximum** (green line) fluctuates around **3-4 K** across the years, with no significant increases or decreases over time.
    - These maximum temperature anomalies remain consistently elevated, suggesting that corals are frequently exposed to high short-term temperature extremes.

- **SSTA Frequency Trends:**
    - The **SSTA Frequency** (orange line) fluctuates significantly, with high peaks in the **1980s** and mid-**2000s**, where frequent thermal anomalies were observed.
    - The frequency of thermal anomalies tends to correspond to higher bleaching events, especially in the **1980s** and **1990s**, with a more stable but still elevated frequency in recent years.

- **SSTA DHW (Degree Heating Weeks) Trends:**
    - The **SSTA DHW** (purple line) follows a similar pattern to SSTA Frequency, with peaks during the **1980s** and **1990s**, and fluctuations in the **2000s**.
    - Higher DHW values indicate prolonged exposure to thermal stress, and during periods of high DHW, we observe increased coral bleaching percentages.

- **Percent Bleaching Trends:**
    - The **Percent Bleaching** (red line) shows significant spikes in the early **1980s**, mid-**1990s**, and early **2000s**, closely following periods of high SSTA Frequency and SSTA DHW.  
    - Although percent bleaching has decreased in recent years, it still corresponds to periods of elevated SSTA Frequency and prolonged heat stress (SSTA DHW).


<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li>Frequent and prolonged thermal stress (as indicated by <strong>SSTA Frequency</strong> and <strong>SSTA DHW</strong>) is closely linked to higher bleaching percentages, with coral reefs experiencing severe bleaching when exposed to frequent thermal anomalies or extended periods of heat stress.</li>
        <li><strong>SSTA Maximum</strong> values remain consistently elevated, suggesting that even short-term extreme temperature spikes may not be the primary driver of bleaching, but prolonged or frequent thermal stress is more impactful.</li>
        <li>Recent trends show lower bleaching percentages, but the continued presence of high <strong>SSTA Frequency</strong> and <strong>DHW outliers</strong> suggests that coral reefs remain vulnerable to future bleaching events due to persistent thermal anomalies.</li>
    </ul>
</div>


These trends emphasize the importance of monitoring both short-term temperature anomalies and the frequency and duration of thermal stress to understand the potential for future coral bleaching events.


In [None]:
# Grouping by year and calculating the mean for all relevant TSA columns and Percent_Bleaching
grouped_data = data_for_eda.groupby('Year')[['TSA', 'TSA_Maximum', 'TSA_Frequency', 'TSA_DHW', 'Percent_Bleaching']].mean()

# Creating a dual y-axis plot
fig, ax1 = plt.subplots(figsize=(14, 8))

# Plot TSA, TSA_Maximum, TSA_Frequency, and TSA_DHW on the left y-axis
ax1.plot(grouped_data.index, grouped_data['TSA'], color='#1192e8', label='TSA')  # Blue for TSA
ax1.plot(grouped_data.index, grouped_data['TSA_Maximum'], color='#a56eff', label='TSA Maximum')  # Pink for TSA Maximum
ax1.plot(grouped_data.index, grouped_data['TSA_Frequency'], color='#ff832b', label='TSA Frequency')  # Red for TSA Frequency
ax1.plot(grouped_data.index, grouped_data['TSA_DHW'], color='#198038', label='TSA DHW')  # Green for TSA DHW
ax1.set_xlabel('Year')
ax1.set_ylabel('TSA Variables', color='#1192e8')  # Blue for TSA label and ticks
ax1.tick_params(axis='y', labelcolor='#1192e8')

# Creating the second y-axis
ax2 = ax1.twinx()

# Plot Percent_Bleaching on the right y-axis
ax2.plot(grouped_data.index, grouped_data['Percent_Bleaching'], color='#da1e28', label='Percent Bleaching')  # Red for Percent Bleaching
ax2.set_ylabel('Percent Bleaching (%)', color='#da1e28')  # Red for Percent Bleaching label and ticks
ax2.tick_params(axis='y', labelcolor='#da1e28')

# Adding a title
plt.title('Temporal Trends of TSA Variables and Percent Bleaching Grouped by Year')

# Adding legends
ax1.legend(loc='upper left', bbox_to_anchor=(0.1, 1))
ax2.legend(loc='upper right')

# Show the plot
plt.tight_layout()
plt.show()

🔎 **Observations:**

- **TSA (Thermal Stress Anomaly) Trends:**
    - The **TSA** (blue line) fluctuates between **-2 K** and **3 K** over time, with occasional periods of negative TSA, indicating cooler conditions relative to normal. However, it shows variability throughout the years, peaking slightly in the mid-**2000s** and mid-**2010s**.
    - In recent years, TSA has stabilized, with values closer to **0**, which may indicate fewer extreme deviations from normal temperatures.

- **TSA Maximum Trends:**
    - The **TSA Maximum** (green line) remains relatively stable over time, fluctuating between **2.5 K** and **3.5 K** across the years. This suggests that short-term thermal stress events remain consistent, with no significant long-term upward trend.
    - While the TSA Maximum shows consistent patterns, it doesn’t reach extreme values, suggesting that prolonged and chronic heat stress might be more critical for coral bleaching than short-term extremes.

- **TSA Frequency Trends:**
    - The **TSA Frequency** (orange line) shows significant fluctuations, particularly in the early **1980s** and **1990s**, where it reached peaks of **6-8 events**. However, it stabilizes after **2000**, indicating fewer frequent heat stress events in recent years.
    - Despite this stabilization, periods of high frequency coincide with increased percent bleaching, suggesting that even moderate thermal stress events can have significant cumulative effects on coral reefs.

- **TSA DHW (Degree Heating Weeks) Trends:**
    - The **TSA DHW** (purple line) shows that periods of prolonged thermal stress were prominent in the **1980s** and mid-**1990s**, peaking in parallel with high percent bleaching events.
    - While TSA DHW has seen reductions post-**2000**, it remains a critical variable in the early years, showing that extended periods of elevated temperatures significantly contributed to bleaching events during those times.

- **Percent Bleaching Trends:**
    - The **Percent Bleaching** (red line) aligns closely with peaks in **TSA DHW** and **TSA Frequency**, especially in the early **1980s**, **1990s**, and mid-**2000s**. These peaks in bleaching are linked to periods of high heat stress, confirming that prolonged and frequent thermal stress events are primary drivers of coral bleaching.  
    - Post-**2010**, bleaching percentages have stabilized at lower levels, even as TSA and TSA Maximum values remain elevated. This suggests that coral bleaching may also depend on other factors beyond temperature anomalies.


<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>TSA Frequency</strong> and <strong>TSA DHW</strong> (prolonged heat stress) are strongly associated with increased coral bleaching percentages, especially in years with high peaks of both variables.</li>
        <li><strong>TSA Maximum</strong> values remain consistent, showing that while short-term thermal stress is a factor, frequent and prolonged heat stress events (chronic stress) play a more critical role in driving coral bleaching.</li>
        <li>Prolonged exposure to elevated temperatures in the form of <strong>TSA DHW</strong> appears to be one of the strongest indicators of severe bleaching, with high peaks in TSA DHW directly correlating with the highest percent bleaching episodes.</li>
        <li>Recent years show lower bleaching percentages, potentially due to reduced frequency and duration of thermal stress events, although corals remain vulnerable to future heat events.</li>
    </ul>
</div>


This analysis demonstrates the importance of understanding both the duration and frequency of thermal stress events to predict and mitigate coral bleaching under future climate scenarios.


### <b>3.9 <span style='color:#6495ED'>|</span> Impact of Other Environmental Factors</b> 

#### <b>3.9.1 <span style='color:#6495ED'>|</span> Univariate Analysis</b> 

<div style="border-radius:10px;padding: 15px;background-color:#6495ED;color:white;font-size:100%;text-align:left">
📝 <strong>Key Takeaways:</strong>
<ul>
    <li><strong>Distance to Shore:</strong> Nearshore reefs face greater human stress (pollution, runoff), while offshore reefs are exposed to oceanic stressors like temperature anomalies and storms.</li>
    <li><strong>Turbidity:</strong> Low turbidity supports coral health, but higher levels reduce light penetration, increasing coral stress and risk of bleaching.</li>
    <li><strong>Depth:</strong> Shallow reefs are more exposed to thermal stress and human activities, while deeper reefs face light limitations but are less affected by surface temperature extremes.</li>
    <li><strong>Cyclone Frequency:</strong> Cyclones can cause direct damage to coral, but also offer relief by cooling surface waters, reducing thermal stress.</li>
    <li><strong>Windspeed:</strong> Moderate winds help cool sea surfaces and mitigate stress, but strong winds during storms can cause physical damage to reefs, increasing bleaching risks.</li>
    <li><strong>Exposure:</strong> Sheltered reefs are protected from strong wave action but face increased vulnerability to heat stress due to limited water circulation.</li>
</ul>
</div>


In [None]:
# Other environmental factors that could be influencing the bleaching events
other_factors = [
    'Distance_to_Shore', 'Turbidity', 'Depth_m', 'Cyclone_Frequency', 'Windspeed', 'Exposure'
]

data_for_eda[other_factors].describe(include='all')

🔎 **Observations:**

- **Distance to Shore:** Coral sites range from **3 meters to 299 kilometers** offshore, with an average of **4 km**. Nearshore reefs face more localized stress (sedimentation, pollution), while offshore reefs deal with open-ocean stressors (temperature anomalies, storms).

- **Turbidity:** Turbidity is generally low, averaging **0.07** but can reach **1.28**. Low turbidity supports coral health by promoting sunlight penetration, while high turbidity reduces light, potentially leading to coral stress or bleaching.

- **Depth:** Coral depths range from **0 to 50.3 meters** (average **7.36 m**). Shallow reefs are more exposed to thermal stress and human activities, while deeper reefs face light limitations but are less affected by surface temperature extremes.

- **Cyclone Frequency:** Coral sites experience an average of **52 cyclones**, with some regions facing over **100**. Cyclones can damage coral but may also reduce thermal stress by cooling surface waters.

- **Windspeed:** Wind speeds average **4.79 m/s** (up to **15 m/s**). Moderate winds help cool the sea surface, but strong winds can cause physical damage, increasing bleaching risks.

- **Exposure:** Most coral sites are **sheltered**, reducing physical damage from waves but increasing the risk of temperature anomalies due to limited water circulation.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Distance to Shore</strong> affects exposure to different stressors, with nearshore and offshore reefs facing distinct challenges.</li>
        <li><strong>Turbidity</strong> impacts sunlight penetration, with higher levels contributing to coral stress and bleaching over time.</li>
        <li><strong>Depth</strong> influences coral vulnerability, with shallow reefs facing thermal stress and deeper reefs limited by light availability.</li>
        <li><strong>Cyclone Frequency</strong> has both harmful and cooling effects, making it a complex factor in coral resilience.</li>
        <li><strong>Windspeed</strong> can mitigate thermal stress at moderate levels but increase physical damage during storms.</li>
        <li><strong>Exposure</strong> influences protection from waves but may increase vulnerability to temperature anomalies in sheltered areas.</li>
    </ul>
</div>

In [None]:
# Set up the figure for the histograms, bar plot, and line plot
plt.figure(figsize=(18, 16))

# Custom colors with the added color #ff832b
colors = ['#1192e8', '#a56eff', '#fa4d56', '#198038', '#ff832b']  # Custom color palette

# Plot histogram for Distance_to_Shore (Blue)
plt.subplot(3, 2, 1)
sns.histplot(data_for_eda['Distance_to_Shore'], bins=50, kde=True, color=colors[0])
plt.title('Histogram of Distance to Shore')
plt.xlabel('Distance to Shore')
plt.ylabel('Frequency')

# Plot histogram for Turbidity (Pink)
plt.subplot(3, 2, 2)
sns.histplot(data_for_eda['Turbidity'], bins=50, kde=True, color=colors[1])
plt.title('Histogram of Turbidity')
plt.xlabel('Turbidity')
plt.ylabel('Frequency')

# Plot histogram for Depth_m (Red)
plt.subplot(3, 2, 3)
sns.histplot(data_for_eda['Depth_m'], bins=50, kde=True, color=colors[2])
plt.title('Histogram of Depth (m)')
plt.xlabel('Depth (m)')
plt.ylabel('Frequency')

# Plot histogram for Cyclone_Frequency (Green)
plt.subplot(3, 2, 4)
sns.histplot(data_for_eda['Cyclone_Frequency'], bins=50, kde=True, color=colors[3])
plt.title('Histogram of Cyclone Frequency')
plt.xlabel('Cyclone Frequency')
plt.ylabel('Frequency')

# Plot histogram for Windspeed (Custom Orange #ff832b)
plt.subplot(3, 2, 5)
sns.histplot(data_for_eda['Windspeed'], bins=50, kde=True, color=colors[4])
plt.title('Histogram of Windspeed')
plt.xlabel('Windspeed')
plt.ylabel('Frequency')

# Plot bar plot for Exposure with 3 distinct colors
plt.subplot(3, 2, 6)
sns.countplot(x='Exposure', data=data_for_eda, palette=[colors[0], colors[1], colors[2]])  # Use 3 colors
plt.title('Bar Plot of Exposure')
plt.xlabel('Exposure')
plt.ylabel('Count')

# Adjust layout for better readability
plt.tight_layout()

# Show the plot
plt.show()

🔎 **Observations:**

- **Distance to Shore (Top left):** Most coral sites are located near shore, within **10 km**, with a few outliers up to **300 km** offshore. Nearshore reefs are more exposed to human impacts like runoff, while offshore reefs face oceanic threats.

- **Turbidity (Top right):** Turbidity is generally low, with most values below **0.1**. Higher turbidity, often caused by runoff, can reduce light and stress corals, while low turbidity supports photosynthesis and coral health.

- **Depth (Middle left):** Coral reefs are mostly found in shallow waters (**0-20 meters**). Shallow reefs are more prone to thermal stress and bleaching, while deeper reefs may be less affected by surface temperature changes but face limited light.

- **Cyclone Frequency (Middle right):** Coral sites typically experience **40-60 cyclones**, with some regions facing up to **100**. Cyclones can damage reefs but also cool waters, providing temporary thermal relief.

- **Windspeed (Bottom left):** Wind speeds generally range from **3 to 7 m/s**. Moderate winds reduce thermal stress, while stronger winds during storms can cause physical damage to reefs.

- **Exposure (Bottom right):** Most coral sites are **sheltered** (~12,000 sites), offering protection from strong waves, though limited water circulation increases heat stress risks during warm periods.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Distance to Shore:</strong> Nearshore reefs face human impacts, while offshore reefs are influenced by broader oceanic factors.</li>
        <li><strong>Turbidity:</strong> Low turbidity supports coral health, while higher levels reduce light and can stress reefs.</li>
        <li><strong>Depth:</strong> Shallow reefs are more vulnerable to thermal stress; deeper reefs may handle temperature changes better but face light limitations.</li>
        <li><strong>Cyclone Frequency:</strong> Cyclones can cause damage but may reduce thermal stress, influencing reef resilience.</li>
        <li><strong>Windspeed:</strong> Moderate wind speeds cool waters, while high speeds from storms can physically damage reefs.</li>
        <li><strong>Exposure:</strong> Sheltered reefs are protected from waves but more vulnerable to heat stress due to limited water circulation.</li>
    </ul>
</div>

Coral reefs are shaped by factors like **distance to shore, turbidity, depth, cyclone frequency, windspeed**, and **exposure**. Nearshore reefs face human-driven stress, while offshore and deeper reefs are influenced by oceanic conditions, making these factors key in predicting coral resilience and mitigating bleaching risks.

In [None]:
# Custom color palette for the box plots
custom_palette = ['#1192e8', '#a56eff', '#fa4d56', '#198038', '#ff832b']  # Blue, Pink, Red, Green, Orange

# Set up the figure for the box plots in a 2x3 layout (3 columns and 2 rows)
fig, axs = plt.subplots(2, 3, figsize=(18, 12))

# Box plot for Distance_to_Shore (Blue)
sns.boxplot(y=data_for_eda['Distance_to_Shore'], color=custom_palette[0], linewidth=2,
            boxprops=dict(edgecolor='black'),
            whiskerprops=dict(color='black'), capprops=dict(color='black'),
            medianprops=dict(color='black'),
            flierprops=dict(marker='o', markersize=5, markerfacecolor=custom_palette[0], markeredgewidth=0), ax=axs[0, 0])
axs[0, 0].set_title('Box Plot of Distance to Shore')

# Box plot for Turbidity (Pink)
sns.boxplot(y=data_for_eda['Turbidity'], color=custom_palette[1], linewidth=2,
            boxprops=dict(edgecolor='black'),
            whiskerprops=dict(color='black'), capprops=dict(color='black'),
            medianprops=dict(color='black'),
            flierprops=dict(marker='o', markersize=5, markerfacecolor=custom_palette[1], markeredgewidth=0), ax=axs[0, 1])
axs[0, 1].set_title('Box Plot of Turbidity')

# Box plot for Depth_m (Red)
sns.boxplot(y=data_for_eda['Depth_m'], color=custom_palette[2], linewidth=2,
            boxprops=dict(edgecolor='black'),
            whiskerprops=dict(color='black'), capprops=dict(color='black'),
            medianprops=dict(color='black'),
            flierprops=dict(marker='o', markersize=5, markerfacecolor=custom_palette[2], markeredgewidth=0), ax=axs[0, 2])
axs[0, 2].set_title('Box Plot of Depth (m)')

# Box plot for Cyclone_Frequency (Green)
sns.boxplot(y=data_for_eda['Cyclone_Frequency'], color=custom_palette[3], linewidth=2,
            boxprops=dict(edgecolor='black'),
            whiskerprops=dict(color='black'), capprops=dict(color='black'),
            medianprops=dict(color='black'),
            flierprops=dict(marker='o', markersize=5, markerfacecolor=custom_palette[3], markeredgewidth=0), ax=axs[1, 0])
axs[1, 0].set_title('Box Plot of Cyclone Frequency')

# Box plot for Windspeed (Orange)
sns.boxplot(y=data_for_eda['Windspeed'], color=custom_palette[4], linewidth=2,
            boxprops=dict(edgecolor='black'),
            whiskerprops=dict(color='black'), capprops=dict(color='black'),
            medianprops=dict(color='black'),
            flierprops=dict(marker='o', markersize=5, markerfacecolor=custom_palette[4], markeredgewidth=0), ax=axs[1, 1])
axs[1, 1].set_title('Box Plot of Windspeed')

# Hide the empty subplot (bottom right)
fig.delaxes(axs[1, 2])

# Iterate through each box to set the transparency for the box face
for ax in axs.flat:
    if ax in axs.flat:  # Ensure the empty axis is excluded
        for artist in ax.artists:
            artist.set_edgecolor('black')  # Black edges
            artist.set_alpha(0.5)  # Set transparency for the box face

# Adjust layout for better readability
plt.tight_layout()

# Show the plot
plt.show()

🔎 **Observations:**

- **Distance to Shore (Top left):** Most coral sites are nearshore, with a few outliers beyond **50,000 meters**. Nearshore reefs are more exposed to human stressors like pollution, while offshore reefs face different environmental conditions.

- **Turbidity (Top middle):** Turbidity is generally low, supporting coral health, but outliers with higher turbidity suggest areas affected by runoff or sediment, which can reduce light and increase coral stress.

- **Depth (Top right):** Most reefs are found at depths of **5 to 15 meters**, with some reaching up to **50 meters**. Shallow reefs face more thermal stress, while deeper reefs may benefit from cooler temperatures but struggle with light availability.

- **Cyclone Frequency (Bottom left):** Most coral sites experience **40 to 60 cyclones**, with some facing over **80**. Cyclones can damage reefs but may also provide thermal relief by cooling the waters.

- **Windspeed (Bottom right):** Wind speeds range from **3 to 7 m/s**, with outliers up to **14 m/s**. Moderate winds reduce thermal stress, while high winds can cause physical damage during storms.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Distance to Shore:</strong> Nearshore reefs are more exposed to human impacts, while offshore reefs face oceanic stressors.</li>
        <li><strong>Turbidity:</strong> Low turbidity supports coral health, but higher levels reduce light and increase stress.</li>
        <li><strong>Depth:</strong> Shallow reefs are more prone to temperature fluctuations, while deeper reefs face light limitations.</li>
        <li><strong>Cyclone Frequency:</strong> Cyclones can be damaging but also help cool waters, reducing thermal stress.</li>
        <li><strong>Windspeed:</strong> Moderate winds reduce thermal stress, while high winds cause physical damage during storms.</li>
    </ul>
</div>

The box plots show how factors like **distance to shore, turbidity, depth, cyclone frequency**, and **windspeed** vary across coral sites, influencing their vulnerability to bleaching. These factors are critical for understanding coral resilience and guiding conservation efforts.

#### <b>3.9.2 <span style='color:#6495ED'>|</span> Affects on Bleaching Percent</b> 

<div style="border-radius:10px;padding: 15px;background-color:#6495ED;color:white;font-size:100%;text-align:left">
📝 <strong>Key Takeaways:</strong>
<ul>
    <li><strong>Distance to Shore:</strong> Both nearshore and far-offshore reefs experience higher bleaching severity, while mid-range reefs (10-50 km) show lower bleaching, indicating a non-linear relationship with bleaching.</li>
    <li><strong>Depth:</strong> Deeper reefs (20-30 meters) experience the highest bleaching severity, while shallow reefs (<10 meters) tend to have lower bleaching, suggesting a complex relationship between depth and stress factors like light availability and temperature.</li>
    <li><strong>Turbidity:</strong> Moderate to high turbidity (0.5-1.0) correlates with higher bleaching severity, but very high or very low turbidity can reduce bleaching, indicating a non-linear relationship.</li>
    <li><strong>Cyclone Frequency:</strong> High cyclone frequency (80-100 cyclones) increases bleaching, while moderate cyclone activity (48-80 cyclones) may help reduce thermal stress, suggesting a dual role of cyclones in coral bleaching.</li>
    <li><strong>Windspeed:</strong> Moderate wind speeds (5-6 m/s, 10-12 m/s) correlate with higher bleaching, while very low or very high windspeeds show lower bleaching, likely due to cooling or reduced stress effects.</li>
    <li><strong>Exposure:</strong> Reefs that are "Sometimes" exposed face the highest bleaching severity, while consistently sheltered reefs show the lowest, indicating that fluctuating exposure leads to higher stress.</li>
</ul>
</div>

##### <b>3.9.2.1 <span style='color:#6495ED'>|</span> Distance to Shore</b> 

In [None]:
# Adjusting bins and labels based on the provided statistics
bins = [0, 158.91, 658.44, 2376.26, 10000, 20000, 50000, 100000, 200000, 300000]
labels = ['<0.16 km', '0.16-0.66 km', '0.66-2.4 km', '2.4-10 km', '10-20 km', '20-50 km', '50-100 km', '100-200 km', '200-300 km']

data_for_eda['Distance_Bin'] = pd.cut(data_for_eda['Distance_to_Shore'], bins=bins, labels=labels)

# Calculate the mean Percent_Bleaching for each distance bin
bleaching_by_distance = data_for_eda.groupby('Distance_Bin')['Percent_Bleaching'].mean().reset_index()

# Define the colors based on the Percent Bleaching (above 15% = pink, below 15% = blue)
colors = ['#a56eff' if val > 15 else '#1192e8' for val in bleaching_by_distance['Percent_Bleaching']]

# Create a figure for the subplots (1 row, 2 columns)
fig, axs = plt.subplots(1, 2, figsize=(16, 6))

# Plot 1: Bar plot for relationship between distance to shore and bleaching severity
axs[0].bar(bleaching_by_distance['Distance_Bin'], bleaching_by_distance['Percent_Bleaching'], color=colors)
axs[0].set_xlabel('Distance to Shore (km)')
axs[0].set_ylabel('Average Percent Bleaching')
axs[0].set_title('Relationship Between Distance to Shore and Bleaching Severity')
axs[0].tick_params(axis='x', rotation=45)
axs[0].grid(True)

# Plot 2: Scatter plot with regression lines
sns.scatterplot(x='Distance_to_Shore', y='Percent_Bleaching', data=data_for_eda, color='#1192e8', alpha=0.6, ax=axs[1])
sns.regplot(x='Distance_to_Shore', y='Percent_Bleaching', data=data_for_eda, scatter=False, color='#da1e28', line_kws={"alpha": 0.7}, ax=axs[1], label='Linear Fit')
sns.regplot(x='Distance_to_Shore', y='Percent_Bleaching', data=data_for_eda, scatter=False, color='#a56eff', line_kws={"alpha": 0.7}, order=2, ax=axs[1], label='Polynomial Fit (2nd Degree)')

# Labels, title, and grid for the second plot
axs[1].set_xlabel('Distance to Shore (m)')
axs[1].set_ylabel('Percent Bleaching')
axs[1].set_title('Distance to Shore vs. Percent Bleaching with Linear and Polynomial Regression Lines')
axs[1].grid(True)

# Add legend to the second plot
axs[1].legend()

# Adjust layout
plt.tight_layout()

# Display the plot with subplots
plt.show()

🔎 **Observations:**

- **Relationship Between Distance to Shore and Bleaching Severity (Left Plot):**
    - The bar plot shows varying levels of **Average Percent Bleaching** across different distance ranges from shore. Coral reefs located between **0.6-2.4 km** from shore exhibit the highest average bleaching percentages, reaching about **15-16%**.
    - Reefs farther offshore, especially those between **200-300 km**, also show higher bleaching percentages, close to **20-22%**.
    - Nearshore reefs (<0.16 km) and those in mid-range distances (**10-50 km**) generally experience lower levels of bleaching severity, averaging around **10-12%**.

- **Distance to Shore vs. Percent Bleaching with Linear and Polynomial Regression (Right Plot):**
    - The scatter plot shows **Percent Bleaching** against **Distance to Shore**, with linear and polynomial regression lines overlayed.
    - The **linear regression** (red line) suggests a weak positive correlation between distance to shore and percent bleaching, indicating that as distance from shore increases, bleaching slightly increases as well.
    - The **polynomial regression** (purple line) indicates a more complex relationship, with bleaching first increasing with distance, then decreasing at greater distances. The shaded area shows higher variability in bleaching outcomes at extreme distances from shore.
    - The majority of the data points cluster near **zero distance**, indicating that most coral sites are located nearshore, with relatively low bleaching levels.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Distance to Shore:</strong> Reefs closer to shore and those located far offshore both experience higher bleaching severity, though the relationship is not strictly linear.</li>
        <li><strong>Mid-range Reefs:</strong> Coral sites located at moderate distances from shore (10-50 km) show lower average bleaching severity, suggesting that they may be less exposed to direct stressors.</li>
        <li><strong>Complex Relationship:</strong> The polynomial regression suggests that bleaching severity increases initially with distance but starts to decrease at larger distances, indicating that factors other than proximity to shore may play a role in bleaching outcomes for offshore reefs.</li>
    </ul>
</div>

The analysis indicates that coral bleaching is influenced by distance to shore, but the relationship is not straightforward. Both nearshore and far offshore reefs face significant stress, while mid-range reefs tend to experience less severe bleaching. Understanding the combination of local and oceanic stressors is crucial in assessing coral reef resilience.


##### <b>3.9.2.2 <span style='color:#6495ED'>|</span> Depth</b> 

In [None]:
# Define bins for Depth (based on provided statistics)
depth_bins = [0, 4, 6.5, 10, 20, 30, 50.3]
depth_labels = ['0-4 m', '4-6.5 m', '6.5-10 m', '10-20 m', '20-30 m', '30-50.3 m']

# Assuming data_for_eda has 'Depth_m' and 'Percent_Bleaching' columns
# Create Depth Bins
data_for_eda['Depth_Bin'] = pd.cut(data_for_eda['Depth_m'], bins=depth_bins, labels=depth_labels)

# Calculate the mean Percent_Bleaching for each depth bin
bleaching_by_depth = data_for_eda.groupby('Depth_Bin')['Percent_Bleaching'].mean().reset_index()

# Define the colors based on the Percent Bleaching (above 15% = pink, below 15% = blue)
colors = ['#a56eff' if val > 15 else '#1192e8' for val in bleaching_by_depth['Percent_Bleaching']]

# Create a figure for the subplots (1 row, 2 columns)
fig, axs = plt.subplots(1, 2, figsize=(16, 6))

# Plot 1: Bar plot for relationship between depth and bleaching severity
axs[0].bar(bleaching_by_depth['Depth_Bin'], bleaching_by_depth['Percent_Bleaching'], color=colors)
axs[0].set_xlabel('Depth (m)')
axs[0].set_ylabel('Average Percent Bleaching')
axs[0].set_title('Relationship Between Depth and Bleaching Severity')
axs[0].tick_params(axis='x', rotation=45)
axs[0].grid(True)

# Plot 2: Scatter plot with regression lines
sns.scatterplot(x='Depth_m', y='Percent_Bleaching', data=data_for_eda, color='#1192e8', alpha=0.6, ax=axs[1])
sns.regplot(x='Depth_m', y='Percent_Bleaching', data=data_for_eda, scatter=False, color='#da1e28', line_kws={"alpha": 0.7}, ax=axs[1], label='Linear Fit')
sns.regplot(x='Depth_m', y='Percent_Bleaching', data=data_for_eda, scatter=False, color='#a56eff', line_kws={"alpha": 0.5}, order=2, ax=axs[1], label='Polynomial Fit (2nd Degree)')

# Labels, title, and grid for the second plot
axs[1].set_xlabel('Depth (m)')
axs[1].set_ylabel('Percent Bleaching')
axs[1].set_title('Relationship Between Depth and Bleaching Severity with Regression Lines')
axs[1].grid(True)

# Add legend to the second plot
axs[1].legend()

# Adjust layout
plt.tight_layout()

# Display the plot with subplots
plt.show()

🔎 **Observations:**

- **Relationship Between Depth and Bleaching Severity (Left Plot):**
    - The bar plot shows varying levels of **Average Percent Bleaching** across different depth ranges. Coral reefs found at depths between **20-30 m** exhibit the highest average bleaching percentages, exceeding **30%**.
    - Deeper reefs (from **10 to 50.3 m**) generally show higher bleaching percentages, with averages above **20%**, while shallower reefs (below **10 m**) experience relatively lower bleaching severity, around **10-12%**.

- **Relationship Between Depth and Percent Bleaching with Linear and Polynomial Regression (Right Plot):**
    - The scatter plot shows **Percent Bleaching** against **Depth**, with both linear and polynomial regression lines overlaid.
    - The **linear regression** (red line) indicates a positive correlation, suggesting that as depth increases, the bleaching percentage tends to increase as well, albeit at a moderate rate.
    - The **polynomial regression** (purple line) suggests a more complex relationship, where bleaching increases more steeply at greater depths (above 30 m). The shaded region indicates greater variability in bleaching outcomes at extreme depths.
    - A large number of data points cluster near shallower depths (0-10 m), where bleaching percentages are generally lower.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Depth:</strong> Deeper reefs, particularly those between **20-30 meters**, experience the highest average bleaching severity, indicating that greater depths may not necessarily protect reefs from bleaching.</li>
        <li><strong>Shallow Reefs:</strong> Coral reefs at depths below **10 meters** show lower average bleaching percentages, suggesting they might be less affected by certain stressors, or they may have better access to sunlight for recovery.</li>
        <li><strong>Complex Relationship:</strong> The polynomial regression indicates that bleaching severity may sharply increase at greater depths, highlighting the need to understand how factors like light availability and water temperature interact at deeper levels.</li>
    </ul>
</div>

This analysis suggests that deeper reefs may be more vulnerable to bleaching than previously thought, with depths beyond **20 meters** associated with higher bleaching severity. While shallow reefs are not immune, their average bleaching severity is lower, potentially due to better sunlight penetration. Understanding how environmental factors vary across depth ranges is key to protecting coral ecosystems.

##### <b>3.9.2.3 <span style='color:#6495ED'>|</span> Turbidity</b> 

In [None]:
# Create a figure with 2 subplots (1 row, 2 columns)
fig, axs = plt.subplots(1, 2, figsize=(16, 6))

# Subplot 1: Bar chart for average percent bleaching by Turbidity bins
# Define bins and calculate the mean Percent Bleaching for each bin
turbidity_bins = [0, 0.0395, 0.056, 0.0841, 0.2, 0.5, 1.0, 1.3]
turbidity_labels = ['0-0.04', '0.04-0.06', '0.06-0.08', '0.08-0.2', '0.2-0.5', '0.5-1.0', '1.0-1.3']
data_for_eda['Turbidity_Bin'] = pd.cut(data_for_eda['Turbidity'], bins=turbidity_bins, labels=turbidity_labels)
bleaching_by_turbidity = data_for_eda.groupby('Turbidity_Bin')['Percent_Bleaching'].mean().reset_index()

# Define the colors based on Percent Bleaching (above 15% = pink, below 15% = blue)
colors = ['#a56eff' if val > 15 else '#1192e8' for val in bleaching_by_turbidity['Percent_Bleaching']]

# Bar plot in the first subplot
axs[0].bar(bleaching_by_turbidity['Turbidity_Bin'], bleaching_by_turbidity['Percent_Bleaching'], color=colors)
axs[0].set_xlabel('Turbidity')
axs[0].set_ylabel('Average Percent Bleaching')
axs[0].set_title('Average Percent Bleaching by Turbidity')
axs[0].tick_params(axis='x', rotation=45)
axs[0].grid(True)

# Subplot 2: Scatter plot with linear and polynomial regression for Turbidity vs. Percent Bleaching
sns.scatterplot(x='Turbidity', y='Percent_Bleaching', data=data_for_eda, color='#1192e8', alpha=0.6, ax=axs[1])

# Linear regression line
sns.regplot(x='Turbidity', y='Percent_Bleaching', data=data_for_eda, scatter=False, color='#da1e28', label='Linear Fit', ax=axs[1])

# Polynomial regression line (2nd degree)
sns.regplot(x='Turbidity', y='Percent_Bleaching', data=data_for_eda, scatter=False, color='#a56eff', order=2, label='Polynomial Fit (2nd Degree)', ax=axs[1])

# Labels and title
axs[1].set_xlabel('Turbidity')
axs[1].set_ylabel('Percent Bleaching')
axs[1].set_title('Turbidity vs. Percent Bleaching with Linear and Polynomial Regression')
axs[1].grid(True)
axs[1].legend()

# Adjust layout and display the combined plot
plt.tight_layout()
plt.show()

🔎 **Observations:**

- **Average Percent Bleaching by Turbidity (Left Plot):**
    - The bar plot shows **Average Percent Bleaching** across different turbidity ranges. Reefs in areas with turbidity between **0.5 and 1.0** have the highest bleaching severity, with average bleaching above **20%**.
    - Sites with low turbidity values (between **0.04 and 0.06**) also show moderate bleaching, with averages around **15%**, while those in very high turbidity areas (**1.0-1.3**) and mid-range turbidity levels (**0.06-0.2**) show lower bleaching percentages, around **10-12%**.

- **Turbidity vs. Percent Bleaching with Linear and Polynomial Regression (Right Plot):**
    - The scatter plot shows **Percent Bleaching** against **Turbidity**, with both linear and polynomial regression lines overlaid.
    - The **linear regression** (red line) shows a slight negative relationship between turbidity and bleaching, suggesting that higher turbidity might be associated with slightly lower bleaching, though the effect is minimal.
    - The **polynomial regression** (purple line) suggests a more complex relationship, where bleaching initially decreases with turbidity but then shows a slight increase at higher turbidity values, particularly beyond **0.8**. The shaded area indicates variability in bleaching outcomes at higher turbidity levels.
    - Most data points are clustered near low turbidity values (below **0.2**), with moderate bleaching percentages.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Turbidity:</strong> Sites with moderate to high turbidity levels (<strong>0.5-1.0</strong>) experience the highest bleaching severity, possibly indicating that such levels limit light penetration or cause other stressors that affect coral health.</li>
        <li><strong>Low Turbidity Sites:</strong> Reefs in areas with very low turbidity (below <strong>0.1</strong>) show moderate bleaching levels, suggesting that while low turbidity supports coral health, other factors may contribute to bleaching in these areas.</li>
        <li><strong>Complex Relationship:</strong> The polynomial regression highlights that bleaching may decrease at certain turbidity levels but could increase again as turbidity becomes excessively high, indicating that the relationship between turbidity and bleaching may not be linear.</li>
    </ul>
</div>

The analysis suggests that while low to moderate turbidity levels support coral health, excessively high turbidity could lead to higher bleaching severity. However, the relationship between turbidity and bleaching is complex, with both linear and nonlinear trends evident in the data.


##### <b>3.9.2.4 <span style='color:#6495ED'>|</span> Cyclone Frequency</b> 

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Define bins for Cyclone Frequency (based on the provided statistics)
cyclone_bins = [18.31, 48.18, 52.33, 57.06, 70, 80, 100, 105.8]
cyclone_labels = ['18-48', '48-52', '52-57', '57-70', '70-80', '80-100', '100-106']

# Assuming data_for_eda has 'Cyclone_Frequency' and 'Percent_Bleaching' columns
# Create Cyclone Frequency Bins
data_for_eda['Cyclone_Bin'] = pd.cut(data_for_eda['Cyclone_Frequency'], bins=cyclone_bins, labels=cyclone_labels)

# Calculate the mean Percent_Bleaching for each cyclone frequency bin
bleaching_by_cyclone = data_for_eda.groupby('Cyclone_Bin')['Percent_Bleaching'].mean().reset_index()

# Define the colors based on the Percent Bleaching (above 15% = pink, below 15% = blue)
colors = ['#a56eff' if val > 15 else '#1192e8' for val in bleaching_by_cyclone['Percent_Bleaching']]

# Create a figure for the subplots (1 row, 2 columns)
fig, axs = plt.subplots(1, 2, figsize=(16, 6))

# Plot 1: Bar plot for relationship between cyclone frequency and bleaching severity
axs[0].bar(bleaching_by_cyclone['Cyclone_Bin'], bleaching_by_cyclone['Percent_Bleaching'], color=colors)
axs[0].set_xlabel('Cyclone Frequency')
axs[0].set_ylabel('Average Percent Bleaching')
axs[0].set_title('Relationship Between Cyclone Frequency and Bleaching Severity')
axs[0].tick_params(axis='x', rotation=45)
axs[0].grid(True)

# Plot 2: Scatter plot with regression lines for Cyclone Frequency vs. Bleaching
sns.scatterplot(x='Cyclone_Frequency', y='Percent_Bleaching', data=data_for_eda, color='#1192e8', alpha=0.6, ax=axs[1])
sns.regplot(x='Cyclone_Frequency', y='Percent_Bleaching', data=data_for_eda, scatter=False, color='#da1e28', line_kws={"alpha": 0.7}, ax=axs[1], label='Linear Fit')
sns.regplot(x='Cyclone_Frequency', y='Percent_Bleaching', data=data_for_eda, scatter=False, color='#a56eff', line_kws={"alpha": 0.5}, order=2, ax=axs[1], label='Polynomial Fit (2nd Degree)')

# Labels, title, and grid for the second plot
axs[1].set_xlabel('Cyclone Frequency')
axs[1].set_ylabel('Percent Bleaching')
axs[1].set_title('Cyclone Frequency vs. Percent Bleaching with Regression Lines')
axs[1].grid(True)

# Add legend to the second plot
axs[1].legend()

# Adjust layout
plt.tight_layout()

# Display the plot with subplots
plt.show()

🔎 **Observations:**

- **Relationship Between Cyclone Frequency and Bleaching Severity (Left Plot):**
    - The bar plot shows **Average Percent Bleaching** across different cyclone frequency ranges. Coral reefs with cyclone frequencies between **80-100** experience the highest average bleaching severity, reaching over **20%**.
    - Sites with lower cyclone frequencies (**48-57**) show lower bleaching percentages, around **10%**. Similarly, areas with moderate cyclone frequencies (**52-57** and **70-80**) also show lower bleaching severity.
    - However, sites with either very high (**100-106**) or relatively low cyclone frequencies (**18-48**) show elevated bleaching, suggesting that extreme values of cyclone frequency may influence bleaching outcomes more than moderate cyclone exposure.

- **Cyclone Frequency vs. Percent Bleaching with Regression Lines (Right Plot):**
    - The scatter plot shows **Percent Bleaching** against **Cyclone Frequency**, with both linear and polynomial regression lines overlaid.
    - The **linear regression** (red line) suggests a minimal overall relationship between cyclone frequency and bleaching, indicating that, on average, cyclone frequency does not significantly affect bleaching levels.
    - The **polynomial regression** (purple line) suggests a more complex relationship, where bleaching initially decreases as cyclone frequency increases but then rises again sharply at very high cyclone frequencies (above **80**). This implies that very frequent cyclones might contribute to increased bleaching, possibly due to physical damage and stress compounding with other factors.
    - Most data points are clustered at lower cyclone frequencies, showing moderate bleaching levels.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Cyclone Frequency:</strong> Coral sites with cyclone frequencies between <strong>80-100</strong> experience the highest bleaching severity, suggesting that very frequent cyclones can exacerbate coral stress and increase bleaching.</li>
        <li><strong>Moderate Cyclone Activity:</strong> Reefs with moderate cyclone exposure (<strong>48-80 cyclones</strong>) show lower bleaching percentages, potentially indicating that cyclones in this range may help mitigate thermal stress without causing significant physical damage.</li>
        <li><strong>Complex Relationship:</strong> The polynomial regression indicates that bleaching may decrease with moderate cyclone activity but increases again with very high cyclone frequencies, highlighting the dual impact of cyclones: cooling surface waters but also physically damaging coral structures.</li>
    </ul>
</div>

This analysis suggests that moderate cyclone frequencies might have a protective effect by reducing thermal stress, but very high cyclone frequencies can lead to increased bleaching due to compounding physical and environmental damage.

##### <b>3.9.2.5 <span style='color:#6495ED'>|</span> Windspeed</b> 

In [None]:
# Define bins for Windspeed (based on the provided statistics)
windspeed_bins = [0, 3, 5, 6, 10, 12, 15]
windspeed_labels = ['0-3', '3-5', '5-6', '6-10', '10-12', '12-15']

# Assuming data_for_eda has 'Windspeed' and 'Percent_Bleaching' columns
# Create Windspeed Bins
data_for_eda['Windspeed_Bin'] = pd.cut(data_for_eda['Windspeed'], bins=windspeed_bins, labels=windspeed_labels)

# Calculate the mean Percent_Bleaching for each windspeed bin
bleaching_by_windspeed = data_for_eda.groupby('Windspeed_Bin')['Percent_Bleaching'].mean().reset_index()

# Define the colors based on the Percent Bleaching (above 15% = pink, below 15% = blue)
colors = ['#a56eff' if val > 15 else '#1192e8' for val in bleaching_by_windspeed['Percent_Bleaching']]

# Create a figure for the subplots (1 row, 2 columns)
fig, axs = plt.subplots(1, 2, figsize=(16, 6))

# Plot 1: Bar plot for relationship between windspeed and bleaching severity
axs[0].bar(bleaching_by_windspeed['Windspeed_Bin'], bleaching_by_windspeed['Percent_Bleaching'], color=colors)
axs[0].set_xlabel('Windspeed (m/s)')
axs[0].set_ylabel('Average Percent Bleaching')
axs[0].set_title('Relationship Between Windspeed and Bleaching Severity')
axs[0].tick_params(axis='x', rotation=45)
axs[0].grid(True)

# Plot 2: Scatter plot with regression lines for Windspeed vs. Bleaching
sns.scatterplot(x='Windspeed', y='Percent_Bleaching', data=data_for_eda, color='#1192e8', alpha=0.6, ax=axs[1])
sns.regplot(x='Windspeed', y='Percent_Bleaching', data=data_for_eda, scatter=False, color='#da1e28', line_kws={"alpha": 0.7}, ax=axs[1], label='Linear Fit')
sns.regplot(x='Windspeed', y='Percent_Bleaching', data=data_for_eda, scatter=False, color='#a56eff', line_kws={"alpha": 0.5}, order=2, ax=axs[1], label='Polynomial Fit (2nd Degree)')

# Labels, title, and grid for the second plot
axs[1].set_xlabel('Windspeed (m/s)')
axs[1].set_ylabel('Percent Bleaching')
axs[1].set_title('Windspeed vs. Percent Bleaching with Regression Lines')
axs[1].grid(True)

# Add legend to the second plot
axs[1].legend()

# Adjust layout
plt.tight_layout()

# Display the plot with subplots
plt.show()

🔎 **Observations:**

- **Relationship Between Windspeed and Bleaching Severity (Left Plot):**
    - The bar plot shows **Average Percent Bleaching** across different windspeed ranges. Reefs exposed to windspeed between **5-6 m/s** and **10-12 m/s** show the highest bleaching severity, with averages above **15%**.
    - Reefs exposed to moderate windspeed (**3-5 m/s** and **6-10 m/s**) show bleaching levels between **12-13%**.
    - Lower windspeed ranges (**0-3 m/s**) and very high windspeed (**12-15 m/s**) show relatively lower bleaching percentages, with averages closer to **10%** and below **10%**, respectively.

- **Windspeed vs. Percent Bleaching with Regression Lines (Right Plot):**
    - The scatter plot shows **Percent Bleaching** against **Windspeed**, with both linear and polynomial regression lines overlaid.
    - The **linear regression** (red line) suggests a slight positive correlation, indicating that as windspeed increases, bleaching may slightly increase, though the effect is minimal.
    - The **polynomial regression** (purple line) shows a more complex relationship, suggesting that bleaching increases initially with windspeed but decreases at higher windspeed levels (above **10 m/s**). The shaded region indicates increasing variability in bleaching outcomes at higher windspeeds.
    - Most data points cluster at lower windspeed values, where bleaching levels appear to fluctuate without a clear trend.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Windspeed:</strong> Moderate windspeed ranges (<strong>5-6 m/s</strong> and <strong>10-12 m/s</strong>) are associated with the highest bleaching severity, suggesting that moderate winds might increase stress on reefs, potentially through wave action or mixing of warmer water layers.</li>
        <li><strong>Low and High Windspeed:</strong> Reefs exposed to both very low windspeeds (<strong>0-3 m/s</strong>) and very high windspeeds (<strong>12-15 m/s</strong>) show lower bleaching percentages, suggesting that these conditions might reduce bleaching severity or that reefs in these areas experience different environmental conditions.</li>
        <li><strong>Complex Relationship:</strong> The polynomial regression suggests a nonlinear relationship between windspeed and bleaching, where bleaching may decrease at very high windspeed values, possibly due to the cooling effect of strong winds or other mitigating factors.</li>
    </ul>
</div>

This analysis suggests that moderate winds may exacerbate coral bleaching, while both very low and very high windspeeds appear to have a protective effect on coral reefs. The relationship between windspeed and bleaching is complex, with multiple factors likely contributing to the observed trends.

##### <b>3.9.2.6 <span style='color:#6495ED'>|</span> Exposure</b> 

In [None]:
# Calculate the mean Percent_Bleaching for each exposure category
bleaching_by_exposure = data_for_eda.groupby('Exposure')['Percent_Bleaching'].mean().reset_index()

# Define colors based on categories (you can choose any colors here)
exposure_colors = ['#1192e8', '#a56eff', '#ff7eb6']  # Different colors for the 3 categories

# Plotting the relationship between Exposure and Percent Bleaching
plt.figure(figsize=(8, 6))
plt.bar(bleaching_by_exposure['Exposure'], bleaching_by_exposure['Percent_Bleaching'], color=exposure_colors)
plt.xlabel('Exposure')
plt.ylabel('Average Percent Bleaching')
plt.title('Relationship Between Exposure and Bleaching Severity')
plt.grid(True)

# Show the plot
plt.tight_layout()
plt.show()

🔎 **Observations:**

- **Relationship Between Exposure and Bleaching Severity:**
    - The bar plot displays the **Average Percent Bleaching** for coral reefs across three exposure categories: **Exposed**, **Sheltered**, and **Sometimes**.
    - Coral sites categorized as **Sometimes** exposed experience the highest average bleaching severity, reaching nearly **20%**.
    - **Exposed** reefs follow with an average bleaching severity of about **15%**.
    - Coral reefs classified as **Sheltered** show the lowest bleaching severity, with an average percent bleaching of around **10%**.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Sometimes Exposure:</strong> Reefs that are occasionally exposed to environmental conditions exhibit the highest bleaching severity, suggesting that fluctuating exposure may cause significant stress to corals.</li>
        <li><strong>Exposed Reefs:</strong> Coral sites that are consistently exposed to strong wave action and environmental factors also experience high bleaching severity, though slightly less than "Sometimes" exposed reefs.</li>
        <li><strong>Sheltered Reefs:</strong> Coral sites in sheltered areas have the lowest bleaching severity, indicating that protection from strong wave action and currents may reduce environmental stress and bleaching.</li>
    </ul>
</div>

This analysis suggests that exposure to environmental stressors plays a critical role in coral bleaching severity. While sheltered reefs are less affected, reefs with inconsistent or fluctuating exposure ("Sometimes") are at the highest risk of bleaching.

#### <b>3.9.3 <span style='color:#6495ED'>|</span> Temporal Trends</b> 

<div style="border-radius:10px;padding: 15px;background-color:#6495ED;color:white;font-size:100%;text-align:left">
📝 <strong>Key Takeaways:</strong>
<ul>
    <li><strong>Cyclone Frequency:</strong> Cyclone frequency peaked around <strong>2000</strong> and has steadily declined since, indicating fewer extreme weather events impacting reefs in recent years.</li>
    <li><strong>Windspeed Stability:</strong> Windspeed has remained relatively stable, with minor fluctuations, but a slight increase toward <strong>2020</strong> could reflect evolving climate conditions.</li>
    <li><strong>Pattern Link:</strong> The decline in cyclone frequency after <strong>2005</strong> may have contributed to the stability in windspeed, though recent wind increases could indicate shifts in atmospheric dynamics affecting reef ecosystems.</li>
</ul>
</div>

In [None]:
# Extracting year from the Date column for the time series analysis
data_for_eda['Year'] = data_for_eda['Date'].dt.year

# Grouping the data by year and calculating the mean of Cyclone Frequency and Windspeed
yearly_data = data_for_eda.groupby('Year').agg({'Cyclone_Frequency': 'mean', 'Windspeed': 'mean'}).reset_index()

# Define a custom color palette
colors = ['#1192e8', '#a56eff']  # Blue and Pink

# Plotting the time series
plt.figure(figsize=(14, 8))

# Cyclone Frequency over time (Top subplot)
plt.subplot(2, 1, 1)
plt.plot(yearly_data['Year'], yearly_data['Cyclone_Frequency'], marker='o', color=colors[0], linestyle='-', alpha=0.8, label='Cyclone Frequency')
plt.title('Cyclone Frequency Over Time', fontsize=14)
plt.xlabel('Year', fontsize=12)
plt.ylabel('Average Cyclone Frequency', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.7)
plt.xticks(rotation=45)  # Rotate x-ticks for better readability
plt.legend(loc='upper left')

# Windspeed over time (Bottom subplot)
plt.subplot(2, 1, 2)
plt.plot(yearly_data['Year'], yearly_data['Windspeed'], marker='o', color=colors[1], linestyle='-', alpha=0.8, label='Windspeed')
plt.title('Windspeed Over Time', fontsize=14)
plt.xlabel('Year', fontsize=12)
plt.ylabel('Average Windspeed (m/s)', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.7)
plt.xticks(rotation=45)  # Rotate x-ticks for better readability
plt.legend(loc='upper left')

plt.tight_layout()
plt.show()

🔎 **Observations:**

- **Cyclone Frequency Over Time (Top Plot):**
    - Cyclone frequency fluctuates between **1980 and 2020**, with peaks in **1985** and **2000-2005**, reaching an average of **58 cyclones**.
    - A steady decline in cyclone frequency occurs post-2005, dropping to its lowest point around **2020**.

- **Windspeed Over Time (Bottom Plot):**
    - Windspeed shows peaks around **1980, 1990**, and **2000**, fluctuating between **4-6 m/s**.
    - Windspeed remains more stable after **2000**, with a slight increase observed towards **2020**.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
📝 <strong>Key Insights:</strong>
<ul>
    <li><strong>Cyclone Frequency:</strong> Cyclone frequency peaks around <strong>2000</strong> and declines steadily, suggesting a reduction in extreme weather events affecting reefs.</li>
    <li><strong>Windspeed Stability:</strong> Windspeed remains stable, with slight fluctuations, but recent increases toward <strong>2020</strong> might indicate changing climate dynamics.</li>
    <li><strong>Link Between Patterns:</strong> Reduced cyclone frequency likely contributes to the more stable windspeed trends observed after <strong>2000</strong>. The divergence in trends post-2005 could signal shifts in atmospheric conditions.</li>
</ul>
</div>

These patterns suggest that fewer cyclones may lead to more stable wind conditions, but increasing windspeed post-2020 could point to broader climatic changes affecting coral reef ecosystems.


### <b>3.10 <span style='color:#6495ED'>|</span> Geographical Distribution of Coral Bleaching and Environmental Variables</b> 

In [None]:
# Define a function to plot data on a world map using Cartopy
def plot_cartopy_geographical(data, variable, title, cmap='viridis'):
    plt.figure(figsize=(18, 10))
    ax = plt.axes(projection=ccrs.PlateCarree())
    ax.coastlines(resolution='110m')
    ax.add_feature(cfeature.BORDERS, linestyle=':')
    scatter = ax.scatter(data['Longitude_Degrees'], data['Latitude_Degrees'], c=data[variable], 
                         cmap=cmap, s=10, alpha=0.7, transform=ccrs.PlateCarree())
    plt.colorbar(scatter, ax=ax, orientation='horizontal', pad=0.05, label=variable)
    plt.title(title)
    plt.show()

#### <b>3.10.1 <span style='color:#6495ED'>|</span> Bleaching Data</b> 

<div style="border-radius:10px;padding: 15px;background-color:#6495ED;color:white;font-size:100%;text-align:left">
📝 <strong>Key Takeaways:</strong>
<ul>
    <li><strong>Global Bleaching Hotspots:</strong> Significant bleaching is concentrated in the <strong>Caribbean</strong>, <strong>East Africa</strong>, and <strong>Indo-Pacific regions</strong> (particularly <strong>Southeast Asia</strong> and parts of the <strong>Great Barrier Reef</strong>), indicating severe coral stress in these areas.</li>
    <li><strong>Regional Variability:</strong> Despite high bleaching rates in certain regions, nearby areas exhibit lower bleaching levels, suggesting that localized factors, such as water quality or protection measures, may influence coral resilience.</li>
    <li><strong>Tropical Concentration:</strong> Coral bleaching predominantly affects tropical and subtropical waters, aligning with the increased vulnerability of coral reefs to rising sea temperatures in these regions.</li>
    <li><strong>Population-Level Bleaching Concentration:</strong> Population-level bleaching is concentrated in the <strong>Caribbean</strong>, <strong>Southeast Asia</strong>, and <strong>Eastern Africa</strong>, indicating widespread coral stress across entire reef systems.</li>
    <li><strong>Colony-Level Bleaching Dispersion:</strong> Colony-level bleaching is more dispersed, affecting individual coral colonies in the <strong>Pacific</strong> and <strong>Indian Oceans</strong>, indicating localized stress at smaller scales.</li>
    <li><strong>Overlap of Bleaching Levels:</strong> The overlap of colony and population-level bleaching in regions like the <strong>Caribbean</strong> and <strong>Southeast Asia</strong> highlights the severity and extent of bleaching in these areas, indicating high vulnerability across scales.</li>
</ul>
</div>

In [None]:
plot_cartopy_geographical(data_for_eda, 'Percent_Bleaching', 'Geographical Distribution of Percent Bleaching', cmap='YlOrRd')

🔎 **Observations:**

- The map visualizes the **percent bleaching** across various coral reef locations globally. The color gradient from **yellow** to **red** represents increasing levels of coral bleaching severity, with **yellow** indicating lower bleaching percentages (near 0%) and **dark red** indicating higher bleaching percentages (up to 100%).
- High bleaching percentages (dark red regions) are concentrated in several areas, notably:
    - The **Caribbean** and **Central American coastlines**, where bleaching levels are significant.
    - Parts of the **Indian Ocean**, including the **East African coast** and some **island nations**.
    - The **Indo-Pacific region**, with hotspots in **Southeast Asia** and parts of the **Great Barrier Reef** in Australia.
- Lower bleaching levels (light yellow) are more dispersed and are found in areas such as the **central Pacific islands** and certain parts of the **Atlantic coast of Africa**.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
📝 <strong>Key Insights:</strong>
<ul>
    <li><strong>Global Hotspots:</strong> High bleaching rates are clustered in key regions, particularly the <strong>Caribbean</strong>, <strong>East Africa</strong>, and parts of the <strong>Indo-Pacific</strong>, highlighting areas where coral reefs are experiencing significant stress.</li>
    <li><strong>Regional Variability:</strong> Even within certain regions, there is variability in bleaching severity, with some nearby areas showing lower bleaching levels, suggesting that localized factors (e.g., water quality, protection measures) may influence outcomes.</li>
    <li><strong>Concentration in Tropical Waters:</strong> The distribution of bleaching is predominantly in tropical and subtropical waters, aligning with the vulnerability of coral reefs to warming sea temperatures in these regions.</li>
</ul>
</div>

This map illustrates how coral bleaching is unevenly distributed across the globe, with certain regions experiencing more severe bleaching events. These patterns may be linked to local environmental conditions, human activities, or broader climate-related factors such as rising sea temperatures.

In [None]:
# Extract necessary columns
latitudes = data_for_eda['Latitude_Degrees']
longitudes = data_for_eda['Longitude_Degrees']
bleaching_level = data_for_eda['Bleaching_Level']

# Define colors or markers for each bleaching level
colors = {'Colony': 'blue', 'Population': 'red'}
markers = {'Colony': 'o', 'Population': 's'}

# Create a Cartopy plot
plt.figure(figsize=(20, 10))
ax = plt.axes(projection=ccrs.PlateCarree())

# Add map features
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')

# Plot each bleaching level with different colors and markers
for level in colors.keys():
    subset = data_for_eda[data_for_eda['Bleaching_Level'] == level]
    ax.scatter(subset['Longitude_Degrees'], subset['Latitude_Degrees'],
               color=colors[level], label=level, s=20, alpha=0.7, marker=markers[level], 
               transform=ccrs.PlateCarree())

# Add a legend
plt.legend(title='Bleaching Level', loc='upper right')

# Set a title
plt.title('Geographical Distribution of Bleaching Level')

# Show the plot
plt.show()

🔎 **Observations:**

- The map displays the **geographical distribution** of coral bleaching levels differentiated between two categories: **Colony-level bleaching** (blue dots) and **Population-level bleaching** (red squares).
- **Population-level bleaching** (red) is highly concentrated along the **Caribbean**, **Eastern Africa**, and the **Indo-Pacific regions** (notably around **Southeast Asia** and **Northern Australia**).
- **Colony-level bleaching** (blue) appears more dispersed, with notable occurrences in the **central Pacific**, parts of the **Indian Ocean**, and several **Southeast Asian** coastal regions.
- The **Caribbean** region, particularly around Central America, experiences a mix of colony and population-level bleaching, indicating widespread coral stress across various scales.
- The **Indo-Pacific** region, particularly near **Southeast Asia** and parts of **Australia**, shows significant population-level bleaching, indicating broader, more severe bleaching events in these areas.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
📝 <strong>Key Insights:</strong>
<ul>
    <li><strong>Population-Level Bleaching Hotspots:</strong> Red areas, indicating population-level bleaching, are concentrated in the <strong>Caribbean</strong>, <strong>Southeast Asia</strong>, and parts of <strong>Eastern Africa</strong>, suggesting significant, widespread coral stress in these regions.</li>
    <li><strong>Colony-Level Bleaching:</strong> Blue areas are more dispersed but show bleaching at smaller scales, affecting individual coral colonies in areas like the <strong>Pacific</strong> and <strong>Indian Oceans</strong>.</li>
    <li><strong>Regional Stress Patterns:</strong> The overlap of colony and population-level bleaching in regions like the <strong>Caribbean</strong> and <strong>Southeast Asia</strong> highlights the severity and extent of coral bleaching in these highly impacted areas.</li>
</ul>
</div>

This map illustrates that coral bleaching affects both individual colonies and entire populations, with more severe and widespread bleaching events concentrated in tropical regions such as the **Caribbean**, **Southeast Asia**, and **Eastern Africa**. Understanding the scale of bleaching events can help inform conservation and recovery efforts for coral ecosystems.

#### <b>3.10.2 <span style='color:#6495ED'>|</span> Thermal Stress Data</b> 

<div style="border-radius:10px;padding: 15px;background-color:#6495ED;color:white;font-size:100%;text-align:left">
📝 <strong>Key Takeaways:</strong>
<ul>
    <li><strong>High SST and SST Maximum Hotspots:</strong> The <strong>Red Sea</strong>, <strong>Indian Ocean</strong>, and <strong>Southeast Asia</strong> exhibit high SST and SST Maximum values, indicating regions at high risk for prolonged coral bleaching due to consistently warm waters.</li>
    <li><strong>Thermal Stress Regions:</strong> The <strong>SST Maximum</strong> map shows that parts of the <strong>Indian Ocean</strong> and <strong>East Africa</strong> experience extreme temperature peaks, likely leading to more frequent bleaching events.</li>
    <li><strong>SSTA Hotspots:</strong> The <strong>Caribbean</strong>, <strong>Red Sea</strong>, and <strong>Indian Ocean</strong> show elevated SSTA values, making these regions more susceptible to higher-than-average sea temperatures and potential coral bleaching.</li>
    <li><strong>Frequent Temperature Anomalies:</strong> The <strong>Caribbean</strong> and <strong>Southeast Asia</strong> experience frequent temperature anomalies, increasing the likelihood of coral bleaching events over time.</li>
    <li><strong>Prolonged Thermal Stress:</strong> High <strong>DHW</strong> values in the <strong>Caribbean</strong> indicate prolonged heat stress, a major factor driving coral bleaching severity.</li>
    <li><strong>TSA Maximum Extremes:</strong> The highest TSA maximum values are found in the <strong>Red Sea</strong> and <strong>Indian Ocean</strong>, signaling regions where temperature deviations are most extreme, intensifying bleaching risks.</li>
    <li><strong>Frequent TSA Events:</strong> Regions like the <strong>Caribbean</strong> and <strong>Southeast Asia</strong> experience frequent TSA events, making these areas vulnerable to recurring coral bleaching.</li>
    <li><strong>Prolonged Heat Stress from TSA DHW:</strong> Prolonged heat exposure in the <strong>Caribbean</strong> and <strong>Southeast Asia</strong> is a critical driver of coral bleaching, as evidenced by high <strong>TSA DHW</strong> levels in these regions.</li>
</ul>
</div>

##### <b>3.10.2.1 <span style='color:#6495ED'>|</span> Sea Surface Temperature (SST)</b> 

In [None]:
plot_cartopy_geographical(data_for_eda, 'SST', 'Geographical Distribution of SST', cmap='coolwarm')

plot_cartopy_geographical(data_for_eda, 'SST_Maximum', 'Geographical Distribution of SST Maximum', cmap='coolwarm')

🔎 **Observations:**

1. **Geographical Distribution of SST (Sea Surface Temperature):**
    - The first map shows **SST** values globally, with a gradient from **blue** (lower temperatures) to **red** (higher temperatures).
    - Regions with the highest SSTs (approaching **310K**, or approximately **37°C**) are concentrated in the **Red Sea**, parts of the **Indian Ocean**, and **Southeast Asia**.
    - Cooler areas (around **290K**, or **17°C**) are seen across parts of the **central Pacific** and **Caribbean**.

2. **Geographical Distribution of SST Maximum:**
    - The second map shows **SST Maximum**, representing the highest recorded sea surface temperatures.
    - Areas with extreme SST Maximum values (above **310K**) are most prominent in the **Red Sea** and **Indian Ocean**, indicating regions where corals are likely subjected to severe thermal stress.
    - Lower SST Maximum values (between **302K** and **305K**) are scattered across the **central Pacific**, **Caribbean**, and parts of **East Africa**.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
📝 <strong>Key Insights:</strong>
<ul>
    <li><strong>High SST and SST Maximum Hotspots:</strong> Regions like the <strong>Red Sea</strong> and <strong>Southeast Asia</strong> exhibit both high average SST and SST Maximum, suggesting these areas are at high risk for prolonged coral bleaching due to consistently warm waters.</li>
    <li><strong>Thermal Stress Regions:</strong> The <strong>SST Maximum</strong> map indicates that parts of the <strong>Indian Ocean</strong> and <strong>East Africa</strong> experience extreme sea surface temperature peaks, likely increasing the frequency of coral bleaching events in these regions.</li>
    <li><strong>Cooler Regions:</strong> Areas with lower SST and SST Maximum, such as the <strong>central Pacific</strong>, may have more resilient coral reefs due to less exposure to high temperature extremes, though localized stress may still occur.</li>
</ul>
</div>

##### <b>3.10.2.2 <span style='color:#6495ED'>|</span> Sea Surface Temperature Anomalies (SSTA)</b> 

In [None]:
plot_cartopy_geographical(data_for_eda, 'SSTA', 'Geographical Distribution of SSTA', cmap='coolwarm')

plot_cartopy_geographical(data_for_eda, 'SSTA_Maximum', 'Geographical Distribution of SSTA Maximum', cmap='coolwarm')

plot_cartopy_geographical(data_for_eda, 'SSTA_Frequency', 'Geographical Distribution of SSTA Frequency', cmap='coolwarm')

plot_cartopy_geographical(data_for_eda, 'SSTA_DHW', 'Geographical Distribution of SSTA DHW', cmap='coolwarm')

🔎 **Observations:**

1. **Geographical Distribution of SSTA (Sea Surface Temperature Anomalies):**
    - The map shows **SSTA** values ranging from **-4K** to **+4K**, indicating areas where sea surface temperatures deviate from the average.
    - Positive anomalies (in **red**) are visible near the **Red Sea**, **Caribbean**, and parts of the **Indian Ocean**, suggesting areas experiencing elevated sea surface temperatures.
    - Negative anomalies (in **blue**) are seen in regions such as the **Pacific** and some **Indian Ocean** areas, reflecting below-average sea temperatures.

2. **Geographical Distribution of SSTA Maximum:**
    - This map represents the maximum recorded sea surface temperature anomalies.
    - Higher maximum anomalies (approaching **18K**) are concentrated in parts of the **Red Sea**, **Indian Ocean**, and the **Caribbean**.
    - The lower end of the spectrum (around **2K-5K**) is scattered across the **Pacific** and some regions in **Southeast Asia** and **Africa**.

3. **Geographical Distribution of SSTA Frequency:**
    - The SSTA Frequency map displays the frequency of sea surface temperature anomalies across different regions.
    - The **Caribbean** and **Southeast Asia** exhibit the highest frequency of temperature anomalies, with **frequencies reaching 50** or more.
    - Other regions, like parts of the **Pacific** and **Indian Ocean**, show lower frequencies, typically under **10**.

4. **Geographical Distribution of SSTA DHW (Degree Heating Weeks):**
    - The DHW map visualizes areas where prolonged temperature stress has occurred, with **DHW** levels ranging from **0 to 50+**.
    - High DHW regions are concentrated near the **Caribbean**, indicating prolonged periods of thermal stress.
    - Lower DHW values are more dispersed across the **Pacific**, with certain areas showing little to no prolonged thermal stress.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
📝 <strong>Key Insights:</strong>
<ul>
    <li><strong>SSTA Hotspots:</strong> The <strong>Caribbean</strong>, <strong>Red Sea</strong>, and <strong>Indian Ocean</strong> show the highest positive SSTA values, indicating regions where coral reefs are likely subjected to higher-than-average sea temperatures, potentially increasing coral bleaching risks.</li>
    <li><strong>High SSTA Maximum Values:</strong> Areas with extreme maximum anomalies (over <strong>15K</strong>) are concentrated in the <strong>Caribbean</strong> and <strong>Southeast Asia</strong>, suggesting more intense thermal stress in these regions.</li>
    <li><strong>Frequent Temperature Anomalies:</strong> The <strong>Caribbean</strong> and <strong>Southeast Asia</strong> experience the most frequent sea surface temperature anomalies, increasing the likelihood of coral bleaching events over time.</li>
    <li><strong>Prolonged Thermal Stress:</strong> The <strong>Caribbean</strong> shows some of the highest DHW levels, indicating prolonged heat stress, which is a significant driver of coral bleaching severity.</li>
</ul>
</div>

These maps highlight the regions most affected by sea surface temperature anomalies, prolonged heating events, and high-frequency anomalies. Areas such as the **Caribbean**, **Red Sea**, and parts of **Southeast Asia** are particularly vulnerable to thermal stress, putting coral reefs at high risk of bleaching.

##### <b>3.10.2.3 <span style='color:#6495ED'>|</span> Thermal Stress Anomalies (TSA)</b> 

In [None]:
plot_cartopy_geographical(data_for_eda, 'TSA', 'Geographical Distribution of TSA', cmap='coolwarm')

plot_cartopy_geographical(data_for_eda, 'TSA_Maximum', 'Geographical Distribution of TSA Maximum', cmap='coolwarm')

plot_cartopy_geographical(data_for_eda, 'TSA_Frequency', 'Geographical Distribution of TSA Frequency', cmap='coolwarm')

plot_cartopy_geographical(data_for_eda, 'TSA_DHW', 'Geographical Distribution of TSA DHW', cmap='coolwarm')

🔎 **Observations:**

1. **Geographical Distribution of TSA (Temperature Stress Anomalies):**
    - The map shows **TSA** values ranging from **-10K** to **+4K**, highlighting areas experiencing temperature deviations from the norm.
    - Positive anomalies (in **red**) are concentrated in the **Caribbean**, **Red Sea**, and **parts of Southeast Asia**, indicating regions under significant temperature stress.
    - Negative anomalies (in **blue**) are present in the **Pacific** and **Indian Oceans**, where temperatures are below the expected levels.

2. **Geographical Distribution of TSA Maximum:**
    - This map shows the maximum recorded **TSA** values, with higher temperature deviations (approaching **12K**) occurring in the **Red Sea**, **Indian Ocean**, and **Caribbean**.
    - Lower **TSA Maximum** values (between **2K** and **5K**) are more widespread across the **Pacific**, **Southeast Asia**, and parts of **East Africa**.

3. **Geographical Distribution of TSA Frequency:**
    - The **TSA Frequency** map shows how frequently temperature anomalies occur.
    - The **Caribbean** and **Southeast Asia** experience the highest frequency of temperature anomalies, with **frequencies over 20**. These areas are likely to experience persistent temperature stress.
    - Regions in the **Pacific** and parts of the **Indian Ocean** show fewer anomalies, typically under **10**.

4. **Geographical Distribution of TSA DHW (Degree Heating Weeks):**
    - The **TSA DHW** map shows areas with prolonged exposure to heat stress.
    - Higher **DHW** values (above **30**) are concentrated in the **Caribbean**, **Red Sea**, and **parts of Southeast Asia**, suggesting that coral reefs in these regions experience sustained thermal stress over time.
    - Areas with lower **DHW** (below **10**) are scattered across the **Pacific**, indicating less prolonged thermal stress.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
📝 <strong>Key Insights:</strong>
<ul>
    <li><strong>Positive TSA Hotspots:</strong> The <strong>Caribbean</strong>, <strong>Red Sea</strong>, and <strong>Southeast Asia</strong> exhibit high positive TSA values, indicating significant temperature stress and potential bleaching risks in these areas.</li>
    <li><strong>TSA Maximum Extremes:</strong> The highest TSA maximum values are found in the <strong>Red Sea</strong> and <strong>Indian Ocean</strong>, where temperature deviations are most extreme.</li>
    <li><strong>Frequent TSA Events:</strong> Areas like the <strong>Caribbean</strong> and <strong>Southeast Asia</strong> experience frequent temperature anomalies, making them more vulnerable to recurring coral bleaching events.</li>
    <li><strong>Prolonged Heat Stress:</strong> The <strong>Caribbean</strong> shows high <strong>TSA DHW</strong> values, indicating prolonged periods of elevated temperatures, which is a significant driver of coral bleaching severity.</li>
</ul>
</div>

#### <b>3.10.3 <span style='color:#6495ED'>|</span> Other Environmental Factors</b> 

<div style="border-radius:10px;padding: 15px;background-color:#6495ED;color:white;font-size:100%;text-align:left">
📝 <strong>Key Takeaways:</strong>
<ul>
    <li><strong>Turbidity Hotspots:</strong> The <strong>Caribbean</strong> and <strong>Southeast Asia</strong> show higher turbidity levels, which can limit sunlight penetration and increase stress on coral ecosystems.</li>
    <li><strong>Cyclone Prone Areas:</strong> The <strong>Caribbean</strong> and parts of the <strong>Indian Ocean</strong> experience high cyclone frequencies, posing significant risks of physical damage to coral reefs.</li>
    <li><strong>High Windspeed Zones:</strong> <strong>Southeast Asia</strong> experiences elevated wind speeds, which may provide cooling but also increase the potential for physical damage to corals during storms.</li>
    <li><strong>Proximity to Shore:</strong> Most coral reefs are located near shorelines in the <strong>Caribbean</strong> and <strong>Southeast Asia</strong>, making them more vulnerable to human impacts such as pollution and coastal runoff.</li>
    <li><strong>Shallow Coral Sites:</strong> Shallow reefs, more common in the <strong>Caribbean</strong> and <strong>Southeast Asia</strong>, are more susceptible to thermal stress, human interference, and other environmental stressors.</li>
</ul>
</div>

In [None]:
plot_cartopy_geographical(data_for_eda, 'Turbidity', 'Geographical Distribution of Turbidity', cmap='viridis')

plot_cartopy_geographical(data_for_eda, 'Windspeed', 'Geographical Distribution of Windspeed', cmap='viridis')

plot_cartopy_geographical(data_for_eda, 'Cyclone_Frequency', 'Geographical Distribution of Cyclone Frequency', cmap='coolwarm')

plot_cartopy_geographical(data_for_eda, 'Distance_to_Shore', 'Geographical Distribution of Distance to Shore', cmap='viridis')

plot_cartopy_geographical(data_for_eda, 'Depth_m', 'Geographical Distribution of Depth (m)', cmap='viridis')

🔎 **Observations:**

1. **Geographical Distribution of Turbidity:**
    - Turbidity values range from **0.0 to 1.2**, with higher turbidity areas (closer to **1.2**) found in the **Caribbean**, **Southeast Asia**, and **Red Sea**.
    - Low turbidity (near **0.0**) is common across **Pacific** coral sites, indicating clearer water conditions.

2. **Geographical Distribution of Cyclone Frequency:**
    - Cyclone frequency varies widely across the map, with areas like the **Caribbean**, parts of **East Africa**, and the **Indian Ocean** experiencing over **80 cyclones**.
    - The **Pacific** and **Southeast Asia** display relatively lower cyclone frequency, mostly below **50** cyclones.

3. **Geographical Distribution of Windspeed:**
    - Windspeed values range from **0 to 14 m/s**, with higher windspeed regions located in **Southeast Asia** and parts of the **Caribbean**.
    - The **Pacific** shows moderate windspeeds, often between **3 to 7 m/s**.

4. **Geographical Distribution of Distance to Shore:**
    - Distance to shore ranges from **0 to 250,000 meters**, with most coral sites situated within **50,000 meters** (~50 km) of the shore in the **Caribbean**, **Southeast Asia**, and **Red Sea**.
    - Some outliers in the **Pacific** and **Indian Ocean** regions are much farther from the shore, indicating more remote coral reefs.

5. **Geographical Distribution of Depth:**
    - Depths of coral sites range from **0 to 50 meters**, with most sites concentrated in the **0 to 20 meters** range.
    - The **Caribbean**, **Southeast Asia**, and **Red Sea** have a greater number of shallow coral sites, while deeper reefs (up to **50 meters**) are scattered in **Southeast Asia** and the **Pacific**.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
📝 <strong>Key Insights:</strong>
<ul>
    <li><strong>Turbidity Hotspots:</strong> High turbidity levels are observed in the <strong>Caribbean</strong> and <strong>Southeast Asia</strong>, which could limit sunlight penetration and stress coral ecosystems.</li>
    <li><strong>Cyclone Prone Areas:</strong> The <strong>Caribbean</strong> and parts of the <strong>Indian Ocean</strong> experience the highest cyclone frequencies, posing risks of physical damage to coral reefs.</li>
    <li><strong>High Windspeed Zones:</strong> Regions such as <strong>Southeast Asia</strong> experience elevated wind speeds, which may provide cooling effects but also increase the potential for physical damage to coral structures.</li>
    <li><strong>Proximity to Shore:</strong> Most coral sites are close to shorelines, particularly in the <strong>Caribbean</strong> and <strong>Southeast Asia</strong>, exposing them to human activities such as pollution and runoff.</li>
    <li><strong>Shallow Coral Sites:</strong> Shallow reefs are more common in regions like the <strong>Caribbean</strong> and <strong>Southeast Asia</strong>, making them more vulnerable to thermal stress and human interference.</li>
</ul>
</div>

These maps highlight the environmental conditions across different regions, showing that factors like turbidity, cyclone frequency, windspeed, and proximity to shore significantly influence coral reef vulnerability and resilience. Regions such as the **Caribbean**, **Southeast Asia**, and **Red Sea** appear to be particularly vulnerable due to their exposure to both natural and anthropogenic stressors.


### <b>3.11 <span style='color:#6495ED'>|</span> Relationships Between Environmental Factors and Bleaching</b> 

#### <b>3.11.1 <span style='color:#6495ED'>|</span>  Correlation Analysis</b> 


<div style="border-radius:10px;padding: 15px;background-color:#6495ED;color:white;font-size:100%;text-align:left">
📝 <strong>Key Takeaways:</strong>
<ul>
    <li><strong>SST and TSA:</strong> Both SST and TSA show weak positive correlations with bleaching, indicating their relevance but not primary roles in predicting bleaching severity.</li>
    <li><strong>SSTA Frequency and DHW:</strong> Persistent thermal stress, as indicated by <strong>SSTA Frequency</strong> and <strong>DHW</strong>, shows stronger correlations with bleaching, highlighting the significance of repeated and prolonged temperature anomalies.</li>
    <li><strong>TSA and SSTA Maximums:</strong> While these metrics are highly correlated with each other, they exhibit weak relationships with bleaching, suggesting that short-term extreme temperatures are not as predictive of bleaching severity as persistent stress.</li>
    <li><strong>SSTA Influence:</strong> The moderate correlation between <strong>SSTA</strong> and bleaching underscores the role of temperature anomalies in driving coral stress, with frequency and duration metrics playing a key role.</li>
</ul>
</div>
<br>
<div style="border-radius:10px;padding: 15px;background-color:#6495ED;color:white;font-size:100%;text-align:left">
📝 <strong>Additional Insights on Environmental Factors:</strong>
<ul>
    <li><strong>Environmental Factors:</strong> Variables like distance to shore, turbidity, cyclone frequency, depth, and windspeed all show very weak correlations with bleaching severity, indicating they may not be primary drivers of coral bleaching.</li>
    <li><strong>Depth:</strong> Among these factors, depth shows the strongest (though still weak) positive correlation with bleaching, suggesting that deeper reefs may face slightly higher bleaching risks.</li>
    <li><strong>Negligible Influence of Exposure:</strong> Coral exposure to waves and currents has no significant impact on bleaching severity based on the observed data.</li>
</ul>
</div>

In [None]:
# Calculate the correlation coefficients between SST, SSTA, and Percent Bleaching
correlation_matrix = data_for_eda[['SST', 'SST_Maximum', 'SSTA', 'SSTA_Maximum', 'SSTA_Frequency', 'SSTA_DHW', 'TSA', 'TSA_Maximum', 'TSA_Frequency', 'TSA_DHW', 'Percent_Bleaching']].corr()

# Display the correlation matrix
correlation_matrix

# Plotting the correlation matrix
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", vmin=-1, vmax=1, cbar_kws={'label': 'Correlation Coefficient'})
plt.title('Correlation Matrix: SST, SSTA, and Bleaching Severity')
plt.show()

🔎 **Observations:**

- **SST (Sea Surface Temperature):**
    - SST shows a **positive correlation** of **0.12** with Percent Bleaching, indicating a weak relationship between higher SST values and increased bleaching.
    - SST is highly correlated with TSA (**0.86**), indicating that temperature anomalies are closely tied to SST values.

- **SST Maximum:**
    - SST Maximum has a very weak negative correlation of **-0.01** with Percent Bleaching, suggesting that the maximum recorded SST is not strongly predictive of bleaching severity.
    - However, SST Maximum has a strong positive correlation with TSA Maximum (**0.64**).

- **SSTA (Sea Surface Temperature Anomaly):**
    - SSTA has a moderate positive correlation (**0.13**) with Percent Bleaching, suggesting that larger anomalies are more likely to increase bleaching.
    - SSTA is also correlated with TSA (**0.57**) and SSTA DHW (**0.44**), indicating that sustained temperature anomalies can contribute to bleaching events.

- **SSTA Maximum:**
    - SSTA Maximum has a weak correlation with Percent Bleaching (**0.10**), but it shows a strong relationship with TSA Maximum (**0.80**) and TSA DHW (**0.43**).

- **SSTA Frequency:**
    - SSTA Frequency is positively correlated with Percent Bleaching (**0.23**), highlighting that areas with frequent temperature anomalies tend to experience higher levels of coral bleaching.
    - It is strongly correlated with SSTA DHW (**0.58**) and TSA Frequency (**0.60**), emphasizing the cumulative effects of frequent anomalies.

- **TSA (Temperature Stress Anomaly):**
    - TSA is weakly correlated with Percent Bleaching (**0.12**) and has strong correlations with SST (**0.86**) and SSTA (**0.57**).

- **TSA Maximum:**
    - TSA Maximum has a weak correlation with Percent Bleaching (**0.02**), but is closely tied to SSTA Maximum (**0.80**).

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>SST and TSA:</strong> Both SST and TSA show weak but positive correlations with bleaching, highlighting their relevance, though not primary factors, in explaining bleaching severity.</li>
        <li><strong>SSTA Frequency and DHW:</strong> The frequency and intensity of temperature anomalies (SSTA Frequency and DHW) have stronger associations with bleaching, indicating the importance of persistent thermal stress.</li>
        <li><strong>TSA Maximum and SSTA Maximum:</strong> These variables are highly correlated with each other, though they exhibit weaker relationships with bleaching severity, suggesting that short-term extreme temperatures may not be as predictive of bleaching.</li>
        <li><strong>SSTA Influence:</strong> The moderate correlation between SSTA and bleaching highlights the critical role that temperature anomalies play in driving coral stress, while maximum and frequency metrics also contribute to bleaching patterns.</li>
    </ul>
</div> 

This matrix shows that **persistent thermal stress** (as indicated by SSTA Frequency and DHW) is more strongly associated with coral bleaching than extreme or short-term temperature spikes, underlining the significance of sustained temperature anomalies in predicting coral health.

In [None]:
# Step 1: Encode the categorical 'Exposure' variable
label_encoder = LabelEncoder()
data_for_eda['Exposure_Encoded'] = label_encoder.fit_transform(data_for_eda['Exposure'])

# Step 2: Recalculate the correlation matrix with 'Exposure_Encoded'
correlation_matrix = data_for_eda[['Distance_to_Shore', 'Turbidity', 'Cyclone_Frequency', 'Depth_m', 'Windspeed', 'Percent_Bleaching', 'Exposure_Encoded']].corr()

# Step 3: Plot the updated correlation matrix
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", vmin=-1, vmax=1, cbar_kws={'label': 'Correlation Coefficient'})
plt.title('Correlation Matrix: Other Environmental Factors and Bleaching Severity')
plt.show()

🔎 **Observations:**

- **Distance to Shore:**
    - Distance to Shore has a very weak positive correlation with Percent Bleaching (**0.03**), suggesting minimal influence of proximity to shore on bleaching severity.

- **Turbidity:**
    - Turbidity shows a very weak negative correlation with Percent Bleaching (**-0.04**), indicating that higher turbidity slightly decreases bleaching but is not a strong predictor.

- **Cyclone Frequency:**
    - Cyclone Frequency has a negligible correlation with Percent Bleaching (**-0.02**), suggesting that cyclone events do not have a strong direct relationship with bleaching severity.

- **Depth:**
    - Depth has a slightly stronger positive correlation with Percent Bleaching (**0.16**), indicating that deeper reefs may be more vulnerable to bleaching, though the effect is weak.

- **Windspeed:**
    - Windspeed also has a weak positive correlation with Percent Bleaching (**0.06**), suggesting that higher winds may slightly contribute to bleaching severity, though this effect is minor.

- **Exposure:**
    - Exposure shows no significant correlation with Percent Bleaching (**-0.00**), indicating that exposure level (whether sheltered or exposed) has little to no effect on bleaching severity.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Environmental Factors:</strong> Distance to shore, turbidity, cyclone frequency, depth, and windspeed all show very weak correlations with bleaching severity, indicating that these factors may not be primary drivers of coral bleaching.</li>
        <li><strong>Depth:</strong> Among the variables, depth shows the strongest (though still weak) correlation with bleaching, suggesting that deeper reefs may face slightly higher bleaching risk.</li>
        <li><strong>Negligible Influence of Exposure:</strong> Coral exposure to waves and currents has no significant impact on bleaching severity based on this correlation matrix.</li>
    </ul>
</div>

This matrix highlights that **other environmental factors** such as depth, distance to shore, and windspeed have very weak relationships with bleaching severity, suggesting that while they may play a role, they are not as influential as factors like temperature anomalies.

#### <b>3.11.2 <span style='color:#6495ED'>|</span> Interaction Between Environmental Factors</b> 


<div style="border-radius:10px;padding: 15px;background-color:#6495ED;color:white;font-size:100%;text-align:left">
📝 <strong>Key Takeaways:</strong>
<ul>
    <li><strong>Temperature and Bleaching:</strong> Elevated SST, SSTA, and TSA are closely tied to increased bleaching severity, particularly above <strong>300 K (~27°C)</strong> SST, suggesting that corals face more stress at higher temperatures.</li>
    <li><strong>Threshold Effect:</strong> Bleaching severity becomes more pronounced beyond <strong>300 K (~27°C)</strong> SST, highlighting a temperature threshold where coral health deteriorates rapidly.</li>
    <li><strong>Thermal Stress:</strong> TSA plays a critical role in coral bleaching, with high bleaching percentages seen at <strong>TSA values near zero or above</strong>, indicating the impact of prolonged or frequent temperature anomalies.</li>
    <li><strong>Interconnected Stress Factors:</strong> Strong correlations between SST, SSTA, and TSA suggest that thermal stressors are interrelated and collectively increase coral bleaching risks.</li>
</ul>
</div>
<br>
<div style="border-radius:10px;padding: 15px;background-color:#6495ED;color:white;font-size:100%;text-align:left">
📝 <strong>Environmental Insights:</strong>
<ul>
    <li><strong>Distance to Shore and Depth:</strong> A weak positive correlation shows minimal impact of distance on coral depth, while shallower, nearshore reefs show higher bleaching vulnerability due to exposure to environmental stressors.</li>
    <li><strong>Cyclone Frequency and Windspeed:</strong> A moderate correlation between cyclone frequency and windspeed suggests these factors slightly influence each other, but they are not major predictors of bleaching severity.</li>
    <li><strong>Turbidity and Depth:</strong> Higher turbidity in shallow waters correlates with greater bleaching, indicating the importance of water clarity and light availability for coral health.</li>
    <li><strong>Thermal Stress Despite Cyclones:</strong> Even in regions with cyclone activity, areas with high SST and SSTA still experience significant bleaching, showing that cooling from cyclones is insufficient to counteract severe thermal stress.</li>
</ul>
</div>
<br>
<div style="border-radius:10px;padding: 15px;background-color:#6495ED;color:white;font-size:100%;text-align:left">
📝 <strong>Turbidity and Coral Bleaching:</strong>
<ul>
    <li><strong>SST and Turbidity:</strong> High SST values are a major factor in bleaching, with low turbidity exacerbating coral stress due to increased light penetration. Moderate to high turbidity offers slight protection but cannot fully prevent bleaching at extreme SSTs.</li>
    <li><strong>Proximity to Shore:</strong> Coastal reefs closer to shore exhibit higher turbidity, likely due to runoff and human activity, while reefs farther offshore experience clearer conditions and reduced turbidity.</li>
    <li><strong>Exposure Impact:</strong> Nearshore reefs are generally sheltered from strong wave action, while exposure increases with distance, making moderately distant reefs more vulnerable to physical stress and environmental changes.</li>
</ul>
</div>

In [None]:
# Create a figure with three subplots (1 row, 3 columns)
fig, axs = plt.subplots(1, 3, figsize=(18, 6))

# First plot: SST vs. SSTA, color intensity representing Percent Bleaching
scatter1 = axs[0].scatter(x=data_for_eda['SST'], y=data_for_eda['SSTA'], c=data_for_eda['Percent_Bleaching'], cmap='viridis', alpha=0.7)
axs[0].set_xlabel('Sea Surface Temperature (SST)')
axs[0].set_ylabel('Sea Surface Temperature Anomaly (SSTA)')
axs[0].set_title('SST vs. SSTA and Bleaching Severity')
axs[0].grid(True)
fig.colorbar(scatter1, ax=axs[0], label='Percent Bleaching')

# Second plot: SST vs. TSA, color intensity representing Percent Bleaching
scatter2 = axs[1].scatter(x=data_for_eda['SST'], y=data_for_eda['TSA'], c=data_for_eda['Percent_Bleaching'], cmap='viridis', alpha=0.7)
axs[1].set_xlabel('Sea Surface Temperature (SST)')
axs[1].set_ylabel('Thermal Stress Anomaly (TSA)')
axs[1].set_title('SST vs. TSA and Bleaching Severity')
axs[1].grid(True)
fig.colorbar(scatter2, ax=axs[1], label='Percent Bleaching')

# Third plot: TSA vs. SSTA, color intensity representing Percent Bleaching
scatter3 = axs[2].scatter(x=data_for_eda['TSA'], y=data_for_eda['SSTA'], c=data_for_eda['Percent_Bleaching'], cmap='viridis', alpha=0.7)
axs[2].set_xlabel('Thermal Stress Anomaly (TSA)')
axs[2].set_ylabel('Sea Surface Temperature Anomaly (SSTA)')
axs[2].set_title('TSA vs. SSTA and Bleaching Severity')
axs[2].grid(True)
fig.colorbar(scatter3, ax=axs[2], label='Percent Bleaching')

# Adjust layout for better spacing
plt.tight_layout()

# Show the combined plot
plt.show()

🔎 **Observations:**

- **SST vs. SSTA and Bleaching Severity (Left Plot):**
    - There is a clear **positive correlation** between **SST** (Sea Surface Temperature) and **SSTA** (Sea Surface Temperature Anomaly). As SST increases, SSTA also rises.
    - Percent Bleaching, indicated by color intensity, increases with higher SST and SSTA values, with most severe bleaching occurring at **SST values above 300 K (~27°C)** and **SSTA values above 2**.

- **SST vs. TSA and Bleaching Severity (Middle Plot):**
    - A strong **negative relationship** is seen between **SST** and **TSA** (Thermal Stress Anomaly), where higher SST corresponds with a lower TSA.
    - Bleaching severity increases with rising SST and higher TSA values. This suggests that higher thermal stress anomalies are linked to more significant bleaching events, with peak bleaching occurring at **SST above 300 K** and **TSA values around 0 or higher**.

- **TSA vs. SSTA and Bleaching Severity (Right Plot):**
    - **TSA** and **SSTA** show a **positive correlation** as well, with TSA increasing alongside SSTA.
    - Percent Bleaching intensifies with both higher TSA and SSTA values, highlighting that regions experiencing elevated thermal stress and temperature anomalies are more prone to severe coral bleaching.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Temperature and Bleaching:</strong> Elevated sea surface temperatures (SST) and thermal anomalies (SSTA and TSA) are strongly linked to increased coral bleaching severity.</li>
        <li><strong>Threshold Effect:</strong> Bleaching severity becomes more pronounced beyond <strong>300 K (~27°C)</strong> SST, indicating a temperature threshold beyond which corals experience more stress.</li>
        <li><strong>Thermal Stress:</strong> TSA plays a significant role in bleaching severity, with high bleaching percentages seen at <strong>TSA values near zero or above</strong>, indicating prolonged or frequent thermal anomalies.</li>
        <li><strong>Interrelation of Factors:</strong> The strong correlations between SST, SSTA, and TSA reflect that thermal stressors are closely related and collectively contribute to coral bleaching risks.</li>
    </ul>
</div>

In [None]:
# Scatter plot function for subplots with different colors and regression lines, including Turbidity vs Cyclone
def plot_combined_scatter_with_regression(data):
    fig, axs = plt.subplots(2, 2, figsize=(18, 10))  # 2x2 subplots layout

    # Colors for each plot
    colors = ['#1192e8', '#a56eff', '#198038', '#ff7f0e']

    # 1. Distance_to_Shore and Depth_m
    if 'Distance_to_Shore' in data.columns and 'Depth_m' in data.columns:
        sns.regplot(x='Distance_to_Shore', y='Depth_m', data=data, scatter_kws={'color': colors[0], 'alpha': 0.6}, line_kws={'color': 'red'}, ax=axs[0, 0])
        axs[0, 0].set_title('Distance to Shore vs. Depth')
        axs[0, 0].set_xlabel('Distance to Shore (m)')
        axs[0, 0].set_ylabel('Depth (m)')
        axs[0, 0].grid(True)
        
        # Calculate and display correlation coefficient
        corr_coef1, p_value1 = pearsonr(data['Distance_to_Shore'], data['Depth_m'])
        print(f"Distance to Shore vs. Depth:\nCorrelation Coefficient: {corr_coef1:.2f}, p-value: {p_value1:.4f}\n")
    else:
        axs[0, 0].set_visible(False)
        print("Data for 'Distance_to_Shore' and 'Depth_m' is not available.")
    
    # 2. Cyclone_Frequency and Windspeed
    if 'Cyclone_Frequency' in data.columns and 'Windspeed' in data.columns:
        sns.regplot(x='Cyclone_Frequency', y='Windspeed', data=data, scatter_kws={'color': colors[1], 'alpha': 0.6}, line_kws={'color': 'red'}, ax=axs[0, 1])
        axs[0, 1].set_title('Cyclone Frequency vs. Windspeed')
        axs[0, 1].set_xlabel('Cyclone Frequency')
        axs[0, 1].set_ylabel('Windspeed (m/s)')
        axs[0, 1].grid(True)
        
        # Calculate and display correlation coefficient
        corr_coef2, p_value2 = pearsonr(data['Cyclone_Frequency'], data['Windspeed'])
        print(f"Cyclone Frequency vs. Windspeed:\nCorrelation Coefficient: {corr_coef2:.2f}, p-value: {p_value2:.4f}\n")
    else:
        axs[0, 1].set_visible(False)
        print("Data for 'Cyclone_Frequency' and 'Windspeed' is not available.")
    
    # 3. Turbidity and Depth_m
    if 'Turbidity' in data.columns and 'Depth_m' in data.columns:
        sns.regplot(x='Turbidity', y='Depth_m', data=data, scatter_kws={'color': colors[2], 'alpha': 0.6}, line_kws={'color': 'red'}, ax=axs[1, 0])
        axs[1, 0].set_title('Turbidity vs. Depth')
        axs[1, 0].set_xlabel('Turbidity')
        axs[1, 0].set_ylabel('Depth (m)')
        axs[1, 0].grid(True)
        
        # Calculate and display correlation coefficient
        corr_coef3, p_value3 = pearsonr(data['Turbidity'], data['Depth_m'])
        print(f"Turbidity vs. Depth:\nCorrelation Coefficient: {corr_coef3:.2f}, p-value: {p_value3:.4f}\n")
    else:
        axs[1, 0].set_visible(False)
        print("Data for 'Turbidity' and 'Depth_m' is not available.")
    
    # 4. Turbidity and Cyclone_Frequency
    if 'Turbidity' in data.columns and 'Cyclone_Frequency' in data.columns:
        sns.regplot(x='Turbidity', y='Cyclone_Frequency', data=data, scatter_kws={'color': colors[3], 'alpha': 0.6}, line_kws={'color': 'red'}, ax=axs[1, 1])
        axs[1, 1].set_title('Turbidity vs. Cyclone Frequency')
        axs[1, 1].set_xlabel('Turbidity')
        axs[1, 1].set_ylabel('Cyclone Frequency')
        axs[1, 1].grid(True)
        
        # Calculate and display correlation coefficient
        corr_coef4, p_value4 = pearsonr(data['Turbidity'], data['Cyclone_Frequency'])
        print(f"Turbidity vs. Cyclone Frequency:\nCorrelation Coefficient: {corr_coef4:.2f}, p-value: {p_value4:.4f}\n")
    else:
        axs[1, 1].set_visible(False)
        print("Data for 'Turbidity' and 'Cyclone_Frequency' is not available.")

    plt.tight_layout()
    plt.show()

# Call the function with the dataset
plot_combined_scatter_with_regression(data_for_eda)

🔎 **Observations:**

- **Distance to Shore vs. Depth (Top Left):**
    - The scatter plot shows a **weak positive correlation** between **Distance to Shore** and **Depth** (Correlation Coefficient: 0.03). Although the p-value indicates statistical significance (p < 0.001), the relationship between these two variables is minimal, suggesting that coral reefs at varying distances from shore do not vary significantly in depth.

- **Cyclone Frequency vs. Windspeed (Top Right):**
    - There is a **moderate positive correlation** between **Cyclone Frequency** and **Windspeed** (Correlation Coefficient: 0.10, p < 0.001). Higher cyclone frequencies correspond with slightly higher wind speeds, but the overall effect remains modest.

- **Turbidity vs. Depth (Bottom Left):**
    - A **negative correlation** is observed between **Turbidity** and **Depth** (Correlation Coefficient: -0.18, p < 0.001). This suggests that areas with higher turbidity are typically shallower, likely due to the proximity to shore and higher sedimentation in shallow waters.

- **Turbidity vs. Cyclone Frequency (Bottom Right):**
    - The scatter plot shows a **weak negative correlation** between **Turbidity** and **Cyclone Frequency** (Correlation Coefficient: -0.03, p < 0.001). Though the p-value indicates significance, the correlation is very weak, indicating that cyclone frequency has little to no direct relationship with turbidity levels.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Distance to Shore:</strong> Weak correlation with depth suggests that coral reefs' proximity to shore has little impact on their depth levels.</li>
        <li><strong>Cyclone Frequency and Windspeed:</strong> A positive but modest correlation shows that higher cyclone activity corresponds to slightly higher wind speeds.</li>
        <li><strong>Turbidity and Depth:</strong> Shallow coral reefs tend to have higher turbidity, likely due to increased sedimentation from nearby shorelines.</li>
        <li><strong>Turbidity and Cyclone Frequency:</strong> The weak negative correlation implies that cyclone frequency does not significantly influence turbidity levels.</li>
    </ul>
</div>

In [None]:
# Create a figure with four subplots (2 rows, 2 columns)
fig, axs = plt.subplots(2, 2, figsize=(18, 12))

# First plot: Distance to Shore vs. Depth, color intensity representing Percent Bleaching
scatter1 = axs[0, 0].scatter(x=data_for_eda['Distance_to_Shore'], y=data_for_eda['Depth_m'], c=data_for_eda['Percent_Bleaching'], cmap='viridis', alpha=0.7)
axs[0, 0].set_xlabel('Distance to Shore (m)')
axs[0, 0].set_ylabel('Depth (m)')
axs[0, 0].set_title('Distance to Shore vs. Depth and Bleaching Severity')
axs[0, 0].grid(True)
fig.colorbar(scatter1, ax=axs[0, 0], label='Percent Bleaching')

# Second plot: Cyclone Frequency vs. Windspeed, color intensity representing Percent Bleaching
scatter2 = axs[0, 1].scatter(x=data_for_eda['Cyclone_Frequency'], y=data_for_eda['Windspeed'], c=data_for_eda['Percent_Bleaching'], cmap='viridis', alpha=0.7)
axs[0, 1].set_xlabel('Cyclone Frequency')
axs[0, 1].set_ylabel('Windspeed (m/s)')
axs[0, 1].set_title('Cyclone Frequency vs. Windspeed and Bleaching Severity')
axs[0, 1].grid(True)
fig.colorbar(scatter2, ax=axs[0, 1], label='Percent Bleaching')

# Third plot: Turbidity vs. Depth, color intensity representing Percent Bleaching
scatter3 = axs[1, 0].scatter(x=data_for_eda['Turbidity'], y=data_for_eda['Depth_m'], c=data_for_eda['Percent_Bleaching'], cmap='viridis', alpha=0.7)
axs[1, 0].set_xlabel('Turbidity')
axs[1, 0].set_ylabel('Depth (m)')
axs[1, 0].set_title('Turbidity vs. Depth and Bleaching Severity')
axs[1, 0].grid(True)
fig.colorbar(scatter3, ax=axs[1, 0], label='Percent Bleaching')

# Fourth plot: Turbidity vs. Cyclone Frequency, color intensity representing Percent Bleaching
scatter4 = axs[1, 1].scatter(x=data_for_eda['Turbidity'], y=data_for_eda['Cyclone_Frequency'], c=data_for_eda['Percent_Bleaching'], cmap='viridis', alpha=0.7)
axs[1, 1].set_xlabel('Turbidity')
axs[1, 1].set_ylabel('Cyclone Frequency')
axs[1, 1].set_title('Turbidity vs. Cyclone Frequency and Bleaching Severity')
axs[1, 1].grid(True)
fig.colorbar(scatter4, ax=axs[1, 1], label='Percent Bleaching')

# Adjust layout for better spacing
plt.tight_layout()

# Show the combined plot
plt.show()

🔎 **Observations:**

- **Distance to Shore vs. Depth and Bleaching Severity (Top Left):**
    - The scatter plot shows a **weak positive correlation** between **Distance to Shore** and **Depth**. The concentration of bleaching severity is higher at **shallower depths** near the shore, where coral reefs are more vulnerable to environmental stressors like turbidity and temperature fluctuations.

- **Cyclone Frequency vs. Windspeed and Bleaching Severity (Top Right):**
    - There is a **moderate positive correlation** between **Cyclone Frequency** and **Windspeed**. However, bleaching severity appears scattered, indicating that windspeed and cyclone frequency alone may not be strong predictors of bleaching without additional stressors.

- **Turbidity vs. Depth and Bleaching Severity (Bottom Left):**
    - A **negative correlation** is observed between **Turbidity** and **Depth**, meaning shallower areas experience higher turbidity. Bleaching severity is more frequent in **shallow, high-turbidity regions**, where light penetration is reduced, causing additional stress on coral health.

- **Turbidity vs. Cyclone Frequency and Bleaching Severity (Bottom Right):**
    - The plot reveals a **weak negative correlation** between **Turbidity** and **Cyclone Frequency**, suggesting little impact of cyclone frequency on turbidity. Bleaching is observed across all turbidity levels, with a concentration at **moderate turbidity**.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Distance to Shore:</strong> Bleaching severity is concentrated in shallow waters close to shore, likely due to higher exposure to environmental stressors like turbidity and human activity.</li>
        <li><strong>Cyclone Frequency and Windspeed:</strong> A modest relationship with bleaching severity suggests that these factors alone are insufficient to predict bleaching without considering thermal stress and other environmental pressures.</li>
        <li><strong>Turbidity and Depth:</strong> Shallow reefs with higher turbidity are more prone to bleaching, indicating the importance of water clarity and light availability for coral health.</li>
        <li><strong>Turbidity and Cyclone Frequency:</strong> Cyclone frequency does not significantly influence turbidity, but bleaching tends to concentrate around moderate turbidity levels, likely due to reduced light penetration.</li>
    </ul>
</div>

In [None]:
# Segment the data based on Cyclone_Frequency (assuming a value > 0 indicates the presence of a cyclone)
data_for_eda['Cyclone_Present'] = data_for_eda['Cyclone_Frequency'] > 0

# Group the data by Cyclone_Present and calculate the mean Percent Bleaching for different levels of SST and SSTA
grouped_data = data_for_eda.groupby(['Cyclone_Present', 'SST', 'SSTA'])['Percent_Bleaching'].mean().reset_index()

# Plotting the impact of cyclones on bleaching severity in the context of SST and SSTA
plt.figure(figsize=(18, 8))
sns.scatterplot(x='SST', y='SSTA', size='Percent_Bleaching', sizes=(10, 200), hue='Percent_Bleaching', data=grouped_data[grouped_data['Cyclone_Present'] == True], alpha=0.7)
plt.xlabel('SST')
plt.ylabel('SSTA')
plt.title('Bleaching Severity with Cyclones Present')
plt.grid(True)

plt.tight_layout()
plt.show()

🔎 **Observations:**

- **SST vs. SSTA with Cyclones Present:**
    - The scatter plot shows a **positive correlation** between **Sea Surface Temperature (SST)** and **Sea Surface Temperature Anomaly (SSTA)**, particularly when SST is above 295 K.
    - **Bleaching severity**, indicated by larger and darker points, is most prevalent at higher SST and SSTA values.
    - This suggests that areas experiencing both **elevated SST and SSTA** are more susceptible to coral bleaching, even in the presence of cyclones.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>SST and SSTA:</strong> Coral bleaching becomes more severe when both SST and SSTA are elevated, showing that thermal stress is a significant factor in coral vulnerability.</li>
        <li><strong>Cyclone Presence:</strong> Despite the potential for cyclones to cool waters through mixing, regions with high SST and SSTA still experience high levels of bleaching severity.</li>
    </ul>
</div>

In [None]:
# Adjusting turbidity bins based on the actual range of 0 to 1.2845
turbidity_bins = [0, 0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 1.3]
turbidity_labels = ['Very Low', 'Low', 'Moderate', 'High', 'Very High', 'Extremely High', 'Max']
data_for_eda['Turbidity_Bin'] = pd.cut(data_for_eda['Turbidity'], bins=turbidity_bins, labels=turbidity_labels)

# Plot the relationship between SST, turbidity, and bleaching severity
plt.figure(figsize=(18, 8))
sns.scatterplot(x='SST', y='Percent_Bleaching', hue='Turbidity_Bin', size='Turbidity', sizes=(20, 200), data=data_for_eda, alpha=0.7)
plt.xlabel('Sea Surface Temperature (SST)')
plt.ylabel('Percent Bleaching')
plt.title('Interaction Between SST, Turbidity, and Bleaching Severity (Adjusted Turbidity Bins)')
plt.grid(True)
plt.show()

🔎 **Observations:**

- **Sea Surface Temperature (SST) vs. Percent Bleaching:**
    - The scatter plot shows a **strong relationship** between rising SST and **bleaching severity**, especially when SST exceeds 300 K.
    - Coral bleaching increases with higher SST, with **bleaching severity** exceeding 60% at these elevated temperatures.

- **Turbidity Effects:**
    - The plot includes **turbidity levels**, represented by different colors and bubble sizes.
    - Most bleaching events occur in areas with **low turbidity** (denoted by blue and small bubbles), suggesting that high **light penetration** at low turbidity exacerbates bleaching, especially at higher SSTs.
    - **Moderate to high turbidity** (green and orange markers) seems to have a slight protective effect, but bleaching still occurs at elevated SSTs.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>SST:</strong> High sea surface temperatures are a critical factor in coral bleaching severity, particularly above 300 K.</li>
        <li><strong>Turbidity:</strong> Low turbidity correlates with higher bleaching severity, while moderate to high turbidity shows some protective effects but cannot fully prevent bleaching at extreme SST levels.</li>
    </ul>
</div>

In [None]:
# Alternative way to show turbidity across different distances from shore using a line plot with error bars
plt.figure(figsize=(12, 6))
turbidity_by_distance = data_for_eda.groupby('Distance_Bin')['Turbidity'].agg(['mean', 'std']).reset_index()

# Use blue from the custom palette
plt.errorbar(turbidity_by_distance['Distance_Bin'], turbidity_by_distance['mean'], 
             yerr=turbidity_by_distance['std'], fmt='-o', color='#1192e8', ecolor='#1192e8', capsize=5)
plt.xlabel('Distance to Shore (km)', fontsize=12)
plt.ylabel('Mean Turbidity', fontsize=12)
plt.title('Mean Turbidity Across Different Distances from Shore with Standard Deviation', fontsize=14)
plt.xticks(rotation=45)
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

# Plot the distribution of exposure levels across different distance bins
plt.figure(figsize=(12, 6))

# Use custom colors for the exposure levels
sns.countplot(x='Distance_Bin', hue='Exposure', data=data_for_eda, palette=['#1192e8', '#a56eff', '#fa4d56'])
plt.xlabel('Distance to Shore (km)', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.title('Exposure Levels Across Different Distances from Shore', fontsize=14)
plt.xticks(rotation=45)
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

🔎 **Observations:**

- **Mean Turbidity Across Different Distances from Shore:**
    - The trend shows **decreasing turbidity** with increasing distance from shore. Coastal areas closer than **0.16 km** exhibit the **highest turbidity**, accompanied by significant variability.
    - As we move further offshore, turbidity becomes more stable, with **lower standard deviation** beyond **50 km**.
    - **Implication:** Nearshore reefs are more prone to turbidity caused by **runoff** and **human activities**, while offshore reefs experience clearer water conditions.

- **Exposure Levels Across Different Distances:**
    - **Sheltered reefs** are predominantly located nearshore, while **Exposed reefs** are more common at moderate distances (between **0.66 km to 10 km**).
    - The **mixed exposure** levels across different distances suggest that proximity to the shore influences how coral reefs interact with external stressors like waves and human activities.
    - **Implication:** The varying exposure levels impact the vulnerability of reefs to **physical damage** and **thermal stress**, particularly in areas closer to the coast.

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    📝 <strong>Key Insights:</strong>
    <ul>
        <li><strong>Turbidity:</strong> Higher turbidity nearshore may be linked to human activities and runoff, while reefs farther offshore experience clearer waters.</li>
        <li><strong>Exposure:</strong> Nearshore reefs are typically sheltered, but exposure increases with moderate distances, affecting their resilience to environmental stressors.</li>
    </ul>
</div>

### <b>3.12 <span style='color:#6495ED'>|</span> Synthesis of Findings</b> 

After conducting the Exploratory Data Analysis, the results will be summarized into the following three subsections: **Findings related to Data Understanding**, **Insights for Feature Engineering**, and the **Next Steps for Modeling**.

#### <b>3.12.1 <span style='color:#6495ED'>|</span> Findings related to Data Understanding</b> 


<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    <p>🌡️🪸 <strong>Temperature and Coral Bleaching:</strong></p>
    <ul>
        <li><strong>Sea Surface Temperature (SST) and Bleaching:</strong> There is a <strong>clear correlation</strong> between rising SST and coral bleaching. Bleaching severity significantly increases when SST exceeds <strong>300 K (~27°C)</strong>, indicating a critical temperature threshold beyond which coral reefs experience significant stress. SST and SST anomalies (SSTA) are key drivers of coral bleaching, particularly in regions like the <strong>Caribbean</strong>, <strong>Red Sea</strong>, and <strong>Southeast Asia</strong>.</li>
        <li><strong>Sustained Thermal Stress:</strong> Prolonged exposure to high temperatures, measured through <strong>Degree Heating Weeks (DHW)</strong>, strongly correlates with bleaching. Reefs in the <strong>Caribbean</strong> and <strong>Indo-Pacific</strong> regions, which experience frequent and prolonged thermal anomalies, are particularly vulnerable.</li>
        <li><strong>Temperature Anomalies (TSA):</strong> While short-term spikes in temperature contribute to bleaching, <strong>sustained temperature anomalies (SSTA and TSA)</strong> are more impactful. Areas experiencing frequent and prolonged thermal stress, such as parts of the <strong>Red Sea</strong> and <strong>Southeast Asia</strong>, face severe bleaching.</li>
    </ul>
    <p>🏖️🌊 <strong>Environmental and Physical Stressors:</strong></p>
    <ul>
        <li><strong>Proximity to Shore:</strong> Reefs located <strong>close to shore</strong> face higher stress from human activities like pollution and runoff. Nearshore reefs in regions such as the <strong>Caribbean</strong> and <strong>Southeast Asia</strong> tend to have higher turbidity and are more exposed to bleaching, though deeper offshore reefs can also experience bleaching when exposed to thermal stress.</li>
        <li><strong>Turbidity:</strong> Higher turbidity levels, especially nearshore, reduce light penetration, limiting photosynthesis and increasing coral stress. However, moderate turbidity can sometimes provide slight protective effects by reducing direct solar exposure.</li>
        <li><strong>Cyclones and Windspeed:</strong> Cyclones and moderate windspeed provide a <strong>dual effect</strong>—cooling the waters and potentially reducing thermal stress while also increasing physical damage to reefs. Very high cyclone frequencies (>80) and moderate winds (5-12 m/s) exacerbate bleaching risks, especially in regions like the <strong>Caribbean</strong> and <strong>Indian Ocean</strong>.</li>
    </ul>
    <p>🌊🌅 <strong>Depth and Exposure:</strong></p>
    <ul>
        <li><strong>Depth and Bleaching:</strong> <strong>Deeper reefs</strong> (20-30 meters) are more prone to severe bleaching than shallow reefs, possibly due to reduced light availability and limited recovery capacity. Shallower reefs (0-10 meters), though exposed to direct sunlight and temperature fluctuations, tend to have lower bleaching severity.</li>
        <li><strong>Exposure to Environmental Factors:</strong> Reefs that are <strong>sometimes exposed</strong> to environmental factors experience the highest bleaching severity, suggesting that fluctuating exposure levels can induce stress. <strong>Sheltered reefs</strong> tend to have lower bleaching severity, likely due to protection from strong currents and waves.</li>
    </ul>
    <p>🌍🪸 <strong>Geographical and Regional Patterns:</strong></p>
    <ul>
        <li><strong>Bleaching Hotspots:</strong> The most severely affected regions are concentrated in the <strong>Caribbean</strong>, <strong>Red Sea</strong>, <strong>Indian Ocean</strong>, and <strong>Southeast Asia</strong>. These areas face frequent temperature anomalies, prolonged heat stress, and a combination of human-induced and natural stressors.</li>
        <li><strong>Resilience in Cooler Regions:</strong> Coral reefs in <strong>cooler regions</strong> like the <strong>central Pacific</strong> and some parts of the <strong>Atlantic</strong> show lower bleaching severity, suggesting higher resilience due to less frequent exposure to extreme thermal events.</li>
        <li><strong>Vulnerability of Shallow Coral Sites:</strong> Shallow reefs, especially in the <strong>Caribbean</strong> and <strong>Southeast Asia</strong>, are more vulnerable to <strong>thermal stress</strong> and <strong>human interference</strong>, making them critical zones for targeted conservation efforts.</li>
    </ul>
    <p>🌪️💨 <strong>Complex Interplay of Stress Factors:</strong></p>
    <ul>
        <li><strong>Temperature as the Dominant Factor:</strong> While environmental factors like <strong>depth</strong>, <strong>turbidity</strong>, and <strong>cyclones</strong> contribute to bleaching, <strong>temperature-related stress</strong> (SST, SSTA, TSA) remains the primary driver. The <strong>threshold effect</strong> of SST (300 K) and the <strong>accumulation of stress</strong> over time (DHW) are critical in predicting bleaching severity.</li>
        <li><strong>Moderating Impact of Cyclones and Winds:</strong> Moderate cyclone activity and windspeed may have a <strong>protective cooling effect</strong> by lowering SST, but high frequencies of cyclones and strong winds cause physical damage, leading to increased bleaching risks in regions with consistent storm exposure.</li>
    </ul>
    <p>🛟🌊 <strong>Conservation Implications:</strong></p>
    <ul>
        <li><strong>High-Risk Regions:</strong> Areas like the <strong>Caribbean</strong>, <strong>Red Sea</strong>, and <strong>Southeast Asia</strong> should be prioritized for conservation due to the high frequency of bleaching events and their exposure to both <strong>natural</strong> and <strong>anthropogenic stressors</strong>.</li>
        <li><strong>Protective Measures for Shallow and Sheltered Reefs:</strong> These reefs show lower bleaching severity but remain vulnerable to <strong>temperature anomalies</strong> and <strong>runoff</strong>. Protective measures like <strong>reducing local pollution</strong> and <strong>improving water quality</strong> can mitigate stress.</li>
        <li><strong>Long-Term Monitoring:</strong> Regions with high <strong>thermal anomaly frequencies</strong> and <strong>prolonged DHW exposure</strong> should be closely monitored. This will help predict future bleaching events and implement adaptive strategies to build <strong>resilience</strong> in vulnerable coral ecosystems.</li>
    </ul>
</div>

#### <b>3.12.2 <span style='color:#6495ED'>|</span> Insights for Feature Engineering</b> 

##### <b>3.12.2.1 <span style='color:#6495ED'>|</span> Key Feature Selection for Predicting Coral Bleaching</b> 

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    <p>🌡️ <strong>Sea Surface Temperature (SST):</strong></p>
    <ul>
        <li><strong>SST</strong> is a critical feature, as it shows a consistent, positive correlation with coral bleaching severity. Reefs experiencing SST above <strong>300 K (~27°C)</strong> are at significant risk for bleaching.</li>
        <li><strong>SST Maximum</strong> adds value by capturing extreme heat events, though it has a weaker relationship with bleaching severity compared to average SST.</li>
    </ul>
    <p>🌡️ 📈 <strong>Sea Surface Temperature Anomalies (SSTA):</strong></p>
    <ul>
        <li><strong>SSTA</strong> and <strong>SSTA Maximum</strong> are key predictors of bleaching. Anomalies above <strong>2-4 K</strong> are strongly associated with higher bleaching percentages.</li>
        <li><strong>SSTA Frequency</strong> and <strong>SSTA DHW</strong> (Degree Heating Weeks) capture chronic thermal stress, making them crucial for identifying areas experiencing prolonged heat stress.</li>
    </ul>
    <p>🔥 <strong>Temperature Stress Anomalies (TSA):</strong></p>
    <ul>
        <li><strong>TSA</strong> and <strong>TSA Maximum</strong> correlate strongly with bleaching severity, particularly when TSA reaches values above <strong>0K</strong>, capturing short-term temperature deviations that often precede bleaching.</li>
        <li><strong>TSA DHW</strong> and <strong>TSA Frequency</strong> are engineered features that highlight prolonged and frequent heat stress, showing strong correlation with bleaching outcomes.</li>
    </ul>
    <p>🌊 <strong>Depth:</strong></p>
    <ul>
        <li>Depth emerges as moderately important, especially for predicting bleaching severity in deeper reefs. Reefs at <strong>depths above 20 meters</strong> tend to experience more severe bleaching, likely due to reduced sunlight and slower recovery.</li>
    </ul>
    <p>🌪️💨 <strong>Cyclone Frequency and Windspeed:</strong></p>
    <ul>
        <li>Cyclone frequency and windspeed have mixed roles in coral bleaching. While cyclones can cool waters, reducing bleaching, frequent storms or strong winds increase physical damage. These features show weak individual correlations but may improve predictions when combined with other environmental factors.</li>
    </ul>
    <p>🌫️ <strong>Turbidity:</strong></p>
    <ul>
        <li>Turbidity shows a weak yet relevant correlation with bleaching. Reefs in moderately turbid waters experience the highest bleaching severity, suggesting that feature engineering around turbidity and depth could improve predictions by accounting for light limitations in shallow areas.</li>
    </ul>
    <p>🏖️ <strong>Proximity to Shore:</strong></p>
    <ul>
        <li>Distance to shore provides insights into human-driven stressors like pollution and runoff. Though its correlation with bleaching is weak, this feature can be engineered with turbidity and depth to better capture nearshore human impacts.</li>
    </ul>
</div>

##### <b>3.12.2.2 <span style='color:#6495ED'>|</span> Feature Engineering Insights</b> 

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    <p>🔥 <strong>Degree Heating Weeks (DHW):</strong></p>
    <ul>
        <li><strong>SSTA DHW</strong> and <strong>TSA DHW</strong> are powerful predictors, measuring <strong>prolonged exposure to thermal stress</strong>. These features capture cumulative heat stress, distinguishing between temporary anomalies and chronic stress, making them essential for predicting severe bleaching.</li>
    </ul>
    <p>📉 <strong>Anomaly Frequency (SSTA Frequency, TSA Frequency):</strong></p>
    <ul>
        <li>These features quantify the <strong>recurrence of thermal anomalies</strong>, which significantly contributes to coral stress. Frequent anomalies correlate with higher bleaching vulnerability, making them critical for long-term predictions.</li>
    </ul>
    <p>🔄 <strong>Interaction Features:</strong></p>
    <ul>
        <li>Interaction terms between <strong>SST, SSTA</strong>, and <strong>turbidity</strong> can capture the compounded effects of thermal stress and reduced light penetration in turbid waters, enhancing model predictions.</li>
        <li>Combining <strong>cyclone frequency</strong> and <strong>windspeed</strong> with SST or SSTA in interaction terms can model the balance between cooling effects and physical damage from storms.</li>
    </ul>
    <p>📊 <strong>Nonlinear Relationships:</strong></p>
    <ul>
        <li>Many features such as <strong>SST, SSTA, depth, cyclone frequency,</strong> and <strong>windspeed</strong> show <strong>nonlinear relationships</strong> with bleaching. Using polynomial features (e.g., <strong>SST²</strong>, <strong>Depth²</strong>) can capture these effects and improve model accuracy.</li>
        <li>Polynomial transformations of <strong>distance to shore</strong> and <strong>depth</strong> help represent the complex interactions between proximity to human activity and environmental stressors.</li>
    </ul>
    <p>🏖️ <strong>Proximity to Shore and Environmental Interactions:</strong></p>
    <ul>
        <li>Transforming <strong>distance to shore</strong> into interaction terms with <strong>turbidity, depth,</strong> and <strong>windspeed</strong> enhances the capture of <strong>localized stressors</strong> experienced by nearshore reefs. These interactions account for the effects of human activity, sedimentation, and physical damage.</li>
    </ul>
</div>

#### <b>3.12.3 <span style='color:#6495ED'>|</span> Recommendations for Model Development</b> 

<div style="border-radius:10px;border:#6495ED solid;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    <p>🔑 <strong>Key Feature Set:</strong></p>
    <ul>
        <li>Utilize essential features like <strong>SST, SSTA, TSA, depth, turbidity, cyclone frequency, windspeed</strong>, and their engineered counterparts (e.g., <strong>DHW, frequency</strong>) to model coral bleaching risk effectively.</li>
    </ul>
    <p>🔄 <strong>Interaction Terms:</strong></p>
    <ul>
        <li>Incorporate interaction terms such as <strong>SST x Depth</strong>, <strong>SSTA x Cyclone Frequency</strong>, and <strong>Turbidity x Depth</strong> to capture the combined effects of multiple stressors on coral reefs.</li>
    </ul>
    <p>📊 <strong>Polynomial Transformations:</strong></p>
    <ul>
        <li>Apply <strong>polynomial transformations</strong> to capture the nonlinear relationships between environmental variables and coral bleaching severity, particularly for <strong>SST, SSTA</strong>, and <strong>depth</strong>.</li>
    </ul>
    <p>📈 <strong>Time-Series Features:</strong></p>
    <ul>
        <li>If temporal data is available, use time-series analysis of <strong>SSTA Frequency, TSA Frequency</strong>, and <strong>DHW</strong> to model how cumulative heat stress over time impacts coral bleaching outcomes, improving prediction accuracy.</li>
    </ul>
</div>

<hr>

## <div style="padding: 20px;color:white;margin:10;font-size:90%;text-align:left;display:fill;border-radius:10px;background-color: rgba(0, 0, 0, 0.2);overflow: hidden; background-image: url(https://i.ibb.co/1fn8BZj/image4.webp); background-size: cover; background-position: center; background-blend-mode: darken;"><b><span style='color:white'> 4 | Data Preprocessing</span></b> </div>

After conducting an extensive Exploratory Data Analysis (EDA) to understand the dataset's structure and gain insights into the factors contributing to coral bleaching, we now move forward with data preprocessing and feature engineering. This stage involves preparing the data for model building by transforming raw data into a format that can be effectively used by machine learning algorithms. The preprocessing steps include handling missing values, encoding categorical variables, scaling numerical features, and creating new interaction terms and clusters based on geographical and environmental factors.

This section outlines the preprocessing workflow, highlighting the steps taken to clean and enhance the dataset for modeling, ensuring that the data is structured, normalized, and contains relevant features for improved predictive performance.

### <b>4.1 <span style='color:#6495ED'>|</span> Copy the Data and Extract Date Information</b> 

In this step, we first make a copy of the original dataset to avoid modifying it. Then, we extract the year, month, and day from the `Date` column and drop the `Date` column afterward. This helps us treat the date as a feature without keeping the raw date format.

In [None]:
# Copy the data to a new variable for model building to avoid modifying the original data
data_for_model = data.copy()

# Extract year, month, and day from the Date
data_for_model['Year'] = pd.to_datetime(data_for_model['Date']).dt.year
data_for_model['Month'] = pd.to_datetime(data_for_model['Date']).dt.month
data_for_model['Day'] = pd.to_datetime(data_for_model['Date']).dt.day

# Drop the Date column
data_for_model.drop('Date', axis=1, inplace=True)

# Display the first few rows to verify changes
data_for_model.head()

### <b>4.2 <span style='color:#6495ED'>|</span> Feature Engineering: Interaction Terms</b> 

Here, we create interaction terms between some of the key environmental features like `SST` (Sea Surface Temperature) and `TSA` (Temperature Stress Anomaly). These interaction terms may help capture combined effects of temperature-related factors on coral bleaching.

In [None]:
# Add interaction terms between environmental stressors
# Interaction of Sea Surface Temperature (SST) and Temperature Stress Anomaly (TSA)
data_for_model['SST_TSA_Interaction'] = data_for_model['SST'] * data_for_model['TSA']

# Interaction between SSTA (Sea Surface Temperature Anomaly) and Depth
data_for_model['SSTA_Depth_Interaction'] = data_for_model['SSTA'] * data_for_model['Depth_m']

# Display the first few rows to verify changes
data_for_model.head()

### <b>4.3 <span style='color:#6495ED'>|</span> Encode Categorical Variables</b> 

In this step, we encode the categorical columns using `LabelEncoder`. This transforms the non-numeric features like `Ocean_Name` and `Bleaching_Level` into numerical labels, which are necessary for machine learning models.

In [None]:
# Encoding categorical columns
label_encoders = {}
categorical_columns = ['Ocean_Name', 'Realm_Name', 'Ecoregion_Name', 'Exposure', 'Bleaching_Level']

for col in categorical_columns:
    le = LabelEncoder()
    data_for_model[col] = le.fit_transform(data_for_model[col])
    label_encoders[col] = le  # Save the encoder for potential inverse transformation

# Display the first few rows to verify changes
data_for_model.head()

### <b>4.4 <span style='color:#6495ED'>|</span> Feature Engineering: Environmental Clustering</b> 

We cluster geographical and environmental features (like `Depth_m` and `Distance_to_Shore`) using K-Means clustering. This creates new feature groups based on the similarity of coral environments, which can help the model identify clusters with similar characteristics.

In [None]:
# Clustering of environmental features: Combining Depth and Distance to Shore
geo_env_features = data_for_model[['Depth_m', 'Distance_to_Shore']]
kmeans_env = KMeans(n_clusters=4, random_state=42)
data_for_model['Geo_Env_Cluster'] = kmeans_env.fit_predict(geo_env_features)

# Display the first few rows to verify changes
data_for_model.head()

### <b>4.5 <span style='color:#6495ED'>|</span> Scale and Normalize Numerical Features</b> 

In this step, we first normalize the numerical columns using the PowerTransformer to handle skewness in the data. Then, we apply the `RobustScaler` to handle outliers effectively by scaling the data based on interquartile range (IQR).

In [None]:
# Select only the numerical columns for scaling
numerical_columns = data_for_model.select_dtypes(include=['float64', 'int64']).columns

# Apply Power Transformation to normalize the data distribution
pt = PowerTransformer(method='yeo-johnson')
data_for_model[numerical_columns] = pt.fit_transform(data_for_model[numerical_columns])

# Apply RobustScaler to handle outliers
scaler = RobustScaler()
data_for_model[numerical_columns] = scaler.fit_transform(data_for_model[numerical_columns])

# Display the first few rows to verify changes
data_for_model.head()

### <b>4.6 <span style='color:#6495ED'>|</span> Geographical Clustering Based on Latitude and Longitude</b> 

We apply K-Means clustering to the geographical features (`Latitude_Degrees` and `Longitude_Degrees`) to create `Geo_Cluster`. This can help group coral sites based on their geographic locations.

In [None]:
# Apply K-Means clustering on geographical features (Latitude and Longitude)
geo_features = data_for_model[['Latitude_Degrees', 'Longitude_Degrees']]
kmeans = KMeans(n_clusters=5, random_state=42)
data_for_model['Geo_Cluster'] = kmeans.fit_predict(geo_features)

# Display the first few rows of the dataset after applying these steps
data_for_model.head()

### <b>4.7 <span style='color:#6495ED'>|</span> Feature Engineering: Rolling Averages for Time Series Effects</b> 

We introduce rolling averages for features like `SST`, `SST Maximum`, and `Cyclone Frequency` to capture the effect of past values on coral bleaching. This is especially useful for capturing temporal dependencies in the data.

In [None]:
# Create rolling averages for SST, SST Maximum, and Cyclone Frequency (rolling average over 3 months)
data_for_model['SST_RollingAvg'] = data_for_model['SST'].rolling(window=3).mean().fillna(data_for_model['SST'])
data_for_model['SST_Max_RollingAvg'] = data_for_model['SST_Maximum'].rolling(window=3).mean().fillna(data_for_model['SST_Maximum'])
data_for_model['CycloneFreq_RollingAvg'] = data_for_model['Cyclone_Frequency'].rolling(window=3).mean().fillna(data_for_model['Cyclone_Frequency'])

# Display the first few rows to verify changes
data_for_model.head()

### <b>4.8 <span style='color:#6495ED'>|</span> Define Features (X) and Target Variable (y), and Split the Data</b> 

Finally, we define the features (X) and target variable (y), and split the dataset into training and testing sets to prepare it for model building.

In [None]:
# Define the features (X) and the target variable (y)
X = data_for_model.drop(columns=['Percent_Bleaching'])  # Drop the target variable from the features
y = data_for_model['Percent_Bleaching']  # Set the target variable

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display the shape of the training and testing sets to verify the split
X_train.shape, X_test.shape, y_train.shape, y_test.shape

<hr>

## <div style="padding: 20px;color:white;margin:10;font-size:90%;text-align:left;display:fill;border-radius:10px;background-color: rgba(0, 0, 0, 0.2);overflow: hidden; background-image: url(https://i.ibb.co/1fn8BZj/image4.webp); background-size: cover; background-position: center; background-blend-mode: darken;"><b><span style='color:white'> 5 | Model Training and Tuning</span></b> </div>

### <b>5.1 <span style='color:#6495ED'>|</span> Model Selection</b> 

After completing the data preprocessing, the next critical step is selecting appropriate regression models to predict our target variable, **Percent Bleaching**. We have chosen a diverse set of regression models that represent both linear and non-linear relationships, simple and complex architectures, and models that incorporate regularization and ensemble techniques to reduce overfitting and improve predictive performance.

***Why These Models?***

We have selected a range of models to evaluate, each offering unique strengths for different types of data and prediction tasks:

1. **Ridge Regression**: A linear regression model with L2 regularization. It helps prevent overfitting by shrinking the coefficients, particularly useful when dealing with multicollinearity or high-dimensional data.
   
2. **Elastic Net Regression**: A hybrid of Ridge and Lasso regression, combining L1 and L2 regularization to balance between penalizing large coefficients and feature selection, making it effective in high-dimensional datasets.

3. **Random Forest Regression**: An ensemble method that builds multiple decision trees and averages their predictions. This model reduces overfitting while capturing non-linear relationships and interactions between variables. It’s robust to noise and effective on both small and large datasets.

4. **Gradient Boosting Regression**: An advanced ensemble method that builds trees sequentially, with each tree correcting the errors of the previous one. This powerful approach is especially good for capturing complex patterns in data and is often used in predictive modeling competitions.

5. **Support Vector Regression (SVR)**: A non-linear model that uses kernel tricks to handle complex relationships between features and the target variable. SVR is ideal for smaller datasets where precise decision boundaries are critical.

6. **XGBoost Regression**: A highly efficient and scalable implementation of gradient boosting. Known for its performance and speed, XGBoost is widely used in machine learning competitions and handles large datasets with complex features effectively.

7. **Decision Tree Regression**: A simple, interpretable model that splits data based on feature values. It is suitable for modeling non-linear relationships and interactions between variables but can overfit without proper tuning, hence included with limited depth.

8. **LightGBM Regressor**: A gradient boosting framework that uses a leaf-wise tree growth algorithm, which makes it faster and more memory-efficient than traditional gradient boosting. It’s well-suited for large datasets and provides excellent performance with less training time.

9. **CatBoost Regressor**: A gradient boosting model specifically designed for categorical features, providing automatic handling of categorical variables without explicit encoding. It's fast, robust, and highly effective for structured datasets.

10. **AdaBoost Regressor**: Another ensemble technique that combines multiple weak learners to create a strong model. It assigns higher weights to incorrectly predicted observations in subsequent rounds, helping to improve overall prediction accuracy.

By evaluating this diverse set of models, we aim to identify the one that delivers the best balance between accuracy, interpretability, and computational efficiency for predicting **Percent Bleaching**. We expect different models to perform better depending on the complexity of relationships within the dataset. This approach allows us to compare linear, regularized, non-linear, and ensemble models to ensure we capture both simple and complex patterns in the data.


In [None]:
# Define a set of regression models with relevant hyperparameters
models = {
    # Ridge Regression: L2 regularization to prevent overfitting, with alpha controlling the penalty strength
    'Ridge Regression': Ridge(alpha=1.0),
    
    # ElasticNet Regression: Combination of L1 (Lasso) and L2 (Ridge) regularization with specified alpha and ratio
    'Elastic Net Regression': ElasticNet(alpha=0.01, l1_ratio=0.5),
    
    # Random Forest Regression: Ensemble of decision trees, with controlled depth and regularization to prevent overfitting
    'Random Forest Regression': RandomForestRegressor(
        n_estimators=100,  # Number of trees in the forest
        max_depth=10,  # Maximum depth of the trees
        min_samples_split=10,  # Minimum samples required to split an internal node
        min_samples_leaf=4,  # Minimum samples required at a leaf node
        random_state=42  # Ensure reproducibility
    ),
    
    # Gradient Boosting Regression: Sequential ensemble technique, optimized for reducing prediction error
    'Gradient Boosting Regression': GradientBoostingRegressor(
        n_estimators=100,  # Number of boosting stages
        learning_rate=0.01,  # Step size shrinkage used to prevent overfitting
        max_depth=3,  # Maximum depth of individual regression estimators
        subsample=0.8,  # Fraction of samples used for fitting the individual base learners
        random_state=42  # Ensure reproducibility
    ),
    
    # Support Vector Regression (SVR): Regression technique based on support vector machines, effective in high-dimensional spaces
    'Support Vector Regression (SVR)': SVR(C=1.0, epsilon=0.1),  # C controls trade-off, epsilon defines the margin
    
    # XGBoost Regression: Powerful boosting algorithm, optimized for speed and performance
    'XGBoost Regression': xgb.XGBRegressor(
        objective='reg:squarederror',  # Objective function for regression tasks
        n_estimators=500,  # Number of boosting rounds
        learning_rate=0.01,  # Step size shrinkage
        max_depth=6,  # Maximum tree depth
        random_state=42  # Ensure reproducibility
    ),
    
    # Decision Tree Regression: Simple decision tree, controlled with depth and minimum samples to prevent overfitting
    'Decision Tree Regression': DecisionTreeRegressor(
        max_depth=6,  # Limit the depth to control model complexity
        min_samples_split=10,  # Minimum number of samples required to split
        min_samples_leaf=4,  # Minimum number of samples at a leaf node
        random_state=42  # Ensure reproducibility
    ),
    
    # LightGBM Regressor: Efficient boosting framework, with hyperparameters tuned for performance
    'LightGBM Regressor': LGBMRegressor(
        n_estimators=500,  # Number of boosting rounds
        learning_rate=0.01,  # Learning rate to prevent overfitting
        num_leaves=31,  # Maximum number of leaves per tree
        max_depth=6,  # Maximum depth of trees
        reg_alpha=0.01,  # L1 regularization
        reg_lambda=0.01,  # L2 regularization
        random_state=42,  # Ensure reproducibility
        verbosity=-1  # Suppress LightGBM warnings
    ),
    
    # CatBoost Regressor: Gradient boosting on decision trees with efficient handling of categorical features
    'CatBoost Regressor': CatBoostRegressor(
        verbose=0,  # Suppress training output
        depth=6,  # Maximum depth of trees
        learning_rate=0.01,  # Learning rate for step size shrinkage
        random_state=42  # Ensure reproducibility
    ),
    
    # AdaBoost Regressor: Adaptive boosting that adjusts weights of misclassified instances
    'AdaBoost Regressor': AdaBoostRegressor(
        n_estimators=100,  # Number of boosting stages
        learning_rate=0.01,  # Learning rate for weight updates
        random_state=42  # Ensure reproducibility
    )
}

### <b>5.2 <span style='color:#6495ED'>|</span> Initial Model Training and Validation</b> 

After selecting a set of models for predicting the target variable, the next step is to evaluate their performance using cross-validation. Cross-validation is a robust method for assessing how well models generalize to unseen data. In this section, we use 5-fold cross-validation to evaluate each model across multiple scoring metrics. Here’s an overview of the process:

1. **Performance Metrics**:  
   To evaluate the models, we use a range of metrics:
   - **MSE (Mean Squared Error)**: Measures the average squared difference between predicted and actual values. Lower values indicate better performance.
   - **RMSE (Root Mean Squared Error)**: The square root of MSE, which provides an interpretable metric in the original unit of the target variable.
   - **MAE (Mean Absolute Error)**: Measures the average absolute difference between predicted and actual values, providing another view of prediction accuracy.
   - **R² Score**: Measures the proportion of variance in the target variable that the model explains. Higher values are better, with a value of 1 indicating perfect predictions.

2. **Cross-Validation Process**:  
   The function `cross_validate_model` performs cross-validation for each model, calculating the mean and standard deviation of the scoring metrics. Negative errors (MSE, MAE, MAPE) are negated to convert them back to positive values for interpretation.

3. **Model Evaluation**:  
   The `tqdm` library is used to visually track progress as the cross-validation loop evaluates each model. After completing the evaluations, the results are stored in a DataFrame for easy comparison.

4. **Result Analysis**:  
   The models are ranked based on their **R² Score**, which helps identify the best-performing model. A summary of performance metrics, including MSE, RMSE, MAE, and MAPE, is provided for each model. Finally, the best model is printed along with its associated metrics.

In this process, we identify the model that best fits the training data, while avoiding overfitting by leveraging cross-validation and evaluating multiple performance metrics.

In [None]:
# Initialize a list to store the results
cv_results = []

# Define scoring metrics for cross-validation
scoring = {
    'MSE': 'neg_mean_squared_error',
    'MAE': 'neg_mean_absolute_error',
    'R2': 'r2'
}

# Function to perform cross-validation and collect model performance
def cross_validate_model(name, model, X_train, y_train, cv=5):
    # Perform cross-validation
    scores = cross_validate(model, X_train, y_train, cv=cv, scoring=scoring, n_jobs=-1, return_train_score=False)
    
    # Calculate mean scores (negate the negative errors to make them positive)
    mse = -np.mean(scores['test_MSE'])
    rmse = np.sqrt(mse)
    mae = -np.mean(scores['test_MAE'])
    r2 = np.mean(scores['test_R2'])
    
    # Append the results as a dictionary
    cv_results.append({
        'Model': name,
        'MSE': mse,
        'RMSE': rmse,
        'MAE': mae,
        'R² Score': r2
    })

# Train and evaluate each model with cross-validation
for name, model in tqdm(models.items(), desc="Model Evaluation"):
    cross_validate_model(name, model, X_train, y_train, cv=5)  # Perform 5-fold cross-validation

# Convert the results list to a DataFrame
cv_results_df = pd.DataFrame(cv_results)

# Sort the DataFrame by the R² Score in descending order (higher is better)
cv_results_df.sort_values(by='R² Score', ascending=False, inplace=True)

# Display the cross-validation results as a formatted table using pandas
print("\nCross-Validation Model Performance Summary:\n")
cv_results_df.set_index('Model', inplace=True)
print(cv_results_df.to_markdown())

# Print the best model based on R² Score
best_cv_model = cv_results_df.iloc[0]
print(f"\n\033[1mBest Model: {best_cv_model.name}\033[0m")
print(f"  - R² Score: {best_cv_model['R² Score']:.4f}")
print(f"  - MSE: {best_cv_model['MSE']:.4f}")
print(f"  - RMSE: {best_cv_model['RMSE']:.4f}")
print(f"  - MAE: {best_cv_model['MAE']:.4f}")

The **Random Forest Regression** model has demonstrated the best performance, achieving the highest **R² Score** of **0.7444**, which indicates that 74.44% of the variance in the target variable is explained by the model. Additionally, it has the lowest **MSE** (0.0646) and **MAE** (0.1872), making it the most accurate model in this set.

The **XGBoost Regression** and **LightGBM Regressor** models also performed well, with **R² Scores** of **0.7387** and **0.7270**, respectively. These models are strong contenders, though slightly less accurate than Random Forest.

On the other hand, models like **Support Vector Regression (SVR)** and **Gradient Boosting Regression** underperformed, with **R² Scores** of **0.0355** and **0.4683**, indicating they are less effective in predicting the target variable for this dataset.

Overall, **Random Forest Regression** stands out as the best model for this problem, followed closely by XGBoost and LightGBM, while simpler linear models like **Ridge** and **Elastic Net** show moderate performance. Moving forward, these top-performing models will be considered for further tuning and optimization.

### <b>5.3 <span style='color:#6495ED'>|</span> Model Hyperparameters Tuning</b> 

To enhance the performance of the **Random Forest Regression** model, we will implement two advanced hyperparameter tuning techniques: **Random Search** and **Bayesian Optimization**. These methods allow us to efficiently explore the hyperparameter space to find the optimal configuration that maximizes model performance.

- **Random Search**: Randomly samples hyperparameter values within a defined range, providing a faster alternative to grid search while still exploring a wide range of possibilities.
- **Bayesian Optimization**: Uses a probabilistic model to predict the performance of hyperparameter combinations and focuses on the most promising areas, making it more efficient than random or grid search.

This section will present the implementation and results of these two tuning approaches to optimize the Random Forest model's performance further.

#### <b>5.3.1 <span style='color:#6495ED'>|</span> Random Search for Hyperparameter Tuning</b> 

We begin with **Random Search** for hyperparameter tuning because it is a simple and efficient method to explore a wide range of hyperparameters without the computational cost of exhaustive search methods. Random Search samples random combinations of hyperparameters, making it faster than grid search, especially when the parameter space is large. It provides a good starting point to quickly narrow down promising ranges for further, more refined tuning methods like **Bayesian Optimization**, which will be used later.

**Process:**
1. **n_estimators**: We first search for the optimal number of trees (`n_estimators`). More trees generally improve model performance, but with diminishing returns. Setting a range from 100 to 1500 trees ensures we explore both smaller, faster models and larger, more accurate ones. Random Search helps identify how many trees are sufficient before performance plateaus.

2. **max_depth** and **min_samples_split**: Once the number of trees is chosen, we fine-tune the depth of each tree (`max_depth`) and the minimum number of samples required to split a node (`min_samples_split`). A deeper tree might overfit the data, so we explore different depths, including no limit (`None`). Varying the `min_samples_split` between 2, 5, and 10 helps prevent overly specific splits, ensuring the model generalizes better.

3. **min_samples_leaf** and **max_features**: After determining the best tree structure, we adjust the minimum number of samples at each leaf (`min_samples_leaf`). This helps avoid overfitting small subsets of the data. We also adjust `max_features` to control the number of features considered at each split, allowing us to explore both more complex models (`auto`) and more constrained ones (`sqrt`).

4. **bootstrap**: Lastly, we explore whether using bootstrapped samples improves model performance. Bootstrap sampling is a feature of random forests that allows each tree to train on a slightly different dataset, improving robustness. We search between `True` and `False` to test its effectiveness in this dataset.

***Why these Hyperparameter Ranges?***
- **n_estimators**: Set between 100 and 1500, providing a balance between computational efficiency and model accuracy. While fewer trees may lead to underfitting, more than 1500 often shows diminishing returns.
- **max_depth**: Ranges from 10, 20, 30, to `None` (unlimited). Limiting depth prevents overfitting, while `None` allows the model to learn more detailed relationships if necessary.
- **min_samples_split**: Set between 2, 5, and 10. Higher values ensure nodes don’t split unless there’s sufficient data, preventing overfitting on small samples.
- **min_samples_leaf**: Ranges from 1 to 4, limiting the minimum number of samples at each leaf. This prevents deep trees from being overly fine-tuned to small subsets.
- **max_features**: Set to `auto` (use all features) or `sqrt` (square root of features), ensuring a balance between computational complexity and model diversity at each split.
- **bootstrap**: Explores both `True` (with replacement) and `False` (without replacement), determining whether sampling with replacement improves model generalization.

This combination of hyperparameters is well-suited for Random Forests, allowing us to explore tree-based model flexibility while controlling for overfitting and computational costs.

In [None]:
# Define the Random Forest Regressor
rf = RandomForestRegressor(random_state=42)

# Combined hyperparameter distribution for RandomizedSearchCV
param_dist_combined = {
    'n_estimators': [100, 200, 500, 1000, 1500],           # Number of trees
    'max_depth': [10, 20, 30, None],                       # Maximum depth of trees
    'min_samples_split': [2, 5, 10],                       # Minimum samples required to split
    'min_samples_leaf': [1, 2, 4],                         # Minimum samples at leaf node
    'max_features': ['auto', 'sqrt'],                      # Number of features to consider at each split
    'bootstrap': [True, False]                             # Bootstrap or not
}

# Initialize RandomizedSearchCV to search all hyperparameters together
rf_random_combined = RandomizedSearchCV(estimator=rf,
                                        param_distributions=param_dist_combined,
                                        n_iter=50,  # Number of iterations (trials) for random search
                                        cv=5,       # 5-fold cross-validation
                                        verbose=0,  # No verbosity for cleaner output
                                        random_state=42,
                                        n_jobs=-1)   # Utilize all available cores for parallel computation

# Fit the random search model
rf_random_combined.fit(X_train, y_train)

# Retrieve the best hyperparameters
best_params_combined = rf_random_combined.best_params_
print("Best Hyperparameters from Random Search:")
for param, value in best_params_combined.items():
    print(f"  - {param}: {value}")

# Train the final Random Forest model using the best hyperparameters
best_rf_model_combined = RandomForestRegressor(**best_params_combined, random_state=42)
best_rf_model_combined.fit(X_train, y_train)

# Make predictions on the test set
y_pred_combined = best_rf_model_combined.predict(X_test)

# Evaluate the final model
mse_combined = mean_squared_error(y_test, y_pred_combined)
rmse_combined = np.sqrt(mse_combined)
mae_combined = mean_absolute_error(y_test, y_pred_combined)
r2_combined = r2_score(y_test, y_pred_combined)

# Display performance metrics
print(f"Final model performance on test set:")
print(f"  - MSE: {mse_combined:.4f}")
print(f"  - RMSE: {rmse_combined:.4f}")
print(f"  - MAE: {mae_combined:.4f}")
print(f"  - R² Score: {r2_combined:.4f}")

The **Random Search** method successfully identified an optimal set of hyperparameters for the **Random Forest Regression** model. The best hyperparameters, as listed, include **500 trees**, a **max depth of 30**, and **bootstrap sampling turned off**. Notably, the model also selected the square root of features (`sqrt`) for splitting, indicating that using a subset of features improved the model's generalization.

The model's performance on the test set significantly improved:
- **MSE** (Mean Squared Error) dropped to **0.0513**, a notable reduction from the initial models.
- **RMSE** (Root Mean Squared Error) improved to **0.2265**, showing better accuracy in the model's predictions.
- **MAE** (Mean Absolute Error) decreased to **0.1571**, reflecting more accurate predictions.
- The **R² Score** increased to **0.7966**, indicating that nearly **80%** of the variability in the target variable is explained by this model.

Overall, the performance improvements show that Random Search effectively tuned the hyperparameters, providing a better-performing model compared to the baseline. However, there may still be room for further improvement by refining the search for optimal hyperparameters.

While Random Search has provided a strong starting point, **Bayesian Optimization** offers a more efficient and refined approach to hyperparameter tuning. By using a probabilistic model to guide the search, Bayesian Optimization focuses on exploring promising areas of the hyperparameter space more effectively. In the next section, we will implement Bayesian Optimization to further fine-tune the hyperparameters and aim for even better model performance.


#### <b>5.3.2 <span style='color:#6495ED'>|</span> Bayesian Search for Hyperparameter Tuning</b> 

After using Random Search to explore the hyperparameter space and gain an initial understanding of optimal ranges, we apply Bayesian Optimization to further refine the hyperparameters and achieve better performance.

**Why Bayesian Optimization?**
Bayesian Optimization is a more sophisticated search method compared to Random Search. Instead of sampling hyperparameters randomly, it builds a probabilistic model of the objective function (in this case, the model performance) and uses this model to guide the search towards regions of the hyperparameter space that are more likely to yield improvements. This makes it more efficient and likely to find better hyperparameters in fewer iterations, especially when dealing with complex models like Random Forest.

**Process:**
- We define a search space for key hyperparameters such as `n_estimators`, `max_depth`, `min_samples_split`, `min_samples_leaf`, `max_features`, and `bootstrap`.
- BayesSearchCV is used to optimize the model’s performance over 32 iterations, utilizing a 5-fold cross-validation to evaluate different hyperparameter combinations.
- The search automatically updates the search strategy based on previously evaluated results, making it more efficient than random or grid search methods.
- After fitting, the best hyperparameters are identified and used to evaluate the model on the test set.

**Key Benefits of Bayesian Optimization:**
- More efficient than Random Search as it focuses on the most promising regions of the hyperparameter space.
- Ideal for high-dimensional hyperparameter tuning, allowing the exploration of complex parameter interactions.
- Provides a more refined final model that often yields better performance metrics.

In [None]:
# Define the search space for hyperparameters using Bayesian optimization
search_space = {
    'n_estimators': Integer(100, 1500),  # Number of trees
    'max_depth': Integer(5, 30),  # Maximum depth of trees
    'min_samples_split': Integer(2, 10),  # Minimum number of samples required to split
    'min_samples_leaf': Integer(1, 5),  # Minimum samples at leaf node
    'max_features': ['sqrt', 'log2', None],  # Corrected: Number of features to consider for the best split
    'bootstrap': [True, False]  # Whether bootstrap samples are used
}


# Bayesian optimization with cross-validation
opt = BayesSearchCV(
    estimator=rf,
    search_spaces=search_space,
    n_iter=32,  # Number of iterations to try
    cv=5,  # 5-fold cross-validation
    n_jobs=-1,  # Parallel processing
    random_state=42,  # Reproducibility
    verbose=0  # Show progress
)

# Fit the model with Bayesian optimization
opt.fit(X_train, y_train)

# Best parameters found by Bayesian optimization
print("Best parameters found by Bayesian optimization:")
print(opt.best_params_)

# Evaluate the best model on the test set
best_rf_model = opt.best_estimator_
y_pred = best_rf_model.predict(X_test)

# Evaluate performance on the test set
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Test set performance of the best model after Bayesian optimization:")
print(f"  - MSE: {mse:.4f}")
print(f"  - RMSE: {rmse:.4f}")
print(f"  - MAE: {mae:.4f}")
print(f"  - R^2 Score: {r2:.4f}")

After applying both **Random Search** and **Bayesian Search** for hyperparameter tuning, we can observe the differences in model performance. Below is a comparison of the key metrics:

<div style="border-radius:10px;padding: 15px;background-color:#ffffff00;font-size:100%;text-align:left">
    <table style="border-collapse: collapse; width: 100%;">
        <thead>
            <tr style="background-color: #6495ED;">
                <th style="border: 1px solid white; padding: 10px; text-align: left; color: white;">Metric</th>
                <th style="border: 1px solid white; padding: 10px; text-align: left; color: white;">Initial Model</th>
                <th style="border: 1px solid white; padding: 10px; text-align: left; color: white;">Random Search</th>
                <th style="border: 1px solid white; padding: 10px; text-align: left; color: white;">Bayesian Search</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td style="border: 1px solid #6495ED; padding: 10px;">MSE</td>
                <td style="border: 1px solid #6495ED; padding: 10px;">0.0646</td>
                <td style="border: 1px solid #6495ED; padding: 10px;">0.0513</td>
                <td style="border: 1px solid #6495ED; padding: 10px;">0.0503</td>
            </tr>
            <tr>
                <td style="border: 1px solid #6495ED; padding: 10px;">RMSE</td>
                <td style="border: 1px solid #6495ED; padding: 10px;">0.2541</td>
                <td style="border: 1px solid #6495ED; padding: 10px;">0.2265</td>
                <td style="border: 1px solid #6495ED; padding: 10px;">0.2242</td>
            </tr>
            <tr>
                <td style="border: 1px solid #6495ED; padding: 10px;">MAE</td>
                <td style="border: 1px solid #6495ED; padding: 10px;">0.1872</td>
                <td style="border: 1px solid #6495ED; padding: 10px;">0.1571</td>
                <td style="border: 1px solid #6495ED; padding: 10px;">0.1483</td>
            </tr>
            <tr>
                <td style="border: 1px solid #6495ED; padding: 10px;">R² Score</td>
                <td style="border: 1px solid #6495ED; padding: 10px;">0.7444</td>
                <td style="border: 1px solid #6495ED; padding: 10px;">0.7966</td>
                <td style="border: 1px solid #6495ED; padding: 10px;">0.8007</td>
            </tr>
        </tbody>
    </table>
</div>

**Comparisons**:
1. **MSE (Mean Squared Error)**:
   - **Initial Model**: The base model starts with an MSE of **0.0646**.
   - **Random Search**: Random Search tuning significantly reduces the MSE to **0.0513**, indicating that the tuned model makes fewer errors on average.
   - **Bayesian Search**: Further refinement with Bayesian Search brings the MSE down to **0.0503**, showcasing a slight improvement over Random Search. This suggests Bayesian Optimization finds more optimal hyperparameters, though the gain is incremental.

2. **RMSE (Root Mean Squared Error)**:
   - **Initial Model**: The initial model has an RMSE of **0.2541**.
   - **Random Search**: With Random Search, the RMSE drops to **0.2265**, showing an improvement in the model’s accuracy by lowering the prediction error.
   - **Bayesian Search**: Bayesian Search continues this trend, reducing the RMSE further to **0.2242**, highlighting marginal gains in error reduction.

3. **MAE (Mean Absolute Error)**:
   - **Initial Model**: The MAE starts at **0.1872** in the initial model.
   - **Random Search**: After Random Search, the MAE decreases to **0.1571**, meaning the average prediction error has become notably smaller.
   - **Bayesian Search**: The best improvement is seen with Bayesian Search, achieving an MAE of **0.1483**, reflecting a more significant reduction in average absolute errors, which indicates better model accuracy and reliability.

4. **R² Score**:
   - **Initial Model**: The R² Score is **0.7444**, indicating that the initial model explains about 74.44% of the variance in the target variable.
   - **Random Search**: With hyperparameter tuning via Random Search, the R² Score increases to **0.7966**, showing the model can now explain almost 80% of the variance.
   - **Bayesian Search**: The best result comes from Bayesian Search, with an R² Score of **0.8007**, indicating that the model now explains a little over 80% of the variance. This represents a substantial improvement in the model’s overall explanatory power.

**Analysis**:
- Both Random Search and Bayesian Search improved model performance, but Bayesian Search, as expected, yielded better results overall, albeit with smaller incremental gains compared to Random Search.
- **Bayesian Search** refines the model further after Random Search has identified promising hyperparameter ranges, allowing for more fine-tuning and optimization.
- The largest improvement is seen in **MAE**, where Bayesian Search reduces the average prediction error more effectively than Random Search.
- **MSE** and **RMSE** show consistent improvements with both search methods, reflecting better model accuracy and a reduction in prediction variance.
- **R² Score** shows that both Random Search and Bayesian Search significantly improve the model’s ability to explain the variance in coral bleaching outcomes, with Bayesian Search crossing the 80% threshold.

**Conclusion**:
While Random Search offers a quick and substantial improvement over the initial model, **Bayesian Search** fine-tunes the hyperparameters to maximize performance, resulting in a more accurate and reliable model. Thus, Bayesian Search should be preferred for final model selection, as it yields the best overall performance across all key metrics.


### <b>5.4 <span style='color:#6495ED'>|</span> Final Model Selection</b> 

The model training process has been successfully completed with significant improvements in performance after hyperparameter tuning using both **Random Search** and **Bayesian Optimization**.

- **Random Search** helped identify a set of optimal hyperparameters for the **Random Forest Regressor**, improving the model's accuracy and reducing error rates.
- **Bayesian Optimization** further refined these hyperparameters, resulting in marginally better performance, particularly in reducing the Mean Absolute Error (MAE) and increasing the R² Score.

**Key Observations:**
- **Initial Model Performance** was relatively strong but showed potential for improvement, particularly in reducing error metrics like **MSE** and **RMSE**.
- **Random Search Tuning** led to a notable improvement in performance, with lower error rates and a higher R² Score.
- **Bayesian Optimization** fine-tuned the hyperparameters further, achieving the best overall performance across all key metrics.

**Final Model Selection:**

The final model selected is the Random Forest Regressor with the following optimal hyperparameters:
- `bootstrap`: False
- `max_depth`: 24
- `max_features`: `log2`
- `min_samples_leaf`: 1
- `min_samples_split`: 2
- `n_estimators`: 1500

The following sections will focus on selecting the best model based on these results and evaluating it using the test set for final performance assessment.

***Performance Comparison Before and After Tuning***

To better visualize the improvements achieved through hyperparameter tuning, the following plots compare the performance of the **Random Forest Regressor** before tuning, after **Random Search**, and after **Bayesian Search** across key metrics such as **MSE**, **RMSE**, **MAE**, and **R² Score**.

In [None]:
# Performance data
models = ['Initial Model', 'Random Search', 'Bayesian Search']
mse_values = [0.0646, 0.0513, 0.0503]
rmse_values = [0.2541, 0.2265, 0.2242]
mae_values = [0.1872, 0.1571, 0.1483]
r2_values = [0.7444, 0.7966, 0.8007]

# Plot MSE, RMSE, MAE, R² in one row
fig, ax = plt.subplots(1, 4, figsize=(20, 5))

# MSE Plot
ax[0].bar(models, mse_values, color=['#6495ED', '#a56eff', '#fa4d56'])
ax[0].set_title('MSE Comparison')
ax[0].set_ylabel('MSE')

# RMSE Plot
ax[1].bar(models, rmse_values, color=['#6495ED', '#a56eff', '#fa4d56'])
ax[1].set_title('RMSE Comparison')
ax[1].set_ylabel('RMSE')

# MAE Plot
ax[2].bar(models, mae_values, color=['#6495ED', '#a56eff', '#fa4d56'])
ax[2].set_title('MAE Comparison')
ax[2].set_ylabel('MAE')

# R² Score Plot
ax[3].bar(models, r2_values, color=['#6495ED', '#a56eff', '#fa4d56'])
ax[3].set_title('R² Score Comparison')
ax[3].set_ylabel('R² Score')

# Display plots
plt.tight_layout()
plt.show()

The plots above clearly demonstrate the improvements achieved through hyperparameter tuning:

- **MSE** and **RMSE** both show a significant reduction after Random Search and a further improvement after Bayesian Search, reflecting better accuracy and lower prediction variance.
- **MAE** shows the most significant improvement after tuning, with Bayesian Search resulting in the lowest mean absolute error, indicating the highest prediction accuracy.
- **R² Score** increased substantially after Random Search and further improved after Bayesian Optimization, with the final model explaining over **80%** of the variance in coral bleaching predictions.

These results confirm that **Bayesian Optimization** provided the best hyperparameter set for the **Random Forest Regressor**, resulting in the most accurate model. Based on these outcomes, this tuned model will be selected as the final model for further evaluation on the test set.

***Actual vs Predicted Values Before and After Tuning***

The scatter plots below visualize the performance of the model by comparing the actual values of coral bleaching severity with the predicted values. These plots illustrate how well the model fits the data before and after hyperparameter tuning. Ideally, a perfect model would have all points lying on the 45-degree diagonal line, where the predicted values equal the actual values.

- **Before Tuning:** Scatter plot showing the predictions from the initial model.
- **After Bayesian Search:** Scatter plot showing the predictions from the best-tuned model.

In [None]:
# Scatter plot of actual vs predicted values before and after tuning

# Predictions from the initial model (Random Forest before tuning)
initial_rf_model = RandomForestRegressor(n_estimators=100, max_depth=10, random_state=42)
initial_rf_model.fit(X_train, y_train)
initial_pred = initial_rf_model.predict(X_test)

# Predictions from the best model after Bayesian Search
best_rf_model = opt.best_estimator_  # Best model from Bayesian Search
best_rf_model.fit(X_train, y_train)
best_pred = best_rf_model.predict(X_test)

# Plot the actual vs predicted values for both models
fig, ax = plt.subplots(1, 2, figsize=(12, 6))

# Scatter plot for the initial model
ax[0].scatter(y_test, initial_pred, color='#1192e8', edgecolor=None, alpha=0.3)
ax[0].plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], 'r--')  # Diagonal line
ax[0].set_title('Initial Model: Actual vs Predicted')
ax[0].set_xlabel('Actual Values')
ax[0].set_ylabel('Predicted Values')
ax[0].grid(True)

# Scatter plot for the best model after Bayesian Search
ax[1].scatter(y_test, best_pred, color='#a56eff', edgecolor=None, alpha=0.3)
ax[1].plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], 'r--')  # Diagonal line
ax[1].set_title('After Bayesian Search: Actual vs Predicted')
ax[1].set_xlabel('Actual Values')
ax[1].set_ylabel('Predicted Values')
ax[1].grid(True)

plt.tight_layout()
plt.show()

The scatter plots demonstrate the improvement in model performance after tuning:

- **Before Tuning (Initial Model)**: The scatter plot shows more dispersion around the diagonal line, indicating that the initial model's predictions had larger deviations from the actual values, particularly for higher values of bleaching severity.
  
- **After Bayesian Search (Tuned Model)**: After hyperparameter tuning with Bayesian Optimization, the scatter plot shows that the predictions are more tightly clustered around the diagonal line, reflecting improved accuracy and reduced prediction errors.

This visual comparison confirms the significant improvements in the model's ability to predict coral bleaching severity after tuning, particularly in reducing overfitting and improving prediction accuracy for extreme values.

<hr>

## <div style="padding: 20px;color:white;margin:10;font-size:90%;text-align:left;display:fill;border-radius:10px;background-color: rgba(0, 0, 0, 0.2);overflow: hidden; background-image: url(https://i.ibb.co/1fn8BZj/image4.webp); background-size: cover; background-position: center; background-blend-mode: darken;"><b><span style='color:white'> 6 | Findings Summary and Conclusion</span></b> </div>

### <b>6.1 <span style='color:#6495ED'>|</span> Summary of Data Exploration</b> 

The initial **Exploratory Data Analysis (EDA)** provided key insights into the structure and distribution of the dataset:
- Irrelevant columns were removed, and missing values were addressed to ensure data integrity.
- Key environmental variables such as **Sea Surface Temperature (SST)**, **Temperature Anomalies (SSTA, TSA)**, and coral bleaching percentages showed significant correlations.
- **Cyclone frequency**, **windspeed**, and **proximity to shore** emerged as moderately influential stressors, while **depth** and **turbidity** demonstrated non-linear relationships with bleaching severity.
- **SSTA DHW (Degree Heating Weeks)** was identified as a key predictor of coral bleaching, capturing the impact of prolonged thermal stress on coral health.

These insights guided the **feature engineering** process, focusing on environmental, geographical, and physical stressor features.

### <b>6.2 <span style='color:#6495ED'>|</span> Summary of Data Preprocessing</b> 

Key preprocessing steps were applied to optimize the dataset for model building:
- **Feature Engineering**: New features like **Degree Heating Weeks (DHW)**, **SSTA/TSA Frequency**, and interaction terms between environmental variables enhanced predictive power.
- **Encoding**: Categorical variables such as **Ocean Name** and **Bleaching Level** were numerically encoded.
- **Scaling and Normalization**: Numerical features were standardized using **Power Transformer** and **RobustScaler** to ensure consistency.
- **Clustering**: **K-Means** clustering was applied to geographic features to account for regional differences in bleaching patterns.

This preprocessing ensured the dataset was well-structured for model training.

### <b>6.3 <span style='color:#6495ED'>|</span> Summary of Model Training and Tuning</b> 

Multiple machine learning models were trained and cross-validated:
- The **Random Forest Regressor** outperformed other models with an initial **R² Score** of 0.7444.
- **Random Search** hyperparameter tuning significantly improved performance, raising the **R² Score** to 0.7966.
- **Bayesian Optimization** fine-tuned the model, achieving the best **R² Score** of 0.8007.

The final model was the **Random Forest Regressor** with the following hyperparameters:
```python
OrderedDict({
    'bootstrap': False,
    'max_depth': 24,
    'max_features': 'log2',
    'min_samples_leaf': 1,
    'min_samples_split': 2,
    'n_estimators': 1500
})
```

### <b>6.4 <span style='color:#6495ED'>|</span> Disadvantages and Future Work</b> 

While the model and approach have yielded promising results, there are several limitations and opportunities for future work:
- **Limited Temporal Data**: The absence of detailed time-series data prevents the model from capturing temporal trends and changes over time. Future work should incorporate temporal data to model the evolving impact of environmental stressors.
- **Geographical Generalization**: While geographic features were clustered, the model may struggle with generalizing across distinct regions. Further work could explore regional models or incorporate more granular geographic data to improve regional accuracy.
- **Handling of Complex Interactions**: The interactions between environmental variables like cyclone frequency, windspeed, and SST were modeled, but more sophisticated techniques (e.g., neural networks) could better capture non-linear relationships between variables.
- **Inclusion of Additional Environmental Variables**: Future models could benefit from additional predictors like water quality, nutrient levels, and fishing activities, which may also influence coral health but were not included in this dataset.
- **Imbalanced Data**: Some regions have more extensive coral bleaching data than others, which may affect the generalizability of the model. Future work could explore techniques such as **resampling** or **ensemble methods** to address data imbalance across different regions and environmental conditions.
- **Model Interpretability**: While **Random Forest** provides strong performance, its **black-box nature** limits interpretability. Future efforts could integrate **explainable AI (XAI)** techniques, such as **SHAP** or **LIME**, to provide better insights into how specific features drive predictions.
- **Climate Change Projections**: Incorporating **climate change projection data** can enhance the model’s utility in predicting long-term coral reef health and help stakeholders implement proactive conservation strategies.

By addressing these limitations, the model can evolve into a more robust and versatile tool for predicting and mitigating coral bleaching under various environmental stressors.

### <b>6.5 <span style='color:#6495ED'>|</span> Conclusion</b> 

The model successfully identified key environmental drivers of coral bleaching, highlighting the role of **SST**, **thermal anomalies**, and **prolonged heat stress**. Through hyperparameter tuning, the **Random Forest Regressor** emerged as the most effective model, with significant predictive power. Future work will focus on temporal modeling, regional refinement, and incorporating additional stressors to improve model accuracy and support conservation efforts.


<img src="https://i.ibb.co/LxRGZkz/image7.webp" alt="Notebook Cover Image" style="width:100%; height:auto; border-radius:15px; object-fit:cover; aspect-ratio: 3/1;">

<hr>

## <div style="padding: 20px;color:white;margin:10;font-size:90%;text-align:left;display:fill;border-radius:10px;background-color: rgba(0, 0, 0, 0.2);overflow: hidden; background-image: url(https://i.ibb.co/1fn8BZj/image4.webp); background-size: cover; background-position: center; background-blend-mode: darken;"><b><span style='color:white'> 7 | References</span></b> </div>

Baker, A. C., Glynn, P. W., & Riegl, B. (2008). Climate change and coral reef bleaching: An ecological assessment of long-term impacts, recovery trends, and future outlook. *Estuarine, Coastal and Shelf Science*, 80(4), 435-471. https://doi.org/10.1016/j.ecss.2008.09.003

Brown, B. E. (1997). Coral bleaching: Causes and consequences. *Coral Reefs*, 16(1), S129-S138. https://doi.org/10.1007/s003380050249

Eakin, C. M., Liu, G., Gomez, A. M., De La Cour, J. L., Heron, S. F., Skirving, W. J., ... & Wang, L. (2016). Global coral bleaching 2014–2017: Status and an appeal for observations. *Reef Encounter*, 31(1), 20-26.

Hoegh-Guldberg, O., Poloczanska, E. S., Skirving, W., & Dove, S. (2017). Coral reef ecosystems under climate change and ocean acidification. *Frontiers in Marine Science*, 4, 158. https://doi.org/10.3389/fmars.2017.00158

Hughes, T. P., Kerry, J. T., Álvarez-Noriega, M., Álvarez-Romero, J. G., Anderson, K. D., Baird, A. H., ... & MacNeil, M. A. (2017). Global warming and recurrent mass bleaching of corals. *Nature*, 543(7645), 373-377. https://doi.org/10.1038/nature21707

Loya, Y., Sakai, K., Yamazato, K., Nakano, Y., Sambali, H., & van Woesik, R. (2001). Coral bleaching: The winners and the losers. *Ecology Letters*, 4(2), 122-131. https://doi.org/10.1046/j.1461-0248.2001.00203.x

Pandolfi, J. M., Connolly, S. R., Marshall, D. J., & Cohen, A. L. (2011). Projecting coral reef futures under global warming and ocean acidification. *Science*, 333(6041), 418-422. https://doi.org/10.1126/science.1204794

Spalding, M., Ravilious, C., & Green, E. P. (2001). *World Atlas of Coral Reefs*. University of California Press.

Aggarwal, S., & Kumar, N. (2021). Transportation system applications. In S. Aggarwal, N. Kumar, & P. Raj (Eds.), *Advances in computers* (Vol. 121, pp. 431-454). Elsevier. https://doi.org/10.1016/bs.adcom.2020.08.022

TIME. (2016, September 17). What is coral bleaching? | TIME [Video]. YouTube. https://www.youtube.com/watch?v=fA6mpexcyN4

Lakshminarayan, K., Harp, S., & Samad, T. (1999). Imputation of missing data in industrial databases. *Applied Intelligence*, 11(3), 259–275.