# The Time Machine, identify drought events

Hello there! 👋 We're excited to take you on an insightful journey through our interactive notebook designed to explore drought events across the globe over time. Our main focus will be on understanding how droughts have varied both geographically and temporally.

To reach this goal we need to recognize and analyze drought occurrences worldwide from the year 1940 up to the present. We will do this by examining the Standardized Precipitation Evapotranspiration Index (SPEI) values, which help us understand moisture deficit better.

📌 Since each of you has different needs and interests, we've made the notebook as interactive as possible. We'll guide you through the cells, but you'll have the freedom to choose what to focus on: the geographical area, the type of index aggregation, and the time period.

### What is SPEI?

The SPEI is a powerful index used by scientists to determine drought conditions. It considers both precipitation and evapotranspiration (the sum of evaporation and plant transpiration from the Earth's surface to atmosphere) to give a standardized measure of moisture adequacy in different regions and times. You can find more info on the [dedicate page of our handbook](https://ecmwfcode4earth.github.io/tales-of-drought/chapters/02-drought-focus/indices.html).

### Data Source
The data we will use comes from ERA5, one of the most comprehensive atmospheric data services available. Specifically, we are working with 'nc' files, which are a type of data file used for storing complex scientific data in a format that can be accessed and processed efficiently (to get deeper see the [dedicate page](https://ecmwfcode4earth.github.io/tales-of-drought/chapters/01-climate-toolkit-for-beginners/netCDF-file.html)).

Let's dive in and start our exploration to better understand the patterns and impacts of droughts around the world! 🌍

## What we will do

In this notebook, we will explore drought events from various points of view:
1. By selecting a geographic area and a month of the year, we will observe the evolution of drought conditions from 1940 to the present using a slider of maps.
2. We will study the same evolution using a scatterplot made with median and mean values for that area.
2. We will delve deeper into the details with a boxplot and a standard deviation bar chart for the same month and area.
3. We will change the time dimension by looking at the evolution of the median values for a certain year across the twelve months.
4. Finally, we will see the evolution over a range of years of our choice via a stripe chart, split by years and months.

## Setting Up the Environment
Before we dive into the data analysis, we need to ensure our notebook has all the necessary tools and libraries. This step involves installing various Python packages that will help us manipulate data, create visualizations, and interact with our notebook more effectively.

### What do you need to install?

The installation of these packages and extensions may vary depending on the computing environment you are using. Whether it's Jupyter Notebook, JupyterLab, Google Colab, or another tool like Binder, some steps or commands might be different or unnecessary. Always tailor these installation steps to suit the specific requirements of your chosen platform to ensure smooth operation. To have more details visit [our page](https://ecmwfcode4earth.github.io/tales-of-drought/chapters/04-set-up-env/index.html).


Here the list of the packages our notebook needs to works:

**Numerical and Data Handling Libraries:**
- `numpy`: for numerical computations.
- `pandas`: for data manipulation and analysis.
- `xarray`: for working with multi-dimensional arrays of data.
- `netCDF4`: for handling and accessing data stored in .nc files (like our ERA5 data).
- `dask`: enhances speed and scalability in data processing, useful for large datasets.

**Visualization Libraries:**

- `matplotlib` and `plotly`: for creating static and interactive graphs respectively.
- `folium`: for making interactive maps.
- `kaleido`: for exporting Plotly figures to static images (like PNG).


**Enhancing Interactivity:**

- `ipywidgets`: allows us to create interactive elements in the notebook (like sliders, dropdown menus and buttons).
- `widgetsnbextension` and `@jupyter-widgets/jupyterlab-manager`: These are necessary for enabling and managing IPython widgets in JupyterLab.


**Installing and Enabling Extensions**
We use `!pip install` commands to download and install these packages from the Python Package Index (PyPI). The ! at the beginning of each command tells our Jupyter Notebook to execute these as shell commands.

In [None]:
!pip install numpy 
!pip install xarray 
!pip install netCDF4 
!pip install "dask[complete]"
!pip install folium
!pip install matplotlib 
!pip install plotly
!pip install -U kaleido
!pip install ipywidgets

After installing the packages, we use:

- `!jupyter nbextension enable --py widgetsnbextension` command to enable IPython widget extensions in the notebook.  
- if you are using JupyterLab environment to host and run your notebook, the command `!jupyter labextension install @jupyter-widgets/jupyterlab-manager` installs a lab extension necessary for managing widgets in JupyterLab.

By executing these commands, we are setting up a robust environment tailored for analyzing and visualizing our data. Now our toolkit is ready so we can proceed without any hitches!

In [None]:
!jupyter nbextension enable --py widgetsnbextension
!jupyter labextension install @jupyter-widgets/jupyterlab-manager  # only for JupyterLab environment

Now we need to import necessary Python libraries and modules that we'll use throughout our analysis:

- `ipywidgets` and `IPython.display`: to create interactive elements (like dropdowns) and display outputs within the notebook.
- `functools.partial`: used to create partial functions: we can fix a certain number of arguments of a function and generate a new function.
- `datetime`: for handling dates and times.
- `warnings`: This module is used to control the display of warnings. 

`warnings.filterwarnings("ignore", category=RuntimeWarning)` tells Python to ignore specific runtime warnings that might not be critical to halt our analysis, making the notebook output cleaner and focusing on essential messages.

You may notice that while we explicitly install some packages using `pip`, others are imported directly without a corresponding installation command. This is beacuse they are standard library, which comes bundled with Python (as `datetime`), or pre-installed with Jupyter environments  (as `ipywidgets`).

We need to import also 4 custom modules from the `utils` folder: `widgets_handler`, `coordinates_retriver`, `data_preprocess` and `charts`. These modules contain custom functions tailored to handle widgets, retrieve coordinates, preprocess data, and create charts.

In [None]:
from ipywidgets import Layout, Dropdown, widgets
from IPython.display import display, clear_output, IFrame
from functools import partial
import datetime
import numpy as np
import utils.widgets_handler as widgets_handler
import utils.coordinates_retriver as coordinates_retriver
import utils.data_preprocess as data_preprocess
import utils.charts as charts
import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning)

This cell sets up the initial state and interface for user interactions concerning drought data selection based on geographical and temporal parameters:
- The variables `country_list`, `months` and `timescales` are initialized by loading data from JSON files using functions from the `widgets_handler` module. They contain, respectively, a list of worldwide countries and their first- and second-level administrative subareas, the 12 months, and the specific periods over which the SPEI values are calculated (e.g., 1 month, 3 months, etc.).
- The `subset_area`, `bounding_box` and `active_btn` variables are initialized to hold the state of user selections and actions.
- The `selected` dictionary is designed to hold the current selections of various parameters like country, administrative subareas, timescale, month, and year.
- `placeholders` provides placeholder text for each dropdown or interactive widget when no selection is made.
- `widgets_handler.save_selection(placeholders)` function call saves the initialized placeholder values into a `selection.json` file to keep trace of the user selections.

In [None]:
country_list = widgets_handler.read_json_to_sorted_dict('countries.json')
months = widgets_handler.read_json_to_dict('months.json')
timescales = widgets_handler.read_json_to_dict('timescales.json')
subset_area = None
bounding_box = (None, None, None, None)
active_btn = None

selected = {
    "country": None,
    "adm1_subarea": None,
    "adm2_subarea": None,
    "timescale": None,
    "month": None,
    "year": None,
    "year_range": None
}

placeholders = {
    "country": "no country selected...",
    "adm1_subarea": "no adm1 subarea selected...",
    "adm2_subarea": "no adm2 subarea selected...",
    "timescale": "no timescale selected...",
    "month": "no month selected...",
    "year": "no year selected..."
}
widgets_handler.save_selection(placeholders)

The next cell sets up and configures the user interface to ensure interactivity.  
We have dropdown widgets to select the country (or its subareas), the period (month, year, or a range of years), and the SPEI index timescale.  
The options in these dropdowns are dynamically populated from previously loaded JSON files or generated lists (such as the list of years from 1940 to the current year).  
A `selectors` dictionary organizes all the selector widgets for efficient access and management in the code.  
Separate 'Get data' buttons are configured for different types of data retrieval based on the selections made via the dropdown menus.  
An `output_area` widget is included to display results or messages dynamically based on the user’s selections and interactions with the buttons.  

In [None]:
# Custom style and layout for descriptions and dropdowns
style = {'description_width': '150px'}
dropdown_layout = Layout(width='400px', display='flex', justify_content='flex-end')
range_layout = Layout(width='400px')
btn_layout = Layout(width='400px')


# Dropdown for countries
country_names = [country['name'] for country in country_list]
country_selector = widgets.Dropdown(
    options=[placeholders['country']] + country_names,
    description='Select a country:',
    style=style,
    layout=dropdown_layout
)

# Dropdown for subareas, initially empty
adm1_subarea_selector = widgets.Dropdown(
    options=[placeholders['adm1_subarea']],
    description='a subarea of first level:',
    style=style,
    layout=dropdown_layout
)

adm2_subarea_selector = widgets.Dropdown(
    options=[placeholders['adm2_subarea']],
    description='or of second level:',
    style=style,
    layout=dropdown_layout
)

# Dropdown for timescales
timescale_selector = widgets.Dropdown(
    options=[placeholders['timescale']] + list(timescales.keys()),
    description='Select a timescale:',
    style=style,
    layout=dropdown_layout
)

# Dropdown for months
month_selector = widgets.Dropdown(
    options=[placeholders['month']] + list(months.keys()),
    description='Select a month:',
    style=style,
    layout=dropdown_layout
)

# Dropdown for years
current_year = datetime.datetime.now().year
years_options = [str(year) for year in range(1940, current_year + 1)]

year_selector = widgets.Dropdown(
    options=[placeholders['year']] + years_options,
    description='Select a year:',
    disabled=False,
    style=style,
    layout=dropdown_layout
)

# SelectionRangeSlider for years
year_range_selector = widgets.SelectionRangeSlider(
    options=years_options,
    index=(len(years_options) - 1, len(years_options) - 1),  # Start and end at the last
    description='Select the year range:',
    disabled=False,
    style=style,
    layout=range_layout
)

selectors = {
    "country" : country_selector,
    "adm1_subarea": adm1_subarea_selector,
    "adm2_subarea": adm2_subarea_selector,
    "timescale": timescale_selector,
    "month": month_selector,
    "year": year_selector,
    "year_range": year_range_selector    
}


month_widgets_btn = widgets.Button(
    description='Get data',
    disabled=False,
    button_style='info', # 'success', 'info', 'warning', 'danger' or ''
    tooltip='Click me',
    icon='filter', # (FontAwesome names without the `fa-` prefix)month
    layout=btn_layout
)
month_widgets_btn.custom_name='month_widgets_btn'


year_widgets_btn = widgets.Button(
    description='Get data',
    disabled=False,
    button_style='info', # 'success', 'info', 'warning', 'danger' or ''
    tooltip='Click me',
    icon='filter', # (FontAwesome names without the `fa-` prefix)
    layout=btn_layout
)
year_widgets_btn.custom_name='year_widgets_btn'

year_range_widgets_btn = widgets.Button(
    description='Get data',
    disabled=False,
    button_style='info', # 'success', 'info', 'warning', 'danger' or ''
    tooltip='Click me',
    icon='filter', # (FontAwesome names without the `fa-` prefix)
    layout=btn_layout
)
year_range_widgets_btn.custom_name='year_range_widgets_btn'

# Output area for display updates
output_area = widgets.Output()

The functions in the next cell handle user input, process data based on those inputs, and update the notebook interface accordingly. Here’s a summary of the key components:

- `setup_observers` function: sets up event listeners (observers) for UI widgets, specifically for the country selector dropdown. This function ensures efficient setup by setting the observers only once. When the value of the country selector changes, it triggers a function to update related subarea dropdowns based on the selected country. It uses a custom attribute to prevent multiple instances of observer setup.
- `update_and_get_data` function: handles data retrieval and UI updates based on user interactions, such as button clicks. It processes the selections, validates them, retrieves the relevant data, and updates the output area with the results and a map display. It ientifies which button was pressed and updates month/year selections accordingly; validates selections and, if valid, clears the output, retrieves geographic boundaries, and fetches data based on these; displays the fetched data and a map centered on the selected region.
- `on_button_clicked` function: acts as a trigger for button clicks, calling update_and_get_data with the appropriate button identifier.

The observers (event listeners) are set up via `setup_observers()` at the end of the cell to ensure all widgets are ready to handle user input as soon as the notebook is run.

In [None]:
def setup_observers():
    """
    Sets up observers for UI widgets to handle interactions and updates dynamically in a graphical user interface.
    This function ensures that observers are only set once using a function attribute to track whether observers have
    already been established, enhancing efficiency and preventing multiple bindings to the same event.

    Observer is attached to widgets for country selection. This observer triggers specific functions when the 'value' property 
    of the widgets changes, facilitating responsive updates to the user interface
    based on user interactions.

    Notes:
    - This function uses a custom attribute `observers_set` on itself to ensure observers are set only once.
    """
    if not hasattr(setup_observers, 'observers_set'):      
            # When 'value' changes, update_subareas function will be called to update the dropdown menus
            # Create a partial function that includes the additional parameters
            country_selector.observe(partial(widgets_handler.update_subareas, 
                                         country_list=country_list, 
                                         placeholders=placeholders,
                                         adm1_subarea_selector=adm1_subarea_selector, 
                                         adm2_subarea_selector=adm2_subarea_selector), 'value')
            # Set a flag to indicate observers are set
            setup_observers.observers_set = True


            

def update_and_get_data(btn_name):
    """
    Update and retrieve data based on user interactions and selections.

    This function handles user interactions, validates selections, calculates geographic bounding boxes,
    fetches the corresponding data subset, and updates the output area with relevant information and a map display.

    Parameters:
    btn_name (str): The name of the button that triggered the interaction.

    Global Variables:
    selected (dict): Dictionary containing current selections for various parameters.
    placeholders (dict): Dictionary of placeholder values.
    output_area (OutputArea): The output area widget to display messages and results.
    subset_data (xarray.DataArray): Subset of data fetched based on the bounding box.
    index (str): Index for the subset data, constructed from timescale value.
    bounding_box (tuple): Bounding box coordinates (min_lon, min_lat, max_lon, max_lat) for the selected area.
    active_btn (str): The name of the currently active button.

    Steps:
    1. Set the active button name.
    2. Update the month and year selections based on the button interaction.
    3. Validate the current selections.
    4. If selections are valid:
       a. Clear the output area.
       b. Retrieve the geographic boundaries for the selected area.
       c. Calculate the bounding box for the selected area.
       d. Fetch the data subset based on the bounding box.
       e. Determine the administrative level, selected area name, timescale, and time period.
       f. Print information about the uploaded subset data.
       g. Display the map with the bounding box and appropriate zoom level.


    Notes:
    - The function assumes the existence of utility functions within the 'uti' module for handling interactions, validations, 
      data fetching, and map display.
    - The global variables should be properly initialized before calling this function.
    """
    global selected, placeholders, output_area, subset_data, index, bounding_box, active_btn
    map_display = None
    active_btn = btn_name
    widgets_handler.month_year_interaction(btn_name, month_selector, year_selector, selected, placeholders)
    if widgets_handler.validate_selections(btn_name, selected, selectors, placeholders, output_area):
        with output_area:
            output_area.clear_output(wait=True)
            coordinates = coordinates_retriver.get_boundaries(selected, country_list, placeholders)
            # print(coordinates)
            bounding_box = coordinates_retriver.calculate_bounding_box(coordinates)
            # print(bounding_box)            
                        
            # sample_coordinates = coordinates[:3] # Showing first 3 coordinates for brevity            
            # print('Original Coordinates Sample: ', sample_coordinates)  
            # print('Bounding Box: ', bounding_box)
                        
            # Fetching data using the bounding box
            subset_data = data_preprocess.get_xarray_data(btn_name, bounding_box, selectors, placeholders, months, timescales)
            index = f"SPEI{timescales[selectors['timescale'].value]}"
            adm_level, selected_area = widgets_handler.get_adm_level_and_area_name(selected, placeholders)
            timescale = selected['timescale']
            time_period = widgets_handler.get_period_of_time(btn_name, selected, placeholders)
                
            print(f"SPEI subset data uploaded for {selected_area}, administrative level {adm_level}, timescale {timescale}, period {time_period}")
            zoom_start = 4
            if adm_level == 'ADM1' or adm_level == 'ADM2':
                zoom_start = 8  
            map_display = coordinates_retriver.display_map(bounding_box, zoom_start)
            map_iframe = coordinates_retriver.display_map_in_iframe(map_display)
            display(map_iframe)

            
# Set up widget interaction
def on_button_clicked(btn):
    update_and_get_data(btn.custom_name)


# Setup observers
setup_observers()

The cell belowed is designed to reload and set up the user interface widgets based on previously saved selections, enhancing user experience by maintaining state across sessions or after a notebook refresh. It begins by loading previously saved selections from the `selection.json` file.  
Then it restore the widget states: the values for the country, administrative subareas, timescale, and month widgets are restored using data from the previously saved selections. If no previous data exists for a particular widget, it defaults to the placeholder value.  
The `on_click` event for the 'Get data' button (`month_widgets_btn`) is configured to trigger the on_button_clicked function when clicked. This function is responsible for initiating the data fetching and processing based on current widget selections.  
Finally, all the widgets along with the output area are displayed. This includes the dropdown selectors for country, subareas, and timescale, the month selector, and the button for initiating data retrieval. The output_area is where messages, errors, or the results (like maps or data summaries) will be shown after the user interacts with the widgets.


Regarding the choice of the area, please take into account that the larger the area, the more computational power and time it will take to retrieve the data. So, if your device is not powerful, choose smaller areas, such as second-level subareas.

If you click the 'Get data' button before choosing the necessary options from the dropdown menu, a message will be displayed under the widgets' block explaining what you missed. 

If all the selections are made, you will receive three messages:
1. The retrieval of the coordinates for the selected area was successful ('Coordinates retrieved for...').
2. The data retrieval was successful ('SPEI subset data uploaded for...')
3. A map of the selected area is displayed, which can help you check if the area is the one you are interested in.

In [None]:
# Update existing selectors
previous_selection = widgets_handler.read_json_to_dict('selection.json')

# Set up widgets with previous settings
country_selector.value = previous_selection.get('country', placeholders['country'])
adm1_subarea_selector.value = previous_selection.get('adm1_subarea', placeholders['adm1_subarea'])
adm2_subarea_selector.value = previous_selection.get('adm2_subarea', placeholders['adm2_subarea'])
timescale_selector.value = previous_selection.get('timescale', placeholders['timescale'])
month_selector.value = previous_selection.get('month', placeholders['month'])
month_widgets_btn.on_click(on_button_clicked)

# Display widgets
display(country_selector, adm1_subarea_selector, adm2_subarea_selector, timescale_selector, month_selector, month_widgets_btn, output_area)

Now you have retrieved the data of your interest in a variable named `subset_data[index]`, where index is the SPEI index you have chosen.

Using the `data_preprocess.display_data_details` function, you can examine your data to check the following:
- If the values chosen from the dropdown menu are correct.
- The number of time, latitude, and longitude values present.
- A sample of the first SPEI values.

In [None]:
data_preprocess.display_data_details(active_btn, selected, subset_data[index])

In [None]:
print(index)

In [None]:
processed_subset, change_summary = data_preprocess.process_datarray(subset_data[index])
print(processed_subset, '\n')
print('Change summary:')
for key, val in change_summary.items():
    print(key, val)

In [None]:
# Convert datetime objects to strings and extract the year for the slider labels
maps_year_labels = {i: str(processed_subset.time.values[i].astype('datetime64[Y]')) for i in range(len(processed_subset.time))}

maps_year_slider = widgets.SelectionSlider(
    options=[(maps_year_labels[i], i) for i in range(len(maps_year_labels))],
    value=0,
    description='Year:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True
)

# Use a lambda to pass both ds (processed_subset) and time_index to the function
maps_year_slider_plot = widgets.interactive(lambda time_index: charts.plot_spei_geographical_distribution(processed_subset, time_index), time_index=maps_year_slider)
display(maps_year_slider_plot)

In [None]:
stat_values = data_preprocess.compute_stats(processed_subset)

In [None]:
charts.create_scatterplot(stat_values, timescales, selected, placeholders)

In [None]:
charts.create_boxplot(stat_values, timescales, selected, placeholders)

In [None]:
charts.create_std_dev_bar_chart(stat_values, timescales, selected, placeholders)

In [None]:
# Update existing selectors
previous_selection = widgets_handler.read_json_to_dict('selection.json')

# Set up widgets with previous settings
country_selector.value = previous_selection.get('country', placeholders['country'])
adm1_subarea_selector.value = previous_selection.get('adm1_subarea', placeholders['adm1_subarea'])
adm2_subarea_selector.value = previous_selection.get('adm2_subarea', placeholders['adm2_subarea'])
timescale_selector.value = previous_selection.get('timescale', placeholders['timescale'])
year_selector.value = previous_selection.get('year', placeholders['year'])
year_widgets_btn.on_click(on_button_clicked)

# Display widgets
display(country_selector, adm1_subarea_selector, adm2_subarea_selector, timescale_selector, year_selector, year_widgets_btn, output_area)

In [None]:
data_preprocess.display_data_details(active_btn, selected, subset_data[index])

In [None]:
processed_subset, change_summary = data_preprocess.process_datarray(subset_data[index])
print(processed_subset, '\n')
print('Change summary:')
for key, val in change_summary.items():
    print(key, val)

In [None]:
stat_values = data_preprocess.compute_stats(processed_subset, full_stats=False)

In [None]:
charts.create_linechart(stat_values, timescales, selected, placeholders)

In [None]:
# Update existing selectors
previous_selection = widgets_handler.read_json_to_dict('selection.json')

# Set up widgets with previous settings
country_selector.value = previous_selection.get('country', placeholders['country'])
adm1_subarea_selector.value = previous_selection.get('adm1_subarea', placeholders['adm1_subarea'])
adm2_subarea_selector.value = previous_selection.get('adm2_subarea', placeholders['adm2_subarea'])
timescale_selector.value = previous_selection.get('timescale', placeholders['timescale'])
year_range_selector.value = previous_selection.get('year_range')
year_range_widgets_btn.on_click(on_button_clicked)

# Display widgets
display(country_selector, adm1_subarea_selector, adm2_subarea_selector, timescale_selector, year_range_selector, year_range_widgets_btn, output_area)

In [None]:
data_preprocess.display_data_details(active_btn, selected, subset_data[index])

In [None]:
processed_subset, change_summary = data_preprocess.process_datarray(subset_data[index])
print(processed_subset, '\n')
print('Change summary:')
for key, val in change_summary.items():
    print(key, val)

In [None]:
stat_values = data_preprocess.compute_stats(processed_subset, full_stats=False)

In [None]:
charts.create_stripechart(stat_values, timescales, selected, placeholders)

In [None]:
charts.create_stripechart(stat_values, timescales, selected, placeholders, 'year')

The list of countries, subareas, and their boundaries is obtained from the [geoBoundaries Global Database of Political Administrative Boundaries Database](https://www.geoboundaries.org/).