## Cal-CRAI metric: SPEI
This notebook generates the text metadata files for the drought exposure metric: `% change in probability that a water year is classified as having Moderate, Severe, or Extreme drought conditions via SPEI` using Cal-Adapt: Analytics Engine data. Because the AE data represents 200+ GB of data, metrics were calculated with a cluster in a high performance computing environment (i.e. a pcluster). Please see the processing script `climate_ae_spei.py` for full methodological process.

**SPEI** will be added as an available data metric to climakitae as a part of this development. 

**References**: 
1. S. M. Vicente-Serrano, S. Beguería, and J. I. López-Moreno, “A Multiscalar Drought Index Sensitive to Global Warming: The Standardized Precipitation Evapotranspiration Index,” Journal of Climate, vol. 23, no. 7, pp. 1696–1718, Apr. 2010, doi: 10.1175/2009JCLI2909.1.
2. George H. Hargreaves and Zohrab A. Samani. Reference Crop Evapotranspiration from Temperature. Applied engineering in agriculture, 1(2):96–99, 1985. PubAg AGID: 5662005. doi:10.13031/2013.26773
3. https://xclim.readthedocs.io/en/stable/indices.html#xclim.indices.potential_evapotranspiration
4. https://xclim.readthedocs.io/en/stable/indices.html#xclim.indices.standardized_precipitation_evapotranspiration_index

Variables:
1. Daily Water Budget, which is the difference between:
    - Daily precipitation and
    - Daily potential evapotranspiration, derived from some combo of the following, depending on method:
       - Daily Min Temperature
       - Daily Max Temperature
       - Daily Mean Temperature
       - Relative Humidity
       - Surface Downwelling Shortwave Radiation
       - Surface Upwelling Shortwave Radiation
       - Surface Downwelling Longwave Radiation
       - Surface Upwelling Longwave Radiation
       - 10m Wind Speed
       
       *we will be using the Hargreaves and Samani (1985) version, so we use daily min and max temperatures*
2. Calibration Daily Water Budget
    - Can be computed from Daily Water Budget over a given "calibration" time period
    
### Step 1: Generate metadata

In [1]:
import pandas as pd
import os
import sys

sys.path.append(os.path.expanduser('../../'))
from scripts.utils.file_helpers import upload_csv_aws, pull_csv_from_directory
from scripts.utils.write_metadata import append_metadata

In [2]:
bucket_name = 'ca-climate-index'
aws_dir = '3_fair_data/index_data/climate_drought_spei_metric.csv'
folder = 'csv_folder'

pull_csv_from_directory(bucket_name, aws_dir, folder, search_zipped=False)

Saved DataFrame as 'csv_folder\climate_drought_spei_metric.csv'


In [3]:
df_in = pd.read_csv(r'csv_folder/climate_drought_spei_metric.csv') # make sure this is in the same folder!
df_in # check

Unnamed: 0,GEOID,change_in_drought_years,change_in_drought_years_min,change_in_drought_years_max,change_in_drought_years_min_max_standardized
0,6001401700,4.25,1,14.25,0.245283
1,6001401800,4.25,1,14.25,0.245283
2,6001402200,4.25,1,14.25,0.245283
3,6001402500,4.25,1,14.25,0.245283
4,6001402600,4.25,1,14.25,0.245283
...,...,...,...,...,...
9124,6111008900,9.00,1,14.25,0.603774
9125,6111009100,9.00,1,14.25,0.603774
9126,6111009200,9.00,1,14.25,0.603774
9127,6111009600,8.50,1,14.25,0.566038


In [4]:
# Move a specific column to the end of the DataFrame
column_to_move = 'change_in_drought_years'  # Replace with the actual column name
columns = [col for col in df_in.columns if col != column_to_move]  # Keep all other columns
columns.append(column_to_move)  # Add the column to move to the end

# Reassign the DataFrame with the new column order
df_in = df_in[columns]

In [5]:
df_in.to_csv('climate_drought_spei_metric.csv', index=False)

In [6]:
@append_metadata
def drought_spei_process(df, export=False, export_filename=None, varname=''):
    '''
    Reduces the size of the initial daily raw data in order to streamline compute time.
    Transforms the raw data into the following baseline metrics:
    * change in probability that a water year is classified as having Moderate, Severe,
    or Extreme drought conditions via Standardized Precipitation Evapotranspiration Index (SPEI)
    
    Methods
    -------
    Metric is calculated with the xclim.indices.standardized_precipitation_evapotranspiration_index.
    
    Parameters
    ----------
    df: pd.DataFrame
        Input data.
    export: True/False boolean
        False = will not upload resulting df containing CAL CRAI drought metric to AWS
        True = will upload resulting df containing CAL CRAI drought metric to AWS
    export_filename: string
        name of csv file to be uploaded to AWS
    varname: string
        Final metric name, for metadata generation
        
    Script
    ------
    Metric calculation: climate_ae_spei.py via pcluster run
    Metadata generation: climate_ae_spei_metadata.ipynb
    
    Note
    ----
    Because the climate projections data is on the order of 2.4 TB in size, intermediary
    processed files are not produced for each stage of the metric calculation. All processing
    occurs in a single complete run in the notebook listed above.
    '''
        
    # historical baseline
    print("Data transformation: historical baseline data retrieved for 1981-2010 for max & min air temperature and precipitation.")
    print("Data transformation: dynamically-downscaled climate data subsetted for a-priori bias-corrected models.")
    print("Data transformation: drop all singleton dimensions (scenario).")

    # calculate chronic with 2°C WL
    print('Data transformation: raw projections data retrieved for warming level of 2.0°C, by manually subsetting based on GWL for parent GCM and calculating 30 year average.')
    print("Data transformation: dynamically-downscaled climate data subsetted for a-priori bias-corrected models.")
    print("Data transformation: drop all singleton dimensions (scenario).")
    
    # calculate delta signal
    print("Data transformation: water budget calculated as input for SPEI.")
    print("Data transformation: SPEI calculated, with the number of water years with 6+ months of SPEI <-1 (6+ dry months).")
    print("Data transformation: delta signal calculated by taking difference between chronic (2.0°C) and historical baseline.")

    # reprojection to census tracts
    print("Data transformation: data transformed from xarray dataset into pandas dataframe.")
    print("Data transformation: data reprojected from Lambert Conformal Conic CRS to CRS 3857.")
    print("Data transformation: data infilling for coastal census tracts by the average of nearest valid census tract via sjoin.nearest") ## confirm

        
    # min-max standardization
    print("Data transformation: data min-max standardized with min_max_standardize function.")
    
    if export == True:
        bucket_name = 'ca-climate-index'
        directory = '3_fair_data/index_data'
        export_filename = [df]
        upload_csv_aws(export_filename, bucket_name, directory)

    if export == False:
        print(f'{df} uplaoded to AWS.')

    if os.path.exists(df):
        os.remove(df)

In [8]:
varname = 'climate_caladapt_drought_probability'
filename = 'climate_drought_spei_metric.csv'
drought_spei_process(filename, export=True, export_filename=None, varname='test')