<div style="background-color:#FCE205; padding:10px; border-radius:5px; color:black; font-weight:bold;">
    <h2>Collecting US Drought Data Data</h2>
</div>


The US drought data was collected from the National Integrated Drought Information System (NIDIS) through https://www.drought.gov/historical-information?dataset=0&selectedDateUSDM=20250408. 

The U.S. Drought Monitor (USDM) is a map that is updated each Thursday to show the location and intensity of drought across the USA. It uses a five-category system, labeled Abnormally Dry or D0, (a precursor to drought, not actually drought), and Moderate (D1), Severe (D2), Extreme (D3) and Exceptional (D4) Drought. Drought categories show experts’ assessments of conditions related to dryness and drought including observations of how much water is available in streams, lakes, and soils compared to usual for the same time of year.

The bee dataset has quarterly data. The drought index is collected weekly and will have to be aggregated to quarterly data. 

In [1]:
# import libraries
import pandas as pd
import os

In [2]:
# set working directory
ITM_DIR = os.path.join(os.getcwd(), '../data/import')

In [3]:
# read in the data
drought = pd.read_csv(os.path.join(ITM_DIR, 'Drought index.csv'))

## Drought index

- D0 - Total percent land area affected by **Abnormally Dry** conditions per week.
- D1 - Total percent land area affected by **Moderate Drought** conditions per week.
- D2 - Total percent land area affected by **Severe Drought** conditions per week.
- D3 - Total percent land area affected by **Extreme Drought** conditions per week.
- D4 - Total percent land area affected by **Exceptional Drought** conditions per week.

In [5]:
# read in the data
drought = pd.read_csv(os.path.join(ITM_DIR, 'Drought index.csv'))

# the data appears to be missing a lot of pennsylvania data
# so it was downloaded from the USGS website seperately
# and added to the drought data

penn = pd.read_csv(os.path.join(ITM_DIR, 'USDM-Pennsylvania.csv'))

# rename drought columns from penn to match drought data
penn_rename = {'D0': 'D0 (total percent land area)',
                'D1': 'D1 (total percent land area)',
                'D2': 'D2 (total percent land area)',
                'D3': 'D3 (total percent land area)',
                'D4': 'D4 (total percent land area)'}                              
penn.rename(columns=penn_rename, inplace=True)

# vertically concatenate the two dataframes
drought = pd.concat([drought, penn], axis=0)

In [8]:
# Create a mapping of state abbreviations to full state names
state_abbreviation_to_name = {
    "AL": "Alabama", "AK": "Alaska", "AZ": "Arizona", "AR": "Arkansas", "CA": "California",
    "CO": "Colorado", "CT": "Connecticut", "DE": "Delaware", "FL": "Florida", "GA": "Georgia",
    "HI": "Hawaii", "ID": "Idaho", "IL": "Illinois", "IN": "Indiana", "IA": "Iowa",
    "KS": "Kansas", "KY": "Kentucky", "LA": "Louisiana", "ME": "Maine", "MD": "Maryland",
    "MA": "Massachusetts", "MI": "Michigan", "MN": "Minnesota", "MS": "Mississippi", "MO": "Missouri",
    "MT": "Montana", "NE": "Nebraska", "NV": "Nevada", "NH": "New Hampshire", "NJ": "New Jersey",
    "NM": "New Mexico", "NY": "New York", "NC": "North Carolina", "ND": "North Dakota", "OH": "Ohio",
    "OK": "Oklahoma", "OR": "Oregon", "PA": "Pennsylvania", "RI": "Rhode Island", "SC": "South Carolina",
    "SD": "South Dakota", "TN": "Tennessee", "TX": "Texas", "UT": "Utah", "VT": "Vermont",
    "VA": "Virginia", "WA": "Washington", "WV": "West Virginia", "WI": "Wisconsin", "WY": "Wyoming"
}

# map the state abbreviations to full names
drought['StateName'] = drought['StateAbbreviation'].map(state_abbreviation_to_name)

# Drop the 'StateAbbreviation' column
drought.drop(columns=['StateAbbreviation'], inplace=True)

<div style="background-color:#FCE205; padding:10px; border-radius:5px; color:black; font-weight:bold;">
    <h3>Aggregating Drought Data</h3>
</div>

A mean and max percent land area affected is calculated across the quarter.

#### _mean: Mean % area per quarter
For seeing how widespread a drought level was on average during the quarter.


#### _max: Max % area per quarter
To catch peak drought severity.

In [9]:
drought['ValidStart'] = pd.to_datetime(drought['ValidStart'])
drought['year'] = drought['ValidStart'].dt.year
drought['quarter'] = drought['ValidStart'].dt.quarter

# Drought severity columns
drought_cols = ['D0 (total percent land area)', 'D1 (total percent land area)', 'D2 (total percent land area)', 'D3 (total percent land area)', 'D4 (total percent land area)']

# 1. Mean
df_mean = drought.groupby(['StateName', 'year', 'quarter'])[drought_cols].mean()
df_mean.columns = [col + '_mean' for col in df_mean.columns]

# 2. Max
df_max = drought.groupby(['StateName', 'year', 'quarter'])[drought_cols].max()
df_max.columns = [col + '_max' for col in df_max.columns]

# Merge summaries
drought_quarterly = pd.concat([df_mean, df_max], axis=1).reset_index()

In [13]:
# change column names to be more readable
drought_quarterly.columns = ['state', 'year', 'quarter'] + [col.replace(' (total percent land area)', '') for col in drought_quarterly.columns[3:]]

In [15]:
# Save the quarterly summary to a CSV file
OUT_DIR = os.path.join(os.getcwd(), '../data/intermediate')

drought_quarterly.to_csv(os.path.join(OUT_DIR, 'drought_quarterly.csv'), index=False)