# Pre-processing: Flood Events

**Tasks**
* Extract all major flood_dates for every country in the provided food crisis dataset.
* Filter food crisis data based on the months before/after flooding events identified.
* Run time: < 1 minute

**WARNING**: There are problems with the Afghanistan data in the original food crisis dataset (i.e. the admin2 codes
are actually admin1). This code does not include error handling for that FEWS Country.


In [1]:
import pandas as pd
import os
import timeit

os.getcwd()
start = timeit.default_timer()


**1. Read Data**
* In CSV format. Shapefiles formats are not recommended due to computational costs.

In [2]:
# Read Food Security data
FS = pd.read_csv('Data/FS_data.csv')


In [3]:
# Read Flood Event data
FE = pd.read_csv('Data/global_flooding_events.csv')


**2. Clean Flooding Event Data**
* Needs to be within time frame that is compatible with both the food crisis data (2010-2019) and SARS data
(2015-Present)
* Data will be cleaned to only include countries from provided food crises dataset (in this case - Burkina Faso,
Chad, Mali, Niger)

In [4]:
# CLEAN FE DATA
# Remove FE dates before 2010
FE.drop(FE.index[FE['year'] == 2000], inplace=True)
FE.drop(FE.index[FE['year'] == 2001], inplace=True)
FE.drop(FE.index[FE['year'] == 2002], inplace=True)
FE.drop(FE.index[FE['year'] == 2003], inplace=True)
FE.drop(FE.index[FE['year'] == 2004], inplace=True)
FE.drop(FE.index[FE['year'] == 2005], inplace=True)
FE.drop(FE.index[FE['year'] == 2006], inplace=True)
FE.drop(FE.index[FE['year'] == 2007], inplace=True)
FE.drop(FE.index[FE['year'] == 2008], inplace=True)
FE.drop(FE.index[FE['year'] == 2009], inplace=True)
FE.drop(FE.index[FE['year'] == 2010], inplace=True)
FE.drop(FE.index[FE['year'] == 2011], inplace=True)
FE.drop(FE.index[FE['year'] == 2012], inplace=True)
FE.drop(FE.index[FE['year'] == 2013], inplace=True)
FE.drop(FE.index[FE['year'] == 2014], inplace=True)

# Remove duplicates
FE = FE.drop_duplicates(subset=['country', 'year', 'month'])

**3. Create Timestamps**
* For both food crises data and flooding event data

In [5]:
# Create Timestamps (MONTHLY)
# FS
FS['date'] = FS['year'].astype(str) + '-' + FS['month'].astype(str)
FS['datetime'] = pd.to_datetime(FS['date'])
FS = FS.drop(['date'], axis=1)
#FE
FE['date'] = FE['year'].astype(str) + '-' + FE['month'].astype(str) + '-' +FE['day'].astype(str)
FE['datetime'] = pd.to_datetime(FE['date'])
FE = FE.drop(['date'], axis=1)
FE['datetime_round'] = pd.to_datetime(FE['datetime']).dt.to_period('M').dt.to_timestamp()
FE = FE.reset_index()
FE = FE.drop(['index'], axis=1)


**4. Filter Flooding Event Data**
* To only include flooding events for relevant countries

In [6]:
# Extract Flood events for relevant countries (from FS dataset)
country_list = set(FS['country'].tolist()) # Extract list of FEWS NET countries
FE2 = pd.DataFrame(columns=list(FE.columns))
for i in FE.index:
    for country in country_list:
        if FE['country'][i] == country:
            FE2 = FE2.append(FE.loc[i])
FE2 = FE2.sort_values(by=['country', 'year', 'month', 'day'])
FE2 = FE2.drop(['area', 'exposed'], axis=1)
country_list


{'Burkina Faso', 'Chad', 'Mali', 'Niger'}

**5. Filter Food Crisis Data**
 * To only include data from before/after each identified flooding event

In [7]:
# Filter FS data to only include data before/after/during each event
FS2 = pd.DataFrame(columns=list(FS.columns))
FS2['state'] = ''
FS2['flood_date'] = ''
for event in FE2.index:
    for i in FS.index:
        if FS['country'][i] == FE2['country'][event]:
            FS_date = FS['datetime'][i]
            FE_date = FE2['datetime_round'][event]
            before = FE_date - pd.DateOffset(months=1)
            after = FE_date + pd.DateOffset(months=1)
            if FS_date == before:
                FS2 = FS2.append(FS.loc[i])
                FS2.loc[i,['state']] = 'Before'
                FS2.loc[i, ['flood_date']] = FE2['datetime'][event]
            elif FS_date == after:
                FS2 = FS2.append(FS.loc[i])
                FS2.loc[i,['state']] = 'After'
                FS2.loc[i, ['flood_date']] = FE2['datetime'][event]
            else:
                continue


**6. Clean Up New Data**


In [8]:
# Clean up Food Crisis Data Before/After
FS2 = FS2.reset_index()
FS2 = FS2.drop(['index'], axis=1)
FS2['datetime'] = FS2['datetime'].astype(str)
FS2['flood_date'] = FS2['flood_date'].astype(str)
FS2 = FS2[['country', 'admin_name', 'state', 'datetime', 'flood_date', 'fews_ipc', 'ndvi_mean', 'rain_mean',
           'et_mean', 'acled_count', 'acled_fatalities', 'p_staple_food', 'area', 'cropland_pct', 'pop',
           'ruggedness_mean', 'pasture_pct', 'spacelag', 'timelag1', 'timelag2', 'geometry']]

# Clean up Flood Event Data - For GEE
FE3 = FS2[['country', 'flood_date', 'datetime', 'state']]
FE3 = FE3.drop_duplicates()
FE3 = FE3.reset_index()
FE3 = FE3.drop(['index'], axis=1)


stop = timeit.default_timer()
print('Running Time: ', stop - start, 'seconds')

Running Time:  34.6736503 seconds


**7. Save Data**

In [9]:
# Save data
FS2.to_csv('Data/FS_before_after_data.csv', index = False)
FE3.to_csv('Data/FE_before_after_dates.csv', index = False)

