## Data Simplification

In order to simplify the dataset, in this notebook, I take the drought values at each point in time for each pixel on the grid and condense the 6-item vector into a single value. In order to do this, I take a weighted sum of the D0 through D4 values. The ultimate shape of the dataset after this simplification is (1009, 105, 237).

In [1]:
import pandas as pd
import numpy as np
import datetime as dt

In [2]:
drought_array = pd.read_csv("./data/drought_reshaped_array.csv")

In [3]:
drought_array.drop(columns="Unnamed: 0", inplace=True)

In [4]:
drt = pd.read_csv("./data/drought_data_combined.csv")

drt.set_index("Date", inplace=True)

drt.index = pd.to_datetime(drt.index)

In [5]:
latitudes = [0.25*i for i in list(range(96, 201))]
latitudes.reverse()

longitudes = [0.25*i for i in list(range(-500, -263))]

dates = list(drt.index.unique().sort_values())

In [6]:
drought_4D = drought_array.values.reshape(len(dates), len(latitudes), len(longitudes), 6)

In [7]:
drought_4D.shape

(1009, 105, 237, 6)

In [8]:
drought_4D /= 100.0

In [9]:
def simplify_array(drought_vector):
    multiplier = np.array([0, 1, 1.25, 1.5, 1.75, 2])
    value = sum(multiplier * drought_vector)
    return value

In [10]:
drought_simplified = np.array([])
for i in range(len(dates)):
    date_data = np.array([])
    for j in range(len(latitudes)):
        lat_row = np.array([])
        for k in range(len(longitudes)):
            data_point = simplify_array(drought_4D[i][j][k])
            lat_row = np.append(lat_row, data_point)
        date_data = np.append(date_data, lat_row)
    drought_simplified = np.append(drought_simplified, date_data)
    
    if len(drought_simplified) % (100 * 105 * 237) == 0:
        print(f"{int(len(drought_simplified)/(105 * 237))} out of 1009 weeks processed")

100 out of 1009 weeks processed
200 out of 1009 weeks processed
300 out of 1009 weeks processed
400 out of 1009 weeks processed
500 out of 1009 weeks processed
600 out of 1009 weeks processed
700 out of 1009 weeks processed
800 out of 1009 weeks processed
900 out of 1009 weeks processed
1000 out of 1009 weeks processed


In [11]:
drought_simplified.shape

(25108965,)

In [12]:
pd.DataFrame(drought_simplified).to_csv("./data/drought_simplified_array.csv")