## ANN-based estimates of mosquito abundance

This code reads Aedes aegypti abundance data, together with mean temperature, precipitation, and relative humidity information in four neighborhoods in Puerto Rico. It then uses the weather data to create input files for an ANN trained on MoLS simulations. Finally, scaled ANN predictions are compared to the surveillance data. 

Import various libraries and read the mosquito data file. The information is stored in a dataframe called `mdata`.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sys, os, joblib, importlib, glob
sys.path.append( os.path.abspath(os.path.join('..')) )
import utils.utils as gen_utils
from datetime import datetime, timedelta

from sklearn.preprocessing import MinMaxScaler



In [None]:
# Read data file
mdata=pd.read_csv('../data/Mosquito_data.csv')
# Find locations
sites=sorted(list(set(mdata.Site)))
print(sites)

## Create complete dataframes for each location

The `mdata` dataframe contains information about 4 locations (listed in `sites`). In addition, because of hurricane Maria, two weeks of weather and msoqutio data are missing. There is also one week, at the end of March 2016, where weather data is missing.

We therefore need to create 4 input files with weekly data, one for each location, and interpolate the missing information over the two periods in question. Missing values are added through calls to `utils.add_missing_data` for the 2 weeks due to hurricane Maria, and to `utils.replace_nan_values` for the week of 3/2016.

The resulting files are called `Arboleda_data`, `La_Margarita_data`, `Playa_data`, and `Villodas_data`.

In [None]:
importlib.reload(gen_utils)
# Set threshold (between 0 and 1) for zero rainfall
#Arboleda
Pdata = mdata[mdata["Site"].isin([sites[0]])]
Pdata.index=np.arange(Pdata.shape[0])
Pdata.Datetime=pd.to_datetime(Pdata.Datetime)
Pdata=gen_utils.add_missing_values(Pdata)
Arboleda_data=gen_utils.replace_nan_values(Pdata,7)
# La Margarita
Pdata=mdata[mdata["Site"].isin([sites[1]])]
Pdata.index=np.arange(Pdata.shape[0])
Pdata.Datetime=pd.to_datetime(Pdata.Datetime)
Pdata=gen_utils.add_missing_values(Pdata)
La_Margarita_data=gen_utils.replace_nan_values(Pdata,10)
# Playa
Pdata=mdata[mdata["Site"].isin([sites[2]])]
Pdata.index=np.arange(Pdata.shape[0])
Pdata.Datetime=pd.to_datetime(Pdata.Datetime)
Pdata=gen_utils.add_missing_values(Pdata)
Playa_data=gen_utils.replace_nan_values(Pdata,10)
# Villodas
Pdata=mdata[mdata["Site"].isin([sites[3]])]
Pdata.index=np.arange(Pdata.shape[0])
Pdata.Datetime=pd.to_datetime(Pdata.Datetime)
Pdata=gen_utils.add_missing_values(Pdata)
Villodas_data=gen_utils.replace_nan_values(Pdata,10)

The structure of the `Arboleda_data` dataframe is shown below.

In [None]:
Villodas_data.head()

We now save the dataframes into separate files for future use.

In [None]:
Arboleda_data.to_pickle('../data/Arboleda.pd')
La_Margarita_data.to_pickle('../data/La_Margarita.pd')
Playa_data.to_pickle('../data/Playa.pd')
Villodas_data.to_pickle('../data/Villodas.pd')

## Create ANN input files

The ANN input files are dataframes whose columns are the location, year, month, day, daily average temperature, daily precipitation (in cm), daily relative humidity, and daily female abundance. They are obtained with `utils.create_AedesAI_input_dataframe`.
* Daily temperature, relative humidity, and abundance values are estimated from normal distributions that use the weekly means and standard deviations provided in `location_data`.
* Negative abundance values are set to 0
* Daily precipitation values are estimated by distributing the weekly amount of rain over the entiree week, either randomly (relatively low values of `th`) or uniformly (for high values of `th`, e.g. `th=1` or even `th=0.5`)

For illustration, we show a plot of the daily female abundance in Arboleda traps (`Ref` column of the `daily_Arboleda_data` dataframe).

In [None]:
importlib.reload(gen_utils)

th=0.3
loc='Arboleda'
data_file='../data/'+loc+'.pd'; data=pd.read_pickle(data_file)
daily_Arboleda_data, cols=gen_utils.create_AedesAI_input_dataframe(data,th)
daily_Arboleda_data = daily_Arboleda_data.set_axis(cols, axis=1)
daily_Arboleda_data.to_pickle('../data/Arboleda_daily.pd')
# Extract date from the year, month, and day columns of the dataframe
dts=np.arange(datetime(int(daily_Arboleda_data.iloc[0,1]),int(daily_Arboleda_data.iloc[0,2]),int(daily_Arboleda_data.iloc[0,3])),
          datetime(int(daily_Arboleda_data.iloc[-1,1]), int(daily_Arboleda_data.iloc[-1,2]), int(daily_Arboleda_data.iloc[-1,3]+1)),
          timedelta(days=1)).astype(datetime)
# Plot daily female abundance
plt.figure(figsize=(12, 3))
plt.plot(dts,daily_Arboleda_data.Ref)
plt.title('Arboleda'); plt.xlabel("Days"); plt.ylabel("Abundance")
plt.show()

Create daily dataframes for the other locations

In [None]:
data=pd.read_pickle('../data/La_Margarita.pd'); daily_La_Margarita_data, cols=gen_utils.create_AedesAI_input_dataframe(data,th)
daily_La_Margarita_data = daily_La_Margarita_data.set_axis(cols, axis=1)
daily_La_Margarita_data.to_pickle('../data/La_Margarita_daily.pd')

data=pd.read_pickle('../data/Playa.pd'); daily_Playa_data, cols=gen_utils.create_AedesAI_input_dataframe(data,th)
daily_Playa_data = daily_Playa_data.set_axis(cols, axis=1)
daily_Playa_data.to_pickle('../data/Playa_daily.pd')

data=pd.read_pickle('../data/Villodas.pd'); daily_Villodas_data,cols=gen_utils.create_AedesAI_input_dataframe(data,th)
daily_Villodas_data = daily_Villodas_data.set_axis(cols, axis=1)
daily_Villodas_data.to_pickle('../data/Villodas_daily.pd')

## Run ANN predictions

We now use the daily weather time series created for each location as input to the trained ANN. The resulting rpedictions are then scaled and compared to the corresponding trap data.

In the setup calls below,
* The file `../utils/predictions.py` is slightly modified from the `Aedes-AI` package
* The file `../utils/gru_avg_temp_scaler.save` was created when the network was trained. It contains the minima and maxima of the training data, which are used to normalize the input weather information

In [None]:
import utils.predictions as predictions
from sklearn.preprocessing import MinMaxScaler

data_shape = [90,3]
scaler_fil = '../utils/gru_avg_temp_scaler.save'
scaler = joblib.load(scaler_fil)

In [None]:
import tensorflow as tf
if os.path.exists('../utils/gru_avg_temp.h5'):
  model=tf.keras.models.load_model('../utils/gru_avg_temp.h5', custom_objects={"r2_keras":r2_keras})
  print('model loaded')
else:
  print('uh oh')

daily_datas = glob.glob('./*_daily.pd')
for data_fil in daily_datas:
  data = pd.read_pickle(data_fil)
  loc = str(data.iloc[0,0])
  outfile='./'+loc+'_gru_avg_temp_predictions.csv'
  results=predictions.gen_preds(model, data, data_shape, scaler, fit_scaler=False, smooth=False)
  pd.DataFrame(results, columns=['Location','Year','Month','Day','Ref','Neural Network']).to_csv(outfile,index=False)
  print('Predictions created for '+loc)