Skip to content

Predicting Dengue outbreaks in Brazil with manifold learning on climate data

Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



9 Commits

Repository files navigation

"Predicting dengue outbreaks in Brazil with manifold learning on climate data"

Caio Souza, Pedro Maia, Lucas M. Stolerman, Vitor Rolla and Luiz Velho


In this work, we improve upon a recent approach of coarsely predicting outbreaks in Brazilian urban centers based solely on their yearly climate data. Our methodological advancements encompass a judicious choice of data pre-processing steps and usage of modern computational techniques from signal-processing and manifold learning.


/data/ folder contains the climate and dengue outbreak data for the cities of Aracaju, Belo Horizonte, Manaus, Recife, Rio de Janeiro, Salvador and São Luís. Each sub-folder contains the files: dengue.csv (dengue cases and incidence per year), precip.csv (daily measure for precipitation), temp_avg.csv (daily measure for the average temperature), and years.csv (the correspondent year for each line in the previous files).

/original_data/ folder contains the original *.mat files used in the previous work of Stolerman et al.

/code/ is responsible for running the grid search over the hyper-parameters and selecting the best prediction date and model. The final result is written to /results/result.csv, while intermediate results for the grid and selection, including plots for each model, are in /results/intermediate/.

/code/ is responsible for generating the main figures with the classifier regions for the previous found best hyper-parameters and date (/results/figures/[city]/).

/code/ is responsible for calculating the statistic tests for the models and writes to /results/stats.csv.

Important notes

Our datasets are small, about 14-16 years for each city, given that, we use noisy data for validation (tuning the hyperparameters). For that reason, the results may have slight variations from run to run. For the grid search step, a complete list of the sorted grid for each city can be found at /results/intermediate/selection/[city]/.

The same aforementioned reason may interfere with the Student's T-test and McNemar Test for the random guess dummy classifier, as it generates a small number of samples, the variability may be higher than for large datasets, where the samples should be 50/50 for the positive and negative class, for this reason, we also include the same tests for the Moda classifier, which should be constant for a given model.

Environment and Dependencies

These scripts depend on the python libraries: scikit-learn, pandas, matplotlib and statsmodels and were tested on both Windows and Linux systems. The grid search may take approximately 1 hour to run while the other scripts should take just a few seconds. The times were measured on a standard notebook with an i7 7700HQ and 16GB RAM.


Predicting Dengue outbreaks in Brazil with manifold learning on climate data






No releases published