# Exploratory Data Analysis

In this notebook, we'll explore the relationships (if they exist) amongst the Zika virus data and the precipitation data across Argentina.

## Import libraries

We'll begin by importing all the necessary libraries. This includes `pandas` to work with csv files and data frames, `matplotlib` to draw plots and `geopandas` to work with geographical data. We'll also import necessary additional libraries (`descartes` and `Point`) to make everything work together.

In [1]:
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline

import descartes
import geopandas as gpd
from shapely.geometry import Point

## Load datasets

We'll load in the data for coordinates for various provinces in Argentina via a file `places.csv`. Then, we'll import the Zika and precipitation monthly files.

In [2]:
# Location information
places = pd.read_csv("zika_data/places.csv")

# April, 2016 Zika data
zika_data = pd.read_csv("zika_data/2016-04.csv")
# April, 2016 Precipitation information
precipitation_data = pd.read_csv("data/precipitation_4_2016.csv")

## Universal functions

Here, we define a set of functions that can be used for any month and handle both the Zika and the precipitation datasets.

### Function: get_zika_coordinates

This function returns the list of all provinces of Argentina, with their coordinates and the number of Zika cases for that given month.

In [3]:
def get_zika_coordinates(data, places):
    """
    The function combines the Zika data with coordinates of 
    its respective province and then returns the aggregated 
    data for each province.
    """
    
    data_updated = pd.merge(data, 
                            places, 
                            how = 'inner', 
                            left_on = "location", 
                            right_on = "location")
    
    data_updated = data_updated[["location", "value", "latitude", "longitude"]] \
                                .groupby(["location", "latitude", "longitude"]) \
                                .sum().reset_index()

    data_updated["location"] = data_updated["location"] \
                                            .apply(lambda x: x.split("-")[1] \
                                                   .replace("_", " "))
    
    return data_updated