# Clean precipitation data

In this notebook, I'll use spacial join to extract all data points that just lie inside the Argentina region and save it.

## Import libraries

I'll use the `shapely` and `geopandas` libraries to work with the spacial data.

In [6]:
import numpy as np
import pandas as pd

import descartes
import geopandas as gpd
from geopandas.tools import sjoin
from shapely.geometry import Point, Polygon, shape

## Load data

I'll load in the data converted from HDF5 to CSV, so that I can retrieve useful information from it. I'll also load the shapefile for Argentina.

In [4]:
dataset = pd.read_csv("data/combined_data.csv")
argentina = gpd.read_file("shapefiles/country/ARG_adm0.shp")

## Retrieve correct points

Next, I'll select only the data points that lie within Argentina.

In [5]:
geometry = [Point(xy) for xy in zip(dataset["longitude"], dataset["latitude"])]
points = gpd.GeoDataFrame(dataset, crs = {'init': 'epsg:4326'}, geometry = geometry)

In [None]:
final_df = sjoin(points, argentina, how = 'inner', op = 'intersects')

In [None]:
final_df[["year", "month", "latitude", "longitude", "precipitation", "geometry"]]