# Downloading Monthly Precipitation Data

In this notebook, we will use a **urls** file that contains a list of URLs for monthly precipitation data to download all relevant data and then filter it for the respective state.

## Import libraries

We import a number of libraries which we will use to read netCDF files, filter data based on a given state and then, output the result as a CSV file.

In [None]:
# Uncomment and run the following if working on Google Colab to install required libraries

!pip install geopandas netCDF4 shapely rtree pygeos

Collecting geopandas
[?25l  Downloading https://files.pythonhosted.org/packages/d7/bf/e9cefb69d39155d122b6ddca53893b61535fa6ffdad70bf5ef708977f53f/geopandas-0.9.0-py2.py3-none-any.whl (994kB)
[K     |████████████████████████████████| 1.0MB 8.9MB/s 
[?25hCollecting netCDF4
[?25l  Downloading https://files.pythonhosted.org/packages/37/56/f65978898fb8e7e5df9c67531d86eb24eb04938deae3b61dbcce12c98212/netCDF4-1.5.6-cp37-cp37m-manylinux2014_x86_64.whl (4.7MB)
[K     |████████████████████████████████| 4.7MB 30.6MB/s 
Collecting rtree
[?25l  Downloading https://files.pythonhosted.org/packages/51/05/5a67111cee91d2165a2bcb855f442186e3d76ddef834596cc84d4875c401/Rtree-0.9.7-cp37-cp37m-manylinux2010_x86_64.whl (994kB)
[K     |████████████████████████████████| 1.0MB 34.0MB/s 
[?25hCollecting pygeos
[?25l  Downloading https://files.pythonhosted.org/packages/f5/2c/071f928a67d8a7e754a99ba3281ec685c8dfa4d64f9b83fc53ca2c325b82/pygeos-0.9-cp37-cp37m-manylinux1_x86_64.whl (2.1MB)
[K     |█████████

In [None]:
import os
import json
import requests
import descartes
import numpy as np
import pandas as pd
import geopandas as gpd
from netCDF4 import Dataset
import urllib.request as urllib2
from geopandas.tools import sjoin
from urllib.parse import urlencode
from http.cookiejar import CookieJar
from shapely.geometry import Point, Polygon, shape

  shapely_geos_version, geos_capi_version_string


## Load credentials and initiate cookies

We read credentials from a file **credentials.json** provided by the user.

In [None]:
# Open credentials file
# credentials_file = open("credentials.json")

# Load credentials
# credentials = json.load(credentials_file)
username = "Vandita"
password = "2xKXd#?qt.AkN8Z"
  
# Close the file
# credentials_file.close()

We then use these credentials to set up cookies and a method we will use to request data from [GES DSIC](https://disc.gsfc.nasa.gov/). The following code is based on the instructions provided by the [EarthData Wiki](https://wiki.earthdata.nasa.gov/display/EL/How+To+Access+Data+With+Python).

In [None]:
# Set up a password manager
password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None, "https://urs.earthdata.nasa.gov", username, password)

# Create a CookieJar for managing cookies to be used within a session
# and avoid requiring re-login
cookie_jar = CookieJar()

# Build and use opener
opener = urllib2.build_opener(
    urllib2.HTTPBasicAuthHandler(password_manager),
    urllib2.HTTPCookieProcessor(cookie_jar))
urllib2.install_opener(opener)

## Import data files

We will load in the shape file for USA and based on the state we are looking at, we will filter the data. We take the URLs saved in the file **GPM_USA_monthly_urls.txt** which contains the urls for monthly GPM data.

In [None]:
shape_file = gpd.read_file("cb_2018_us_state_500k.shp")

## Create csv

I have created a function which takes in the urls file and creates CSV file for the respective state.

In [None]:
def create_csv(url_file_path, shape_file, state_code):
    
    # Open URLs file for the given state
    url_file = open(url_file_path, "r")
    
    # Filter shape file for the specific state
    shape_file = shape_file[shape_file["STUSPS"] == state_code].reset_index(drop = True)
    
    # Create a dataframe which will store all data
    resultant_data = pd.DataFrame({"Latitude": [], 
                                   "Longitude": [], 
                                   "Precipitation": [], 
                                   "geometry": []})
    
    
    # For each URL, get the data
    for URL in url_file:

        # Make sure the URL is not for a PDF file
        if URL[-4:] != "pdf\n":
            
            # Get data from the URL
            request = urllib2.Request(URL[:-1])
            response = urllib2.urlopen(request)
            result = response.read()

            # Save the retrieved data to a file
            FILENAME = URL.split("https://gpm1.gesdisc.eosdis.nasa.gov/opendap/GPM_L3/GPM_3IMERGM.06/")[1].split("/")[1].split("?")[0]
            f = open(FILENAME, 'wb')
            f.write(result)
            f.close()

            # Read data
            data = Dataset(FILENAME)
            
            # Get latitude, longitude and precipitation
            lon_values = list(np.repeat(data['lon'][:], data['lat'][:].shape[0]))
            lat_values = list(np.tile(data['lat'][:], data['lon'][:].shape[0]))
            precp_values = data['precipitation'][:][0].flatten()
            temp_df = pd.DataFrame({"Latitude": lat_values, "Longitude": lon_values, "Precipitation": precp_values})

            # Create geodataframe from the points
            geometry = [Point(xy) for xy in zip(temp_df["Longitude"], temp_df["Latitude"])]
            points = gpd.GeoDataFrame(temp_df, crs = "EPSG:4269", geometry = geometry)

            # Select the data that lies within the given state
            final_df = sjoin(points, shape_file, how = 'inner', op = 'intersects')
            
            # Create dataframe for the given file
            final_df = final_df[["Latitude", "Longitude", "Precipitation", "geometry"]].reset_index(drop = True)
            date = data.__dict__["FileHeader"].split("\nStartGranuleDateTime=")[1].split(";")[0].split("T")[0].split("-")
            final_df["Year"] = date[0]
            final_df["Month"] = date[1]

            # Append to the resultant dataframe
            resultant_data = pd.concat([resultant_data, final_df], ignore_index = True)

            # Print to console that a record has been updated
            print("Retrieved data for {}, {}".format(date[1], date[0]))

            # Remove the file we created as we no longer need
            os.remove(FILENAME)

    # Save the final dataframe to the system
    resultant_data.to_csv(state_code + "_monthly.csv", index = False)

## Retrieve data

The above function can now be called with the two-letter code for which the CSV needs to be created.

In [None]:
create_csv("url.txt", shape_file, state_code = "FL")

Retrieved data for 01, 2010
Retrieved data for 02, 2010
Retrieved data for 03, 2010
Retrieved data for 04, 2010
Retrieved data for 05, 2010
Retrieved data for 06, 2010
Retrieved data for 07, 2010
Retrieved data for 08, 2010
Retrieved data for 09, 2010
Retrieved data for 10, 2010
Retrieved data for 11, 2010
Retrieved data for 12, 2010
Retrieved data for 01, 2011
Retrieved data for 02, 2011
Retrieved data for 03, 2011
Retrieved data for 04, 2011
Retrieved data for 05, 2011
Retrieved data for 06, 2011
Retrieved data for 07, 2011
Retrieved data for 08, 2011
Retrieved data for 09, 2011
Retrieved data for 10, 2011
Retrieved data for 11, 2011
Retrieved data for 12, 2011
Retrieved data for 01, 2012
Retrieved data for 02, 2012
Retrieved data for 03, 2012
Retrieved data for 04, 2012
Retrieved data for 05, 2012
Retrieved data for 06, 2012
Retrieved data for 07, 2012
Retrieved data for 08, 2012
Retrieved data for 09, 2012
Retrieved data for 10, 2012
Retrieved data for 11, 2012
Retrieved data for 1

In [None]:
create_csv("url2.txt", shape_file, state_code = "AR")

Retrieved data for 01, 2010
Retrieved data for 02, 2010
Retrieved data for 03, 2010
Retrieved data for 04, 2010
Retrieved data for 05, 2010
Retrieved data for 06, 2010
Retrieved data for 07, 2010
Retrieved data for 08, 2010
Retrieved data for 09, 2010
Retrieved data for 10, 2010
Retrieved data for 11, 2010
Retrieved data for 12, 2010
Retrieved data for 01, 2011
Retrieved data for 02, 2011
Retrieved data for 03, 2011
Retrieved data for 04, 2011
Retrieved data for 05, 2011
Retrieved data for 06, 2011
Retrieved data for 07, 2011
Retrieved data for 08, 2011
Retrieved data for 09, 2011
Retrieved data for 10, 2011
Retrieved data for 11, 2011
Retrieved data for 12, 2011
Retrieved data for 01, 2012
Retrieved data for 02, 2012
Retrieved data for 03, 2012
Retrieved data for 04, 2012
Retrieved data for 05, 2012
Retrieved data for 06, 2012
Retrieved data for 07, 2012
Retrieved data for 08, 2012
Retrieved data for 09, 2012
Retrieved data for 10, 2012
Retrieved data for 11, 2012
Retrieved data for 1

In [None]:
# create_csv("precipitation_data/monthly/OK_monthly_urls.txt", shape_file, state_code = "OK")

Retrieved data for 06, 2000
Retrieved data for 07, 2000
Retrieved data for 08, 2000
Retrieved data for 09, 2000
Retrieved data for 10, 2000
Retrieved data for 11, 2000
Retrieved data for 12, 2000
Retrieved data for 01, 2001
Retrieved data for 02, 2001
Retrieved data for 03, 2001
Retrieved data for 04, 2001
Retrieved data for 05, 2001
Retrieved data for 06, 2001
Retrieved data for 07, 2001
Retrieved data for 08, 2001
Retrieved data for 09, 2001
Retrieved data for 10, 2001
Retrieved data for 11, 2001
Retrieved data for 12, 2001
Retrieved data for 01, 2002
Retrieved data for 02, 2002
Retrieved data for 03, 2002
Retrieved data for 04, 2002
Retrieved data for 05, 2002
Retrieved data for 06, 2002
Retrieved data for 07, 2002
Retrieved data for 08, 2002
Retrieved data for 09, 2002
Retrieved data for 10, 2002
Retrieved data for 11, 2002
Retrieved data for 12, 2002
Retrieved data for 01, 2003
Retrieved data for 02, 2003
Retrieved data for 03, 2003
Retrieved data for 04, 2003
Retrieved data for 0