# Table of Contents
 <p><div class="lev2 toc-item"><a href="#Get-data-from-EOSDIS-the-Python-way-..." data-toc-modified-id="Get-data-from-EOSDIS-the-Python-way-...-01"><span class="toc-item-num">0.1&nbsp;&nbsp;</span>Get data from EOSDIS the Python way ...</a></div><div class="lev3 toc-item"><a href="#Making-a-request-for-NetCDF-data" data-toc-modified-id="Making-a-request-for-NetCDF-data-011"><span class="toc-item-num">0.1.1&nbsp;&nbsp;</span>Making a request for NetCDF data</a></div><div class="lev3 toc-item"><a href="#[aside]-Huh?--What-is-that-URL-from???" data-toc-modified-id="[aside]-Huh?--What-is-that-URL-from???-012"><span class="toc-item-num">0.1.2&nbsp;&nbsp;</span>[aside] Huh?  What is that URL from???</a></div><div class="lev3 toc-item"><a href="#Saving-the-data-from-the-request" data-toc-modified-id="Saving-the-data-from-the-request-013"><span class="toc-item-num">0.1.3&nbsp;&nbsp;</span>Saving the data from the request</a></div><div class="lev3 toc-item"><a href="#Yay!-We-have-a-NetCDF-file-..." data-toc-modified-id="Yay!-We-have-a-NetCDF-file-...-014"><span class="toc-item-num">0.1.4&nbsp;&nbsp;</span>Yay! We have a NetCDF file ...</a></div><div class="lev3 toc-item"><a href="#Using-NetCDF-in-Python" data-toc-modified-id="Using-NetCDF-in-Python-015"><span class="toc-item-num">0.1.5&nbsp;&nbsp;</span>Using NetCDF in Python</a></div><div class="lev3 toc-item"><a href="#Back-on-track:-Iterating-over-a-large-number-of-files-..." data-toc-modified-id="Back-on-track:-Iterating-over-a-large-number-of-files-...-016"><span class="toc-item-num">0.1.6&nbsp;&nbsp;</span>Back on track: Iterating over a large number of files ...</a></div><div class="lev3 toc-item"><a href="#PyDAP-Solution" data-toc-modified-id="PyDAP-Solution-017"><span class="toc-item-num">0.1.7&nbsp;&nbsp;</span>PyDAP Solution</a></div>

## Get data from EOSDIS the Python way ...

We will be using Python to get the data and it will make life easy in the future.

The instructions in this notebook (a variant of them at least), is found on NASAs site [here](https://disc.gsfc.nasa.gov/registration/registration-for-data-access#python).

**IMPORTANT NOTES:**
* follow the instructions [here](https://disc.gsfc.nasa.gov/registration/authorizing-gesdisc-data-access-in-earthdata_login) to AUTHORIZE _GESDISC_ ACCESS FOR YOUR ACCOUNT, OTHERWISE **THIS WILL NOT WORK**
* make sure you are using the correct username and password for your account

In [2]:
username = '' # your GES DISC username
password = '' # your GES DISC password

This piece of GES DISC silliness is required, but cut and paste verbatim. The how/why of this trickery can be found [here](https://wiki.earthdata.nasa.gov/display/EL/How+To+Access+Data+With+Python).  Under normal circumstances and with other services, this step will not be necessary or look a bit differently.  This is a solution that works well with the GES DISC system, but likely not others.

In [2]:
from urllib import request
from urllib.request import urlopen
from http import cookiejar

password_manager = request.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None, "https://urs.earthdata.nasa.gov", username, password)
cookie_jar = cookiejar.CookieJar()

opener = request.build_opener(
    request.HTTPBasicAuthHandler(password_manager),
    #urllib2.HTTPHandler(debuglevel=1),    # Uncomment these two lines to see
    #urllib2.HTTPSHandler(debuglevel=1),   # details of the requests/responses
    request.HTTPCookieProcessor(cookie_jar))
request.install_opener(opener)

### Making a request for NetCDF data
The template for making a request for data looks like this:

1. set the URL of the data you want
2. open a request with url
3. process the response of that request

Here is the code that does that:

In [3]:
url  = 'http://disc2.gesdisc.eosdis.nasa.gov/opendap/TRMM_L3/TRMM_3B43.7/2015/3B43.20151201.7.HDF.nc'
req  = request.Request(url)
resp = urlopen(req)

### [aside] Huh?  What is that URL from???

If you go into a download screen and hover over the NetCDF link, you'll see the URL that is above -- `http://disc2.[...]./opendap/TRMM_L3/[...]`.

![](./opera_2017-06-15_13-01-37.png)
This is very important to pay attention to, because we'll need it later.

### Saving the data from the request

Saving the data is now like you would expect in normal Python code ... we'll rename the file to `sample_file.nc` and write it to the file system.

In [4]:
with open("sample_file.nc","wb") as fo:
    fo.write(resp.read())

### Yay! We have a NetCDF file ...

We have a file that we can use, so lets go see what's in it!

To proceed, you will need to make sure you have the `netCDF4` library.  

**TESTING YOUR NETCDF4**

* in a Jupyter Notebook, type 
```python
from netCDF4 import *
```
* if you get an error follow the directions below

** INSTALLING NETCDF4**

* open your terminal
* type 
```bash
conda install netcdf
```
* type Y a few times only when prompted
* test again with the instructions above

### Using NetCDF in Python

This notebook doesn't go into the details of netCDF in Python, so if you need to learn how to do complex things with it, please [see the full docs here](https://unidata.github.io/netcdf4-python/).

Instead we'll poke around to make sure the file is as we think it is.

**IMPORT NETCDF AND LOAD A FILE LIKE THIS**

In [5]:
from netCDF4 import Dataset
rootgrp = Dataset("sample_file.nc", "r", format="NETCDF4")

**POKE AROUND THE VARIABLES OF THIS FILE LIKE THIS**

In [6]:
rootgrp.variables

OrderedDict([('precipitation', <class 'netCDF4._netCDF4.Variable'>
              float32 precipitation(nlon, nlat)
                  units: mm/hr
                  coordinates: nlon nlat
                  _FillValue: -9999.9
              unlimited dimensions: 
              current shape = (1440, 400)
              filling off),
             ('relativeError', <class 'netCDF4._netCDF4.Variable'>
              float32 relativeError(nlon, nlat)
                  units: mm/hr
                  coordinates: nlon nlat
                  _FillValue: -9999.9
              unlimited dimensions: 
              current shape = (1440, 400)
              filling off),
             ('gaugeRelativeWeighting', <class 'netCDF4._netCDF4.Variable'>
              int32 gaugeRelativeWeighting(nlon, nlat)
                  units: percent
                  coordinates: nlon nlat
              unlimited dimensions: 
              current shape = (1440, 400)
              filling off),
             ('nlon', <c

In [7]:
# LETS LOOK AT PRECIPITATION'S SHAPE AND PLAY A BIT ...
rootgrp.variables['precipitation'].shape

(1440, 400)

In [8]:
rootgrp.variables['precipitation'][1439,399]

0.09600807

In [9]:
rootgrp.variables['nlon'].shape

(1440,)

In [10]:
rootgrp.variables['nlon'][:]

array([-179.875, -179.625, -179.375, ...,  179.375,  179.625,  179.875], dtype=float32)

In [11]:
rootgrp.variables['nlat'][:]

array([-49.875, -49.625, -49.375, -49.125, -48.875, -48.625, -48.375,
       -48.125, -47.875, -47.625, -47.375, -47.125, -46.875, -46.625,
       -46.375, -46.125, -45.875, -45.625, -45.375, -45.125, -44.875,
       -44.625, -44.375, -44.125, -43.875, -43.625, -43.375, -43.125,
       -42.875, -42.625, -42.375, -42.125, -41.875, -41.625, -41.375,
       -41.125, -40.875, -40.625, -40.375, -40.125, -39.875, -39.625,
       -39.375, -39.125, -38.875, -38.625, -38.375, -38.125, -37.875,
       -37.625, -37.375, -37.125, -36.875, -36.625, -36.375, -36.125,
       -35.875, -35.625, -35.375, -35.125, -34.875, -34.625, -34.375,
       -34.125, -33.875, -33.625, -33.375, -33.125, -32.875, -32.625,
       -32.375, -32.125, -31.875, -31.625, -31.375, -31.125, -30.875,
       -30.625, -30.375, -30.125, -29.875, -29.625, -29.375, -29.125,
       -28.875, -28.625, -28.375, -28.125, -27.875, -27.625, -27.375,
       -27.125, -26.875, -26.625, -26.375, -26.125, -25.875, -25.625,
       -25.375, -25.

In [12]:
print(rootgrp.variables['nlon'][130])
print(rootgrp.variables['nlat'][10])

-147.375
-47.375


In [13]:
print(rootgrp.variables['precipitation'][130,10])

0.0950806


### Back on track: Iterating over a large number of files ...

Let's say we put a bunch of files in our cart:

![](./opera_2017-06-15_12-22-04.png)

We can proceed to get the URLs by checking out and can list all the URLs using the `Download URL List (Data)` option:

![](./opera_2017-06-15_12-24-04.png)

and you will get a list of files in your cart like this:

![](./opera_2017-06-15_12-33-19.png)

Copy these using cut and paste and store these into a file -- we'll call ours `data_sample.txt`.

Now we will process the contents of the file we just saved.

In [14]:
with open("./data_sample.txt") as f_ds:
    urls = [l.strip() for l in f_ds.readlines()]  
    # readlines() includes the \n at the end of the line 
    # so this cleans it up for us in one line

If you notice, all the URLs we have (at least for the HDF files) is separated by `\\`.  Unfortunately, with GES DISC the URL to get the netCDF file, we need a different URL and we also need to append the filename with `.nc` for netCDF.

I'm not clear on why the URLs are different (other than they are different formats and there is an indication that [OpenDAP](https://www.opendap.org) is at play on their side), but the URL we need to obtain the data is only different by a small amount:

instead of 

* http://disc2.gesdisc.eosdis.nasa.gov/data//TRMM_L3/TRMM_3B43.7/1998/3B43.19980801.7.HDF

we'll use

* http://disc2.gesdisc.eosdis.nasa.gov/opendap/TRMM_L3/TRMM_3B43.7/1998/3B43.19980801.7.HDF.nc


Notice `data` is replaced by `opendap` ... everything else is the same.

One way to do this is to use `split()` on `\\`.  Another is to use `replace()`.  We'll use replace and I'll leave is as fun to use `replace()` as shown in the section below.

In [15]:
for url in urls:
    local_filename = url.split('/')[-1]
    url = url.replace('/data','/opendap').replace('.HDF', '.HDF.nc')
    
    # make the request to the server
    req  = request.Request(url)
    resp = urlopen(req)
    
    with open(local_filename, "wb") as fo:
        fo.write(resp.read())
    
    print('.', end='')

.....................................

**USING SPLIT INSTEAD OF REPLACE**

In [16]:
urls[0].split('//')

['http:',
 'disc2.gesdisc.eosdis.nasa.gov/data',
 'TRMM_L3/TRMM_3B43.7/2000/3B43.20001201.7A.HDF']

In [17]:
file_urls = [url.split('//')[-1] for url in urls if url.endswith('.HDF')]

In [18]:
file_urls

['TRMM_L3/TRMM_3B43.7/2000/3B43.20001201.7A.HDF',
 'TRMM_L3/TRMM_3B43.7/2000/3B43.20001101.7A.HDF',
 'TRMM_L3/TRMM_3B43.7/2000/3B43.20001001.7A.HDF',
 'TRMM_L3/TRMM_3B43.7/2000/3B43.20000901.7A.HDF',
 'TRMM_L3/TRMM_3B43.7/2000/3B43.20000801.7A.HDF',
 'TRMM_L3/TRMM_3B43.7/2000/3B43.20000701.7A.HDF',
 'TRMM_L3/TRMM_3B43.7/2000/3B43.20000601.7A.HDF',
 'TRMM_L3/TRMM_3B43.7/2000/3B43.20000501.7A.HDF',
 'TRMM_L3/TRMM_3B43.7/2000/3B43.20000401.7A.HDF',
 'TRMM_L3/TRMM_3B43.7/2000/3B43.20000301.7A.HDF',
 'TRMM_L3/TRMM_3B43.7/2000/3B43.20000201.7A.HDF',
 'TRMM_L3/TRMM_3B43.7/2000/3B43.20000101.7A.HDF',
 'TRMM_L3/TRMM_3B43.7/1999/3B43.19991201.7.HDF',
 'TRMM_L3/TRMM_3B43.7/1999/3B43.19991101.7.HDF',
 'TRMM_L3/TRMM_3B43.7/1999/3B43.19991001.7.HDF',
 'TRMM_L3/TRMM_3B43.7/1999/3B43.19990901.7.HDF',
 'TRMM_L3/TRMM_3B43.7/1999/3B43.19990801.7.HDF',
 'TRMM_L3/TRMM_3B43.7/1999/3B43.19990701.7.HDF',
 'TRMM_L3/TRMM_3B43.7/1999/3B43.19990601.7.HDF',
 'TRMM_L3/TRMM_3B43.7/1999/3B43.19990501.7.HDF',
 'TRMM_L

In [21]:
gesdisc_base_dap_url = 'http://disc2.gesdisc.eosdis.nasa.gov/opendap/'
for url in file_urls:
    gesdisc_url = "{}/{}.nc".format(gesdisc_base_dap_url, url)
    
    # grab the local filename
    local_filename = url.split('/')[-1]
    
    # make the request to the server
    req  = request.Request(gesdisc_url)
    resp = urlopen(req)
    
    with open(local_filename, "wb") as fo:
        fo.write(resp.read())
    
    print('.', end='')

....................................

### PyDAP Solution

[PyDAP](http://www.pydap.org/en/latest/#) is another tool you may try to use to get the job done, though I was unable to get it to work as easily as the solution above.

To get PyDAP, you will need to first do the following and install the `pydap` package in conda:

```bash
conda install pydap
```

Once this is done (you may have to hit Y) a few times through the installer.