![logo](../../_static/images/NCI_logo.png)


-------

# Data Access using netCDF4 library 


### In this notebook:

* Launch Jupyter Notebook

* Finding data 

* Opening the file

* Extracting data: Remote vs. direct filesystem access 

* Small subsets

* Large subsets


---------



### Launch the Jupyter Notebook application

**Using pre-built VDI modules:**

Load the `python`, `ipython`, and `netcdf4-python` modules:

```
    $ module load python3
    $ module load ipython/4.2.0-py3.5
    $ module load netcdf4-python/1.2.4-ncdf-4.3.3.1-py3.5
```    
    
Launch the Jupyter Notebook application:
```
    $ jupyter notebook
``` 

<div class="alert alert-info">
<b>NOTE: </b> This will launch the <b>Notebook Dashboard</b> within a new web browser window. 
</div>

**Using virtual environments:**

To use along with customised python packages in a virtual environment, begin by following the steps in **Python on the VDI: Part II**. 

Once you have a virtual environment setup with your packages (including `Jupyter`), proceed by loading the required modules and activating the virtual environment:

```
    $ module load python/2.7.11
    $ source <path_to_virtual_environment>/bin/activate
```

Then, as above, launch the Jupyter Notebook application:

```
    $ jupyter notebook
```    
    
<div class="alert alert-warning">
<b>NOTE: </b> If you have already followed <b>Python on the VDI: Part II</b>, you should have installed the netcdf4-python package, which is required in the remainder of this notebook.  
</div>

### Find some NetCDF data

In this example, we will use a file from the Geoscience Australia Geophysics National Coverages Collection:

    /g/data/rr2/national_geophysical_compilations/magmap_v6_2015_VRTP/magmap_v6_2015_VRTP.nc
    
and we are going to compare direct vs. remote access. Timings (using the `%%time` magic function) will also be shown to help illustrate when it can be useful to conduct analysis on the filesystem.

**Local path on /g/data**

In [1]:
path = '/g/data/rr2/national_geophysical_compilations/magmap_v6_2015_VRTP/magmap_v6_2015_VRTP.nc'

**OPeNDAP Data URL**

For more information on where to find OPeNDAP URL's, see:
<a href="https://nbviewer.jupyter.org/github/nci/nci-notebooks/blob/master/Data_Access/Using_Thredds/THREDDS_DataAccess.ipynb">THREDDS Data Server: Data Access</a>



In [2]:
url = 'http://dapds00.nci.org.au/thredds/dodsC/rr2/national_geophysical_compilations/magmap_v6_2015_VRTP/magmap_v6_2015_VRTP.nc'

### Open file

In [3]:
from netCDF4 import Dataset

In [4]:
%%time

f1 = Dataset(path)

CPU times: user 8 ms, sys: 27 ms, total: 35 ms
Wall time: 2.06 s


In [5]:
%%time

f2 = Dataset(url)

CPU times: user 33 ms, sys: 41 ms, total: 74 ms
Wall time: 797 ms


### Extracting data: Remote vs. direct filesystem access

<div class="alert alert-info">
One big advantage of working directly on the filesystem is that data access is much faster. For modest subsets, the difference is quite small but as you work with larger data, remote access can become much slower or even exceed NCI's THREDDS Data Server memory limits. 
</div>

**Small subsets**

File variables

In [7]:
vars = f2.variables.keys()
for item in vars:
    print('Variable: \t', item)
    print('Dimensions: \t', f2[item].dimensions)
    print('Shape:    \t', f2[item].shape, '\n')

Variable: 	 lat
Dimensions: 	 ('lat',)
Shape:    	 (41882,) 

Variable: 	 crs
Dimensions: 	 ()
Shape:    	 () 

Variable: 	 lon
Dimensions: 	 ('lon',)
Shape:    	 (50591,) 

Variable: 	 mag_tmi_rtp_anomaly
Dimensions: 	 ('lat', 'lon')
Shape:    	 (41882, 50591) 



Extract: Remotely

In [8]:
%%time

lat = f2.variables['lat'][:1000]
lon = f2.variables['lon'][:1000]

mag = f2.variables['mag_tmi_rtp_anomaly'][:1000,:1000]

CPU times: user 64 ms, sys: 35 ms, total: 99 ms
Wall time: 923 ms


Extract: Locally

In [9]:
%%time

lat = f1.variables['lat'][:1000]
lon = f1.variables['lon'][:1000]

mag = f1.variables['mag_tmi_rtp_anomaly'][:1000,:1000]

CPU times: user 26 ms, sys: 19 ms, total: 45 ms
Wall time: 84.2 ms


**Large subsets**


Extract: Remotely

In [10]:
%%time

lat = f2.variables['lat'][:]
lon = f2.variables['lon'][:]

mag = f2.variables['mag_tmi_rtp_anomaly'][:,:]

RuntimeError: NetCDF: Access failure

<br></br>
<div class="alert alert-info">
You will notice the remote example below results in an "Access failure" because it is too large a request. To access this amount of data remotely, the request would have to be made in iterative chunks. However, the more performant alternative is to work direct on the filesystem from within the VDI. 
</div>



Extract: Locally

In [None]:
%%time

lat = f1.variables['lat'][:]
lon = f1.variables['lon'][:]

mag = f1.variables['mag_tmi_rtp_anomaly'][:,:]