## **Exploring netCDFs** 
adapted from [Katy Abbot](https://github.com/amnh/BridgeUP-STEM-Oceans-Six/blob/master/jupyter-notebooks/netCDF_practice.ipynb)

![image](https://camo.githubusercontent.com/77e36a1f8169f7da010f7c1615fe39ab88f190ea/687474703a2f2f6465736b746f702e6172636769732e636f6d2f656e2f6172636d61702f31302e332f6d616e6167652d646174612f6e65746364662f475549442d44383732413443332d373439452d343135392d413643302d4642364433423437433544382d7765622e676966)
What are netCDF files? The acronym stands for Network Common Data Form, and they're a way of formatting data that makes it easy for other scientists to share and read data on different computers, with different operating systems, with different software etc... without running into issues or struggling to understand someone else's work!

netCDF files are in what we call an array-oriented dataset. Data is stored in arrays, which are like grids, and can be accessed by selecting the appropriate row and column. Here's an example of a 2D array:

<img src="https://camo.githubusercontent.com/b525fcfb6792a87d5a15b0b1c52fc39aff739722/68747470733a2f2f7777772e6479636c617373726f6f6d2e636f6d2f696d6167652f746f7069632f632f32642d61727261792f32642d61727261792e6a7067" width="600"/>

With netCDF files, our rows, columns, and other indices are called dimensions, and they can take values such as latitude, longitude and time.


<img src="https://simulatingcomplexity.files.wordpress.com/2014/11/netcdf-file-structure.png" width="400"/>

Let's try to explore this file format with an actual file. Make sure you have the file **n-atlantic-t0.nc** somewhere you will be able to find it but **not in your GitHub repository (not in ocean-motion)**. This is the data we are going to use for our data processing. 

First, we are going to explore the file in Terminal.

* In Terminal, type **ncdump -h n-atlantic-t0.nc** to see all the headers for the file. 

* Type **ncdump -v header-title n-atlantic-t0.nc**, where header-title is the header you want to look at, to see the data in the file under that header.
    
* Try exploring the files by searching different headers (time, lattitude, etc.)

Now we are going to explore using python:  Our first step is to import netCDF4. 

Then we are going to load the dataset using the ``Dataset()`` function, one of the main tools we use for viewing netCDF files. The ``r`` in the function tells the function that you are opening the file to read it.

In [7]:
from netCDF4 import Dataset #import Dataset from the netCDF4 package
my_data = Dataset(r'C:\Users\Me\Desktop\Data for Internships\n-atlantic-t0.nc') #replace with pathname for your computer


In [8]:
print(my_data) #What output do you see when you run this command?

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_CLASSIC data model, file format NETCDF3):
    description: ARMOR3D REP CMEMS - April 2019 Release
    title: ARMOR3D REP - TSHUVMld Global Ocean Observation-based Product
    Conventions: CF-1.0
    institution: CLS
    domain_name: GLO
    history: 2019-03-22 17:53:09 ARMOR3D REP - TSHUV Global Ocean Obervation-based Product netCDF creation
    _CoordSysBuilder: ucar.nc2.dataset.conv.CF1Convention
    references:  
    comment: 
    source:  
    dimensions(sizes): time(261), depth(7), latitude(158), longitude(255)
    variables(dimensions): int16 [4mdepth[0m(depth), int16 [4mzo[0m(time,depth,latitude,longitude), float32 [4mlatitude[0m(latitude), int32 [4mtime[0m(time), int16 [4mto[0m(time,depth,latitude,longitude), int16 [4mso[0m(time,depth,latitude,longitude), int16 [4mugo[0m(time,depth,latitude,longitude), float32 [4mlongitude[0m(longitude), int16 [4mvgo[0m(time,depth,latitude,longitude)
    groups: 



Note that we've now created an object, called my_data, that we can use to access different aspects of the file. We'll use the dot notation (i.e. ``my_data.blahblahblah``) to access different parts of the data.

Let's find out more about this dataset. We'll look at the "metadata," which is basically data about the data. 

Scientists use this to explain how the data was acquired or made, how old it is, who to contact with questions etc. First, we'll look at the dataset's "global attributes," which can be accessed by calling ncattrs (shorthand for netcdf attributes).

In [9]:
my_data.ncattrs()

['description',
 'title',
 'Conventions',
 'institution',
 'domain_name',
 'history',
 '_CoordSysBuilder',
 'references',
 'comment',
 'source']

To look at one of these, type in the name of the dataset variable, and add a period (.) and the name of the attribute you want to look at.

In [12]:
print(my_data.title)

# add two here
print(my_data.institution)
print(my_data.Conventions)

ARMOR3D REP - TSHUVMld Global Ocean Observation-based Product
CLS
CF-1.0


You can access the dimensions of the dataset by calling **my_data.dimensions.** Notice that the output is a dictionary. We can see the "keys," or dimension names, with **my_data.dimensions.keys()**

In [13]:
print(my_data.dimensions)
print(my_data.dimensions.keys())

OrderedDict([('time', <class 'netCDF4._netCDF4.Dimension'>: name = 'time', size = 261
), ('depth', <class 'netCDF4._netCDF4.Dimension'>: name = 'depth', size = 7
), ('latitude', <class 'netCDF4._netCDF4.Dimension'>: name = 'latitude', size = 158
), ('longitude', <class 'netCDF4._netCDF4.Dimension'>: name = 'longitude', size = 255
)])
odict_keys(['time', 'depth', 'latitude', 'longitude'])


If you want to see a specific dimension, you can do so by adding brackets and the dimension name in quotes. i.e. **my_data.dimensions['time']**

In [14]:
print(my_data.dimensions['time'])
print(my_data.dimensions['latitude'])

<class 'netCDF4._netCDF4.Dimension'>: name = 'time', size = 261

<class 'netCDF4._netCDF4.Dimension'>: name = 'latitude', size = 158




We can also access the variables of our dataset by typing dataset.variables

In [15]:
print(my_data.variables, "\n \n")  #"\n" creates a new empty line so you can separate your output



OrderedDict([('depth', <class 'netCDF4._netCDF4.Variable'>
int16 depth(depth)
    axis: Z
    long_name: depth
    positive: down
    standard_name: depth
    unit_long: meter
    units: m
    _CoordinateAxisType: Height
    _CoordinateZisPositive: down
    valid_min: 0
    valid_max: 100
unlimited dimensions: 
current shape = (7,)
filling on, default _FillValue of -32767 used
), ('zo', <class 'netCDF4._netCDF4.Variable'>
int16 zo(time, depth, latitude, longitude)
    _FillValue: 32767
    long_name: absolute height
    scale_factor: 0.001
    standard_name: geopotential_height
    unit_long: meter
    units: m
    valid_range: [-20000  20000]
unlimited dimensions: 
current shape = (261, 7, 158, 255)
filling on), ('latitude', <class 'netCDF4._netCDF4.Variable'>
float32 latitude(latitude)
    axis: Y
    long_name: latitude
    standard_name: latitude
    step: 0.25
    unit_long: degrees north
    units: degrees_north
    _CoordinateAxisType: Lat
    valid_min: 12.625
    valid_max: 51

This is kind of too much information, right? To just look at the names of the variables, we can use ``.variables.keys()``:

In [16]:
print(my_data.variables.keys())

odict_keys(['depth', 'zo', 'latitude', 'time', 'to', 'so', 'ugo', 'longitude', 'vgo'])


## looking at one variable + its attributes: 
These variables have a lot more information, right? Let's look at just one variable: latitude. Inspect it by typing **my_data.variables['latitude']**


In [21]:
my_data.variables['ugo']

<class 'netCDF4._netCDF4.Variable'>
int16 ugo(time, depth, latitude, longitude)
    _FillValue: 32767
    long_name: geostrophic zonal velocity from thermal wind equation
    scale_factor: 0.001
    standard_name: geostrophic_eastward_sea_water_velocity
    unit_long: meter per second
    units: m s-1
    valid_range: [-4000  4000]
unlimited dimensions: 
current shape = (261, 7, 158, 255)
filling on

How many different attributes can you identify? (standard_name, long_name, cell_methods, _FillValue, missing_value, original_name, original_units, history, current shape). Look at the second line. It gives the name of the variable, and it also lists three names in parentheses after it. What do you think those names signify?

## looking at a specific attribute:
We can access any one of these attributes by calling it directly. Just add a period at the end of your call to a variable and add in the attribute name.

In [23]:
print(my_data.variables['ugo'].long_name)
# I got 'long_name' from the list of attributes above
# other examples are 'unit_long', 'units', 'valid min', and 'valid max'. 

geostrophic zonal velocity from thermal wind equation


You may be wondering: Where's the actual data?? So far, we've learning about what variables and dimensions are in this dataset, but we haven't actually seen any numbers or values.

Let's look at the latitude and longitude values. To do so, you'll call on a variable (i.e. ``my_data.variables['longitude']``, as above), but you'll add ``[:]`` after it to tell the computer that you want to see the numpy array.

In [24]:
print("latitude: ", my_data.variables['latitude'][:]) #print the latitude values, and then add a line break to distinguish from longitude


latitude:  [12.625 12.875 13.125 13.375 13.625 13.875 14.125 14.375 14.625 14.875
 15.125 15.375 15.625 15.875 16.125 16.375 16.625 16.875 17.125 17.375
 17.625 17.875 18.125 18.375 18.625 18.875 19.125 19.375 19.625 19.875
 20.125 20.375 20.625 20.875 21.125 21.375 21.625 21.875 22.125 22.375
 22.625 22.875 23.125 23.375 23.625 23.875 24.125 24.375 24.625 24.875
 25.125 25.375 25.625 25.875 26.125 26.375 26.625 26.875 27.125 27.375
 27.625 27.875 28.125 28.375 28.625 28.875 29.125 29.375 29.625 29.875
 30.125 30.375 30.625 30.875 31.125 31.375 31.625 31.875 32.125 32.375
 32.625 32.875 33.125 33.375 33.625 33.875 34.125 34.375 34.625 34.875
 35.125 35.375 35.625 35.875 36.125 36.375 36.625 36.875 37.125 37.375
 37.625 37.875 38.125 38.375 38.625 38.875 39.125 39.375 39.625 39.875
 40.125 40.375 40.625 40.875 41.125 41.375 41.625 41.875 42.125 42.375
 42.625 42.875 43.125 43.375 43.625 43.875 44.125 44.375 44.625 44.875
 45.125 45.375 45.625 45.875 46.125 46.375 46.625 46.875 47.125 47

Let's look at some attributes and data of another variable. From the variables in ``my_data.variables.keys()`` pick another variable, print all the attributes, print some specific attributes, and print the data in that variable. 

In [None]:
# pick a variable and print all the atributes here:


In [None]:
# print a few specific attributes here: 


In [None]:
# print the data in that variable here: 


For the variable you chose, try to understand what it is, and what the meanings of the attributes and data are. 

## 👉netCDF file cheat sheet👈
[This tutorial](http://www.ceda.ac.uk/static/media/uploads/ncas-reading-2015/10_read_netcdf_python.pdf) was written in Python 2.7, so the print command is slightly different, but it's a helpful read to understand how these files work.

Addditionally:
1. Import the tools to open a dataset: **from netCDF4 import Dataset**
2. Open a dataset: **dataset = Dataset(r'filename.nc')**
3. View the dataset's attributes: **dataset.ncattrs()**
4. Access a specific attribute: **dataset.attribute_name**
5. View the dataset's dimensions: **dataset.dimensions**
6. View a specific dimension: **dataset.dimensions[ 'name of dimension' ]**
7. View the dataset's variables: **dataset.variables**
8. View a specific variable: **dataset.variables[ 'name of variable' ]**
9. See a variable's values: **dataset.variables[ 'name of variable' ][ : ]**