## **Exploring netCDFs** 
adapted from [Katy Abbot](https://github.com/amnh/BridgeUP-STEM-Oceans-Six/blob/master/jupyter-notebooks/netCDF_practice.ipynb)

![image](https://camo.githubusercontent.com/77e36a1f8169f7da010f7c1615fe39ab88f190ea/687474703a2f2f6465736b746f702e6172636769732e636f6d2f656e2f6172636d61702f31302e332f6d616e6167652d646174612f6e65746364662f475549442d44383732413443332d373439452d343135392d413643302d4642364433423437433544382d7765622e676966)
What are netCDF files? The acronym stands for Network Common Data Form, and they're a way of formatting data that makes it easy for other scientists to share and read data on different computers, with different operating systems, with different software etc... without running into issues or struggling to understand someone else's work!

netCDF files are in what we call an array-oriented dataset. Data is stored in arrays, which are like grids, and can be accessed by selecting the appropriate row and column. Here's an example of a 2D array:

<img src="https://camo.githubusercontent.com/b525fcfb6792a87d5a15b0b1c52fc39aff739722/68747470733a2f2f7777772e6479636c617373726f6f6d2e636f6d2f696d6167652f746f7069632f632f32642d61727261792f32642d61727261792e6a7067" width="600"/>

With netCDF files, our rows, columns, and other indices are called dimensions, and they can take values such as latitude, longitude and time.


<img src="https://simulatingcomplexity.files.wordpress.com/2014/11/netcdf-file-structure.png" width="400"/>

Let's try to explore this file format with an actual file. Make sure you have the file **n-atlantic-t0.nc** somewhere you will be able to find it but **not in your GitHub repository (not in ocean-motion)**. This is the data we are going to use for our data processing. 

First, we are going to explore the file in Terminal.

* In Terminal, type **ncdump -h n-atlantic-t0.nc** to see all the headers for the file. 

* Type **ncdump -v header-title n-atlantic-t0.nc**, where header-title is the header you want to look at, to see the data in the file under that header.
    
* Try exploring the files by searching different headers (time, lattitude, etc.)

Now we are going to explore using python:  Our first step is to import netCDF4. 

Then we are going to load the dataset using the ``Dataset()`` function, one of the main tools we use for viewing netCDF files. The ``r`` in the function tells the function that you are opening the file to read it.

In [2]:
from netCDF4 import Dataset #import Dataset from the netCDF4 package
my_data = Dataset(r'/Users/brownscholar/Desktop/A2 Internship/n-atlantic-t0.nc') #replace with pathname for your computer


FileNotFoundError: [Errno 2] No such file or directory: b'/Users/brownscholar/Desktop/A2 Internship/n-atlantic-t0.nc'

In [None]:
print(my_data) #What output do you see when you run this command?

Note that we've now created an object, called my_data, that we can use to access different aspects of the file. We'll use the dot notation (i.e. ``my_data.blahblahblah``) to access different parts of the data.

Let's find out more about this dataset. We'll look at the "metadata," which is basically data about the data. 

Scientists use this to explain how the data was acquired or made, how old it is, who to contact with questions etc. First, we'll look at the dataset's "global attributes," which can be accessed by calling ncattrs (shorthand for netcdf attributes).

In [None]:
my_data.ncattrs()

To look at one of these, type in the name of the dataset variable, and add a period (.) and the name of the attribute you want to look at.

In [None]:
print(my_data.title)

# add two here

You can access the dimensions of the dataset by calling **my_data.dimensions.** Notice that the output is a dictionary. We can see the "keys," or dimension names, with **my_data.dimensions.keys()**

In [None]:
print(my_data.dimensions)
print(my_data.dimensions.keys())

If you want to see a specific dimension, you can do so by adding brackets and the dimension name in quotes. i.e. **my_data.dimensions['time']**

In [None]:
print(my_data.dimensions['time'])
print(my_data.dimensions['latitude'])


We can also access the variables of our dataset by typing dataset.variables

In [None]:
print(my_data.variables, "\n \n")  #"\n" creates a new empty line so you can separate your output



This is kind of too much information, right? To just look at the names of the variables, we can use ``.variables.keys()``:

In [None]:
print(my_data.variables.keys())

## looking at one variable + its attributes: 
These variables have a lot more information, right? Let's look at just one variable: latitude. Inspect it by typing **my_data.variables['latitude']**


In [None]:
my_data.variables['latitude']

How many different attributes can you identify? (standard_name, long_name, cell_methods, _FillValue, missing_value, original_name, original_units, history, current shape). Look at the second line. It gives the name of the variable, and it also lists three names in parentheses after it. What do you think those names signify?

## looking at a specific attribute:
We can access any one of these attributes by calling it directly. Just add a period at the end of your call to a variable and add in the attribute name.

In [None]:
print(my_data.variables['latitude'].long_name)
# I got 'long_name' from the list of attributes above
# other examples are 'unit_long', 'units', 'valid min', and 'valid max'. 

You may be wondering: Where's the actual data?? So far, we've learning about what variables and dimensions are in this dataset, but we haven't actually seen any numbers or values.

Let's look at the latitude and longitude values. To do so, you'll call on a variable (i.e. ``my_data.variables['longitude']``, as above), but you'll add ``[:]`` after it to tell the computer that you want to see the numpy array.

In [None]:
print("latitude: ", my_data.variables['latitude'][:]) #print the latitude values, and then add a line break to distinguish from longitude


Let's look at some attributes and data of another variable. From the variables in ``my_data.variables.keys()`` pick another variable, print all the attributes, print some specific attributes, and print the data in that variable. 

In [None]:
# pick a variable and print all the atributes here:


In [None]:
# print a few specific attributes here: 


In [None]:
# print the data in that variable here: 


For the variable you chose, try to understand what it is, and what the meanings of the attributes and data are. 

## ðŸ‘‰netCDF file cheat sheetðŸ‘ˆ
[This tutorial](http://www.ceda.ac.uk/static/media/uploads/ncas-reading-2015/10_read_netcdf_python.pdf) was written in Python 2.7, so the print command is slightly different, but it's a helpful read to understand how these files work.

Addditionally:
1. Import the tools to open a dataset: **from netCDF4 import Dataset**
2. Open a dataset: **dataset = Dataset(r'filename.nc')**
3. View the dataset's attributes: **dataset.ncattrs()**
4. Access a specific attribute: **dataset.attribute_name**
5. View the dataset's dimensions: **dataset.dimensions**
6. View a specific dimension: **dataset.dimensions[ 'name of dimension' ]**
7. View the dataset's variables: **dataset.variables**
8. View a specific variable: **dataset.variables[ 'name of variable' ]**
9. See a variable's values: **dataset.variables[ 'name of variable' ][ : ]**