# Structure of netCDF file

###  netCDF file

netCDF (network Common Data Form) is a file format for storing multidimensional array data. This is an example of a file with temperature and precipitation data.

![](http://xarray.pydata.org/en/stable/_images/dataset-diagram.png)

A netCDF file includes: 
* **data variables**: temperature, precipitation
* **coordinates**: time, latitude, longitude
* **attributes**: information about how the data is collected, institutions involved etc.

In [2]:
#Import the xarray library which is used to read and analyze netCDF files
import xarray as xr
#Other useful Python libraries
import os

In [7]:
#Change to your data directory
dataPath = '/Users/brownscholar/Desktop/BridgeUP_Climate_Guardians_GitHub/Labs'
os.chdir(dataPath)
os.getcwd()

FileNotFoundError: [Errno 2] No such file or directory: '/Users/Brown Scholar/Desktop/BridgeUP_Climate_Guardians_GitHub/Labs'

### Import netCDF file

`open_dataset(file name)` : imports netCDF file

In [6]:
# Load the sea surface temperature dataset
fileName = 'HadISST_sst.nc'
data = xr.open_dataset(fileName)
data

FileNotFoundError: [Errno 2] No such file or directory: '/Users/brownscholar/Desktop/BridgeUP_Climate_Guardians_GitHub/Labs/HadISST_sst.nc'

In [None]:
# This is what the display will look like on Sublime so we'll change our display style to reflect that
xr.set_options(display_style="text")
data

### Extract sea surface temperature (SST)

This is a lot of information! But we are only interested in one variable (i.e sst) within the dataset so let's extract that. 

We need to give Python a path to this value much like you would with the path to your Data folder but instead of separating values with `/` you use `.`

In [None]:
# Extract the sst variable
data.sst

Your SST data is saved as a DataArray which is xarray's implementation of a labeled, multi-dimensional array.
It has several key properties:

| Attribute | Description                                                                                                                              |
| --------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `data`    | array's values.                                                                              |
| `dims`    | dimension names for each axis. For example:(`x`, `y`, `z`) (`lat`, `lon`, `time`).                                                       |
| `coords`  | values of the dimensions |
| `attrs`   | relevant attributes/metadata (for example: units, research institution)                                                                   |
| `name`    | name of the array                                                                                      |


In [4]:
# Extract dimensions
data.sst.dims

NameError: name 'data' is not defined

In [7]:
# Extract coordinates or values of the dimensions
data.sst.coords

Coordinates:
  * time       (time) datetime64[ns] 1870-01-16T11:59:59.505615234 ... 2020-0...
  * latitude   (latitude) float32 89.5 88.5 87.5 86.5 ... -87.5 -88.5 -89.5
  * longitude  (longitude) float32 -179.5 -178.5 -177.5 ... 177.5 178.5 179.5

In [8]:
#You can also extract a particular coordinate such as latitude
data.sst.latitude

In [11]:
# Extract attributes
data.sst.attrs

{'standard_name': 'sea_surface_temperature',
 'long_name': 'sst',
 'units': 'C',
 'cell_methods': 'time: lat: lon: mean'}

In [None]:
# Extract the units of the SST variable


## Practice

In your processing data script (`process_SST_data.py`), 
1. load in the netCDF file `HadISST_sst.nc` using xarray
1. What values will be useful in our analysis? Save these as separate variables so you can call on them easily. 

Use the space below to test out your code.

In [1]:
fileName = 'HadISST_sst.nc'

NameError: name 'HadISST_sst' is not defined

### Cheatsheet

`open_dataset(file name)`: imports netCDF file

`data_array.data_variable`: extracts data variable such as termperature or precipitation

`data_array.data_variable.dims`: names of the dimensions of the data variable such as time, latitude, ...

`data_array.data_variable.coords`: values of the dimensions of the data variable

`data_array.data_variable.time`: extracts a particular coordinate

`data.sst.attrs`: attributes of the data variable like it's name, units, ...