# Import modules

Firstly, let's import the modules that we will work with

In [18]:
import xarray as xr # For reading in a NetCDF file

# Importing some data

Have you heard of OpenDAP? Try running the cell below.

OpenDAP, which stands for "Open-source Project for a Network Data Access Protocol," is a technology that makes it easier to access and share scientific data over the internet. In simple terms, think of OpenDAP as a way to access scientific data over the internet. One advantage of using OpenDap is that you don't need to download the data!

In [34]:
netcdf_file = 'https://opendap1.nodc.no/opendap/physics/point/cruise/nansen_legacy-single_profile/NMDC_Nansen-Legacy_PR_CT_58US_2021708/CTD_station_P1_NLEG01-1_-_Nansen_Legacy_Cruise_-_2021_Joint_Cruise_2-1.nc'
netcdf_file = '/home/lukem/Documents/Multi_institution_projects/Nansen_Legacy/Data/Chlorophyll_A_Anna_Vader/NetCDF_files/AeN_SeaWaterChlorophyllA_20190807_P1.nc'
xrds = xr.open_dataset(netcdf_file)
print(xrds)

<xarray.Dataset>
Dimensions:          (DEPTH: 63, NCHAR: 36)
Coordinates:
  * DEPTH            (DEPTH) float64 5.0 10.0 15.0 20.0 ... 305.0 310.0 315.0
  * NCHAR            (NCHAR) float64 0.0 1.0 2.0 3.0 4.0 ... 32.0 33.0 34.0 35.0
Data variables:
    CHLOROPHYLL_A    (DEPTH) float64 ...
    FILTERED_VOLUME  (DEPTH) float32 ...
    PHAEOPIGMENT     (DEPTH) float64 ...
    EVENT_ID         (DEPTH, NCHAR) object ...
Attributes: (12/38)
    id:                   8ac9acad-b481-50f0-a579-8d1170c6d3c0
    naming_authority:     The University Centre in Svalbard, Norway
    title:                Chlorophyll A and phaeopigments Nanen Legacy cruise...
    summary:              \n            This dataset is a collection of the a...
    keywords:             Oceans > Ocean chemistry > Chlorophyll
    keywords_vocabulary:  GCMD Science Keywords
    ...                   ...
    sampleType:           Chlorophyll a tot
    pi_name:              Anna Vader
    pi_institution:       UNIS
    pi_email:

Now lets take a look at what we have opened

In [36]:
print(xrds)

<xarray.Dataset>
Dimensions:          (DEPTH: 63, NCHAR: 36)
Coordinates:
  * DEPTH            (DEPTH) float64 5.0 10.0 15.0 20.0 ... 305.0 310.0 315.0
  * NCHAR            (NCHAR) float64 0.0 1.0 2.0 3.0 4.0 ... 32.0 33.0 34.0 35.0
Data variables:
    CHLOROPHYLL_A    (DEPTH) float64 ...
    FILTERED_VOLUME  (DEPTH) float32 ...
    PHAEOPIGMENT     (DEPTH) float64 ...
    EVENT_ID         (DEPTH, NCHAR) object ...
Attributes: (12/38)
    id:                   8ac9acad-b481-50f0-a579-8d1170c6d3c0
    naming_authority:     The University Centre in Svalbard, Norway
    title:                Chlorophyll A and phaeopigments Nanen Legacy cruise...
    summary:              \n            This dataset is a collection of the a...
    keywords:             Oceans > Ocean chemistry > Chlorophyll
    keywords_vocabulary:  GCMD Science Keywords
    ...                   ...
    sampleType:           Chlorophyll a tot
    pi_name:              Anna Vader
    pi_institution:       UNIS
    pi_email:

Let's break down what we see above. 

A classic NetCDF file like this one can be broken down into 3 components - dimensions, variables and global attributes. The variables can be broken down into coordinate variables and data variables. Sometimes they are displayed separately like here, but if you open a NetCDF file using different software the variables might be displayed together.

We will now have a closer look at each of these components, starting at the bottom with attributes. 


### Global attributes

Let's look at the global attributes. These are the metadata that describe the file as a whole. Below you can access them into a python dictionary.

In [6]:
xrds.attrs

{'qc_manual': 'Recommendations for in-situ data Near Real Time Quality Control https://doi.org/10.13155/36230',
 'contact': 'datahjelp@hi.no',
 'distribution_statement': 'These data are public and free of charge. User assumes all risk for use of data. User must display citation in any publication or product using data. User must contact PI prior to any commercial use of data.',
 'naming_authority': 'no.unis',
 'license': 'https://creativecommons.org/licenses/by/4.0/ https://creativecommons.org/licenses/by/4.0/',
 'data_assembly_center': 'IMR',
 'update_interval': 'void',
 'area': 'Arctic Ocean',
 'author': '',
 'Conventions': 'CF-1.8, ACDD-1.3, OceanSITES Manual 1.4',
 'data_mode': 'M',
 'comment': 'Descending CTD profile',
 'history': 'Created at 2022-08-08T12:43:51Z using the xarray library in Python',
 'netcdf_version': 'netCDF-4 classic model',
 'quality_index': 'A',
 'quality_control_indicator': '0',
 'publisher_name': 'Elisabeth Jones',
 'publisher_email': 'datahjelp@imr.no',
 'w

Since the above is a dictionary, we can access a single attribute by calling the *key* or attribute name we are interested in.

In [11]:
xrds.attrs['Conventions']

'CF-1.8, ACDD-1.3, OceanSITES Manual 1.4'

*Conventions* is probably the most important global attribute because it tells you (and a machine) how to interpret the rest of the file. *CF-1.8* refers to version 1.8 of the CF conventions, which you can find here:

https://cfconventions.org/
https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html

The CF conventions are a set of standards that define how a NetCDF file should be structured. The document linked above is extensive, but the aim is to provide a standardised way to organise many different types of data. You don't need to read it all, but it should be your go-to place if you want to know how to do something.

However, the CF conventions are light on discovery metadata. Discovery metadata are metadata that can be used to find data. For example, when and where the data were collected and by whom, some keywords etc. So we also use the ACDD convention - The Attribute Convention for Data Discovery.

https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3

In most cases, if you want to find out what a global attribute means, you can visit the ACDD convention page above to find a description of the attribute. There are other conventions that someone might have included that you can also find online, but we recommend that you always follow the CF and ACDD conventions as a minimum when creating a NetCDF file.

The person who created this file should have read the relevant sections of these documents to make sure that the files comply with these conventions. There are also validators you can run your files by to make sure that you file is compliant with the conventions before you publish it. For example:

https://compliance.ioos.us/index.html

By following conventions, the data creator and user, human or machine, should be able to understand the data in the same way. A NetCDF file itself is no FAIR because you could include any attributes or structure your data however you like. A CF-NetCDF file is FAIR.



### Dimensions:

To extract only the dimensions, you can do this:


In [20]:
xrds.dims

Frozen({'DEPTH': 63, 'NCHAR': 36})

This NetCDF file has N dimensions, called *NAME*, *NAME* and *NAME*. 

The *NAME* dimension has N points. Dimensions tell you about the shape and size of your variables. In this case, we know that any variable with a dimension of PRES will have 320 data points - though some could be NaN. A variable might also have 2 dimensions. For example, a variable with the dimensions *NAME* and *NAME* will be a N x N grid of points.

You can extract individual dimensions like this.

In [21]:
xrds.dims['DEPTH']

63

### Cordinates and Data variables:

Variables are where the data or coordinate values are stored. The coordinate variables usually have the same names as their respective dimensions. So to recap, a dimension tells you how many grid points there are, whilst the coordinate variable tells you the values for those grid points. To extract all the coordinate variables at once, we can do this:

In [22]:
xrds.coords

Coordinates:
  * DEPTH    (DEPTH) float64 5.0 10.0 15.0 20.0 25.0 ... 300.0 305.0 310.0 315.0
  * NCHAR    (NCHAR) float64 0.0 1.0 2.0 3.0 4.0 ... 31.0 32.0 33.0 34.0 35.0

The name of the variable is given first.

The dimension (or dimensions) that the variable has is second in brackets. So a variable with 2 dimensions will contain a 2D grid of values, where the size of grid can be seen by looking at the dimensions. 

Third is the format that the values are stored in. float64 tells you the values are decimal places. There is a list of possible formats here:

The data variables can be extracted like this: 

In [29]:
xrds.data_vars

Data variables:
    CHLOROPHYLL_A    (DEPTH) float64 ...
    FILTERED_VOLUME  (DEPTH) float32 ...
    PHAEOPIGMENT     (DEPTH) float64 ...
    EVENT_ID         (DEPTH, NCHAR) object ...

To extract a single variable named *DEPTH*:

In [28]:
xrds['DEPTH']

Above you can see not only the data values but also the *Attributes*, which are the metadata describing that variable. To extract only the data values themselves:

In [30]:
xrds['DEPTH'].values

array([  5.,  10.,  15.,  20.,  25.,  30.,  35.,  40.,  45.,  50.,  55.,
        60.,  65.,  70.,  75.,  80.,  85.,  90.,  95., 100., 105., 110.,
       115., 120., 125., 130., 135., 140., 145., 150., 155., 160., 165.,
       170., 175., 180., 185., 190., 195., 200., 205., 210., 215., 220.,
       225., 230., 235., 240., 245., 250., 255., 260., 265., 270., 275.,
       280., 285., 290., 295., 300., 305., 310., 315.])

Or alternatively, just the attributes

In [32]:
xrds['DEPTH'].attrs

{'units': 'metres',
 'long_name': 'Sample depth below sea level, positive is increasing depth',
 'standard_name': 'depth',
 'coverage_content_type': 'physicalMeasurement',
 'positive': 'down'}

The attributes are retrieved as a dictionary, so it is possible to access a single attribute by calling the *key* or attribute name.

In [37]:
xrds['DEPTH'].attrs['standard_name']

'depth'

The variable name *PRES* is not standardised, but the *standard_name* variable attribute is from the CF conventions. You can find it in the list of CF standard names here. 
https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html

You can read the description and see what the canonical units should be. The data in the file doesn't need to be stored with the same units, but should be stored with units that are physically equivalent. 