
<table>
    <tr>
        <td><img src='https://coastalrisk.live/wp-content/uploads/2018/05/cera_50x50.png' alt='Image' width='50' height=50'></td>
        <td><h1 align="left"><font color='green'>Unveiling the Storm: A Beginner's Guide toAnalyzing NetCDF Data with Python for<br>Coastal Emergency Risks Assessment (CERA)</font></h1></td>
    </tr>
</table>

<p align="center">
  <img src="https://cloud.cera.lsu.edu/s/pCmmXMCmR4GMXja/download/1_Intro.png" alt="Image Description">
</p>

# Table of Contents

- [Part 1: Introduction](#part-1-introduction)
  - [1.1 - What is NetCDF?](#11---what-is-netcdf)
  - [1.2 - Software Installation](#12---software-installation)
  - [1.3 - Tutorial example NetCDF file](#13---tutorial-example-netcdf-file)

- [Part 2: Reading a NetCDF File and Understanding its Structure](#part-2-reading-a-netcdf-file-and-understanding-its-structure)
  - [2.1 - Opening a NetCDF file](#21---opening-a-netcdf-file)
  - [2.2 - NetCDF file structure](#22---netcdf-file-structure)

- [Part 3: Getting the NetCDF File Attributes (Metadata)](#part-3-getting-the-netcdf-file-attributes-metadata)
  - [3.1 - Dataset attributes (metadata)](#31---dataset-attributes-metadata)
  - [3.2 - Utilizing the Python dictionary structure for working with NetCDF](#32---utilizing-the-python-dictionary-structure-for-working-with-netcdf)
  - [3.3 - Variable attributes (metadata)](#33---variable-attributes-metadata)
  - [3.4 - Dimensions attributes (metadata)](#34---dimensions-attributes-metadata)

- [Part 4: Accessing the NetCDF Data Values](#part-4-accessing-the-netcdf-data-values)
    - [4.1 - Parsing the data values from the NetCDF variables](#41---parsing-the-data-values-from-the-netcdf-variables)
    - [4.2 - Further work with NumPy](#42---further-work-with-numpy)

>Disclaimer: If you are running through google Colab please use the table of contents from the top left corner.


# Part 1: Introduction
Scientific geospatial datasets like climatological or oceanographic model results can sometimes grow in complexity and data volume very quickly. It can become very 
challenging to work with the huge amount of outputs if the data is simulated over large geographic regions or is generated multiple times per day. The NetCDF format provides 
a solution for efficient data management by storing the datasets in a well-organized manner that allows a successful data analysis in a user-friendly way. This tutorial 
explains the structure of a NetCDF file using the Python library NetCDF4.



##  What is the structure of NetCDF and how to work with it?


Analyzing NetCDF Data with Python is a beginners-friendly guide for learning the basics of reading and analyzing NetCDF using the Python library netCDF4. 
All sections of this four-part tutorial are available as a Jupyter Notebook utilizing the Python programming language and an example NetCDF file from the ADCIRC ocean circulation model. 


## 1.1 - What is NetCDF?


#### About NetCDF
Network common data form (NetCDF) is a data format that is widely used to store multi-dimensional, array-oriented information like geographic, meteorological or oceanographic data. 
NetCDF often handles huge datasets that hold spatial information and data values sampled several times a day or collected over large geographic regions. 
The NetCDF file format structure supports this by providing the data values as named arrays that also can hold metadata to describe the dataset.

#### NetCDF in Python
Python provides the powerful NetCDF4 and NumPy libraries that have all the functionalities we need to successfully work with a NetCDF file.


## 1.2 - Software Installation


### Installing Jupyter Notebook

Here are listed the various options for installing jupyter notebook in your system. Please choose according to your system requirments and personal preference. 

- [Installing the classic Jupyter Notebook interface](https://docs.jupyter.org/en/latest/install/notebook-classic.html)
- [Installing the JupyterLab](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html)


#### Our prefered way of working is using [Viusal Studio Code](https://code.visualstudio.com/download)
The general instruction are [here](https://code.visualstudio.com/docs/datascience/jupyter-notebooks) using the extension provided by Microsoft [Jupyter](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter).


Install the [Python extension](https://marketplace.visualstudio.com/items?itemName=ms-python.python) and create a venv or conda virtual enviroment so you can keep your Global/OS python install clean of unwanted packages. You can use editor commands in vscode.
- (Ctrl+Shift+P) for Windows and Linux. 
- (⇧⌘P) for Mac.


Type "Python: Create Enviroment"


Pick the prefered package manager / environment manager (conda or venv)

Open any jupyter notebook file with the extentssion .ipynb in vscode

From the top right corner pick as a kernel for this specific jupyter notebook file the enviroment created in the previous step.

### Standalone Installation


For running the tutorial Python script examples using a command-line or terminal, the following software should be installed:

- Python (version 3)
- Numpy
- NetCDF4


In [1]:
pip install numpy

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install netCDF4

Note: you may need to restart the kernel to use updated packages.


## 1.3 - Tutorial example NetCDF file


In our exercise, we will use an example NetCDF file coming from the oceanographic ADCIRC model. The file contains the underlying mesh topology and the values for the maximum water elevation computed at each mesh node. 
The file is available in the directory of this script. Notebook/maxele.63.nc
>The example data file was retrieved from the public adcirc.org examples page [webpage](https://adcirc.org/home/documentation/example-problems/shinnecock-inlet-ny-with-tidal-forcing-example/shinnecock-inlet-ny-with-tidal-forcing-results/).

<p align="center">
  <img src="https://cloud.cera.lsu.edu/s/dWeEefFNWRgwpra/download/2.png" />
</p>

In [3]:
!wget https://cloud.cera.lsu.edu/s/7PfqfzWDj285Afw/download/maxele.63.nc

--2024-04-08 21:36:30--  https://cloud.cera.lsu.edu/s/7PfqfzWDj285Afw/download/maxele.63.nc
Resolving cloud.cera.lsu.edu (cloud.cera.lsu.edu)... 130.39.22.133
Connecting to cloud.cera.lsu.edu (cloud.cera.lsu.edu)|130.39.22.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 149379 (146K) [application/octet-stream]
Saving to: ‘maxele.63.nc.1’


2024-04-08 21:36:31 (351 KB/s) - ‘maxele.63.nc.1’ saved [149379/149379]



# Part 2: Reading a NetCDF File and Understanding its Structure

## 2.1 - Opening a NetCDF file 

In [4]:
# importing libraries
import netCDF4
import numpy as np

This line of code is importing the NumPy library, which is a fundamental package for scientific computing in Python.

```python
import numpy as np

```
The `import` keyword is used to include external libraries or modules in your Python program. In this case, `numpy` is the module being imported.

The `as` keyword is used to create an alias for the imported module. Here, `np` is the alias created for `numpy`. This means that when you want to use functions or attributes from the `numpy` module, you can use the prefix `np` instead of `numpy`.

For example, to use the `array` function from the `numpy` module, you can now do so by calling `np.array()` instead of `numpy.array()`.
```

For reading in the NetCDF file, we will pass it to NetCDF4.Dataset.


In [5]:
# reading in the NetCDF file
mynetcdf = netCDF4.Dataset('maxele.63.nc')

This line of code is using the `netCDF4` library in Python to open a netCDF file and assign it to a variable.

```python
mynetcdf = netCDF4.Dataset('../data/maxele.63.nc')
```
* `netCDF4` is a Python interface to the netCDF C library. netCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

* The `Dataset` function is used to open a netCDF file. It takes as argument the path to the file you want to open. In this case, `'../data/maxele.63.nc'` is the relative path to the netCDF file. The `'..'` in the path refers to the parent directory, `data` is a subdirectory within the parent directory, and `maxele.63.nc` is the name of the netCDF file.

* The opened netCDF file is assigned to the variable `mynetcdf`. This variable is now a `Dataset` object, and you can use it to access the data and metadata in the netCDF file. For example, you can get a list of the variables in the file with `mynetcdf.variables`, or access a specific variable with `mynetcdf.variables['varname']`.

Now that we have loaded our file in Python, we can see how we can get the desired information out of it. For that, let’s have a look at the general structure of a NetCDF file first.


## 2.2 - NetCDF file structure

### A NetCDF file has three basic components:
- **Attributes (Metadata)** that describe a) the dataset as a whole and b) all contained data variables or dimensions. 
- **Dimensions** are used to define the **shape** of the data variables (arrays) in the netCDF file.
- **Variables** hold the actual **data values.**

>- Variables store model output data like water heights, wind velocities, pressures etc. but also information like latitudes, longitudes, times etc. 
>- Each variable can also have attributes (metadata) that describe the data.
>- The NetCDF format stores variables as arrays that are defined by unique variable names, data types, and array dimensions. 
>- Global attributes describe the entire dataset with metadata like project title, institution, contact information, program version etc.
>- Variable attributes describe properties of the data variables like units, scaling factors, offsets etc



How do we know what is in our NetCDF file?

# Part 3: Getting the NetCDF File Attributes (Metadata)



## 3.1 - Dataset attributes (metadata)

When we print the entire NetCDF dataset (mynetcdf), we will get an overview of the file, including:
- The global attributes describing the entire dataset
- The dimensions defining the shape of the contained data arrays
- The names of all data variables contained in the NetCDF file


In [6]:
# getting the attributes (metadata) of the file
print(mynetcdf)

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4_CLASSIC data model, file format HDF5):
    _FillValue: -99999.0
    model: ADCIRC
    version: 51.52.29
    grid_type: Triangular
    description: Shinnecock Inlet V20051108               ! UPTO 32 CHARACTER ALPHANUMERIC RUN D
    agrid: Shinacock Inlet Coarse Grid
    title: adcirc.org netcdf examples project
    institution: UNC CH Institute of Marine Sciences
    source: adcirc.org examples page
    history: based on Shinnecock Inlet but with netcdf output
    references: http://adcirc.org/home/documentation/example-problems/shinnecock-inlet-ny-with-t
    comments: netcdf4 format was used (fully compatible with hdf5)
    host: adcirc.org
    convention: CF
    Conventions: UGRID-0.9.0
    contact: cfulcher@email.unc.edu
    creation_date: 2015-12-14  4:18:28 -05:00
    modification_date: 2015-12-14  4:18:28 -05:00
    fort.15: ==== Input File Parameters (below) ====
    dt: 6.0
    ihot: 0
    ics: 2
    nolibf: 2
    nolifa: 2


With this information we can explore the variables and their dimensions in more depth. To get a better understanding of how that works, we will dive in to see how the NetCDF4 library reads in the NetCDF files in Python.

## 3.2 - Utilizing the Python dictionary structure for working with NetCDF


When we read a NetCDF file with NetCDF4, the NetCDF variables and dimensions are represented as Python **dictionaries.** 
- The keys are the variable ‘names’.
- The values are associated NumPy arrays that contain the data and any attributes. 


All NetCDF variables are read as **dictionaries** in the **[key/value]** structure with the values being NumPy arrays.
- mynetcdf.variables  is a dictionary containing the variables
- mynetcdf.dimensions  is a dictionary containing the dimensions of the variables

Example of getting the list of variables through the Python dictionary structure.


In **Part 3.1 - Dataset attributes (metadata)**, we have already explored how to get a list of all variable names contained in the NetCDF file by printing the entire dataset.

In [7]:
print(mynetcdf)

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4_CLASSIC data model, file format HDF5):
    _FillValue: -99999.0
    model: ADCIRC
    version: 51.52.29
    grid_type: Triangular
    description: Shinnecock Inlet V20051108               ! UPTO 32 CHARACTER ALPHANUMERIC RUN D
    agrid: Shinacock Inlet Coarse Grid
    title: adcirc.org netcdf examples project
    institution: UNC CH Institute of Marine Sciences
    source: adcirc.org examples page
    history: based on Shinnecock Inlet but with netcdf output
    references: http://adcirc.org/home/documentation/example-problems/shinnecock-inlet-ny-with-t
    comments: netcdf4 format was used (fully compatible with hdf5)
    host: adcirc.org
    convention: CF
    Conventions: UGRID-0.9.0
    contact: cfulcher@email.unc.edu
    creation_date: 2015-12-14  4:18:28 -05:00
    modification_date: 2015-12-14  4:18:28 -05:00
    fort.15: ==== Input File Parameters (below) ====
    dt: 6.0
    ihot: 0
    ics: 2
    nolibf: 2
    nolifa: 2


Alternatively, we can access all variable names by calling the “keys()” of the variables dictionary. 

In [8]:
# getting all variables names as contained in the file by printing the dictionary ’keys’
all_variable_names = mynetcdf.variables.keys()
print(all_variable_names)

dict_keys(['time', 'x', 'y', 'element', 'adcirc_mesh', 'neta', 'nvdll', 'max_nvdll', 'ibtypee', 'nbdv', 'nvel', 'nvell', 'max_nvell', 'ibtype', 'nbvv', 'depth', 'zeta_max', 'time_of_zeta_max'])


 The variable `mynetcdf` is assumed to be a netCDF Dataset object.

* In the line `all_variable_names = mynetcdf.variables.keys()`, `mynetcdf.variables` is a dictionary where each key-value pair corresponds to a variable name and its associated data, respectively. The `.keys()` method is a built-in Python function that returns a view object displaying a list of all the keys in the dictionary. So, `mynetcdf.variables.keys()` will return all the variable names in the netCDF file. These variable names are then stored in the `all_variable_names` variable.

* The next line, `print(all_variable_names)`, uses the Python built-in function `print` to output the list of variable names to the console. This is a quick way to inspect the variable names in a netCDF file.



## 3.3 - Variable attributes (metadata)

The NetCDF variables hold the actual data values of the dataset. Each variable can also contain associated attributes that describe the variable.

### List of all variable attributes in the file


In [9]:
# getting a list of all variable attributes (metadata) in the file
# note, this does not print any data but just the attributes
for attrs in mynetcdf.variables.values():
   print(attrs)

<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
    long_name: model time
    standard_name: time
    units: seconds since 2015-12-14 00:00:00 UTC
    base_date: 2015-12-14 00:00:00 UTC
unlimited dimensions: time
current shape = (1,)
filling on, default _FillValue of 9.969209968386869e+36 used
<class 'netCDF4._netCDF4.Variable'>
float64 x(node)
    long_name: longitude
    standard_name: longitude
    units: degrees_east
    positive: east
unlimited dimensions: 
current shape = (3070,)
filling on, default _FillValue of 9.969209968386869e+36 used
<class 'netCDF4._netCDF4.Variable'>
float64 y(node)
    long_name: latitude
    standard_name: latitude
    units: degrees_north
    positive: north
unlimited dimensions: 
current shape = (3070,)
filling on, default _FillValue of 9.969209968386869e+36 used
<class 'netCDF4._netCDF4.Variable'>
int32 element(nele, nvertex)
    long_name: element
    cf_role: face_node_connectivity
    start_index: 1
    units: nondimensional
unlimited dimen

### Attributes of a specific variable


If we are interested in a specific variable and want to explore the associated attributes, we can do so by calling the variable name.

In [10]:
# accessing the attributes (metadata) of the variable ’x’ (longitudes) and  the variable ’y’ (latitudes)
print(mynetcdf.variables['x'], mynetcdf.variables['y'])

<class 'netCDF4._netCDF4.Variable'>
float64 x(node)
    long_name: longitude
    standard_name: longitude
    units: degrees_east
    positive: east
unlimited dimensions: 
current shape = (3070,)
filling on, default _FillValue of 9.969209968386869e+36 used <class 'netCDF4._netCDF4.Variable'>
float64 y(node)
    long_name: latitude
    standard_name: latitude
    units: degrees_north
    positive: north
unlimited dimensions: 
current shape = (3070,)
filling on, default _FillValue of 9.969209968386869e+36 used


## 3.4 - Dimensions attributes (metadata)

### Understanding the dimension descriptions in NetCDF

Dimensions are used to define the shape of variables in NetCDF. 
Every NetCDF dimension has both a size and a descriptive name. The size is either a positive integer or ‘unlimited’. Variables that are defined with an unlimited dimension can grow in length over time.
In Part 3.1 - Dataset attributes (metadata), we have printed an overview of the NetCDF dataset attributes that also contains the description of the dimensions.

Every NetCDF dimension has both a size and a descriptive name. The size is either a positive integer or ‘unlimited’. Variables that are defined with an unlimited dimension can grow in length over time.

In Part 3.1 - Dataset attributes (metadata), we have printed an overview of the NetCDF dataset attributes that also contains the description of the dimensions.

In [11]:
print(mynetcdf)

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4_CLASSIC data model, file format HDF5):
    _FillValue: -99999.0
    model: ADCIRC
    version: 51.52.29
    grid_type: Triangular
    description: Shinnecock Inlet V20051108               ! UPTO 32 CHARACTER ALPHANUMERIC RUN D
    agrid: Shinacock Inlet Coarse Grid
    title: adcirc.org netcdf examples project
    institution: UNC CH Institute of Marine Sciences
    source: adcirc.org examples page
    history: based on Shinnecock Inlet but with netcdf output
    references: http://adcirc.org/home/documentation/example-problems/shinnecock-inlet-ny-with-t
    comments: netcdf4 format was used (fully compatible with hdf5)
    host: adcirc.org
    convention: CF
    Conventions: UGRID-0.9.0
    contact: cfulcher@email.unc.edu
    creation_date: 2015-12-14  4:18:28 -05:00
    modification_date: 2015-12-14  4:18:28 -05:00
    fort.15: ==== Input File Parameters (below) ====
    dt: 6.0
    ihot: 0
    ics: 2
    nolibf: 2
    nolifa: 2


There are two sections that describe the dimensions:
- The **dimensions(sizes)** that define the sizes and assign names
- The **variable(dimensions)** that explain what dimension(s) each variable is using


Examples:
- The variable `x` is defined as `x(node)` with `node(3070)`, meaning that the variable `x` is a one-dimensional array of 3070 values.
- The variable `element` is defined as `element(nele, nvertex)` with `nele(5780)` and `nvertex(3)`, meaning that the variable `element` is a two-dimensional array with a size of “5780 by 3” values.

In addition to the dimension size, the variables(dimensions) also provide the information of the data type, e.g. the array values of the ‘element’ variable are stored as an “Integer 32-bit data type”.


### List of all dimensions (sizes) in the file


When we print out a list of all dimensions that are stored in our NetCDF file, we will get the **“dimensions(sizes)”** that we can use to understand the definitions of the **“variables(dimensions)”** as explained in the example above.

In [12]:
# getting a list of all dimensions(size) dictionary's keys in the file
for dims in mynetcdf.dimensions.values():
   print(dims)

<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 1
<class 'netCDF4._netCDF4.Dimension'>: name = 'node', size = 3070
<class 'netCDF4._netCDF4.Dimension'>: name = 'nele', size = 5780
<class 'netCDF4._netCDF4.Dimension'>: name = 'nvertex', size = 3
<class 'netCDF4._netCDF4.Dimension'>: name = 'nope', size = 1
<class 'netCDF4._netCDF4.Dimension'>: name = 'max_nvdll', size = 75
<class 'netCDF4._netCDF4.Dimension'>: name = 'nbou', size = 1
<class 'netCDF4._netCDF4.Dimension'>: name = 'max_nvell', size = 570
<class 'netCDF4._netCDF4.Dimension'>: name = 'mesh', size = 1


This code is used to retrieve and print all dimension sizes from a netCDF file. The variable `mynetcdf` is our netCDF Dataset object.

* In the line `for dims in mynetcdf.dimensions.values():`, `mynetcdf.dimensions` is a dictionary where each key-value pair corresponds to a dimension name and its associated size, respectively. The `.values()` method is a built-in Python function that returns a view object displaying a list of all the values in the dictionary.

* The code `mynetcdf.dimensions.values()` will return all the dimension sizes in the netCDF file. The `for` loop then iterates over these sizes (each one assigned to the variable `dims` in each iteration), and the `print(dims)` statement inside the loop prints each dimension's size.

### Size of a specific dimension


Getting the size of a particular dimension can be done similarly to accessing the **Attributes of a specific variable** by calling the dimension name.

In [13]:
# accessing the dimension of the variable ’x’ (longitudes)
print(mynetcdf.dimensions['node'])	

<class 'netCDF4._netCDF4.Dimension'>: name = 'node', size = 3070


Knowing the dimension sizes is great but
>How do we know what dimensions our variables have in the NetCDF file?


### Variable dimensions


There are several options to explore the dimensions of variables in the file:



- As explained in the example **Understanding the dimension descriptions in NetCDF**, we can print out the **Dataset attributes (metadata)** to get all variables(dimensions) definitions.


In [14]:
print(mynetcdf)

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4_CLASSIC data model, file format HDF5):
    _FillValue: -99999.0
    model: ADCIRC
    version: 51.52.29
    grid_type: Triangular
    description: Shinnecock Inlet V20051108               ! UPTO 32 CHARACTER ALPHANUMERIC RUN D
    agrid: Shinacock Inlet Coarse Grid
    title: adcirc.org netcdf examples project
    institution: UNC CH Institute of Marine Sciences
    source: adcirc.org examples page
    history: based on Shinnecock Inlet but with netcdf output
    references: http://adcirc.org/home/documentation/example-problems/shinnecock-inlet-ny-with-t
    comments: netcdf4 format was used (fully compatible with hdf5)
    host: adcirc.org
    convention: CF
    Conventions: UGRID-0.9.0
    contact: cfulcher@email.unc.edu
    creation_date: 2015-12-14  4:18:28 -05:00
    modification_date: 2015-12-14  4:18:28 -05:00
    fort.15: ==== Input File Parameters (below) ====
    dt: 6.0
    ihot: 0
    ics: 2
    nolibf: 2
    nolifa: 2


- When printing the List of all variable attributes in the file or the Attributes of a specific variable, the attribute section also contains the information of the dimension of the particular variable.




In [15]:
for attrs in mynetcdf.variables.values():
   print(attrs)

<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
    long_name: model time
    standard_name: time
    units: seconds since 2015-12-14 00:00:00 UTC
    base_date: 2015-12-14 00:00:00 UTC
unlimited dimensions: time
current shape = (1,)
filling on, default _FillValue of 9.969209968386869e+36 used
<class 'netCDF4._netCDF4.Variable'>
float64 x(node)
    long_name: longitude
    standard_name: longitude
    units: degrees_east
    positive: east
unlimited dimensions: 
current shape = (3070,)
filling on, default _FillValue of 9.969209968386869e+36 used
<class 'netCDF4._netCDF4.Variable'>
float64 y(node)
    long_name: latitude
    standard_name: latitude
    units: degrees_north
    positive: north
unlimited dimensions: 
current shape = (3070,)
filling on, default _FillValue of 9.969209968386869e+36 used
<class 'netCDF4._netCDF4.Variable'>
int32 element(nele, nvertex)
    long_name: element
    cf_role: face_node_connectivity
    start_index: 1
    units: nondimensional
unlimited dimen

In [16]:
print(mynetcdf.variables['time'])

<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
    long_name: model time
    standard_name: time
    units: seconds since 2015-12-14 00:00:00 UTC
    base_date: 2015-12-14 00:00:00 UTC
unlimited dimensions: time
current shape = (1,)
filling on, default _FillValue of 9.969209968386869e+36 used


- Another option to get the dimension of a variable is utilizing the Python NumPy functionality ‘shape‘.

### Getting a variable dimension with the NumPy functionality ‘shape’


#### Getting the dimension size

In [17]:
# getting the variable dimension size with the NumPy functionality ’shape’
print("Array Dimension x = ", (mynetcdf.variables['x'].shape) )
print("Array Dimension element = ", (mynetcdf.variables['element'].shape))

Array Dimension x =  (3070,)
Array Dimension element =  (5780, 3)


This code is used to retrieve and print the shape (dimension sizes) of two variables, 'x' and 'element', from a netCDF file. The variable `mynetcdf` is assumed to be a netCDF Dataset object.

* In the line `print("Array Dimension x = ", (mynetcdf.variables['x'].shape))`, `mynetcdf.variables['x']` accesses the 'x' variable from the `variables` dictionary of the `mynetcdf` Dataset object. The `.shape` attribute is a tuple that gives the shape of the underlying data array, and is a functionality provided by NumPy (since the data in netCDF variables are stored as NumPy arrays). This line prints a string "Array Dimension x = " followed by the shape of the 'x' variable.
* Similarly, the line `print("Array Dimension element = ", (mynetcdf.variables['element'].shape))` does the same for the 'element' variable.


In [18]:
# Or even bettter if you want every variable size
for dims in mynetcdf.variables:
   #print (dims)
   print("Array Dimension", dims , " = ", (mynetcdf.variables[dims].shape) )

Array Dimension time  =  (1,)
Array Dimension x  =  (3070,)
Array Dimension y  =  (3070,)
Array Dimension element  =  (5780, 3)
Array Dimension adcirc_mesh  =  (1,)
Array Dimension neta  =  ()
Array Dimension nvdll  =  (1,)
Array Dimension max_nvdll  =  ()
Array Dimension ibtypee  =  (1,)
Array Dimension nbdv  =  (75, 1)
Array Dimension nvel  =  ()
Array Dimension nvell  =  (1,)
Array Dimension max_nvell  =  ()
Array Dimension ibtype  =  (1,)
Array Dimension nbvv  =  (570, 1)
Array Dimension depth  =  (3070,)
Array Dimension zeta_max  =  (3070,)
Array Dimension time_of_zeta_max  =  (3070,)


This code is used to retrieve and print the shape (dimension sizes) of all variables from a netCDF file. The variable `mynetcdf` is assumed to be a netCDF Dataset object.
* In the line `for dims in mynetcdf.variables:`, `mynetcdf.variables` is a dictionary where each key-value pair corresponds to a variable name and its associated Variable object, respectively. The `for` loop iterates over all the variable names (each one assigned to the variable `dims` in each iteration).
* Inside the loop, the line `print("Array Dimension", dims , " = ", (mynetcdf.variables[dims].shape))` prints a string "Array Dimension", followed by the variable name, an equals sign, and then the shape of the variable. Here, `mynetcdf.variables[dims]` accesses the Variable object for the current variable, and the `.shape` attribute gives the shape of the underlying data array. This attribute is a tuple that provides the size of each dimension of the variable's data.

# Part 4: Accessing the NetCDF Data Values

## 4.1 - Parsing the data values from the NetCDF variables

When reading in the NetCDF file with the Python NetCDF library, the data values (variables) are returned as NumPy arrays. 

When we want to access the actual data values of a particular NetCDF variable, we need to pass two parameters:
- The **name** of the variable where the data is in
- The information for the **array indexing to access** a specific value or a range of values




### What is array indexing?


Array indexing is the same as accessing an array element. 
- [:] always returns all available values of an array element
- A specific variable value can be accessed by referring to its index number. The NumPy arrays are 0-indexed, meaning that the first element has the index 0.



#### All data values of a variable


one-dimensional data array


In [19]:
# getting all data values from an one-dimensional array
x = mynetcdf.variables['x'][:]
print(x)
print("number of values: %s " % len(x)) 

[-72.05767827 -72.05219374 -72.04696872 ... -72.585996   -72.586352
 -72.589697  ]
number of values: 3070 


Retrieving all data values from a one-dimensional array named 'x' from a netCDF file. The variable `mynetcdf` is assumed to be a netCDF Dataset object.

* In the line `x = mynetcdf.variables['x'][:]`, `mynetcdf.variables['x']` accesses the 'x' variable from the `variables` dictionary of the `mynetcdf` Dataset object. The `[:]` slice notation is used to retrieve all data values from the 'x' variable. The result is a one-dimensional NumPy array, which is assigned to the variable `x`.

* The line `print(x)` then prints this array to the console. Depending on the size of the array and your Python environment's settings, this may print all values in the array, or it may print only the first few and last few values, with an ellipsis (...) indicating omitted values.

* The line `print("number of values: %s " % len(x))` prints the number of values in the 'x' array. The `len(x)` function returns the size of the array (i.e., the number of elements in it), and the `%s` placeholder in the string is replaced with this size.

For two-dimensional data array


In [20]:
# getting all data values from a two-dimensional array
elems = mynetcdf.variables['element'][:,:]
print(elems)

[[  77   76    1]
 [  76    2    1]
 [  78    2   76]
 ...
 [3069 3065 3066]
 [3065 3069 3067]
 [3068 3067 3070]]


#### How do we know what syntax to use when accessing the data values?


In order to parse out the data values correctly, we need to know the dimension of the variable we are interested in. We can get the dimension of a specific variable by either looking at the **Attributes of a specific dimension or Getting the dimension of a variable with the Python function 'shape'**.


### Specific value of a variable


Getting a specific value from a one-dimensional data array

In [21]:
# getting the first value from the ‘x’ data array
x = mynetcdf.variables['x'][0] # NumPy arrays are 0-indexed
print(x)

-72.0576782709


* In the first line, a variable `x` is being assigned a value from a netCDF file. The `mynetcdf.variables['x'][0]` part of the code is accessing the 'x' variable from the netCDF file stored in `mynetcdf`. The 'x' variable in a netCDF file typically represents a dimension, such as time or spatial coordinates (latitude, longitude, etc.). The `[0]` after `variables['x']` is indexing into the 'x' data array to get the first value. This is because, like many other programming languages, Python uses 0-based indexing, meaning that the index of the first element in an array is 0, not 1.

* The `print(x)` line is then outputting the value of `x` to the console. This is useful for debugging purposes or for simply understanding the data you're working with. If 'x' represents time, for example, this could be printing the first timestamp in the data set.

### Getting a specific value from a two-dimensional data array

Example: getting the first entry of the second value (array of 3 numbers)


In [22]:
# getting the first entry of the second value from the ‘element’ data array
# zero-indexed (second row, first column)
elem = mynetcdf.variables['element'][1,0]
print(elem) 

76



* The line `elem = mynetcdf.variables['element'][1,0]` is accessing the 'element' variable from the netCDF file stored in `mynetcdf`. The 'element' variable in a netCDF file could represent a variety of data types, such as temperature, pressure, or humidity values, depending on the specific dataset.

* The `[1,0]` part of the code is indexing into the 'element' data array to get a specific value. In Python, arrays are 0-indexed, meaning that the index of the first element is 0, not 1. Therefore, `[1,0]` is referring to the second row (since 1 is the second index in a 0-indexed system) and the first column (since 0 is the first index in a 0-indexed system). So, this line of code is getting the first entry of the second value from the 'element' data array.

* The `print(elem)` line is then outputting the value of `elem` to the console. This is useful for debugging purposes or for simply understanding the data you're working with. For example, if 'element' represents temperature, this could be printing the temperature value at a specific time and location in the dataset.

## 4.2 - Further work with NumPy


When working with NetCDF variables, the data values are returned as NumPy arrays. The Python numerical mathematics extensions NumPy is a very powerful tool that opens up endless possibilities for further diving into data analysis. While we do not want to go into depth with that, here are two short examples for getting started.

### Getting a data range from a two-dimensional data array


In [23]:
# from the second row in the ‘element’ array, getting the first two entries 
elems = mynetcdf.variables['element'][1,0:2] #zero-indexed 
print(elems)

[76  2]


* The second line is where the action happens. `mynetcdf.variables['element'][1,0:2]` is accessing the 'element' variable from the netCDF file stored in the `mynetcdf` object. The 'element' variable is  a two-dimensional array. The indices `[1,0:2]` are used to access data from this array.  In Python, arrays are zero-indexed, meaning that the index of the first element is 0, not 1. So, `[1,0:2]` is accessing the second row (since 1 is the second index in a zero-indexed system), and within that row, it's getting the first two entries (indices 0 and 1).

* The result of this operation is stored in the variable `elems`.

* The third line uses the `print` function to output the value of `elems` to the console. This is useful for debugging or for understanding the data you're working with, as it allows you to see the actual values that were retrieved from the 'element' array.

### Advanced filtering of data values


In [24]:
z = mynetcdf.variables['zeta_max'][:]
z_slice = np.intersect1d(np.where(np.array(z)<0.6),np.where(np.array(z)>0.5))                              
print(z_slice)
print("number of returned values: %s " % len(z_slice))   

[  20   21   22 ... 2885 2886 2887]
number of returned values: 1355 



* The command `z = mynetcdf.variables['zeta_max'][:]`, the code is accessing a variable named 'zeta_max' from the netCDF file and assigning all its values to the variable `z`. The `[:]` is a slicing operation that gets all elements from the 'zeta_max' variable.

* After that, `z_slice = np.intersect1d(np.where(np.array(z)<0.6),np.where(np.array(z)>0.5))`, is using NumPy functions to create a new array `z_slice`. The `np.where` function returns the indices of elements in an input array where the specified condition is met. Here, it's used twice to find the indices of elements in `z` that are less than 0.6 and greater than 0.5. The `np.intersect1d` function then finds the intersection of these two arrays of indices, effectively returning the indices of elements in `z` that are between 0.5 and 0.6.

* The third and fourth lines are simply printing the `z_slice` array and the number of elements in it. The `%s` in the print statement is a placeholder that gets replaced with the value of `len(z_slice)`, which is the number of elements in `z_slice`.