# NetCDFs

Although text files are useful, sometimes we have <ins>**_multiple datasets_**</ins> for various points either in time or space. When this occurs, we can store this these multiple items within <ins>**netCDFs**</ins>.

<img src ="https://2.bp.blogspot.com/-8H2FCuCGhWY/VoPgf3RpoNI/AAAAAAAAH44/EPHtpGb_UBc/s1600/metadata.jpg" width = '400'>

By the end of this section, you will learn:

* [**what a netCDF is and why it is useful**](#What-is-netCDF?)
* [**to install ```netCDF4```**](#Installing-netCDF4)
* [**how to open a netCDF file**](#Opening-a-netCDF-file)

## What is netCDF?

**NetCDF (network Common Data Form)** stores <ins>**multidimensional data**</ins>, such as <ins>*spatial*</ins>, <ins>*temporal*</ins>, and <ins>*scientific*</ins> information. It is *not* only a file format but a set of software libraries. [**Unidata**](https://www.unidata.ucar.edu/software/netcdf/) has more information. 

A netCDF files has **three parts**:


<ins>**```metadata```**</ins> = data that describes the data contained within the file


<ins>**```dimensions```**</ins> = contains a name and size; can be used to represent physical dimensions


<ins>**```variables```**</ins> = contains a name, data type, and shape; represents an arrays of same value type


## Why do we use NETCDFs?

## Installing ```netCDF4```

In your terminal, type the following command:

**```conda install netCDF4```**

or

**```pip install netCDF4```**



*Warning: it may take a while to install.*

After installing **```netCDF4```**, you can import it into Python by typing:

```python
import netCDF4 as nc
```

## Opening a netCDF file

To <ins>**open a netCDF file**</ins>, we can use the **```Dataset()```** function:

In [3]:
import netCDF4 as nc
dsst = nc.Dataset('sst_monthly.nc')

We can use the **```print()```** function to <ins>**return the metadata**</ins> of the file:

In [4]:
print(dsst)

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): day(50), lat(361), lon(720)
    variables(dimensions): float32 day(day), <class 'str'> day_str(day), float32 lat(lat), float32 lon(lon), float32 sst(day, lat, lon)
    groups: 


To break this down:

* **```dimensions(sizes)```**:
    - <ins>**```day```**</ins> with a size of 50
    - <ins>**```lat```**</ins> with a size of 361
    - <ins>**```lon```**</ins> with a size of 720
* **```variables(dimensions)```**:
    - <ins>**```float32 day(day)```**</ins> which has type ```float32``` with the name ```day``` and a dimension of ```day```
  
    - <ins>**```<class 'str'> day_str(day)```**</ins> which has type ```string``` with the name ```day_str``` and a dimension of ```day```
    
    - <ins>**```float32 lat(lat)```**</ins> which has type ```float32``` with the name ```lat``` and a dimension of ```lat```
    
    - <ins>**```float32 lon(lon)```**</ins> which has type ```float32``` with the name ```lon``` and a dimension of ```lon```
    
    - <ins>**```float32 sst(day, lat, lon)```**</ins> which has type ```float32``` with the name ```sst``` and dimensions of ```day```, ```lat```, ```lon```

## Viewing metadata of variables

We can use a **```for```** loop to <ins>**print out each variable's metadata**</ins>:

In [5]:
for variable in dsst.variables.values():
    print("\n",variable)


 <class 'netCDF4._netCDF4.Variable'>
float32 day(day)
unlimited dimensions: 
current shape = (50,)
filling on, default _FillValue of 9.969209968386869e+36 used

 <class 'netCDF4._netCDF4.Variable'>
vlen day_str(day)
vlen data type: <class 'str'>
unlimited dimensions: 
current shape = (50,)

 <class 'netCDF4._netCDF4.Variable'>
float32 lat(lat)
unlimited dimensions: 
current shape = (361,)
filling on, default _FillValue of 9.969209968386869e+36 used

 <class 'netCDF4._netCDF4.Variable'>
float32 lon(lon)
unlimited dimensions: 
current shape = (720,)
filling on, default _FillValue of 9.969209968386869e+36 used

 <class 'netCDF4._netCDF4.Variable'>
float32 sst(day, lat, lon)
unlimited dimensions: 
current shape = (50, 361, 720)
filling on, default _FillValue of 9.969209968386869e+36 used


We can also <ins>**view the data values of a specific variable**</ins> by putting it into an **array**:

In [6]:
import numpy as np

# index the specific variable we want to look at
dsst['lat']

# put the items within the variable into a useable array
sstlat = np.array(dsst['lat'])
print(sstlat)

[ 90.   89.5  89.   88.5  88.   87.5  87.   86.5  86.   85.5  85.   84.5
  84.   83.5  83.   82.5  82.   81.5  81.   80.5  80.   79.5  79.   78.5
  78.   77.5  77.   76.5  76.   75.5  75.   74.5  74.   73.5  73.   72.5
  72.   71.5  71.   70.5  70.   69.5  69.   68.5  68.   67.5  67.   66.5
  66.   65.5  65.   64.5  64.   63.5  63.   62.5  62.   61.5  61.   60.5
  60.   59.5  59.   58.5  58.   57.5  57.   56.5  56.   55.5  55.   54.5
  54.   53.5  53.   52.5  52.   51.5  51.   50.5  50.   49.5  49.   48.5
  48.   47.5  47.   46.5  46.   45.5  45.   44.5  44.   43.5  43.   42.5
  42.   41.5  41.   40.5  40.   39.5  39.   38.5  38.   37.5  37.   36.5
  36.   35.5  35.   34.5  34.   33.5  33.   32.5  32.   31.5  31.   30.5
  30.   29.5  29.   28.5  28.   27.5  27.   26.5  26.   25.5  25.   24.5
  24.   23.5  23.   22.5  22.   21.5  21.   20.5  20.   19.5  19.   18.5
  18.   17.5  17.   16.5  16.   15.5  15.   14.5  14.   13.5  13.   12.5
  12.   11.5  11.   10.5  10.    9.5   9.    8.5   

## Summary

* **[netCDFs](#What-is-netCDF?)** are files that hold **_multidimensional data_** and has three parts: <ins>**metadata**</ins>, <ins>**dimenions**</ins>, and <ins>**variables** </ins>


* To **[open a netCDF file](#Opening-a-netCDF-file)**, we can use **```Dataset()```** from the **```netCDF4```** library


* To **[view the metadata of a variable](#Viewing-metadata-of-variables)** in a netcdf, we can use a **```for``` loop**. To <ins>**view the specific data within a variable**</ins>, we can **index the variable and store it in an array**.

## Exercises

1. **Make an array** of the variable **```lon```** from **```sst_monthly.nc```** and **find its shape**.

In [None]:
import netCDF4 as nc
import numpy as np

# Open the netCDF file
dsst = nc.Dataset('sst_monthly.nc')

# Extract the longitude data
lon_array = np.array(dsst['lon'])
print("Longitude Array:", lon_array)
print("Shape of Longitude Array:", lon_array.shape)


2. **Make an array** of the variable **```sst```** from **```sst_monthly.nc```** and **find its shape**.

In [None]:
# Extract the sea surface temperature data
sst_array = np.array(dsst['sst'])
print("SST Array:", sst_array)
print("Shape of SST Array:", sst_array.shape)

3. **Make a list** of the variable **```day_lon```** from **```sst_monthly.nc```** and find its shape.

In [None]:
# Extract the day data
day_array = np.array(dsst['day'])
print("Day Array:", day_array)
print("Shape of Day Array:", day_array.shape)