# Explore NetCDF (*.nc) file

In [1]:
# Import packages
from pathlib import Path
import pandas as pd
import xarray as xr

## The `xarray` [dataset object](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html). 


In [5]:
# Set path to nc filed and import as an xarray dataset
nc_file = Path.cwd().parent/'data'/'raw'/'multi_reanal.partition.aoc_15m.197901.nc'
ds = xr.open_dataset(nc_file)
ds

### The components of an `xarray` dataset object:
### ➡️Dimensions
Dimensions define the axes or coordinate lengths that structure the dataset. They describe how data is organized — e.g., time, space, or ensemble members. 

The results above indicate:
* There are **745 time steps** (`date: 745`)
* Each time step includes data for **1694 spatial gridpoints** (`gridpoint: 1694`)
*  Each gridpoint is associated with **12 partitions** or 12 measurements (`partitions: 12`)

> Dimensions are the *axes of a multidimensional spreadsheet*.

### ➡️Data Variables
Data variables are the core measured or modeled quantities — the actual “data fields” stored along the dataset’s dimensions.

Each data variable can depend on one or more dimensions.
For instance:
```python
longitude(date, gridpoint)
significant_wave_height(date, gridpoint, partition)
```
* **longitude** varies over date and gridpoint — meaning it could change slightly over time (e.g., if the grid moves or data assimilations adjust).
* **significant_wave_height** varies over date, gridpoint, and partition, so it’s a 3D variable: wave height for each partition of the spectrum at each time and location.
> Each variable is like a column in a data table, but in multiple dimensions.

### ➡️Indexes
Indexes (or coordinates) define the values associated with dimensions.
They describe what each position along an axis means — like labels for time, space, or category.

In the dataset:
```python
date → datetime64[ns]
gridpoint → float64 gridpoint
partition → int32 partition
```
* `date` is indexed by actual datetimes (likely the times of each model output).
* `gridpoint` might be an index ID or physical location identifier.
* `partition` labels the wave components (e.g., 1–12).

Sometimes coordinates like latitude and longitude are also stored as data variables that depend on these dimensions:

```python
latitude(date, gridpoint)
longitude(date, gridpoint)
```

These serve as geolocation coordinates, helping you map or spatially analyze the data.

#### ➡️Attributes (Metadata)
Attributes are descriptive metadata — they provide context about the dataset or variables but don’t affect the data structure.
```text
title:          WAVEWATCH III version 5.08
institution:    National Centers for Environmental Prediction
source:         WAVEWATCH III partition file
experiment:     CFSRR Phase 2
history:        part2nc
field_type:     instantaneous
forecast_type:  hindcast
```
They tell you:
* What the data represents (a hindcast from WAVEWATCH III).
* Where it came from (NCEP, CFSRR project).
* How it was produced (part2nc = partition-to-NetCDF conversion).
Each variable may also have its own attributes (e.g., units, standard names).
>Attributes are like the notes on the spreadsheet explaining what the numbers mean.