# Working with Scientific Datasets

<img src="images/000_000_epom_logo.png" alt="ePOM" title="" align="center" width="12%" alt="Python logo\"></a>

Based on one of the example topics listed below (and discussions in GGE5011/GGE6302), you will design and implement a Python Jupyter Notebook practical exercise. The exercise should be modelled to fit into
the EPOM Ocean Data Science Notebook.

**Requirements**
- Follows the format described in Python Basics / 000_Welcome_on_Board
- Include an introduction to the problem or topic with relevant links to other websites, references or resources.
- Ensure that the notebook is structured with natural progression through the topic
	- Introduction, basic information, advanced information, conclusion
- The material should go above and beyond the material covered in the Ocean Data Science
notebooks
- Include sample code and solutions separated by cells.
	- All code should run without errors or warnings.
- Text should be formatted using markup language with headings, images, and links
- Post a message in Teams within the “GGE6302 Practical Exercise” channel with the topic you will be developing.

**Note:**
If you are building the notebook on Jupyter.omg.unb.ca, please inform ian.church@unb.ca if you require any additional Python libraries.

**Deliverables**
- The juypter notebook file (*.ipynb)
- The completed notebook exercise with solutions in PDF format
Feel free to select a topic not included on this list or modify one of the topics below.
- Read and display Oceanographic Data from a NetCDF files
	- Access datasets via OPeNDAP
	- Using xarray to interface with NetCDF files

---

The purpose of this notebook is to demonstrate the use of a few related standards and technologies related to working with oceanographic datasets. This involves:
- Using OPeNDAP to access online datasets directly through code
- The NetCDF file format
- Using xarray, what distinguishes it from NumPy, and how to use it with our chosen dataset

First, import some modules we will need later.

In [6]:
from pydap.client import open_url
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
%load_ext autoreload
%autoreload 2
%matplotlib inline

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# NetCDF

NetCDF is a file format...

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

Key insight

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

Supplemental information
`code snipped` **Bold Text**

<img align="left" width="6%" style="padding-right:10px;" src="images/test.png">

A test sheet with a pencil indicates the beginning of an exercise.

In [None]:

#url = "http://test.opendap.org:8080/opendap/tutorials/20220531090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
#url = "http://www.jason.oceanobs.com/html/presentation/aviso_uk.html"
#url = "https://psl.noaa.gov/thredds/dodsC/Datasets/noaa.oisst.v2/sst.mnmean.nc."
url = "https://www.smartatlantic.ca/erddap/tabledap/SMA_saint_john"
url2 = "https://dap.oceannetworks.ca/erddap/tabledap/allDatasets"
pydap_ds = open_url(url)#, protocol='dap4')

Consider replacing `http` in your `url` with either `dap2` or `dap4` to specify the DAP protocol (e.g. `dap2://<data_url>` or `dap4://<data_url>`).  For more 
information, go to https://www.opendap.org/faq-page.


In [15]:
pydap_ds.tree()

.s
└──s
   ├──station_name
   ├──time
   ├──longitude
   ├──latitude
   ├──precise_lon
   ├──precise_lat
   ├──wind_spd_avg
   ├──wind_spd_max
   ├──wind_dir_avg
   ├──air_temp_avg
   ├──air_pressure_avg
   ├──surface_temp_avg
   ├──wave_ht_max
   ├──wave_ht_sig
   ├──wave_dir_avg
   ├──wave_spread_avg
   ├──wave_period_max
   ├──curr_spd_avg
   ├──curr_dir_avg
   ├──curr_spd2_avg
   ├──curr_dir2_avg
   ├──curr_spd3_avg
   ├──curr_dir3_avg
   ├──curr_spd4_avg
   ├──curr_dir4_avg
   ├──curr_spd5_avg
   ├──curr_dir5_avg
   ├──curr_spd6_avg
   ├──curr_dir6_avg
   ├──curr_spd7_avg
   ├──curr_dir7_avg
   ├──curr_spd8_avg
   ├──curr_dir8_avg
   ├──curr_spd9_avg
   ├──curr_dir9_avg
   ├──curr_spd10_avg
   ├──curr_dir10_avg
   ├──curr_spd11_avg
   ├──curr_dir11_avg
   ├──curr_spd12_avg
   ├──curr_dir12_avg
   ├──curr_spd13_avg
   ├──curr_dir13_avg
   ├──curr_spd14_avg
   ├──curr_dir14_avg
   ├──curr_spd15_avg
   ├──curr_dir15_avg
   ├──curr_spd16_avg
   ├──curr_dir16_avg
   ├──curr_spd17_avg
 

In [30]:
pydap_ds['s']['curr_spd_avg'].attributes

{'actual_range': [1, 4633],
 'ioos_category': 'Currents',
 'long_name': 'Curr Spd Avg',
 'standard_name': 'sea_water_speed',
 'units': 'mm s-1'}

In [37]:
a = pydap_ds['s']['curr_spd_avg']
#a.dims
#a.shape
#a.units
a.actual_range
arr = a[1:10]
print(arr)
#print(arr.array[:])
print(arr.iterdata())
#for i in arr.iterdata():
#    print(i)
print(a.dtype)
#print(a.size)
#print(a.path)
#print(a.data)
#l = list(a.data)
for i in a.data:
    print(i)
#print(l)
#s = np.rec.fromrecords(list(a.data), names=a.keys())

<BaseType with data SequenceProxy('https://www.smartatlantic.ca/erddap/tabledap/SMA_saint_john', <BaseType with data BaseProxy('https://www.smartatlantic.ca/erddap/tabledap/SMA_saint_john', 's.curr_spd_avg', dtype('>i2'), (), ())>, [], (slice(1, 10, 1),))>
<generator object BaseType.__iter__ at 0x00000226A2B1E260>
>i2
0


In [7]:
ds = open_url(url2)
ds.tree()
ds['s']['minTime']

Consider replacing `http` in your `url` with either `dap2` or `dap4` to specify the DAP protocol (e.g. `dap2://<data_url>` or `dap4://<data_url>`).  For more 
information, go to https://www.opendap.org/faq-page.


.s
└──s
   ├──datasetID
   ├──accessible
   ├──institution
   ├──dataStructure
   ├──cdm_data_type
   ├──class
   ├──title
   ├──minLongitude
   ├──maxLongitude
   ├──longitudeSpacing
   ├──minLatitude
   ├──maxLatitude
   ├──latitudeSpacing
   ├──minAltitude
   ├──maxAltitude
   ├──minTime
   ├──maxTime
   ├──timeSpacing
   ├──griddap
   ├──subset
   ├──tabledap
   ├──MakeAGraph
   ├──sos
   ├──wcs
   ├──wms
   ├──files
   ├──fgdc
   ├──iso19115
   ├──metadata
   ├──sourceUrl
   ├──infoUrl
   ├──rss
   ├──email
   ├──testOutOfDate
   ├──outOfDate
   └──summary


<BaseType with data SequenceProxy('https://dap.oceannetworks.ca/erddap/tabledap/allDatasets', <BaseType with data BaseProxy('https://dap.oceannetworks.ca/erddap/tabledap/allDatasets', 's.minTime', dtype('>f8'), (), ())>, [], (slice(None, None, None),))>

 ## <image align="left" width="6%" style="padding-right:10px;" src="images/refs.png"> Useful References and Information


- https://pydap.github.io/pydap/en/5_minute_tutorial.html
- https://github.com/hmedrano/erddap-python