<div align="center">
    <h3>A Python API for the National Data Buoy Center</h3>
</div>


The National Oceanic and Atmospheric Association's National Data Buoy Center maintains marine monitoring and observation stations around the world[^1]. These stations report atmospheric, oceanographic, and other meterological data at regular intervals to the NDBC. Measurements are made available over HTTP through the NDBC's data service.

The ndbc-api is a python library that makes this data more widely accessible.

The ndbc-api is primarily built to parse whitespace-delimited oceanographic and atmospheric data distributed as text files for available time ranges, on a station-by-station basis[^2]. Measurements are typically distributed as `utf-8` encoded, station-by-station, fixed-period text files. More information on the measurements and methodology are available [on the NDBC website](https://www.ndbc.noaa.gov/docs/ndbc_web_data_guide.pdf)[^3].

[^1]: https://www.ndbc.noaa.gov/
[^2]: https://www.ndbc.noaa.gov/obs.shtml
[^3]: https://www.ndbc.noaa.gov/docs/ndbc_web_data_guide.pdf

This sample notebook covers some of the core functionality of the `NdbcApi`, including usage examples for retrieving the list of stations, station metadata, station measurements, and finding the nearest station to a given location.

We assume some familiarity with NDBC data buoy's, their purpose, and their data formats, but the examples help provide extra context for users new to the NDBC data service.  One important feature to consider is that not all data buoys are alike.  Some buoys provide a full suite of measurements, while others provide only a subset.  The `NdbcApi` provides methods to help users determine which measurements are available for a given station, and to retrieve those measurements.

### Setup

These setup steps are useful when cloning the project from source and running the notebooks locally.  If you are running the notebooks in a cloud environment, you can skip these steps and just run the cells under the "API Overview" heading.

In [1]:
import os
import sys

In [2]:
os.chdir("..")
sys.path.append(os.getcwd())

### API Overview

The API surface is exposed through the `NdbcApi` class.  The `NdbcApi` is a singleton, such that the underlying `RequestHandler` and NDBC station-level `RequestCache`s are shared between instances. Both the singleton metaclass and `RequestHandler` are implemented to reduce the likelihood of repeat requests to the NDBC's data service, and to converse NDBC resources. This is balanced by a station-level `cache_limit`, implemented as an LRU cache, which seeks to balance a respect for user resources and NDBC resources.

In [3]:
from ndbc_api import NdbcApi

In [4]:
api = NdbcApi()

#### Obtain a list of NDBC-maintained stations (data buoys)

Now that we have our `api` instance, we can begin to obtain information from the NDBC data service.

We begin by retrieving a list of all data buoys, and some of their high-level metadata.

In [5]:
api.stations()

Unnamed: 0,Station,Hull No./Config and Location,Location Lat/Long,Wind Speed,Wind Direction,Sea Level Pressure,Wave Height,Dominant Period,Air Temp,Water Temp,Dew Point,Remark
0,41001,3D90 (SC) East Hatteras,34.70N 72.23W,99,99,99,99,99,99,96,99,
1,41002,3DV33 (SC) South Hatteras,31.75N 74.93W,99,99,99,99,99,99,91,99,
2,41004,3DV02 (SC) Edisto,32.50N 79.08W,99,99,99,99,99,99,96,99,
3,41008,3D36 (SC) Grays Reef,31.40N 80.85W,99,99,99,99,99,99,92,99,Dewpoint data failed 8/02/21. Air temp failed ...
4,41009,3D65 (SC) Canaveral,28.50N 80.18W,99,99,99,99,99,99,91,99,Air temperature and dewpoint data failed 11/10...
...,...,...,...,...,...,...,...,...,...,...,...,...
35,SRST2,"(MA) Sabine Pass, Tx",29.68N 94.03W,72,72,72,No sensor installed.,No sensor installed.,72,No sensor installed.,Sensor/system failure.,System transmissions are intermittent 4/9/19. ...
36,STDM4,"(SU) Stannard Rock, Mi",47.18N 87.22W,100,100,Sensor/system failure.,No sensor installed.,No sensor installed.,Sensor/system failure.,No sensor installed.,Sensor/system failure.,Air temperature and dewpoint data failed 05/27...
37,TPLM2,"(MA) Thomas Point, Md",38.88N 76.43W,99,99,99,No sensor installed.,No sensor installed.,99,Sensor/system failure.,Sensor/system failure.,Water temperature sensor removed 07/14/22. Dew...
38,VENF1,"(AR) Venice, Fl",27.07N 82.45W,100,100,100,No sensor installed.,No sensor installed.,100,100,100,


It is important to note that the list of data buoys above is not exhaustive. The NDBC maintains a list of all active buoys [here](https://www.ndbc.noaa.gov/activestations.xml). The `get_stations` method returns a list of all active buoys maintained by the NDBC.

There are additional buoys which are supported by other programs and feed in to the NDBC data service. As such, they should be supported. For an example, consider station id `APAM2`, which is maintained by the National Ocean Service, but is available under the `NdbcApi`'s `get_data` method. If you would like to see a buoy or buoy family added to the API, please open an issue on the [GitHub repository](https:///www.github.com/cdjellen//ndbc-api).

#### Check the API's supported data formats

The `NbdcApi` seeks to enable easy access to data from most common models of data buoy and oceanographic station.

Data are assumed to follow common patterns based on the measurement type and the station type.  For example, a buoy that measures wind speed and direction will have a `cwind` mode, and a coastal station that measures wind speed and direction will have a `stdmet` mode.  The `NbdcApi` provides a `get_modes` method to list the available modes for a given station, and a `get_data` method to retrieve data for a given station and mode.

Not all stations have support for all modes.

In [6]:
api.get_modes()

['adcp',
 'cwind',
 'ocean',
 'spec',
 'stdmet',
 'supl',
 'swden',
 'swdir',
 'swdir2',
 'swr1',
 'swr2']

#### Find the nearest station to a location

In some cases, we might be interested in obtaining data from a location without a-priori knowledge of which data buoys or oceanography stations are nearby.  We can use the `get_nearest_stations` function to find the nearest stations to a given location.  This function returns the `station_id` of the nearest station to any given lat-lon location.

It is important to check the metadata for that station to ensure it is close enough to meet your needs.

In [7]:
api.nearest_station(lat="38.88N", lon="76.43W")

'TPLM2'

We can also search for a station from floating-point longitude and latitude values.

In [8]:
api.nearest_station(lat=38.88, lon=76.43)

'TPLM2'

#### Obtain that station's metadata

The NDBC records some features of data buoys and oceanographic stations such as the location, station type, and elevations of various instruments.

Using the `station_id` obtained above, we can query the NDBC data service for some additional details about the nearest station to `lat='38.88N', lon='76.43W'`.  This is both to verify that the nearest station is indeed close to our desired location, and to learn more about how measurements of interest are collected.

In [9]:
api.station(station_id="tplm2")

{'Sea temp depth': '1 m below MLLW',
 'Barometer elevation': '12.2 m above mean sea level',
 'Anemometer height': '18 m above site elevation',
 'Air temp height': '17.4 m above site elevation',
 'Site elevation': '0 m above mean sea level',
 'Location': '38.899 N 76.436 W (38°53\'56" N 76°26\'9" W)',
 'Statation Type': 'Owned and maintained by National Data Buoy Center, C-MAN Station, MARS payload',
 'Name': 'Station TPLM2  - Thomas Point, MD'}

We can also obtain this metadata as a pandas DataFrame.

In [10]:
api.station(station_id="tplm2", as_df=True)

Unnamed: 0,0
Sea temp depth,1 m below MLLW
Barometer elevation,12.2 m above mean sea level
Anemometer height,18 m above site elevation
Air temp height,17.4 m above site elevation
Site elevation,0 m above mean sea level
Location,"38.899 N 76.436 W (38°53'56"" N 76°26'9"" W)"
Statation Type,Owned and maintained by National Data Buoy Cen...
Name,"Station TPLM2 - Thomas Point, MD"


While this information provides some helpful context about the station, it does not tell us what measurements are actually collected at the station.  We learned that it is a C-MAN station, and that it is maintained by the NDBC.  In order to determine what realtime and historical measurements are available for query, we can make two additional API calls.

#### Obtain the realtime measurements available at that station

Each station, due to the variety of buoy designs and environmental factors, offers a potentially different set of available measurements.  In order to determine what realtime measurements are available for a specific station, we can use the `available_realtime` API method.

In [11]:
api.available_realtime(station_id="tplm2")

{'Real time hourly standard meteorological': {'data directory': 'https://www.ndbc.noaa.gov/data/hourly2/',
  'description': 'https://www.ndbc.noaa.gov/faq/measdes.shtml#cwind'},
 'Real time standard meteorological data': {'Real time standard meteorological data': 'https://www.ndbc.noaa.gov/data/realtime2/TPLM2.txt',
  'description': 'https://www.ndbc.noaa.gov/faq/measdes.shtml#stdmet'},
 'Real time continuous winds data': {'Real time continuous winds data': 'https://www.ndbc.noaa.gov/data/realtime2/TPLM2.cwind',
  'description': 'https://www.ndbc.noaa.gov/faq/measdes.shtml#cwind'},
 'Real time derived measurements data': {'Real time derived measurements data': 'https://www.ndbc.noaa.gov/data/derived2/TPLM2.dmv',
  'description': 'https://www.ndbc.noaa.gov/faq/measdes.shtml#deriv'}}

We can also return this data as a pandas DataFrame.

In [12]:
api.available_realtime(station_id="tplm2", as_df=True)

Unnamed: 0,data directory,description,Real time standard meteorological data,Real time continuous winds data,Real time derived measurements data
Real time hourly standard meteorological,https://www.ndbc.noaa.gov/data/hourly2/,https://www.ndbc.noaa.gov/faq/measdes.shtml#cwind,,,
Real time standard meteorological data,,https://www.ndbc.noaa.gov/faq/measdes.shtml#st...,https://www.ndbc.noaa.gov/data/realtime2/TPLM2...,,
Real time continuous winds data,,https://www.ndbc.noaa.gov/faq/measdes.shtml#cwind,,https://www.ndbc.noaa.gov/data/realtime2/TPLM2...,
Real time derived measurements data,,https://www.ndbc.noaa.gov/faq/measdes.shtml#deriv,,,https://www.ndbc.noaa.gov/data/derived2/TPLM2.dmv


#### Determine what historical measurements are available at that station

We follow a similar process to determine what historical data is available at our station of interest, this time calling the `available_historical` method.

In [13]:
api.available_historical(station_id="tplm2")

{'Standard meteorological data': {'Jan 2023': 'https://www.ndbc.noaa.gov/download_data.php?filename=tplm212023.txt.gz&dir=data/stdmet/Jan/',
  'Feb 2023': 'https://www.ndbc.noaa.gov/download_data.php?filename=tplm222023.txt.gz&dir=data/stdmet/Feb/',
  'Mar 2023': 'https://www.ndbc.noaa.gov/download_data.php?filename=tplm232023.txt.gz&dir=data/stdmet/Mar/',
  'Apr 2023': 'https://www.ndbc.noaa.gov/download_data.php?filename=tplm242023.txt.gz&dir=data/stdmet/Apr/',
  'May 2023': 'https://www.ndbc.noaa.gov/download_data.php?filename=tplm252023.txt.gz&dir=data/stdmet/May/',
  'Jun 2023': 'https://www.ndbc.noaa.gov/data/stdmet/Jun/tplm2.txt',
  '1985': 'https://www.ndbc.noaa.gov/download_data.php?filename=tplm2h1985.txt.gz&dir=data/historical/stdmet/',
  '1986': 'https://www.ndbc.noaa.gov/download_data.php?filename=tplm2h1986.txt.gz&dir=data/historical/stdmet/',
  '1987': 'https://www.ndbc.noaa.gov/download_data.php?filename=tplm2h1987.txt.gz&dir=data/historical/stdmet/',
  '1988': 'https:/

Again, we could also capture this data as a pandas DataFrame.

In [14]:
api.available_historical(station_id="tplm2", as_df=True)

Unnamed: 0,Jan 2023,Feb 2023,Mar 2023,Apr 2023,May 2023,Jun 2023,1985,1986,1987,1988,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
Standard meteorological data,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/data/stdmet/Jun/tplm...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...
Continuous winds data,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/data/cwind/Jun/tplm2...,,,,,...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...,https://www.ndbc.noaa.gov/download_data.php?fi...


#### Obtain measurements over a given time range

Now that we know both `stdmet` (Standard Meterological) and `cwind` (Continuous Winds) data are available for our station, we can begin to obtain these measurements using our API.

We can select any time range of interest, regardless of the 45-day limit for "realtime" data before it becomes "historical" data.  The API abstracts these concerns away, and will automatically retrieve the data from the appropriate source, unifying it into a single `pd.DataFrame` or `dict` object.

We can begin by querying all `stdmet` data for our station `'tplm2'` for calendar-year 2020.  These dates follow the time conventions of the NDBC data service.

In [15]:
df_stdmet_tplm2 = api.get_data(
    'tplm2',
    'stdmet',
    '2020-01-01',
    '2022-01-01',
    as_df=True
)

By inspecting the returned `pd.DataFrame`, we can see that some of the typical `stdmet` feautures are unavailable for this station.  If we were to pull this data directly from the NDBC data service, these missing measurements would be marked with `99.0` `999` or `999.0` values (depending on the measurement).  However, the `stdmet` data service replaces these values with `NaN` values.  This is done to make it easier to work with the data in downstream applications and analyses.

In [16]:
df_stdmet_tplm2.head(3).T

timestamp,2020-01-01 00:00:00,2020-01-01 01:00:00,2020-01-01 02:00:00
WDIR,188.0,273.0,286.0
WSPD,6.1,6.0,4.7
GST,6.2,6.9,5.6
WVHT,,,
DPD,,,
APD,,,
MWD,,,
PRES,1006.1,1006.8,1007.3
ATMP,9.5,9.3,8.7
WTMP,6.6,6.6,6.5


In [17]:
df_stdmet_tplm2.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 17280 entries, 2020-01-01 00:00:00 to 2022-01-01 00:00:00
Data columns (total 13 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   WDIR    17255 non-null  float64
 1   WSPD    17280 non-null  float64
 2   GST     17280 non-null  float64
 3   WVHT    0 non-null      float64
 4   DPD     0 non-null      float64
 5   APD     0 non-null      float64
 6   MWD     0 non-null      float64
 7   PRES    17278 non-null  float64
 8   ATMP    17278 non-null  float64
 9   WTMP    17280 non-null  float64
 10  DEWP    14715 non-null  float64
 11  VIS     0 non-null      float64
 12  TIDE    0 non-null      float64
dtypes: float64(13)
memory usage: 1.8 MB


In the informational view above, note that we have indexed the data by its `timestamp`, which was computed using the `'YY'`, `'MM'`, `'DD'`, `'hh'`, and `'mm'` fields typical of the NDBC data service.

We can also capture this data as a basic python `dict` object, which can be used to create other data structures or serialized for outside analysis or tools.

In [18]:
api.get_data(
    'tplm2',
    'stdmet',
    '2020-01-01',
    '2020-01-02',
    as_df=False
)

{'WDIR': {Timestamp('2020-01-01 00:00:00'): 188.0,
  Timestamp('2020-01-01 01:00:00'): 273.0,
  Timestamp('2020-01-01 02:00:00'): 286.0,
  Timestamp('2020-01-01 03:00:00'): 278.0,
  Timestamp('2020-01-01 04:00:00'): 293.0,
  Timestamp('2020-01-01 05:00:00'): 297.0,
  Timestamp('2020-01-01 06:00:00'): 296.0,
  Timestamp('2020-01-01 07:00:00'): 287.0,
  Timestamp('2020-01-01 08:00:00'): 287.0,
  Timestamp('2020-01-01 09:00:00'): 282.0,
  Timestamp('2020-01-01 10:00:00'): 286.0,
  Timestamp('2020-01-01 11:00:00'): 276.0,
  Timestamp('2020-01-01 12:00:00'): 271.0,
  Timestamp('2020-01-01 13:00:00'): 275.0,
  Timestamp('2020-01-01 14:00:00'): 266.0,
  Timestamp('2020-01-01 15:00:00'): 269.0,
  Timestamp('2020-01-01 16:00:00'): 270.0,
  Timestamp('2020-01-01 17:00:00'): 260.0,
  Timestamp('2020-01-01 18:00:00'): 252.0,
  Timestamp('2020-01-01 19:00:00'): 257.0,
  Timestamp('2020-01-01 20:00:00'): 271.0,
  Timestamp('2020-01-01 21:00:00'): 269.0,
  Timestamp('2020-01-01 22:00:00'): 281.0,
  T

### Concluding remarks

The NDBC API is a living project, and is open both to community requests and code contributions.  Please feel free to submit a pull request or open an issue on the [GitHub repository](https://www.github.com/cdjellen/ndbc-api) if you have any questions or suggestions.

Thank you for your time and have an excellent rest of your day, wherever in the world you are!