In [None]:
import logging

import hvplot.pandas
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
import shapely
import xarray as xr

import searvey

## Retrieve Station Metadata

In order to retrieve station metadata we need to use the `get_ioc_stations()` function:

In [None]:
ioc_stations = searvey.get_ioc_stations()
len(ioc_stations)

In [None]:
ioc_stations.sample(3).sort_index()

In [None]:
ioc_stations.columns

In [None]:
world_plot = ioc_stations.hvplot(geo=True, tiles=True, hover_cols=["ioc_code", "location"])
world_plot.opts(width=800, height=500)

## Retrieve station metadata from arbitrary polygon

We can filter the IOC stations using any shapely object. E.g. to only select stations in the East Coast of US:

In [None]:
east_coast = shapely.geometry.box(-85, 25, -65, 45)
east_stations = searvey.get_ioc_stations(region=east_coast)
len(east_stations)

In [None]:
east_stations.hvplot.points(geo=True, tiles=True)

## Retrieve IOC station data

The function for retrieving data is called `fetch_ioc_station()`. 

In its simplest form it only requires the station_id (i.e. IOC_CODE) and it will retrieve the last week of data:

In [None]:
df = searvey.fetch_ioc_station("acap2")
df

We can also explicitly specify the start and the end date. E.g. to retrieve the first 10 days of May 2024:

In [None]:
df = searvey.fetch_ioc_station(
    station_id="alva",
    start_date=pd.Timestamp("2024-05-01"),
    end_date=pd.Timestamp("2024-05-10"),
    progress_bar=False,
)
df

If we request more than 30 days, then multiple HTTP requests are send to the IOC servers via multithreading and the responses are merged to a single dataframe. 

In this case, setting `progress_bar=True` can be useful in monitoring the progress of HTTP requests. 
For example to retrieve data for the first 6 months of 2020:

In [None]:
df = searvey.fetch_ioc_station(
    station_id="alva",
    start_date=pd.Timestamp("2020-01-01"),
    end_date=pd.Timestamp("2020-06-01"),
    progress_bar=True,
)
df

Keep in mind that each IOC station may return dataframes with different sensors/columns. For example the station in Bahamas returns a bunch of them:

In [None]:
bahamas = searvey.fetch_ioc_station(
    station_id="setp1",
    start_date=pd.Timestamp("2020-05-25"),
    end_date=pd.Timestamp("2020-05-30"),
    progress_bar=False,
)
bahamas

Furthermore, not all of these timeseries are ready to be used. 

E.g. we see that in the last days of May the `rad` sensor was offline for some time:

In [None]:
bahamas.rad.hvplot()