# Frequency Asked Questions

## How do I import SynopticPy?
Simply `import synoptic` to get access to all the services classes. The `datetime` library is useful when requesting data and you'll likely need the `polars` package for DataFrame manipulation.

In [2]:
from datetime import datetime, timedelta

import synoptic
import polars as pl

## What Synoptic weather API services are available?

All the Synoptic Weather API services are available with the exception of `qcsegments`, which is not implemented (I have never needed it). There are essentially two types of API services.

Data Services

- `synoptic.TimeSeries` : Request time series data from a station or stations.
- `synoptic.Latest` :  Request latest data from a station or stations.
- `synoptic.NearestTime` :  Request data from a station or stations nearest a specific time. Very similar to the Latest service.
- `synoptic.Precipitation` : Request precipitation data from a station or stations.
- `synoptic.Latency` : Request latency information for a station or stations.
- `synoptic.Metadata` : Request station metadata, like location, name, etc.

Metadata Services

- `synoptic.QCTypes` : Table of Synoptic's quality control types.
- `synoptic.Variables` : Table of Synoptic's variable definitions.
- `synoptic.Networks` : Table of Synoptic's available networks.
- `synoptic.NetworkTypes` : Table of Synoptic's network type categories.


## What is included in a Synoptic Services class instance?

In general, most instances return the following attributes:

1. All capitalized attributes like `SUMMARY`,  `STATION`, `UNITS`, `QC_SUMMARY` are copied dictionaries attached from the returned json. These are for convenience.
1. `df` is the long-format **Polars DataFrame** of the `STATION` data.
1. `endpoint` is the URL for the requested API service.
1. `help_url` is the website for the documentation for the service.
1. `json` is the returned json from the API request loaded into a Python dictionary.
1. `params` are the user-specified parameters used to make the request.
1. `response` is the object from the requests library, `requests.get(...)`.
1. `service` is the requested Synoptic API service type.
1. `token_source` is where SynopticPy found the token.
1. `url` is the full URL used to make the API request.
1. `verbose` indicates if details about what SynopticPy is doing is printed to the screen (i.e., poor-man's logging).

Let's take a look at the attributes of Metadata service instance...

In [3]:
# Get station metadata for a single, specific station
s = synoptic.Metadata(stid="KSLC", verbose=True)

print(f"{s.endpoint=}")
print(f"{s.service=}")
print(f"{s.help_url=}")
print(f"{s.token_source=}")
# print(f"{s.params=}")
# print(f"{s.url=}")
print(f"{s.verbose=}")
s.df

🚚💨 Speedy delivery from Synoptic metadata service.
📦 Received data from 1 stations.
s.endpoint='https://api.synopticdata.com/v2/stations/metadata'
s.service='metadata'
s.help_url='https://docs.synopticdata.com/services/weather-data-api'
s.token_source='Config File: /home/blaylock/.config/SynopticPy'
s.verbose=True


id,stid,name,elevation,latitude,longitude,mnet_id,state,timezone,elev_dem,period_of_record_start,period_of_record_end,is_restricted,is_active
u32,str,str,f64,f64,f64,u32,str,str,f64,"datetime[μs, UTC]","datetime[μs, UTC]",bool,bool
53,"""KSLC""","""Salt Lake City, Salt Lake City…",4226.0,40.77069,-111.96503,1,"""UT""","""America/Denver""",4235.6,1997-01-01 00:00:00 UTC,2024-09-14 05:20:00 UTC,False,True


## What is the DataFrame structure?

SynopticPy returns all data as long-format Polars DataFrames. This means that for data requests, each row in the dataframe is a single unique observation.

Why? This makes it easy to archive the data locally, such as a Parquet file.

I will let the user manipulate the DataFrame in the way they want it using Polars' extensive and efficient processing. For instance, long-format DataFrames can be _pivoted_ to make a DataFrame with each column as a different variable.

In [4]:
df = synoptic.TimeSeries(
    stid="ukbkb",
    recent=timedelta(hours=6),
    vars=["air_temp", "wind_speed", "wind_direction"],
).df

df.head()

🚚💨 Speedy delivery from Synoptic timeseries service.


📦 Received data from 1 stations.


date_time,variable,sensor,derived,value,units,id,stid,name,elevation,latitude,longitude,mnet_id,state,timezone,elev_dem,period_of_record_start,period_of_record_end,is_restricted,is_active
"datetime[μs, UTC]",str,u32,bool,f64,str,u32,str,str,f64,f64,f64,u32,str,str,f64,"datetime[μs, UTC]","datetime[μs, UTC]",bool,bool
2024-09-14 00:30:00 UTC,"""air_temp""",1,False,23.889,"""Celsius""",37032,"""UKBKB""","""EW2355 Spanish Fork""",4734.0,40.09867,-111.62767,65,"""UT""","""America/Denver""",4740.8,2013-03-13 00:00:00 UTC,2024-09-14 05:15:00 UTC,False,True
2024-09-14 00:45:00 UTC,"""air_temp""",1,False,22.778,"""Celsius""",37032,"""UKBKB""","""EW2355 Spanish Fork""",4734.0,40.09867,-111.62767,65,"""UT""","""America/Denver""",4740.8,2013-03-13 00:00:00 UTC,2024-09-14 05:15:00 UTC,False,True
2024-09-14 01:00:00 UTC,"""air_temp""",1,False,22.222,"""Celsius""",37032,"""UKBKB""","""EW2355 Spanish Fork""",4734.0,40.09867,-111.62767,65,"""UT""","""America/Denver""",4740.8,2013-03-13 00:00:00 UTC,2024-09-14 05:15:00 UTC,False,True
2024-09-14 01:16:00 UTC,"""air_temp""",1,False,21.667,"""Celsius""",37032,"""UKBKB""","""EW2355 Spanish Fork""",4734.0,40.09867,-111.62767,65,"""UT""","""America/Denver""",4740.8,2013-03-13 00:00:00 UTC,2024-09-14 05:15:00 UTC,False,True
2024-09-14 01:31:00 UTC,"""air_temp""",1,False,20.556,"""Celsius""",37032,"""UKBKB""","""EW2355 Spanish Fork""",4734.0,40.09867,-111.62767,65,"""UT""","""America/Denver""",4740.8,2013-03-13 00:00:00 UTC,2024-09-14 05:15:00 UTC,False,True


In [5]:
df.pivot(index=["date_time", "stid"], on="variable", values="value")

date_time,stid,air_temp,wind_speed,wind_direction
"datetime[μs, UTC]",str,f64,f64,f64
2024-09-14 00:30:00 UTC,"""UKBKB""",23.889,0.0,
2024-09-14 00:45:00 UTC,"""UKBKB""",22.778,0.0,
2024-09-14 01:00:00 UTC,"""UKBKB""",22.222,0.0,
2024-09-14 01:16:00 UTC,"""UKBKB""",21.667,0.0,
2024-09-14 01:31:00 UTC,"""UKBKB""",20.556,0.0,
…,…,…,…,…
2024-09-14 05:15:00 UTC,"""UKBKB""",15.556,0.0,
2024-09-14 05:30:00 UTC,"""UKBKB""",15.556,0.895,0.0
2024-09-14 05:45:00 UTC,"""UKBKB""",15.556,0.448,91.0
2024-09-14 06:00:00 UTC,"""UKBKB""",15.556,0.0,


## Why does SynopticPy return Polars DataFrames?

Previous versions of SynopticPy used Pandas. Pandas is popular, but is starting to be antiquated. I have been using Polars for over a year and love it! These are the reasons why I re-wrote SynopticPy from the ground up using [Polars](https://docs.pola.rs/user-guide/getting-started/).

1. **_Personal learning exercise:_** I wanted to get better at using Polars, and rewriting SynopticPy was a great chance to do that. I'm also using class inheritance, which is not something I have used before, so I'm experimenting with that too.

1. **_Improve maintainability:_** Older versions of SynopticPy had some quirks I wanted to fix. The best way to fix those quicks was to re-write the package.

1. **_Locally Archiving Synoptic Data:_** SynopticPy limits the amount of data you can request in one API request. Also, in a research setting I need to use and re-use data lots of times as I'm experimenting. It doesn't make sense to keep getting data from the API every time I need to use the data. Instead, I should store the data locally after I got it from Synoptic. A long-format Polars DataFrame can be written to Parquet format, which has much smaller file sizes than JSON files.


## What are the benefits of saving Synoptic data to Parquet instead of raw JSON?

If I need to reuse data over many times (i.e., researching a case study) then I don't want to keep asking Synoptic for the data; I want to get the data and save it to local disk. Also, Synoptic restricts how much data you can retrieve in a single API request. If you need long a long time series then you will need to make multiple API calls. You should save the DataFrame information to a Parquet file to save disk space and most performant loading time.

To demonstrate, let's collect a timeseries of 5 days of data for all the stations within 10 miles of WBB.

1. Write the raw JSON to a JSON file.
1. Write the Polars DataFrame to a Parquet file.

- How large is the JSON file versus Parquet file? _Parquet is about 18x smaller than JSON, because it is efficiently compressed._
- How long does it take to load a JSON file versus Parquet file? _Parquet is faster to load into memory, it's already organized in a clean table, you can read only select rows if you want, and it's easy to read multiple files at a time._

In [6]:
s = synoptic.TimeSeries(radius="wbb,10", recent=timedelta(days=5))
print(f"Number of rows: {len(s.df):,}")

🚚💨 Speedy delivery from Synoptic timeseries service.


📦 Received data from 89 stations.
Number of rows: 1,447,543


In [7]:
import json
from pathlib import Path

filepath = Path("sample_timeseries.json")
parquet = filepath.with_suffix(".parquet")

# Write raw data to JSON
with open(filepath, "w") as f:
    json.dump(s.json, f, indent=4)


# Write DataFrame to Parquet
s.df.write_parquet(parquet)

print(f"JSON file size: {filepath.stat().st_size / 1000 / 1000:.2f} MB")
print(f"Parquet size: {parquet.stat().st_size / 1000 / 1000:.2f} MB")

JSON file size: 43.66 MB
Parquet size: 2.24 MB


In [13]:
%%time
# Read the JSON file
with open(filepath, "r") as json_file:
    data = json.load(json_file)

CPU times: user 114 ms, sys: 133 ms, total: 247 ms
Wall time: 245 ms


: 

In [9]:
%%time
_ = pl.read_parquet(parquet)

CPU times: user 141 ms, sys: 117 ms, total: 259 ms
Wall time: 109 ms


## How can I split a long-format DataFrame by station?
Polars makes this easy; use `df.partition_by('stid')` to get a list of DataFrames, each DataFrame with it's own station.

In [10]:
df = synoptic.TimeSeries(
    stid="ukbkb,kslc",
    recent=timedelta(minutes=30),
    vars="air_temp",
).df

df.partition_by("stid")


🚚💨 Speedy delivery from Synoptic timeseries service.


📦 Received data from 2 stations.


[shape: (6, 20)
 ┌─────────────┬──────────┬────────┬─────────┬───┬────────────┬────────────┬────────────┬───────────┐
 │ date_time   ┆ variable ┆ sensor ┆ derived ┆ … ┆ period_of_ ┆ period_of_ ┆ is_restric ┆ is_active │
 │ ---         ┆ ---      ┆ ---    ┆ ---     ┆   ┆ record_sta ┆ record_end ┆ ted        ┆ ---       │
 │ datetime[μs ┆ str      ┆ u32    ┆ bool    ┆   ┆ rt         ┆ ---        ┆ ---        ┆ bool      │
 │ , UTC]      ┆          ┆        ┆         ┆   ┆ ---        ┆ datetime[μ ┆ bool       ┆           │
 │             ┆          ┆        ┆         ┆   ┆ datetime[μ ┆ s, UTC]    ┆            ┆           │
 │             ┆          ┆        ┆         ┆   ┆ s, UTC]    ┆            ┆            ┆           │
 ╞═════════════╪══════════╪════════╪═════════╪═══╪════════════╪════════════╪════════════╪═══════════╡
 │ 2024-09-14  ┆ air_temp ┆ 1      ┆ false   ┆ … ┆ 1997-01-01 ┆ 2024-09-14 ┆ false      ┆ true      │
 │ 05:54:00    ┆          ┆        ┆         ┆   ┆ 00:00:00   ┆ 05

## How to get Metadata for stations of interest?

In [11]:
synoptic.Metadata(radius="kmry,5").df

🚚💨 Speedy delivery from Synoptic metadata service.


📦 Received data from 34 stations.


id,stid,name,elevation,latitude,longitude,mnet_id,state,timezone,elev_dem,distance,period_of_record_start,period_of_record_end,is_restricted,is_active
u32,str,str,f64,f64,f64,u32,str,str,f64,f64,"datetime[μs, UTC]","datetime[μs, UTC]",bool,bool
276,"""KMRY""","""Monterey Regional Airport""",167.0,36.59047,-121.84875,1,"""CA""","""America/Los_Angeles""",170.6,0.0,1997-04-12 00:00:00 UTC,2024-09-14 05:35:00 UTC,false,true
2478,"""DMB""","""BAMI1""",26.0,36.61,-121.87,31,"""CA""","""America/Los_Angeles""",0.0,1.79,2000-05-07 00:00:00 UTC,2003-02-06 21:50:00 UTC,false,false
2489,"""MBA""","""BAMI12""",75.0,36.62,-121.9,31,"""CA""","""America/Los_Angeles""",0.0,3.5,2000-05-07 00:00:00 UTC,2003-02-06 20:50:00 UTC,false,false
3619,"""RTGC1""","""FORT ORD #2""",490.0,36.626944,-121.786389,2,"""CA""","""America/Los_Angeles""",469.2,4.28,2001-10-11 00:00:00 UTC,2012-10-31 03:34:00 UTC,false,false
18515,"""CMEC1""","""CARMEL RIVER NEAR CARMEL 3E""",45.0,36.53917,-121.87944,106,"""CA""","""America/Los_Angeles""",52.5,3.93,2006-12-16 00:00:00 UTC,2007-12-28 18:15:00 UTC,false,false
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
155497,"""026PG""","""Parker Flats Cutoff""",436.0,36.62987,-121.79182,229,"""CA""","""America/Los_Angeles""",419.9,4.17,2020-08-25 03:14:00 UTC,2024-09-14 05:40:00 UTC,false,true
166603,"""G0246""","""GW0246 Monterey""",314.0,36.57217,-121.7975,65,"""CA""","""America/Los_Angeles""",301.8,3.11,2021-07-26 22:09:00 UTC,2024-09-14 05:36:00 UTC,false,true
236415,"""NDBC46240""","""CABRILLO POINT, MONTEREY BAY, …",0.0,36.626,-121.907,286,"""CA""","""America/Los_Angeles""",,4.06,2024-04-25 11:26:00 UTC,2024-09-14 04:56:00 UTC,false,true
236973,"""NDBCMEYC1""","""9413450 - MONTEREY, CA""",7.87,36.605,-121.889,286,"""CA""","""America/Los_Angeles""",,2.45,2024-04-27 21:36:00 UTC,2024-09-14 04:36:00 UTC,false,true


## What if I don't know Polars? I'm sticking with Pandas.
Converting a Polars DataFarme to and Pandas DataFrame is simple. Just use `df.to_pandas()` and you got it!

In [12]:
df = synoptic.Metadata(radius="kmry,5").df
print("Right now I'm a", type(df))

df = df.to_pandas()
print("Now I'm a", type(df))


🚚💨 Speedy delivery from Synoptic metadata service.
📦 Received data from 34 stations.
Right now I'm a <class 'polars.dataframe.frame.DataFrame'>
Now I'm a <class 'pandas.core.frame.DataFrame'>
