Skip to content

Vonng/isd

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
bin
 
 
 
 
doc
 
 
 
 
sql
 
 
ui
 
 
 
 
 
 
 
 
 
 

ISD —— Integrated Surface Data

SYNOPSIS

Download, Parse, Visualize Integrated Surface Dataset

Including 30000 meteorology station, sub-hourly observation records, from 1900-2021.

Quick Start

make all will just setup everything

Internet (Github & noaa) access required

Make Baseline Works

Run make baseline will create a minimal usable production via:

make sql        # load isd database schema into postgres (via PGURL env)
make ui         # setup grafana dashboards
make download   # download meta data (dict) & parsers
make load-meta  # load meta-data into database

Get This Year's Daily Summary

Get latest daily observation summary (daily, monthly, yearly)

NOTICE: Will download directly from noaa. (check your proxy if too slow! about 60MB per year) around 3~4 GB original zipped file, 20 GB in database

Run make reload will load minimal data (this year so far) to database.

make get-daily   # get latest observation daily summary (of latest year e.g 2021)
make load-daily  # load latest daily data into database (of latest year e.g 2021)
make refresh     # refresh monthly & yearly data based on daily data 

ISD Daily and ISD hourly dataset will roll update each day. Run these commands to get daily update.

Get This Year's Hourly Raw Data

Get the latest hourly observation raw data (not recommended)

WARNING: hourly raw data are large dataset with tons of noisy. around 5GB per year around 100 GB original zipped file, 1TB in database

Run make reload-hourly will load minimal raw data (this year so far) to database.

make get-hourly   # get latest observation daily summary (of latest year e.g 2021)
make load-hourly  # load latest daily data into database (of latest year e.g 2021) 

Pour more historic data

You can download hourly & daily data by specific year.

# bin/get-daily.sh <year> will get specific year's observation daily summary (1929-2021)
bin/get-daily.sh 2020     # get 2020 data

# bin/get-hourly.sh <year> will get latest observation daily summary (1900-2021)
bin/get-hourly.sh 2020 

And load them into database with parser:

# bin/load-daily.sh <PGURL> <year> will load <year>'s daily summary into PGURL database 
bin/load-daily.sh service=meta 2020     # note there may have some dirty data that violate constraints

# bin/load-hourly.sh <PGURL> <year> will load <year>'s raw hourly data into PGURL database
bin/load-hourly.sh service=meta 2020

Data

Dataset

Dataset Sample Document Comments
ISD Hourly isd-hourly-sample.csv isd-hourly-document.pdf (Sub-) Hour oberservation records
ISD Daily isd-daily-sample.csv isd-daily-format.txt Daily summary
ISD Monthly N/A isd-gsom-document.pdf Not used, gen from daily
ISD Yearly N/A isd-gsoy-document.pdf Not used, gen from monthly

Hourly Data: Oringinal tarball size 105GB, Table size 1TB (+600GB Indexes).

Daily Data: Oringinal tarball size 3.2GB, table size 24 GB

It is recommended to have 2TB storage for a full installation, and at least 40GB for daily data only installation.

Schema

Data schema definition

Station

CREATE TABLE isd.station
(
    station    VARCHAR(12) PRIMARY KEY,
    usaf       VARCHAR(6) GENERATED ALWAYS AS (substring(station, 1, 6)) STORED,
    wban       VARCHAR(5) GENERATED ALWAYS AS (substring(station, 7, 5)) STORED,
    name       VARCHAR(32),
    country    VARCHAR(2),
    province   VARCHAR(2),
    icao       VARCHAR(4),
    location   GEOMETRY(POINT),
    longitude  NUMERIC GENERATED ALWAYS AS (Round(ST_X(location)::NUMERIC, 6)) STORED,
    latitude   NUMERIC GENERATED ALWAYS AS (Round(ST_Y(location)::NUMERIC, 6)) STORED,
    elevation  NUMERIC,
    period     daterange,
    begin_date DATE GENERATED ALWAYS AS (lower(period)) STORED,
    end_date   DATE GENERATED ALWAYS AS (upper(period)) STORED
);

Hourly Data

CREATE TABLE isd.hourly
(
    station    VARCHAR(11) NOT NULL,
    ts         TIMESTAMP   NOT NULL,
    temp       NUMERIC(3, 1),
    dewp       NUMERIC(3, 1),
    slp        NUMERIC(5, 1),
    stp        NUMERIC(5, 1),
    vis        NUMERIC(6),
    wd_angle   NUMERIC(3),
    wd_speed   NUMERIC(4, 1),
    wd_gust    NUMERIC(4, 1),
    wd_code    VARCHAR(1),
    cld_height NUMERIC(5),
    cld_code   VARCHAR(2),
    sndp       NUMERIC(5, 1),
    prcp       NUMERIC(5, 1),
    prcp_hour  NUMERIC(2),
    prcp_code  VARCHAR(1),
    mw_code    VARCHAR(2),
    aw_code    VARCHAR(2),
    pw_code    VARCHAR(1),
    pw_hour    NUMERIC(2),
    data       JSONB
) PARTITION BY RANGE (ts);

Daily Data

CREATE TABLE isd.daily
(
   station     VARCHAR(12) NOT NULL,
   ts          DATE        NOT NULL,
   temp_mean   NUMERIC(3, 1),
   temp_min    NUMERIC(3, 1),
   temp_max    NUMERIC(3, 1),
   dewp_mean   NUMERIC(3, 1),
   slp_mean    NUMERIC(5, 1),
   stp_mean    NUMERIC(5, 1),
   vis_mean    NUMERIC(6),
   wdsp_mean   NUMERIC(4, 1),
   wdsp_max    NUMERIC(4, 1),
   gust        NUMERIC(4, 1),
   prcp_mean   NUMERIC(5, 1),
   prcp        NUMERIC(5, 1),
   sndp        NuMERIC(5, 1),
   is_foggy    BOOLEAN,
   is_rainy    BOOLEAN,
   is_snowy    BOOLEAN,
   is_hail     BOOLEAN,
   is_thunder  BOOLEAN,
   is_tornado  BOOLEAN,
   temp_count  SMALLINT,
   dewp_count  SMALLINT,
   slp_count   SMALLINT,
   stp_count   SMALLINT,
   wdsp_count  SMALLINT,
   visib_count SMALLINT,
   temp_min_f  BOOLEAN,
   temp_max_f  BOOLEAN,
   prcp_flag   CHAR,
   PRIMARY KEY (ts, station)
) PARTITION BY RANGE (ts);

Parser

There are two parser: isdd and isdh, which takes noaa original yearly tarball as input, generate CSV as output (which could be directly consume by PostgreSQL Copy command).

NAME
	isdh -- Intergrated Surface Dataset Hourly Parser

SYNOPSIS
	isdh [-i <input|stdin>] [-o <output|st>] -p -d -c -v

DESCRIPTION
	The isdh program takes isd hourly (yearly tarball file) as input.
	And generate csv format as output

OPTIONS
	-i	<input>		input file, stdin by default
	-o	<output>	output file, stdout by default
	-p	<profpath>	pprof file path (disable by default)	
	-v                verbose progress report
	-d                de-duplicate rows (raw, ts-first, hour-first)
	-c                add comma separated extra columns

UI

ISD Overview

Dashboard definition

ISD Station

Dashboard definition

ISD Monthly

Dashboard definition