Add BUFR file creation script and fix unit test bug in `test_get_bufr.py` #263

ladsmund · 2024-06-24T08:26:13Z

PR Description

New Feature: Added script for recreating BUFR files, create_bufr_files, as a CLI script in setup.py.
Bug Fix: Resolved issue in test_get_bufr.py that caused unit tests to fail.

BaptisteVandecrux

Looks good!

I did not have much to say about your create_bufr_files. But going through those files triggered a few questions that I listed below. Sorry if they fall outside the scope of the PR. Feel free to move them to issues if you want to deal with them later.

I'm approving anyway since the code is fully functional.

src/pypromice/postprocess/create_bufr_files.py

BaptisteVandecrux · 2024-06-25T08:36:38Z

src/pypromice/postprocess/station_configurations.toml

I've noticed some missing stations:

THU_U2v3 (replacing THU_U2 that has been decomissioned, confirm with Liam)

FRE

JAR & SWC (but decomissioned)

KAN_Tv3 (newly installed)

QAS_B (unknown type, see with Robert)

Check out these lines:

pypromice/src/pypromice/postprocess/station_configurations.toml

Lines 511 to 514 in 3958fb1

[WEG_B]

stid = "WEG_B"

station_site = "NUK_U"

project = "Wegener"

Related issue: #256

I've also been wondering about the comment = "v3_bad":

pypromice/src/pypromice/postprocess/station_configurations.toml

Lines 567 to 576 in 3958fb1

[KPC_Lv3]

stid = "KPC_Lv3"

station_site = "KPC_L"

project = "Promice"

station_type = "mobile"

wmo_id = "04428"

export_bufr = false

comment = "v3_bad"

skipped_variables = []

positions_update_timestamp_only = false

Does that mean that we shouldn't use them?
KPC_L (the v2 station) has comment = "use_v3"

src/pypromice/test/bufr_export/test_get_bufr_integration.py

BaptisteVandecrux · 2024-06-25T08:55:43Z

src/pypromice/postprocess/real_time_utilities.py

+    if last_valid_index not in df_limited.index:
+        logger.info("No valid data limited period")
+        return None


This should solve #235 which is not showing up at SDM anymore (because of maintenance) but that might be still showing up at one of the LYN stations

BaptisteVandecrux · 2024-06-25T08:59:08Z

src/pypromice/postprocess/real_time_utilities.py

+    if last_valid_index not in df_limited.index:
+        logger.info("No valid data limited period")
+        return None
+
    # Apply smoothing to z_boom_u
    # require at least 2 hourly obs? Sometimes seeing once/day data for z_boom_u
    df_limited = rolling_window(df_limited, "z_boom_u", "72H", 2, 1)


Rounding to 1 decimal is quite drastic. You don't think we can provide the height a cm precision?

Later on I also see that the gps_alt is smoothed and only a single digit is kept:

pypromice/src/pypromice/postprocess/real_time_utilities.py

Line 154 in 3958fb1

df_limited, alt_valid = linear_fit(df_limited, "gps_alt", 1)

I can see that there are differences in the accuracy for the height measurements: gps_alt is more of the 10 cm (at least judging from the rounding that is done) while z_boom_u is more 1-5 cm (depending on the surface roughness). It would be great to evaluate our estimated heights in NRT (such as done here) with the precise measurements done by Jakob's GNSS during fieldwork. Maybe the linear regression (or in the future the loess smoothing of gps_alt) does a good job removing the noise and already achieves a <5 cm accuracy making it possible to provide more digits for the height estimations.

Good point about the z_boom_u measurements. It would make sense to use more digits. I'll use 3 digits.

It would definitely make sense to explore better methods for elevation measurements.
Accuracy:
As I understand, our current accuracy is around ±10m, far from cm-level. As you suggested, we might be able to improve these measurements by incorporating fieldwork data from Jacob.
Precision:
Smoothing filters can help enhance the precision of the measurements.
I have not investigated the data yet, but if you suggest <5cm precision, I can increase the number of digits to 2.

I am trying my best to steer between the terms accuracy vs precision 🙈😅.

PS: The BUFR schema uses 0.1m resolution for the elevation measurements. We are currently not using the z_boom_u values.

Maybe it would be better to avoid hard assumptions about precision at this and the point and avoid rounding. The variables are anyway rounded in the BUFRVariables class and the eccodes export.
🤔

BaptisteVandecrux · 2024-06-25T10:08:44Z

src/pypromice/postprocess/get_bufr.py

Some comments on the part of the code that you have not modified (it's not so often that I look at these scripts)

variable naming:

pypromice/src/pypromice/postprocess/get_bufr.py

Line 271 in 3958fb1

sufficient_wx_data, sufficient_position_data = min_data_check(latest_data)

I don't understand what _wx_ stand for.

position seeding:

I got afraid the position seeding was not triggered when seeing:

pypromice/src/pypromice/postprocess/get_bufr.py

Line 338 in 3958fb1

positions_seed_path: Optional[Path] = None,

and the operational scripts it is not being mentioned:
https://github.com/GEUS-Glaciology-and-Climate/aws-operational-processing/blob/9cc7b5083514d45f93a7a30863b561a06b28b4fc/bufr_processor.sh#L17-L22

But turned out that it is set as a non-None default value in the CLI:

pypromice/src/pypromice/postprocess/get_bufr.py

Lines 104 to 110 in 3958fb1

parser.add_argument(

"--position_seed",

default=DEFAULT_POSITION_SEED_PATH,

type=Path,

required=False,

help="Path to csv file with seed values for output positions.",

)

So maybe the default value should be used instead of None when defining get_bufr function:

pypromice/src/pypromice/postprocess/get_bufr.py

Line 338 in 3958fb1

positions_seed_path: Optional[Path] = None,

Yet I saw the following errors/warning for KAN_B in the latest log on glacio01:

2024-06-25 11:11:51,533; INFO; pypromice.postprocess.get_bufr; ####### Processing KAN_B ####### 2024-06-25 11:11:51,533; INFO; pypromice.postprocess.get_bufr; Generating /mnt/data/aws/pypromice_aws/aws-bufr/BUFR_out/KAN_B.bufr from /mnt/data/aws/pypromice_aws/aws-l2/tx/KAN_B/KAN_B_hour.csv 2024-06-25 11:11:51,683; INFO; pypromice.postprocess.get_bufr; No valid instantaneous timestamps! 2024-06-25 11:11:51,686; WARNING; pypromice.postprocess.get_bufr; No position information available for KAN_B

It would be great to have

something more informative than No valid instantaneous timestamps! and

another message after No position information available for KAN_B if the position seed is being used

a clear message if there's no position info in the latest data and the position seed is failing/missing.

height of instruments

Right now, even though z_boom_u is smoothed, I haven't seen it being used to define attributes of the bufr files (did I just miss it?). When seeing lines like:

pypromice/src/pypromice/postprocess/get_bufr.py

Lines 584 to 586 in 3958fb1

heightOfSensorAboveLocalGroundOrDeckOfMarinePlatformWSPD = (

station_configuration.anemometer_from_station_ground

)

I really think that the snow surface is important for the interpretation of the wspd values (but also t and rh). It should be informed either as an additional variable describing the snow thickness above ground (then the current setup where the bare ice surface is taken as ground can be kept) or defining the snow surface as the ground itself.

additional filtering when instruments are too close to the ground

I guess we should be more picky with the values sent to WMO. Maybe measurements lower than 30-50 cm should be filtered out because blowing snow might be interfering with the instruments.

Variable Naming

I tried to change as little as possible of the original code and I don't now why it is called "wx".
The flag sufficient_wx_data is true if both air temperature and pressure are present at the latest time. This is required for exporting the BUFR file for a station.

Position Seeding

I would like to avoid the positions seeding input and rely only on the input data and maybe a station config

Configuration variables were to strictly validated. * Made bufr_integration_test explicit

* Updated read_bufr_file to use wmo_id as index

* Added corresponding unit tests * Added flag to raise exceptions on errors * Added create_bufr_files.py to setup

The sonic ranger based heights are very unstable. DMI are using constant values for their weather stations in Greenland without considering snow cover. Updated unittests to align with the new output dimensions Updated test_get_bufr_integration.py

- Ensure get_bufr_variables raises AttributeError when station dimensions are missing

* Bedrock stations shouldn’t depend on the noisy GPS signal for elevation. * Added station dimension values for WEG_B * Added corresponding unittest

Added eccodes installation

* Added support for loading multiple station configuration files Other * Made ArgumentParser instantiation inline

… wmo_id Updated bufr_utilities.set_station to validate wmo id

* Added detailed descriptions with references to the attributes in BUFRVariables * Change the attribute order to align with the exported schema * Changed variable roundings to align with the scales defined in the BUFR schemas: * Latitude and longitude is set to 5. Was 6 * heightOfStationGroundAboveMeanSeaLevel is set to 1. Was 2 * heightOfBarometerAboveMeanSeaLevel is set to to 1. Was 2 * pressure is set to -1. Was 1. Note: The BUFRVariable unit is Pa and not hPA * airTemperature is set to 2. Was 1. * heightOfSensorAboveLocalGroundOrDeckOfMarinePlatformTempRH is set to 2. Was 4 * heightOfSensorAboveLocalGroundOrDeckOfMarinePlatformWSPD is set to 2. Was 4 * Added unit tests to test the roundings * Updated existing unit tests to align with corrected precision

ladsmund changed the title ~~### PR Title Add BUFR file creation script and fix unit test bug in test_get_bufr.py~~ Add BUFR file creation script and fix unit test bug in test_get_bufr.py Jun 24, 2024

ladsmund requested review from PennyHow and BaptisteVandecrux June 24, 2024 08:26

BaptisteVandecrux previously approved these changes Jun 25, 2024

View reviewed changes

ladsmund dismissed BaptisteVandecrux’s stale review via b5de50d July 3, 2024 13:27

ladsmund force-pushed the feature/bufr_regenerate branch from 3958fb1 to b5de50d Compare July 3, 2024 13:27

ladsmund changed the base branch from main to develop July 3, 2024 13:29

ladsmund force-pushed the feature/bufr_regenerate branch from b5de50d to a1227a8 Compare July 5, 2024 09:48

ladsmund added 15 commits July 8, 2024 12:54

Fixed bug in get_bufr

6057f0f

Configuration variables were to strictly validated. * Made bufr_integration_test explicit

Added __all__ to get_bufr.py

240e3f0

Applied black code formatting

ad2cb26

Made bufr_to_csv as cli script in setup.py

8583592

* Updated read_bufr_file to use wmo_id as index

Added script to recreate bufr files

48574ea

* Added corresponding unit tests * Added flag to raise exceptions on errors * Added create_bufr_files.py to setup

Added test for missing data in get_bufr

116af04

- Ensure get_bufr_variables raises AttributeError when station dimensions are missing

Updated get_bufr to support static GPS heights.

0f97a28

* Bedrock stations shouldn’t depend on the noisy GPS signal for elevation. * Added station dimension values for WEG_B * Added corresponding unittest

Updated github/workflow to run unittests

9596b82

Added eccodes installation

Removed station_configurations.toml from repository

a56d0b1

Extracted StationConfiguration utils from get_bufr

7bb0ef6

* Added support for loading multiple station configuration files Other * Made ArgumentParser instantiation inline

Updated get_bufr to support station config files in folder

b31f772

Fixed test utility function get_station_configuration to create valid…

9979847

… wmo_id Updated bufr_utilities.set_station to validate wmo id

Increased the real_time_utilities rounding precisions

0c3fcd5

ladsmund force-pushed the feature/bufr_regenerate branch from 34afcce to 0c3fcd5 Compare July 9, 2024 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BUFR file creation script and fix unit test bug in `test_get_bufr.py` #263

Add BUFR file creation script and fix unit test bug in `test_get_bufr.py` #263

ladsmund commented Jun 24, 2024

BaptisteVandecrux left a comment

BaptisteVandecrux Jun 25, 2024

BaptisteVandecrux Jun 25, 2024

BaptisteVandecrux Jun 25, 2024

BaptisteVandecrux Jun 25, 2024

ladsmund Jul 9, 2024

ladsmund Jul 9, 2024 •

edited

Loading

BaptisteVandecrux Jun 25, 2024

ladsmund Jul 9, 2024 •

edited

Loading

ladsmund Jul 9, 2024 •

edited

Loading

	[WEG_B]
	stid = "WEG_B"
	station_site = "NUK_U"
	project = "Wegener"

	[KPC_Lv3]
	stid = "KPC_Lv3"
	station_site = "KPC_L"
	project = "Promice"
	station_type = "mobile"
	wmo_id = "04428"
	export_bufr = false
	comment = "v3_bad"
	skipped_variables = []
	positions_update_timestamp_only = false

	parser.add_argument(
	"--position_seed",
	default=DEFAULT_POSITION_SEED_PATH,
	type=Path,
	required=False,
	help="Path to csv file with seed values for output positions.",
	)

	heightOfSensorAboveLocalGroundOrDeckOfMarinePlatformWSPD = (
	station_configuration.anemometer_from_station_ground
	)

Add BUFR file creation script and fix unit test bug in test_get_bufr.py #263

Are you sure you want to change the base?

Add BUFR file creation script and fix unit test bug in test_get_bufr.py #263

Conversation

ladsmund commented Jun 24, 2024

PR Description

BaptisteVandecrux left a comment

Choose a reason for hiding this comment

BaptisteVandecrux Jun 25, 2024

Choose a reason for hiding this comment

BaptisteVandecrux Jun 25, 2024

Choose a reason for hiding this comment

BaptisteVandecrux Jun 25, 2024

Choose a reason for hiding this comment

BaptisteVandecrux Jun 25, 2024

Choose a reason for hiding this comment

ladsmund Jul 9, 2024

Choose a reason for hiding this comment

ladsmund Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

BaptisteVandecrux Jun 25, 2024

Choose a reason for hiding this comment

variable naming:

position seeding:

height of instruments

additional filtering when instruments are too close to the ground

ladsmund Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

Variable Naming

ladsmund Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

Position Seeding

Add BUFR file creation script and fix unit test bug in `test_get_bufr.py` #263

Add BUFR file creation script and fix unit test bug in `test_get_bufr.py` #263

ladsmund Jul 9, 2024 •

edited

Loading

ladsmund Jul 9, 2024 •

edited

Loading

ladsmund Jul 9, 2024 •

edited

Loading