Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BUFR file creation script and fix unit test bug in test_get_bufr.py #263

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

ladsmund
Copy link
Contributor

PR Description

  • New Feature: Added script for recreating BUFR files, create_bufr_files, as a CLI script in setup.py.
  • Bug Fix: Resolved issue in test_get_bufr.py that caused unit tests to fail.

@ladsmund ladsmund changed the title ### PR Title Add BUFR file creation script and fix unit test bug in test_get_bufr.py Add BUFR file creation script and fix unit test bug in test_get_bufr.py Jun 24, 2024
Copy link
Member

@BaptisteVandecrux BaptisteVandecrux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

I did not have much to say about your create_bufr_files. But going through those files triggered a few questions that I listed below. Sorry if they fall outside the scope of the PR. Feel free to move them to issues if you want to deal with them later.

I'm approving anyway since the code is fully functional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed some missing stations:

  • THU_U2v3 (replacing THU_U2 that has been decomissioned, confirm with Liam)
  • FRE
  • JAR & SWC (but decomissioned)
  • KAN_Tv3 (newly installed)
  • QAS_B (unknown type, see with Robert)

Check out these lines:

[WEG_B]
stid = "WEG_B"
station_site = "NUK_U"
project = "Wegener"

Related issue: #256

I've also been wondering about the comment = "v3_bad":

[KPC_Lv3]
stid = "KPC_Lv3"
station_site = "KPC_L"
project = "Promice"
station_type = "mobile"
wmo_id = "04428"
export_bufr = false
comment = "v3_bad"
skipped_variables = []
positions_update_timestamp_only = false

Does that mean that we shouldn't use them?
KPC_L (the v2 station) has comment = "use_v3"

Comment on lines +69 to +71
if last_valid_index not in df_limited.index:
logger.info("No valid data limited period")
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should solve #235 which is not showing up at SDM anymore (because of maintenance) but that might be still showing up at one of the LYN stations

if last_valid_index not in df_limited.index:
logger.info("No valid data limited period")
return None

# Apply smoothing to z_boom_u
# require at least 2 hourly obs? Sometimes seeing once/day data for z_boom_u
df_limited = rolling_window(df_limited, "z_boom_u", "72H", 2, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rounding to 1 decimal is quite drastic. You don't think we can provide the height a cm precision?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later on I also see that the gps_alt is smoothed and only a single digit is kept:

df_limited, alt_valid = linear_fit(df_limited, "gps_alt", 1)

I can see that there are differences in the accuracy for the height measurements: gps_alt is more of the 10 cm (at least judging from the rounding that is done) while z_boom_u is more 1-5 cm (depending on the surface roughness). It would be great to evaluate our estimated heights in NRT (such as done here) with the precise measurements done by Jakob's GNSS during fieldwork. Maybe the linear regression (or in the future the loess smoothing of gps_alt) does a good job removing the noise and already achieves a <5 cm accuracy making it possible to provide more digits for the height estimations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the z_boom_u measurements. It would make sense to use more digits. I'll use 3 digits.

It would definitely make sense to explore better methods for elevation measurements.
Accuracy:
As I understand, our current accuracy is around ±10m, far from cm-level. As you suggested, we might be able to improve these measurements by incorporating fieldwork data from Jacob.
Precision:
Smoothing filters can help enhance the precision of the measurements.
I have not investigated the data yet, but if you suggest <5cm precision, I can increase the number of digits to 2.

I am trying my best to steer between the terms accuracy vs precision 🙈😅.

PS: The BUFR schema uses 0.1m resolution for the elevation measurements. We are currently not using the z_boom_u values.

Copy link
Contributor Author

@ladsmund ladsmund Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to avoid hard assumptions about precision at this and the point and avoid rounding. The variables are anyway rounded in the BUFRVariables class and the eccodes export.
🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the part of the code that you have not modified (it's not so often that I look at these scripts)

variable naming:

sufficient_wx_data, sufficient_position_data = min_data_check(latest_data)

I don't understand what _wx_ stand for.

position seeding:

I got afraid the position seeding was not triggered when seeing:

positions_seed_path: Optional[Path] = None,

and the operational scripts it is not being mentioned:
https://github.com/GEUS-Glaciology-and-Climate/aws-operational-processing/blob/9cc7b5083514d45f93a7a30863b561a06b28b4fc/bufr_processor.sh#L17-L22

But turned out that it is set as a non-None default value in the CLI:

parser.add_argument(
"--position_seed",
default=DEFAULT_POSITION_SEED_PATH,
type=Path,
required=False,
help="Path to csv file with seed values for output positions.",
)

So maybe the default value should be used instead of None when defining get_bufr function:

positions_seed_path: Optional[Path] = None,

Yet I saw the following errors/warning for KAN_B in the latest log on glacio01:

2024-06-25 11:11:51,533; INFO; pypromice.postprocess.get_bufr; ####### Processing KAN_B #######
2024-06-25 11:11:51,533; INFO; pypromice.postprocess.get_bufr; Generating /mnt/data/aws/pypromice_aws/aws-bufr/BUFR_out/KAN_B.bufr from /mnt/data/aws/pypromice_aws/aws-l2/tx/KAN_B/KAN_B_hour.csv
2024-06-25 11:11:51,683; INFO; pypromice.postprocess.get_bufr; No valid instantaneous timestamps!
2024-06-25 11:11:51,686; WARNING; pypromice.postprocess.get_bufr; No position information available for KAN_B

It would be great to have

  1. something more informative than No valid instantaneous timestamps! and
  2. another message after No position information available for KAN_B if the position seed is being used
  3. a clear message if there's no position info in the latest data and the position seed is failing/missing.

height of instruments

Right now, even though z_boom_u is smoothed, I haven't seen it being used to define attributes of the bufr files (did I just miss it?). When seeing lines like:

heightOfSensorAboveLocalGroundOrDeckOfMarinePlatformWSPD = (
station_configuration.anemometer_from_station_ground
)

I really think that the snow surface is important for the interpretation of the wspd values (but also t and rh). It should be informed either as an additional variable describing the snow thickness above ground (then the current setup where the bare ice surface is taken as ground can be kept) or defining the snow surface as the ground itself.

additional filtering when instruments are too close to the ground

I guess we should be more picky with the values sent to WMO. Maybe measurements lower than 30-50 cm should be filtered out because blowing snow might be interfering with the instruments.

Copy link
Contributor Author

@ladsmund ladsmund Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable Naming

I tried to change as little as possible of the original code and I don't now why it is called "wx".
The flag sufficient_wx_data is true if both air temperature and pressure are present at the latest time. This is required for exporting the BUFR file for a station.

Copy link
Contributor Author

@ladsmund ladsmund Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Position Seeding

I would like to avoid the positions seeding input and rely only on the input data and maybe a station config

Configuration variables were to strictly validated.
* Made bufr_integration_test explicit
* Updated read_bufr_file to use wmo_id as index
* Added corresponding unit tests
* Added flag to raise exceptions on errors
* Added create_bufr_files.py to setup
The sonic ranger based heights are very unstable. DMI are using constant values for their weather stations in Greenland without considering snow cover.

Updated unittests to align with the new output dimensions
Updated test_get_bufr_integration.py
- Ensure get_bufr_variables raises AttributeError when station dimensions are missing
* Bedrock stations shouldn’t depend on the noisy GPS signal for elevation.
* Added station dimension values for WEG_B
* Added corresponding unittest
Added eccodes installation
* Added support for loading multiple station configuration files

Other
* Made ArgumentParser instantiation inline
… wmo_id

Updated bufr_utilities.set_station to validate wmo id
* Added detailed descriptions with references to the attributes in BUFRVariables
* Change the attribute order to align with the exported schema
* Changed variable roundings to align with the scales defined in the BUFR schemas:
  * Latitude and longitude is set to 5. Was 6
  * heightOfStationGroundAboveMeanSeaLevel is set to 1. Was 2
  * heightOfBarometerAboveMeanSeaLevel is set to to 1. Was 2
  * pressure is set to -1. Was 1. Note: The BUFRVariable unit is Pa and not hPA
  * airTemperature is set to 2. Was 1.
  * heightOfSensorAboveLocalGroundOrDeckOfMarinePlatformTempRH is set to 2. Was 4
  * heightOfSensorAboveLocalGroundOrDeckOfMarinePlatformWSPD is set to 2. Was 4
 * Added unit tests to test the roundings
* Updated existing unit tests to align with corrected precision
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants