-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BUFR file creation script and fix unit test bug in test_get_bufr.py
#263
base: develop
Are you sure you want to change the base?
Conversation
test_get_bufr.py
test_get_bufr.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
I did not have much to say about your create_bufr_files
. But going through those files triggered a few questions that I listed below. Sorry if they fall outside the scope of the PR. Feel free to move them to issues if you want to deal with them later.
I'm approving anyway since the code is fully functional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed some missing stations:
- THU_U2v3 (replacing THU_U2 that has been decomissioned, confirm with Liam)
- FRE
- JAR & SWC (but decomissioned)
- KAN_Tv3 (newly installed)
- QAS_B (unknown type, see with Robert)
Check out these lines:
pypromice/src/pypromice/postprocess/station_configurations.toml
Lines 511 to 514 in 3958fb1
[WEG_B] | |
stid = "WEG_B" | |
station_site = "NUK_U" | |
project = "Wegener" |
Related issue: #256
I've also been wondering about the comment = "v3_bad"
:
pypromice/src/pypromice/postprocess/station_configurations.toml
Lines 567 to 576 in 3958fb1
[KPC_Lv3] | |
stid = "KPC_Lv3" | |
station_site = "KPC_L" | |
project = "Promice" | |
station_type = "mobile" | |
wmo_id = "04428" | |
export_bufr = false | |
comment = "v3_bad" | |
skipped_variables = [] | |
positions_update_timestamp_only = false |
Does that mean that we shouldn't use them?
KPC_L
(the v2 station) has comment = "use_v3"
if last_valid_index not in df_limited.index: | ||
logger.info("No valid data limited period") | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should solve #235 which is not showing up at SDM anymore (because of maintenance) but that might be still showing up at one of the LYN stations
if last_valid_index not in df_limited.index: | ||
logger.info("No valid data limited period") | ||
return None | ||
|
||
# Apply smoothing to z_boom_u | ||
# require at least 2 hourly obs? Sometimes seeing once/day data for z_boom_u | ||
df_limited = rolling_window(df_limited, "z_boom_u", "72H", 2, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rounding to 1 decimal is quite drastic. You don't think we can provide the height a cm precision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later on I also see that the gps_alt is smoothed and only a single digit is kept:
df_limited, alt_valid = linear_fit(df_limited, "gps_alt", 1) |
I can see that there are differences in the accuracy for the height measurements: gps_alt
is more of the 10 cm (at least judging from the rounding that is done) while z_boom_u
is more 1-5 cm (depending on the surface roughness). It would be great to evaluate our estimated heights in NRT (such as done here) with the precise measurements done by Jakob's GNSS during fieldwork. Maybe the linear regression (or in the future the loess smoothing of gps_alt
) does a good job removing the noise and already achieves a <5 cm accuracy making it possible to provide more digits for the height estimations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about the z_boom_u measurements. It would make sense to use more digits. I'll use 3 digits.
It would definitely make sense to explore better methods for elevation measurements.
Accuracy:
As I understand, our current accuracy is around ±10m, far from cm-level. As you suggested, we might be able to improve these measurements by incorporating fieldwork data from Jacob.
Precision:
Smoothing filters can help enhance the precision of the measurements.
I have not investigated the data yet, but if you suggest <5cm precision, I can increase the number of digits to 2.
I am trying my best to steer between the terms accuracy vs precision 🙈😅.
PS: The BUFR schema uses 0.1m resolution for the elevation measurements. We are currently not using the z_boom_u values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be better to avoid hard assumptions about precision at this and the point and avoid rounding. The variables are anyway rounded in the BUFRVariables class and the eccodes export.
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on the part of the code that you have not modified (it's not so often that I look at these scripts)
variable naming:
sufficient_wx_data, sufficient_position_data = min_data_check(latest_data) |
I don't understand what
_wx_
stand for.
position seeding:
I got afraid the position seeding was not triggered when seeing:
positions_seed_path: Optional[Path] = None, |
and the operational scripts it is not being mentioned:
https://github.com/GEUS-Glaciology-and-Climate/aws-operational-processing/blob/9cc7b5083514d45f93a7a30863b561a06b28b4fc/bufr_processor.sh#L17-L22
But turned out that it is set as a non-None default value in the CLI:
pypromice/src/pypromice/postprocess/get_bufr.py
Lines 104 to 110 in 3958fb1
parser.add_argument( | |
"--position_seed", | |
default=DEFAULT_POSITION_SEED_PATH, | |
type=Path, | |
required=False, | |
help="Path to csv file with seed values for output positions.", | |
) |
So maybe the default value should be used instead of None
when defining get_bufr
function:
positions_seed_path: Optional[Path] = None, |
Yet I saw the following errors/warning for KAN_B in the latest log on glacio01:
2024-06-25 11:11:51,533; INFO; pypromice.postprocess.get_bufr; ####### Processing KAN_B #######
2024-06-25 11:11:51,533; INFO; pypromice.postprocess.get_bufr; Generating /mnt/data/aws/pypromice_aws/aws-bufr/BUFR_out/KAN_B.bufr from /mnt/data/aws/pypromice_aws/aws-l2/tx/KAN_B/KAN_B_hour.csv
2024-06-25 11:11:51,683; INFO; pypromice.postprocess.get_bufr; No valid instantaneous timestamps!
2024-06-25 11:11:51,686; WARNING; pypromice.postprocess.get_bufr; No position information available for KAN_B
It would be great to have
- something more informative than
No valid instantaneous timestamps!
and - another message after
No position information available for KAN_B
if the position seed is being used - a clear message if there's no position info in the latest data and the position seed is failing/missing.
height of instruments
Right now, even though z_boom_u
is smoothed, I haven't seen it being used to define attributes of the bufr files (did I just miss it?). When seeing lines like:
pypromice/src/pypromice/postprocess/get_bufr.py
Lines 584 to 586 in 3958fb1
heightOfSensorAboveLocalGroundOrDeckOfMarinePlatformWSPD = ( | |
station_configuration.anemometer_from_station_ground | |
) |
I really think that the snow surface is important for the interpretation of the
wspd
values (but also t
and rh
). It should be informed either as an additional variable describing the snow thickness above ground (then the current setup where the bare ice surface is taken as ground can be kept) or defining the snow surface as the ground itself.
additional filtering when instruments are too close to the ground
I guess we should be more picky with the values sent to WMO. Maybe measurements lower than 30-50 cm should be filtered out because blowing snow might be interfering with the instruments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable Naming
I tried to change as little as possible of the original code and I don't now why it is called "wx".
The flag sufficient_wx_data
is true if both air temperature and pressure are present at the latest time. This is required for exporting the BUFR file for a station.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Position Seeding
I would like to avoid the positions seeding input and rely only on the input data and maybe a station config
3958fb1
to
b5de50d
Compare
b5de50d
to
a1227a8
Compare
Configuration variables were to strictly validated. * Made bufr_integration_test explicit
* Updated read_bufr_file to use wmo_id as index
* Added corresponding unit tests * Added flag to raise exceptions on errors * Added create_bufr_files.py to setup
The sonic ranger based heights are very unstable. DMI are using constant values for their weather stations in Greenland without considering snow cover. Updated unittests to align with the new output dimensions Updated test_get_bufr_integration.py
- Ensure get_bufr_variables raises AttributeError when station dimensions are missing
* Bedrock stations shouldn’t depend on the noisy GPS signal for elevation. * Added station dimension values for WEG_B * Added corresponding unittest
Added eccodes installation
* Added support for loading multiple station configuration files Other * Made ArgumentParser instantiation inline
… wmo_id Updated bufr_utilities.set_station to validate wmo id
* Added detailed descriptions with references to the attributes in BUFRVariables * Change the attribute order to align with the exported schema * Changed variable roundings to align with the scales defined in the BUFR schemas: * Latitude and longitude is set to 5. Was 6 * heightOfStationGroundAboveMeanSeaLevel is set to 1. Was 2 * heightOfBarometerAboveMeanSeaLevel is set to to 1. Was 2 * pressure is set to -1. Was 1. Note: The BUFRVariable unit is Pa and not hPA * airTemperature is set to 2. Was 1. * heightOfSensorAboveLocalGroundOrDeckOfMarinePlatformTempRH is set to 2. Was 4 * heightOfSensorAboveLocalGroundOrDeckOfMarinePlatformWSPD is set to 2. Was 4 * Added unit tests to test the roundings * Updated existing unit tests to align with corrected precision
34afcce
to
0c3fcd5
Compare
PR Description
create_bufr_files
, as a CLI script insetup.py
.test_get_bufr.py
that caused unit tests to fail.