Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message structure not identified correctly #49

Closed
sandorkertesz opened this issue Mar 30, 2023 · 3 comments · Fixed by #50
Closed

Message structure not identified correctly #49

sandorkertesz opened this issue Mar 30, 2023 · 3 comments · Fixed by #50

Comments

@sandorkertesz
Copy link
Collaborator

sandorkertesz commented Mar 30, 2023

pdbufr uses an in memory cache to identify and reuse the message structure as it is processing the messages in a BUFR file. Cache entries are identified by the following header keys and contain all the keys for a given message (structure):

"edition", "masterTableNumber", "numberOfSubsets","unexpandedDescriptors", "delayedDescriptorReplicationFactor"

So if there is already a cache entry for the given message the list of keys are taken from the cache instead of using the key iterator to read them from the message.

The following BUFR file contains 2 messages:

https://get.ecmwf.int/repository/test-data/pdbufr/test-data/message_structure_diff_2.bufr

and according to pdbufr their structure is identical because the value of the keys listed above are the same:

(4, 0, 1, 307096, 22061, 20058, 4024, 13012, 4024, 1, 0)

However, the first message contains more keys as using bufr_dump -p confirms it:

This is the end of the first message:

#24#timePeriod=-1
depthOfFreshSnow=MISSING
#25#timePeriod=0

This is the end of the second message:

#20#timePeriod=-1
depthOfFreshSnow=MISSING
#21#timePeriod=0

The bottom line is that the message structure identification mechanism does not work correctly in pdbufr and has to be improved.

@shahramn
Copy link

shahramn commented Mar 30, 2023

Let's extract the two messages and compare the output of bufr_dump -p on each

bufr_copy message_structure_diff_2.bufr 'delme__[count].bufr'
bufr_dump -p delme__1.bufr > dump.1
bufr_dump -p delme__2.bufr > dump.2
meld dump.*

we see the following key is different
shortDelayedDescriptorReplicationFactor

@shahramn
Copy link

There are 3 such keys:
shortDelayedDescriptorReplicationFactor
delayedDescriptorReplicationFactor
extendedDelayedDescriptorReplicationFactor

@sandorkertesz
Copy link
Collaborator Author

sandorkertesz commented Mar 30, 2023

Does it mean if we added shortDelayedDescriptorReplicationFactor and extendedDelayedDescriptorReplicationFactor to the key list we could uniquely identify the message structure for all message types?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants