New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context options, storage conversion, and fuzzy data matching #66
Context options, storage conversion, and fuzzy data matching #66
Conversation
The file splitting causes our code linters to complain:
|
I told it to ignore those import * in future. Maybe just on that one file? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, will try out. Looks good. Only thing I'm realizing is that this fuzzy matching means that I have to upload this S3 PR.
|
||
|
||
@strax.takes_config( | ||
strax.Option(name='storage_converter', default=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this mean that the strax key changes based on whether or not convertor? Just wanting to make sure that it doesn't change lineage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't change the lineage. Context options are not tracked with the data, as they change how you load but not what you load.
@@ -133,21 +134,21 @@ def saver(self, key, metadata, meta_only): | |||
|
|||
def get_metadata(self, key, | |||
ambiguous='warn', | |||
ignore_lineage=tuple(), | |||
ignore_config=tuple()): | |||
fuzzy_for=tuple(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that I need to update my S3 PR? Or just raise NotImplementedError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if you don't have fuzzy matching implemented in a frontend, you can raise NotImplementedError whenever someone tries to use these options.
Unfortunately the best implementation of fuzzy matching is frontend specific: for filesystem directories we have to laboriously scan through directories and load jsons until we find one that matches (unless the exact match with the hash succeeds of course), for the run db we can use a mongo query.
plugin name, version, or option check is performed. | ||
:param ignore_config: list/tuple of configuration options for which no | ||
check is performed. | ||
:param fuzzy_for_options: list/tuple of configuration options for which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question, do I have to update other frontends?
Merge and minor? Then I'll tell the S3 bindings more with this before merge. |
@JelleAalbers I'd look at the Codacy issues though. Those are real. Unused variable for example. |
Thanks for the review! I'll merge, maybe we can release a new version at the end of the week just before the workshop? Yeah, some of the issues are real and we should have a look through them. It was just confusing they come up with this PR just because we moved some code to another file. |
Sure we can release later. I've just been using the releases for processing, but I guess I won't update yet. The pip install is still broken, but will fix next release. Need to update HISTORY.md |
This adds three things:
Context options
These options control how
Context
s behave, e.g. on data matching and conversion. They are different from the existing normal config options, which specify what exact data you want instead. You set these new context options by passing additional keyword arguments toContext.__init__
, or any function that can make contexts on the fly, such asget_df
and the otherget_xxx
functions. If you call.show_config()
on a context, you get a DataFrame with available context options and their current values:(OK, the word 'data_type' above is an artifact from re-using the config system for plugins. Probably it should be 'applies to' instead, or just be gone altogether.)
Storage conversion
Strax saves newly computed data through every willing storage frontend, but data that exists in one frontend was not so easy to transfer to another -- see issue #25.
A boolean context option
storage_converter
can now be set. If True, it will save any data loaded from one frontend, using regular data loading commands, through all willing other storage frontends. Try it with e.gst.get_df('event_info', storage_converter=True)
if you have two storage frontends registered.Fuzzy matching of data
Previously, strax could only load exactly the data specified by the context (registered plugins / specified config options). For example, to load data produced by a non-standard
raw_records
maker, such as the pax converter plugin, you'd first have to register that plugin. Sometimes this is too much trouble and you want strax to be more lenient. Two context options are available:All new functionality has associated tests in test_core.
A few minor changes:
search_config
method is now calledshow_config
.