Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context options, storage conversion, and fuzzy data matching #66

Merged
merged 4 commits into from Aug 21, 2018

Conversation

JelleAalbers
Copy link
Member

@JelleAalbers JelleAalbers commented Aug 20, 2018

This adds three things:

Context options

These options control how Contexts behave, e.g. on data matching and conversion. They are different from the existing normal config options, which specify what exact data you want instead. You set these new context options by passing additional keyword arguments to Context.__init__, or any function that can make contexts on the fly, such as get_df and the other get_xxx functions. If you call .show_config() on a context, you get a DataFrame with available context options and their current values:

option default current data_type help
storage_converter False Context If True, save data that is loaded from one frontend through all willing other storage frontends.
fuzzy_for () Context Tuple of plugin names for which no checks for version, providing plugin, and config will be performed when looking for data.
fuzzy_for_options () Context Tuple of config options for which no checks will be performed when looking for data.

(OK, the word 'data_type' above is an artifact from re-using the config system for plugins. Probably it should be 'applies to' instead, or just be gone altogether.)

Storage conversion

Strax saves newly computed data through every willing storage frontend, but data that exists in one frontend was not so easy to transfer to another -- see issue #25.

A boolean context option storage_converter can now be set. If True, it will save any data loaded from one frontend, using regular data loading commands, through all willing other storage frontends. Try it with e.g st.get_df('event_info', storage_converter=True) if you have two storage frontends registered.

Fuzzy matching of data

Previously, strax could only load exactly the data specified by the context (registered plugins / specified config options). For example, to load data produced by a non-standard raw_records maker, such as the pax converter plugin, you'd first have to register that plugin. Sometimes this is too much trouble and you want strax to be more lenient. Two context options are available:

  • fuzzy_for: Tuple of plugin names for which no checks for versions, providing plugin, and config will be performed when looking for data.
  • fuzzy_for_options: Tuple of config options for which no checks will be performed when looking for data.

All new functionality has associated tests in test_core.

A few minor changes:

  • I split core.py into two files: context.py and config.py. The former holds only the Context class, the latter various utilities regarding configuration.
  • the search_config method is now called show_config.

@JelleAalbers
Copy link
Member Author

The file splitting causes our code linters to complain:

  • CodeFactor doesn't like our import *s in __init__.py (even though this is standard practice when combined with __all__, see e.g. https://github.com/numpy/numpy/blob/master/numpy/__init__.py#L143), and due to the splitting we now have one more.
  • Codacy seems to flag all its existing beefs with context.py as 'new issues' now that they have moved to two new files.

@tunnell
Copy link
Member

tunnell commented Aug 21, 2018

I told it to ignore those import * in future. Maybe just on that one file?

Copy link
Member

@tunnell tunnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, will try out. Looks good. Only thing I'm realizing is that this fuzzy matching means that I have to upload this S3 PR.



@strax.takes_config(
strax.Option(name='storage_converter', default=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this mean that the strax key changes based on whether or not convertor? Just wanting to make sure that it doesn't change lineage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't change the lineage. Context options are not tracked with the data, as they change how you load but not what you load.

@@ -133,21 +134,21 @@ def saver(self, key, metadata, meta_only):

def get_metadata(self, key,
ambiguous='warn',
ignore_lineage=tuple(),
ignore_config=tuple()):
fuzzy_for=tuple(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that I need to update my S3 PR? Or just raise NotImplementedError?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you don't have fuzzy matching implemented in a frontend, you can raise NotImplementedError whenever someone tries to use these options.

Unfortunately the best implementation of fuzzy matching is frontend specific: for filesystem directories we have to laboriously scan through directories and load jsons until we find one that matches (unless the exact match with the hash succeeds of course), for the run db we can use a mongo query.

plugin name, version, or option check is performed.
:param ignore_config: list/tuple of configuration options for which no
check is performed.
:param fuzzy_for_options: list/tuple of configuration options for which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question, do I have to update other frontends?

@tunnell
Copy link
Member

tunnell commented Aug 21, 2018

Merge and minor? Then I'll tell the S3 bindings more with this before merge.

@tunnell
Copy link
Member

tunnell commented Aug 21, 2018

@JelleAalbers I'd look at the Codacy issues though. Those are real. Unused variable for example.

@JelleAalbers
Copy link
Member Author

Thanks for the review! I'll merge, maybe we can release a new version at the end of the week just before the workshop?

Yeah, some of the issues are real and we should have a look through them. It was just confusing they come up with this PR just because we moved some code to another file.

@JelleAalbers JelleAalbers merged commit 8a9d835 into AxFoundation:master Aug 21, 2018
@JelleAalbers JelleAalbers deleted the context_options branch August 21, 2018 09:22
@tunnell
Copy link
Member

tunnell commented Aug 21, 2018

Sure we can release later. I've just been using the releases for processing, but I guess I won't update yet. The pip install is still broken, but will fix next release. Need to update HISTORY.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants