feat!: filter values from Layout folders#4
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces the
FilesDatabase.filter_valuesmethod andFilesDatabase.subsetsproperty.filter_valueslists the unique values that can be passed to a filter. It tries to get this information from the folders if possible for a quick extraction. It falls back to a full scan of the files if the layouts (aka. the folders and files hierarchy description) are disabled or if the actual file system does not match the expected layouts. Warnings are emitted if the full scan is used to extract this information, except if the files are not organized (flatcase, no folders), in which case we consider this is a nominal behavior.subsetsworks on top of the previous method to list the combination ofSubsetUnmixer.partition_keysthat are present. This will help the user understand which datasets are mixed in the product.There are two motivations for this PR:
query,list_files,mapmethods will help building the queriesThe
subsetsproperty can also be reused to refactor how we handle themandatorykeys in the query methods.Finally, a breaking change has been introduced: if a file does not match the file name convention, it was previously ignored. It now raises a
LayoutMismatchError. This change in behavior is needed to raise an exception when the files are not organized in folders and we are trying to get the filter values. The chosen implementation will crop the existing layouts and remove theflatlayout containing the file name convention only. This means we need to raise an error during the metadata collection if we are in theflatcase, so that we can properly handle it and fall back to a full scan silently. The alternative would have been to configure theLayoutVisitorpolicy more finely, but it is also great to entice the user having 1 folder per product (datasets can be mixed, but not products).This last point showed that the
flatcase should not be the nominal case, and it would be best if the files are organized in folders. @annesophie-cls This has an impact on the AVISO client which should keep the remote layout if possible. At least, the output folder should be different for each product.