Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic dataset module and specific s3_datasets module - part 2 (Create datasets_base and move enums) #1257
Generic dataset module and specific s3_datasets module - part 2 (Create datasets_base and move enums) #1257
Changes from all commits
035a637
aa99257
8ea5ca2
2ad4a2e
59d4c93
17fe992
2c333a2
7d66809
313dd89
7e287d1
c4ec66d
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ideally we would have
dataset_base
be automatically activated if any of the child dataset modules (i.e.s3_datasets
) is active and not have to expose it here on the config.jsonConfiguring the
dataset_base
module as active or not does not really mean anything as it is really the child dataset modules that are meaningful to activate or notI am not sure how easy it is to do such a thing or if it requires a big shift in the loader logic but if an easy fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw your comment that it is required here because of how we load ImportModes of CDK and CDK_CLI_EXTENSION - but can you explain why that is / is there an easy way to resolve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all,
datasets_base
will be added in the config.json at some point (check the full design steps in #1123) because some parameters are relevant for all datasets no only for s3_datasets or redshift_datasets (e.g. confidentiality). That is why I did not give it more thought.But if you are curious, the issue is a bit hidden, the
loader
has some methods to check that it initialized the modules correctly. With the current implementation I think that a module cannot have a module interface CDK and not have a CDK_CLI_EXTENSION if it is not a config.json module. I think the problem is not that much on the loader but on the way we are using it in the cdk proxy. I will have a closer lookThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In earlier versions such as
v2.4.0
I think we never exposeddatasets_base
as an module inconfig.json
and we had the same type of__init__.py
function where it was only supported by CDK and not CDK_CLI_EXTENSIONTrying to understand why we could do that then but can not now - playing around a bit on how these modules are imported using the loader class
Either way I tested these changes and it looks good so will approve PR but do some more digging here on the side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I found the issue as to why there is an error when we remove
datasets_base
fromconfig.json
:load_modules(modes={ImportMode.CDK_CLI_EXTENSION})
in_config
as part ofdef _load_modules(...)
s3_datasets
anddataset_sharing
at this point in timedataset_sharing
we have the following line in top level of__init__.py
that subsequently importss3_datasets
anddatasets_base
modules_check_loading_correct(...)
in loader.pyImportError
sincedatasets_base
is insys.modules.keys()
but is not inchecked_modules_names
(sincedatasets_base
is not in config / expected_load)Previously we never exposed
datasets_base
ordatasets_sharing
as configurable modules inconfig.json
so we avoided this problem. Moving the linein
backend/dataall/modules/dataset_sharing/__init__.py
to within the thedef __init__(self):
inclass SharingApiModuleInterface()
resolves this issue on loading modules but may need some additional testing to ensure share APIs still working as expected