formalize extending common_cfg.definitions (and other things) in extensions? #6314

yarikoptic · 2021-12-14T15:00:38Z

Looking at http://docs.datalad.org/en/latest/customization.html (filed separate #6313), and then skimming through extension template and extensions, it seems we have not formalized how extensions could (or should at all) expand the common_cfg.definitions (https://github.com/datalad/datalad/blob/master/datalad/interface/common_cfg.py#L46). ATM it seems just be a matter of "documenting" those added/used config variables somewhere in docstrings, but IMHO they should be added to aforementioned definitions. It would be particularly useful in the scope of the configuration command (#6306).

I guess we could just tell people to add entries to that (common_cfg.definitions) dictionary. But I start feeling that we need something like datalad.support.extensions module which would provide formalized and centralized interfaces for extensions to extend datalad. E.g. in the case of extending configuration, could be

def add_config_variable(name, ui, *, destination='global', default=None, type=None)

if we decide to just mimic internal structure present ATM (may be name it add_common_cfg to bring closer to current setup). If we later decide to formalize/change internal config structure or interface to add config variables, we would be able to provide deprecation cycle, while adjusting initial (above) version to map into a new internal data structure, etc. So, benefits are multiple.

In the scope of #5963 we could also add_downloader.

The text was updated successfully, but these errors were encountered:

mih · 2021-12-15T16:14:16Z

Duplicate of #5769

mih · 2021-12-16T13:57:12Z

Importanting the content from #5769, which is partly overlapping with the OP:

Two modes of operations come to my mind:

Do nothing and don't fight extensions

Once imported, extensions could "monkey-patch" this dictionary and announce additional config. This would have minimal impact on performance, as it is only happening when it become relevant. There are two major downsides: 1) the model would be to invite 3rd-party code mess with the internal of datalad-core (fragile), and 2) generic tools like x_configuration will have a hard time to pick these up (not impossible, though).

Introduce update functionality that queries extensions (via entrypoint)

This will be slower, but could be implemented in a lazy-loading fashion (likely the simple dict needs to become a class). The big advantage is to have a formal interface for extensions to work with.

yarikoptic · 2021-12-16T17:34:48Z

Well, in the original description of this PR I proposed a third (then I guess) way:

Define DataLad core API for extensions to use to update internal DataLad "registries"

which is different from both modes listed in the prior comment. Unlike the 2nd (entry points) it would provide an interface in the core (not in extensions) which would allow for easier deprecation cycles if we would need to change anything (if it would be via entry point -- "core" would need to support multiple "flavors" during deprecation and figure out which flavor of the interface extension uses). I do not think there would be any speed penalty in comparison to "monkey-patch"ing.

mih · 2022-03-23T08:11:59Z

I think a straightforward way to resolve this issue would be to turn datalad.interface.common_cfg.definitions into a class instance of a newly established facility for this purpose (third option).

I screened the code and this class would need to support to following API subset of dict in order to be a drop in replacement:

__setitem__()
__getitem__()
__contains__()
keys()
get()
items()

We would be able to add additional methods in the spirit of the proposed add_config_variable() easily, and future changes and deprecations would have a layer to live in.

mih · 2022-03-23T14:27:35Z

An aspect that still needs a resolution is how and when to trigger extensions to actually deposit their config definitions. In #6584 I tried to explore how fast this could be made, but it remains to be decided how such a mechanism needs to be used.

My conclusion is that unless we trigger the respective extension code centrally, the ability of extensions to configure anything will remain severely limited -- to influence core package behavior only within running extension code. However, this is already possible right now, and requires no changes.

So depending on the outcome of this decision, we can either close this or add a central entrypoint loop somewhere.

yarikoptic · 2022-03-23T19:31:14Z

Indeed a dedicated class/interface for compute_cfg_defaults (AKA cfg_defs AKA common_cfg.definitions) would be good! And then individual config definition should also be typed to some @dataclass -- that would allow for type checking etc. I would only worry if such formalization wouldn't add notable performance overhead to get initialized/used. But all of that is IMHO largely orthogonal to the problem that we do not have centralized interface(s) for extensions to operate on...

... this is already possible right now, and requires no changes.

Given that we have not had yet a use case where we needed a "less limited" influence from extensions, I think that

it is ok "to influence core package behavior only within running extension code" as long as we add possibility to make extensions run that code of theirs at datalad interfaces (CLI or datalad.api) import time, or explicitly requested "load all extensions" (e.g. for wtf) if so needed. I don't think we should "discover" extensions for a generic datalad import
extensions ~~(like they need to do now)~~ should not refer to some arbitrary nested and ad-hoc structures and interfaces in datalad core submodules to trigger some interface to "register" themselves
it would be nice DX if all "changes" an extension could do be conveniently available via "discoverable interfaces" (like it is possible to discover what methods any given class has)

ATM we use

entry points (those are good in that at least we can find relevant code in datalad-core by git grep 'iter_entry_points(')
- datalad.extensions: expected to be provided ad-hoc tuple structure collecting UI interface ad-hoc (tuple) definitions to be added. notes: 1. so more logical entry point name could probably be datalad.interfaces. 2. we do not use this entry point in datalad core to specify its own interfaces
- datalad.metadata.extractors: metadata extractors. notes: we do populate that entry_point even within datalad core setup.py.
- datalad.metadata.indexers: new one for datalad-metalad AFAIK
- datalad.tests: the one to run datalad tests, but not sure if actually used
- in ENH: Add ability to expand downloaders from entry points #5963 I have tried to add datalad.downloaders entry point to add more downloaders, didn't finish/forgot immediate motivation
config directly:
- no registering , and simply cfg.geting as e.g. in https://github.com/datalad/datalad-fuse/pull/37/files#diff-9ecef4f4bd763d8738d14a47bafe14a54cb42520c194bdc09e41c913506c9061R66
agreed upon location resources/procedures taken from the top of the datalad.extension's module.

What I was thinking is to have a singular entry point for each extension which would then do all desired registrations of extra features of the extension, smth like

def myextension_entry_point(extensions):
   extensions.add_config(name, ui, *, destination='global', default=None, type=None)  # come up with your example ;)
   with extensions.add_interfaces_group("Containerized environments") as g:
       # optional kwargs can all be deduced from the name of the module or explicitly
       # provided
       g.add_interface('datalad_container.containers_list'[, cls=None, pyinterface=None, cliinterface=None])
       g.add_interface('datalad_container.containers_remove')
       ...
   extensions.add_downloader(...)
   extensions.add_procedures([path='resources/procedures'])  # use the same magic on traversal but now explicitly requested

and "datalad core" would in a single helper iterate over all such registered datalad.extensions and trigger necessary changes to take effect of such changes within internal to datalad core structures/classes, thus eliminating multiple points of iteration over extensions.

pros:

clear interfaces, instead of "documented tuple-based structures"
completion on extensions. (will be just a submodule or may be some fancy class instance) -- and see what you can do for an extension
singular location for an extension to do registration (not spread through setup.{cfg,py}, assumed location for procedures, etc)

cons I see in such approach

registering an extension would register its all capabilities at once, not "per needed location" iteration of entrypoints. So may be would become slower? but may be faster since a singular iteration over entry points?
error (e.g. due to incompatibility in updated interfaces of extensions helpers) for one aspect (e.g. config) would disable importing that extension entirely but only after it might have partially being already registered (e.g. interfaces added). To some degree could be taken as a pros if e.g. we recommend to register from config to interfaces -- if registration of config fails, most likely interfaces shouldn't be used
- pros could be that extension might be able to decide what it wants to do about the error
extension can add arbitrary complicated logic to test on presence of other extensions etc to decide on what to register/do
we would need to have "internal" functioning and interfaces/structures organized in such a way that we do not somehow manage to delay extensions registration... e.g. in add_interface we should not modify some internal dict with new interfaces while going through that dict for interfaces definitions -- probably all interfaces from extensions should be kept in a separate data structure to be traversed in addition.

mih · 2022-03-23T20:13:18Z

The use cases are plenty. My mental model for thinking about this is clone. What would have prevented the RIA code from having to enter the core package?

yarikoptic · 2022-03-23T22:09:50Z

Great use case -- clone of RIA. Unfortunately I do not have a mental map of RIA functionality within datalad core. Is there a list of RIA-specific functionality/configuration to be moved away from "datalad core"? then we could see what "interfaces" are needed for that datalad-ria extension and where they would need to be "applied".

This also replaces the previously very flexible data structure for such items with a more rigid set of classes that implement lazy evaluation of defaults (avoiding their possibly expensive computation, even when they are not actually needed). They otherwise implement all elements of the previously used API, such that they work as a drop-in replacement. Only register_config() is intended to be used publicly. Fixed datalad#6314

This was referenced Dec 14, 2021

fsspec-cache-clear: add config variables for clearing cache up datalad/datalad-fuse#31

Open

fusefs: an option (or config) to clean up caches for all "considered" datasets upon exit datalad/datalad-fuse#32

Closed

mih mentioned this issue Dec 15, 2021

Improve integration of/configurability via DataLad extensions [meta issue] #5996

Closed

mih marked this as a duplicate of #5769 Dec 15, 2021

mih mentioned this issue Dec 16, 2021

Allow extensions to populate datalad.interface.common_cfg.definitions #5769

Closed

mih mentioned this issue Mar 23, 2022

TMP: formalize extending common_cfg.definitions #6582

Closed

bpoldrack mentioned this issue Mar 24, 2022

Add --with to CLI to selectively enable extension support "outside" extension commands #6589

Closed

mih mentioned this issue Mar 30, 2022

NF: register_config() #6601

Merged

bpoldrack closed this as completed in #6601 Apr 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

formalize extending common_cfg.definitions (and other things) in extensions? #6314

formalize extending common_cfg.definitions (and other things) in extensions? #6314

yarikoptic commented Dec 14, 2021

mih commented Dec 15, 2021

mih commented Dec 16, 2021

yarikoptic commented Dec 16, 2021

mih commented Mar 23, 2022

mih commented Mar 23, 2022

yarikoptic commented Mar 23, 2022 •

edited

Loading

mih commented Mar 23, 2022

yarikoptic commented Mar 23, 2022

formalize extending common_cfg.definitions (and other things) in extensions? #6314

formalize extending common_cfg.definitions (and other things) in extensions? #6314

Comments

yarikoptic commented Dec 14, 2021

mih commented Dec 15, 2021

mih commented Dec 16, 2021

Do nothing and don't fight extensions

Introduce update functionality that queries extensions (via entrypoint)

yarikoptic commented Dec 16, 2021

Define DataLad core API for extensions to use to update internal DataLad "registries"

mih commented Mar 23, 2022

mih commented Mar 23, 2022

yarikoptic commented Mar 23, 2022 • edited Loading

mih commented Mar 23, 2022

yarikoptic commented Mar 23, 2022

yarikoptic commented Mar 23, 2022 •

edited

Loading