Skip to content

Non-installed Dataset instance is using a separate, but non-global config manager #7299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mih opened this issue Feb 23, 2023 · 4 comments · Fixed by #7301
Closed

Non-installed Dataset instance is using a separate, but non-global config manager #7299

mih opened this issue Feb 23, 2023 · 4 comments · Fixed by #7301
Labels
DX developer experience

Comments

@mih
Copy link
Member

mih commented Feb 23, 2023

Chasing datalad/datalad-next#249 led me to realize that calling Dataset().config for any Dataset that does not actually exist yet locally, the returned instance is NOT the same as datalad.cfg.

It is a dedicated instance for a Dataset instance, which is parameterized like the global one, but still a separate instance.

I think this does not make sense, and the docs provide no counter arguments.

They say:

Get an instance of the parser for the persistent dataset configuration.

There is a lot that is not persistent about this. For example, I can add overrides to this "persistent" configuration, but these overrides evaporate, as soon as a repo materializes for this Dataset.

Moreover, calling set() on this separate ConfigManager instance will not trigger the update routines in the global manager. Even worse, the ConfigManager instance is bound to a Dataset instance and will persist, as long as that parent instance persists. Given that DataLad implements a Flyweight pattern for datasets, this means that such configuration containers will survive for a surprisingly long time. Commands would "rediscover" a dataset instance for a particular path, and if there continues to be no repo, manipulations done via the global config manager will NOT be reflected in this separate instance -- even if the config items come from system or global scope!

I think this is rather undesirable complexity. I cannot come up with a reason why Dataset().config for an uninstalled dataset should not simply return datalad.cfg.

@mih mih added the DX developer experience label Feb 23, 2023
@mih
Copy link
Member Author

mih commented Feb 23, 2023

Here is another side to the same problem. I now think I understood that the situation that triggered this issue led to two different config managers to be used by two different commands. The problem could be paved over by making these two particular commands behavior more uniformly, and thereby end up using the exact same instance.

However, as a general setup, this seems undesirable as a minimum requirement.

What I sketched about still makes sense to me to do. However, it does not address a related problem:

Presume a GLOBAL config is modified via a DATASET-bound ConfigManager. This change would be in-effect immediately via all Dataset and Repo instances. However, it would not trigger an update of the central instance datalad.cfg.

However, there is only one global instance. When any ConfigManager instance performs a reload(), it can know whether it is the global instance. If it is NOT, it can also trigger a reload() in the global instance.

Such an operation can be triggered bluntly in reload() itself, or more targeted (based on the manipulated scope) in any of the setters.

mih added a commit to mih/datalad-next that referenced this issue Feb 23, 2023
When called with `dataset=None`, `credentials()` would auto-locate
a dataset containing CWD (and use its configuration), whereas
`download()` would use the global configuration in this case.

This change makes the behavior uniform and chooses what `download()`
did as the model behavior.

This uniformatization is meaningful on its own.

However, due to datalad/datalad#7299 it is
also a necessity, because otherwise credentials deposited by
`download()` remain invisible to `credentials()` -- which is the
superficial trigger for datalad#249

Closes datalad#249
@yarikoptic
Copy link
Member

Forcing reload of global makes sense to me as a workaround. Workaround because in principle global, when queried, should detect global config file modification and reload automagically. But our magic overall is a bit flawed here AFAIK, and #5824 was previous general concern on that.

mih added a commit to mih/datalad that referenced this issue Feb 23, 2023
Also ping datalad#7300 re diversity of exception behavior for one and the same
issue.
mih added a commit to mih/datalad that referenced this issue Feb 23, 2023
This has seemlingly no purpose or use case. It only complicates the
synchronization between, here needlessly, many manager instances.

This addressed half of datalad#7299
mih added a commit to mih/datalad that referenced this issue Feb 23, 2023
...whenever the 'global' configuration scope was modified.
Theoretically, this should also cover a 'system' scope, but presently
`ConfigManager` does not recognize or support that.

Closes datalad#7299
@mih
Copy link
Member Author

mih commented Feb 23, 2023

Forcing reload of global makes sense to me as a workaround. Workaround because in principle global, when queried, should detect global config file modification and reload automagically. But our magic overall is a bit flawed here AFAIK, and #5824 was previous general concern on that.

My my concern with update-on-read is that reads are FAR more frequent than writes. Updates are also expensive. When there is an updated-on-read expectation, it would be sensible to have that expectation cover any sources of configuration. Hence a modification check of the global config file is insufficient.

The original reason for not doing that to start with was that it would require per filesystem implementation to get that right, and it might still be limited to the time resolution supported by each particular file system.

yarikoptic pushed a commit to mih/datalad that referenced this issue Mar 24, 2023
Also ping datalad#7300 re diversity of exception behavior for one and the same
issue.
yarikoptic pushed a commit to mih/datalad that referenced this issue Mar 24, 2023
This has seemlingly no purpose or use case. It only complicates the
synchronization between, here needlessly, many manager instances.

This addressed half of datalad#7299
yarikoptic pushed a commit to mih/datalad that referenced this issue Mar 24, 2023
...whenever the 'global' configuration scope was modified.
Theoretically, this should also cover a 'system' scope, but presently
`ConfigManager` does not recognize or support that.

Closes datalad#7299
@yarikoptic-gitmate
Copy link
Collaborator

Issue fixed in 0.18.3

yarikoptic added a commit to yarikoptic/datalad that referenced this issue Mar 28, 2023
* origin/maint: (47 commits)
  Set minimal version of typing_extensions to be 3.10.0.0
  Fix Providers caching of config files
  [skip ci] Update docs/source/changelog.rst
  [skip ci] Update CHANGELOG
  Fix spelling
  pythonic object comparison
  Fix git identity test
  [release-action] Autogenerate changelog snippet for PR 7301
  Reload global ConfigManager via any instance
  Stop returning a custom ConfigManager with Dataset[uninstalled].config
  Document datalad#7299
  [release-action] Autogenerate changelog snippet for PR 7353
  [release-action] Autogenerate changelog snippet for PR 7355
  BF: save - force git based on particular repo having uuid, not the reference ds
  [release-action] Autogenerate changelog snippet for PR 7342
  TST(BK): extend test to demonstrate datalad#7351
  Ensure that GitRepo initiated repo remains without git-annex upon Dataset.create
  Fix test for incompatible annex and rsync versions
  Do use/pass repocls into get_convoluted_situation
  Refine type of `content` argument to `SignalingThread.signal()`
  ...

 Conflicts:
	setup.py -- tqdm dependency was versioned in master due to deprecations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DX developer experience
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants