Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NF: Helper to support Git-style URL rewriting #4064

Merged
merged 6 commits into from Jan 22, 2020

Conversation

mih
Copy link
Member

@mih mih commented Jan 21, 2020

With this helper (and its use in the relevant places), we can make
DataLad acknowledge URL rewrite instructions by supporting Git's own
mechanism: url.<base>.insteadof

With this move we should be able to avoid custom configuration for
the purpose of mapping locations (like needed for the RIA support
code). The included test contains the desired mappings.

As suggested by @kyleam this should set us up better for eventually
supporting custom protocols like ria+ssh:// in a proper Git remote
helper.

With this helper (and its use in the relevant places), we can make
DataLad acknowledge URL rewrite instructions by supporting Git's own
mechanism: `url.<base>.insteadof`

With this move we should be able to avoid custom configuration for
the purpose of mapping locations (like needed for the RIA support
code). The included test contains the desired mappings.

As suggested by @kyleam this should set us up better for eventually
supporting custom protocols like `ria+ssh://` in a proper Git remote
helper.
@mih mih added the merge-if-ok label Jan 21, 2020
@codecov
Copy link

@codecov codecov bot commented Jan 21, 2020

Codecov Report

Merging #4064 into master will decrease coverage by 0.01%.
The diff coverage is 70%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4064      +/-   ##
==========================================
- Coverage   89.66%   89.65%   -0.02%     
==========================================
  Files         274      274              
  Lines       36647    36667      +20     
==========================================
+ Hits        32861    32872      +11     
- Misses       3786     3795       +9
Impacted Files Coverage Δ
datalad/core/distributed/tests/test_clone.py 91.43% <0%> (+0.21%) ⬆️
datalad/core/distributed/clone.py 93.04% <0%> (+0.03%) ⬆️
datalad/config.py 97.07% <100%> (+0.09%) ⬆️
datalad/tests/test_config.py 98.97% <100%> (+0.05%) ⬆️
datalad/support/tests/test_fileinfo.py 100% <0%> (ø) ⬆️
datalad/local/tests/test_subdataset.py 100% <0%> (ø) ⬆️
datalad/support/tests/test_repodates.py 100% <0%> (ø) ⬆️
datalad/core/local/tests/test_save.py 97.86% <0%> (ø) ⬆️
datalad/interface/tests/test_ls_webui.py 100% <0%> (ø) ⬆️
datalad/tests/test_protocols.py 100% <0%> (ø) ⬆️
... and 170 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 790ab49...c844b40. Read the comment docs.

and add a test to show it in action.
while url != prev_url:
prev_url = url
# all config that applies
matches = {k: len(v) for k, v in insteadof.items()
Copy link
Member

@bpoldrack bpoldrack Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see one problem here: You match all configs (well, the longest). However, what if I have a --global config that is long and a --local overwrite which is shorter? I think this should use ConfigManager's get, check for a returned tuple and consider order therein.

Copy link
Member Author

@mih mih Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not how gitconfig works:

% git config --show-origin -l | grep 'github.*insteadof'
file:/home/mih/.gitconfig       url.git@github.com:.insteadof=github:
% git config --add 'url.git@github.com:.insteadof' 'mike:'
% git config --show-origin -l | grep 'github.*insteadof'                                                      
file:/home/mih/.gitconfig       url.git@github.com:.insteadof=github:
file:.git/config        url.git@github.com:.insteadof=mike:
% git config --get 'url.git@github.com:.insteadof'
mike:

Local overwrites global, it does not append.

Copy link
Member

@bpoldrack bpoldrack Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, my suggestion is wrong. The usual tuple considers the keys, not the values. So, get doesn't work. Still, I think same replacements should overwrite each other. So, assuming items yielding in correct order (least significant first), that insteadof dict, needs to be based on the values, I guess.

Copy link
Member Author

@mih mih Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I don't get it yet. What is it that this function does or doesn't that git itself does when rewriting URLs. The point here is to be as clever (no more, no less) than Git.
Here is the definition: https://git-scm.com/docs/git-config#Documentation/git-config.txt-urlltbasegtinsteadOf

Copy link
Member

@bpoldrack bpoldrack Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point here is to be as clever (no more, no less) than Git.

I absolutely agree.
I was talking about using the same "label" with different replacements (basically ria usecase). So, by definition the matches don't differ in length. And I thought that git itself would overwrite them, if specified at different levels. However, that doesn't seem to be the case.

/tmp/some$ git config --add --global url."example.com:".insteadOf ex:
/tmp/some$ git config --add --local url."other.com:".insteadOf ex:
/tmp/some$ git remote add test ex:/some/where
/tmp/some$ git remote -v
test    example.com:/some/where (fetch)
test    example.com:/some/where (push)

That's quite unexpected to me.

Edit: Also note, that on same length of the real URL piece the same thing happens: .git/config isn't applied, but ~/.gitconfig is.

Copy link
Member

@bpoldrack bpoldrack Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also well be, that the two of user interpret differently what length is considered by git.

When more than one insteadOf strings match a given URL, the longest match is used.

In my example I'd consider ex: the thing that needs to be matched and whose length is relevant. If it's actually the length of the replacement it's a different story. But then we need to have better wording than git itself.

Copy link
Member

@bpoldrack bpoldrack Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last for now: Tried to somehow apply local config for that ex: label above. So, length of replacement doesn't matter, length of label does (as it should judging by my understanding of the docs). However, at equal length apparently --global takes precedence over --local, which I find weird.
Not sure, whether we really account for that here.

Copy link
Member Author

@mih mih Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length of the match is relevant, not the length of the replacement (see len(v) where v is the value of the config).

Copy link
Contributor

@kyleam kyleam Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like te discussion ended up being about unexpected behavior that is a consequence of git's decision to use the replacement as the key (discussed here). But I think Ben's point about the value potentially being a tuple still needs to be considered to be safe. Consider for example the unlikely event that the user has repeat values for the same key:

$ cd $(mktemp -d --tmpdir ga-XXXXXXX)
$ git init
$ git config url.'bar:'.insteadof foo:
$ git config --add url.'bar:'.insteadof foo:
$ git remote add test bar:abc
$ git remote -v
test	bar:abc (fetch)
test	bar:abc (push)

Here's what rewrite_url would currently return.

$  python -c 'from datalad import config; cfg = config.ConfigManager(); print(config.rewrite_url(cfg, "foo:abc"))'
bar:o:abc

That's because the value is a tuple and the length is calculated as 2:

$ python -c 'from datalad import cfg; print(cfg.get("url.bar:.insteadof"))'
('foo:', 'foo:')

Copy link
Member Author

@mih mih Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@mih mih added conference agenda item and removed merge-if-ok labels Jan 21, 2020
@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Jan 21, 2020

Just FTR for myself (and others):

  • git does rewrites as part of the preparing configuration record for an existing remote
    • git clone first initiates repo, then adds remote, and then operates on it
  • so it is unlikely to be able to query git solely to get a url rewritten (without creating a throw away git repo, adding remote, and querying its config)

@mih
Copy link
Member Author

@mih mih commented Jan 21, 2020

Verdict in call was to warn whenever we discover two url..insteadof settings with the same value.

if k.startswith('url.') and k.endswith('.insteadof')
}
prev_url = None
while url != prev_url:
Copy link
Contributor

@kyleam kyleam Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, git itself doesn't recursively resolve the insteadof values. Is there a concrete use case for this?

Copy link
Member Author

@mih mih Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it to be an awesome thing to do, but now I cannot think of an actual need for it. Will remove (but keep the memory of having done it close to my heart).

Copy link
Member Author

@mih mih Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mih mih removed the conference agenda item label Jan 22, 2020
mih added 4 commits Jan 22, 2020
At least until a use case can be identified.
Rational: In case of conflicts we cannot possibly determine what
is the correct thing to do, hence don't do anything, but warn.
@mih
Copy link
Member Author

@mih mih commented Jan 22, 2020

This should now address all corner-cases, except for Git's own weaknesses. I think this is ready to merge, and we should start using it.

@mih mih added the merge-if-ok label Jan 22, 2020
kyleam
kyleam approved these changes Jan 22, 2020
Copy link
Contributor

@kyleam kyleam left a comment

Thanks for the updates.

@bpoldrack bpoldrack merged commit a1a2f9b into datalad:master Jan 22, 2020
15 of 17 checks passed
@mih mih deleted the nf-url-insteadof branch Jan 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-if-ok
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants