Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unintentional reading/writing of repo configuration (approach 2) #4078

Merged
merged 8 commits into from Jan 24, 2020

Conversation

kyleam
Copy link
Contributor

@kyleam kyleam commented Jan 24, 2020

This series is adapted from gh-4072 and prompted by this discussion. It replaces the core fix of that series with an approach that doesn't rely on explicitly passing --global and --system to git config. The main commit of interest is the final one. Here's the range-diff between gh-4072 and this PR:

1:  19e7a83cf = 1:  19e7a83cf TST: Demo some flavor of gh-4071
2:  58f13ad7a = 2:  58f13ad7a BF: Make git-config dump parser aware of CWD
3:  1895b0558 = 3:  1895b0558 TST: Adjust test to show config leak
4:  2332d5e30 ! 4:  6d9c2ae59 ENH: ConfigManager(source='dataset-local')
    @@ Commit message
     
         Be able to ignore global and system configuration.
     
    +    Modified-by: Kyle Meyer <kyle@kyleam.com>
    +      Absorb the "dataset is None and 'dataset-local'" guard from
    +      gh-4072's 7fe9f9a55 (BF: Prevent accidential read from AND write to
    +      undesired datasets, 2020-01-23).
    +
      ## datalad/config.py ##
     @@ datalad/config.py: class ConfigManager(object):
            Legacy option, do not use.
    @@ datalad/config.py: class ConfigManager(object):
                  raise ValueError(
                      'Unkown ConfigManager(source=) setting: {}'.format(source))
                  # legacy compat
    +@@ datalad/config.py: def __init__(self, dataset=None, dataset_only=False, overrides=None, source='any
    +         if overrides is not None:
    +             self.overrides.update(overrides)
    +         if dataset is None:
    +-            if source == 'dataset':
    ++            if source in ('dataset', 'dataset-local'):
    +                 raise ValueError(
    +                     'ConfigManager configured to read dataset only, '
    +                     'but no dataset given')
     @@ datalad/config.py: def reload(self, force=False):
                  self._store.update(self.overrides)
                  return
5:  50f1ecf2a = 5:  c905abd2e TST: Verify ConfigManager doesn't read from a random bystander dataset
6:  7fe9f9a55 < -:  --------- BF: Prevent accidential read from AND write to undesired datasets
7:  9e35ee00b < -:  --------- TST: Adjust tests for new, more correct behavior
-:  --------- > 6:  1d081af0d BF: config: Prevent accidental read/write to PWD's repository

mih and others added 6 commits Jan 22, 2020
But this is not yet the big fish, can't seem to find it...
Be able to ignore global and system configuration.

Modified-by: Kyle Meyer <kyle@kyleam.com>
  Absorb the "dataset is None and 'dataset-local'" guard from
  dataladgh-4072's 7fe9f9a (BF: Prevent accidential read from AND write to
  undesired datasets, 2020-01-23).
When ConfigManager() is given a dataset, the 'git config --list' is
executed from its top-level directory.  Otherwise, execution is from
the current working directory.  Problematically, if the PWD belongs to
a repository, its configuration is considered for both reading and
writing.  In practice this doesn't happen much in the code base.
There are two main paths to trigger it:

  1) datalad.cfg, which isn't supposed to be attached to any dataset,
     will contain the configuration for the PWD's

  2) instantiating Dataset() with a path that does not yet exist while
     the PWD belongs to another repo.  In this case, Dataset.config
     calls ConfigManager() with dataset=None.  But once the dataset is
     created, the configuration comes from the correct repository

The unmerged 7fe9f9a (BF: Prevent accidential read from AND write to
undesired datasets, dataladgh-4072) fixes the above issues by passing either
--system or --global to 'git config' when a dataset isn't specified or
its path doesn't exist.  --system versus --global is chosen based on
whether ~/.gitconfig exists.  This approach introduces other problems.
Currently the configuration from both --system and --global sources is
considered, but that change would limit it to just one of those
sources.  Also, the approach assumes that the user's "global"
configuration is in ~/.gitconfig, but it could also be in
$XDG_CONFIG_HOME/git/config.

As an alternative fix that avoids passing --system or --global
explicitly, let's prevent 'git config' from reading or writing
repository-based configuration values whenever a dataset is not given.
Do this by setting $GIT_DIR to an empty string.

One argument against this is that, as far as I can see, the empty
string behavior isn't explicitly documented.  However, based on trying
with Git v2.9.0 (2016) [*] and the latest Git (v2.25.0), it appears
this behavior is stable.  And if the behavior does change, the tests
we have in place should detect it.

Fixes datalad#4071.
Supersedes and closes datalad#4072.

[*] This was the earliest Git I had luck compiling locally.
@mih
Copy link
Member

@mih mih commented Jan 24, 2020

Thanks @kyleam !

This is much more elegant, and robust. I will close my PR in favor of this one. I will have a look at the test differences before, though.

@mih mih mentioned this pull request Jan 24, 2020
5 tasks
@mih
Copy link
Member

@mih mih commented Jan 24, 2020

I verified that #4073 works on top of this PR (this is the test that originally made my discover the underlying issue).

@mih
Copy link
Member

@mih mih commented Jan 24, 2020

I suspect that the culprit for the test failure is

    with chpwd(path):
        # ATM all the github goodness does not care about "this dataset"
        # so force "process wide" cfg to pick up our defined above oauthtoken
        cfg.reload(force=True)

which read from PWD no matter what. Use proper patch, instead of
a hack.
@codecov
Copy link

@codecov codecov bot commented Jan 24, 2020

Codecov Report

Merging #4078 into master will increase coverage by 0.08%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4078      +/-   ##
==========================================
+ Coverage   89.65%   89.73%   +0.08%     
==========================================
  Files         274      274              
  Lines       36667    36711      +44     
==========================================
+ Hits        32872    32941      +69     
+ Misses       3795     3770      -25
Impacted Files Coverage Δ
datalad/config.py 97.14% <100%> (+0.06%) ⬆️
datalad/support/gitrepo.py 89.73% <100%> (ø) ⬆️
datalad/tests/test_config.py 99.16% <100%> (+0.18%) ⬆️
datalad/distribution/tests/test_create_github.py 95.58% <100%> (-0.36%) ⬇️
datalad/downloaders/base.py 75.55% <0%> (+0.37%) ⬆️
datalad/support/github_.py 78.94% <0%> (+1.16%) ⬆️
datalad/downloaders/http.py 75.29% <0%> (+3.18%) ⬆️
datalad/downloaders/tests/test_http.py 61.8% <0%> (+3.4%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1a2f9b...cad795b. Read the comment docs.

It looks as if @yarikoptic can bring it back with his 2FA setup.
@mih
Copy link
Member

@mih mih commented Jan 24, 2020

I will merge this, as soon as the tests have passed.

@mih mih merged commit 0b709d2 into datalad:master Jan 24, 2020
17 checks passed
@kyleam kyleam deleted the bf-4071-alt branch Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants