Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: prevent users to create a repo in a subdir with files known to git above #3211

Merged
merged 6 commits into from Mar 11, 2019

Conversation

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Mar 8, 2019

This pull request should prevent such problematic situations as it was revealed in #3068 (comment) from occurring.

This situation was probably also the origin for the situation which inspired #3139, although in that report I have come up with a minimal use case (empty git subrepo) which might have actually be a different problem.

Merge into master would conflict, and it might (or not) be wiser to just use get_content_info which is provided in there instead of a custom call to ls-files

yarikoptic added 2 commits Mar 8, 2019
…it above

Original use case which prompted the prevention is
  datalad#3068
but might also be the origin for the situation which inspired
  datalad#3139
although in that report I have come up with a minimal use case (empty git subrepo)
which might have actually be a different problem
@mih
Copy link
Member

@mih mih commented Mar 8, 2019

This smells like I have addressed this in rev-create. But I cannot check ATM. I am not convinced that this should be checked at this low-level, for efficiency reasons.

Update: it is not addressed in rev-create, but the relevant status() call is already made, and I only need to adjust the check. PR is coming.

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Mar 8, 2019

Are you creating thousands of new repositories daily? We are calling git config tens of times for any trivial user level operation to accomplish, I think it is ok to call once a targeted ls-files upon create of a new repo.
Solving it at the lowest level is IMHO preferable to guarantee that we don't fall into this trap in any scenario

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Mar 8, 2019

The only advantage to check at the higher level is possibly an ability to yield the "impossible" result record, but even then I would keep this check in place

@mih
Copy link
Member

@mih mih commented Mar 8, 2019

I disagree.

Copy link
Contributor

@kyleam kyleam left a comment

Aside from confusion about a comment, this change looks fine to me. I don't suspect it would be too big of a performance hit, but perhaps I'm missing something.

datalad/support/gitrepo.py Show resolved Hide resolved
@@ -1379,11 +1399,14 @@ def test_custom_runner_protocol(path):
# Check that a runner with a non-default protocol gets wired up correctly.
prot = ExecutionTimeProtocol()
gr = GitRepo(path, runner=Runner(cwd=path, protocol=prot), create=True)
Copy link
Contributor

@kyleam kyleam Mar 8, 2019

Just a note: We should be able to drop these tweaks when we merge with master, which has 9d74111 (TST: gitrepo: Prepare test for revolution's _create_empty_repo(), 2019-02-28).

# use case: https://github.com/datalad/datalad/issues/3068
try:
stdout, _ = self._git_custom_command(
None, ['git', 'ls-files'], cwd=path, expect_fail=True
Copy link
Contributor

@kyleam kyleam Mar 8, 2019

Doubt it matters, but using git ls-tree HEAD (no -r) would avoid listing all the contents of a deep tree with many files under directories.

Copy link
Member Author

@yarikoptic yarikoptic Mar 8, 2019

hm...
ls-tree (without HEAD) even fails to run in a top directory of a valid (even without any file in it, just freshly initialized like in #3139) git repo:

(git-annex)hopa:~/datalad/openfmri/ds000001[master]
$> git ls-tree ; echo $?
usage: git ls-tree [<options>] <tree-ish> [<path>...]

    -d                    only show trees
    -r                    recurse into subtrees
    -t                    show trees when recursing
    -z                    terminate entries with NUL byte
    -l, --long            include object size
    --name-only           list only filenames
    --name-status         list only filenames
    --full-name           use full path names
    --full-tree           list entire tree; not just current directory (implies --full-name)
    --abbrev[=<n>]        use <n> digits to display SHA-1s

129

and that is when we also do not want to re-init probably.
in an empty one, ls-tree HEAD exits with 128 exit code.

and now I realized you were talking about not only adding HEAD but changing to ls-tree from ls-files.

FTR, ls-files with and without HEAD, returns nothing and doesn't error out in an empty repo.

So, indeed ls-tree HEAD in ds000001 takes 0.003 sec, ls-files takes 0.007. Let me switch ;) (FTR: git -c pager.config=0 config -l -z --show-origin takes also 0.003 sec)

Copy link
Member Author

@yarikoptic yarikoptic Mar 8, 2019

hm -- with ls-tree HEAD we are not catching then the case where files are already in the index but not yet committed... so we would not be catching that. Given that running ls-files in an empty subdirectory takes on that ds000001 only 0.003-0.005 sec, I would then prefer to keep it as is.

I spotted that there is also --deleted option for ls-files but add/commit/rm , create another new one, resulted in an interesting output of git status:

$> time git ls-files --deleted
git ls-files --deleted  0.00s user 0.00s system 88% cpu 0.005 total

so --deleted didn't give me the file which is still known to git but deleted/not yet committed,

Do you know a way for those to show up?

Copy link
Contributor

@kyleam kyleam Mar 8, 2019

so --deleted didn't give me the file which is still known to git but deleted/not yet committed,

Do you know a way for those to show up?

AFAIU git ls-files won't report on it because it is only concerned about files in the working tree or index.

Copy link
Member Author

@yarikoptic yarikoptic Mar 8, 2019

Right, but isn't --deleted exactly for this use case? I am now just curious what it's purpose

Copy link
Contributor

@kyleam kyleam Mar 8, 2019

Right, but isn't --deleted exactly for this use case?

I think it's to distinguish files that are in the index but not the working tree.

mih added a commit to datalad/datalad-revolution that referenced this issue Mar 8, 2019
This fixes the same issue as datalad/datalad#3211

But without another (duplicate) call to git.
@mih
Copy link
Member

@mih mih commented Mar 8, 2019

Rational for my disagreement:

  • we literally create thousands of repos every day in the tests
  • via rev-create this added call is guaranteed to be duplicate (here is the PR that implements the same check without a call datalad/datalad-revolution#93)
  • the pattern that lowest-level code "ensures" conditions is a common source of slowdown and complications
  • the only "user" of this level of the API is the crawler (and I believe it would not use this level, if it was written today)
  • a core concept of everything in -revolution is to not solve any problems at the lowest level, because that lowest level is pretty much context-free, and cannot, almost by definition, be smart

If this move or anything like it is really really needed, please give it a switch that I can easily override to turn "I know better than you" mode on. Ideally without having to reintroduce another Repo derivative class.

Thanks!

@kyleam
Copy link
Contributor

@kyleam kyleam commented Mar 8, 2019

@mih I find all of those convincing. On the call today, one of my points against this change was having to deal with the conflict in core/revolution philosophy. I'd be OK dropping this PR and accepting this as one of the many things that will eventually be fixed by the revolution. Another option would be to do this in maint, and remove it in the merge to master, but that doesn't seem worth it to me.

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Mar 8, 2019

I think we will waste more time discussing it than would be the penalty of that command on all test runs we will be "waiting" for on our CI in a month, and since

$> git grep create=True | grep -v tests/
datalad/distribution/clone.py:                GitRepo.clone(path=dest_path, url=source_, create=True)
datalad/distribution/create.py:                create=True,
datalad/distribution/create.py:                create=True,
datalad/distribution/create_test_dataset.py:    repo = RepoClass(path, create=True)
datalad/distribution/utils.py:    # set a description without `create=True`
datalad/support/annexrepo.py:                 direct=None, backend=None, always_commit=True, create=True,
datalad/support/annexrepo.py:          fresh git clone). Note that if `create=True`, then initialization
datalad/support/gitrepo.py:    def __init__(self, path, url=None, runner=None, create=True,

reminded me that there is also datalad clone, I would say that

Another option would be to do this in maint, and remove it in the merge to master, but that doesn't seem worth it to me.

should be the least common denominator ATM so we address the problem for those who might still use upcoming 0.11.4. BUT if the check is removed, and some user manages to step on that "rake" again one way or another, I will blame blame both of you! But before that you could look at the interesting case our check brought up by the failure it detected in our tests:

# TODO: There's something wrong with the nested testrepo!
# Fear mongering detected!
@with_testrepos('submodule_annex')
@with_tempfile(mkdir=True)
def test_is_installed(src, path):
    ds = Dataset(path)
    assert_false(ds.is_installed())

    # get a clone:
    AnnexRepo.clone(src, path)
    ok_(ds.is_installed())
    # submodule still not installed:
    subds = Dataset(opj(path, 'subm 1'))
    assert_false(subds.is_installed())
    subds.create()
    # get the submodule
    # This would init so there is a .git file with symlink info, which is
    # as we agreed is more pain than gain, so let's use our install which would
    # do it right, after all we are checking 'is_installed' ;)
    # from datalad.cmd import Runner
    # Runner().run(['git', 'submodule', 'update', '--init', 'subm 1'], cwd=path)
    with chpwd(path):
        get('subm 1')
    ok_(subds.is_installed())
    # wipe it out
    rmtree(ds.path)
    assert_false(ds.is_installed())

ATM fails in this PR upon subds.create() because, if I got it right, we are trying to create a new dataset (repository) in the path which is known to be associated with currently not "installed" submodule. Sure thing here we are using "high level API" and probably rev-create would catch it (?) but... but I am "ok" to keep mysteries ongoing. (I will add another comment re this test after this one)

@mih :

* a core concept of everything in -revolution is to not solve any problems at the lowest level, because that lowest level is pretty much context-free, and cannot, almost by definition, be smart

@kyleam :

one of my points against this change was having to deal with the conflict in core/revolution philosophy

IMHO such behavior should have been precluded by git itself, which for any path given to any command would probably check its "standing", but does not care to check for git init. And git is as core as it gets. So, once again IMHO, ideally this should have been fixed at git level. Since it would take time to wait for that to happen (if ever), that is why I would maintain that ideal "fix" here at the core.

@mih :

If this move or anything like it is really really needed, please give it a switch that I can easily override to turn "I know better than you" mode on.

do you see any use case for this switch to be used? (e.g. would you use it in that failing test cited above?) If the answer is "no" -- there is no need for a switch as far as I see it.

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Mar 8, 2019

Aforementioned failed test brought up the shortcoming for the proposed here "avoidance" fix. If I (user) created a subdirectory under the path associated with the submodule, we would not catch that situation with a plain ls-path (subsub/ I created myself within the test repo of the failed test):

(git-annex)hopa:~/.tmp/datalad_temp_test_is_installedim21_0b8[master]subm 1/subsub
$> git ls-files
$>

That is where I feel that higher-level analysis of paths probably done in the "fix" within rev-create would be the only way to deal with it.

…submodule

Removed the comment above the test since context was not clear,
and the only thing which was coming to mind is actually what
currently we are preventing -- that we allowed to create a
new subdataset where there is already a dataset expected
@codecov
Copy link

@codecov codecov bot commented Mar 8, 2019

Codecov Report

Merging #3211 into 0.11.x will decrease coverage by 0.38%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           0.11.x    #3211      +/-   ##
==========================================
- Coverage   90.85%   90.46%   -0.39%     
==========================================
  Files         249      249              
  Lines       32823    32840      +17     
==========================================
- Hits        29820    29708     -112     
- Misses       3003     3132     +129
Impacted Files Coverage Δ
datalad/distribution/tests/test_dataset.py 100% <100%> (ø) ⬆️
datalad/support/gitrepo.py 88.7% <100%> (-0.35%) ⬇️
datalad/support/tests/test_gitrepo.py 100% <100%> (ø) ⬆️
datalad/interface/unlock.py 87.65% <0%> (-7.41%) ⬇️
datalad/interface/tests/test_unlock.py 93.82% <0%> (-6.18%) ⬇️
datalad/support/annexrepo.py 82.6% <0%> (-6%) ⬇️
datalad/downloaders/http.py 82.53% <0%> (-2.78%) ⬇️
datalad/plugin/tests/test_export_archive.py 98.52% <0%> (-1.48%) ⬇️
datalad/tests/utils.py 87.92% <0%> (-1.36%) ⬇️
datalad/downloaders/tests/test_http.py 91.08% <0%> (-1.24%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4842edc...244542b. Read the comment docs.

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Mar 9, 2019

hm , interesting -- failures in direct mode (only)
======================================================================
ERROR: datalad.distribution.tests.test_create.test_saving_prior
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/tests/utils.py", line 442, in newfunc
    return t(*(arg + (d,)), **kw)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/distribution/tests/test_create.py", line 299, in test_saving_prior
    ds2 = create('ds2', dataset=ds1, force=True)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/wrapt/wrappers.py", line 564, in __call__
    args, kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 491, in eval_func
    return return_func(generator_func)(*args, **kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/wrapt/wrappers.py", line 564, in __call__
    args, kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 479, in return_func
    results = list(results)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 428, in generator_func
    result_renderer, result_xfm, _result_filter, **_kwargs):
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 521, in _process_results
    for res in results:
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/distribution/create.py", line 339, in __call__
    fake_dates=fake_dates
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/repo.py", line 146, in __call__
    instance = type.__call__(cls, *new_args, **new_kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/annexrepo.py", line 196, in __init__
    fake_dates=fake_dates)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/gitrepo.py", line 690, in __init__
    self._repo = self._create_empty_repo(path, **git_opts)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/gitrepo.py", line 734, in _create_empty_repo
    % (path, stdout)
RuntimeError: Failing to initialize new repository under /tmp/datalad_temp_tree_test_saving_priorj1k9zvqb/ds2 where following files are known to a repository above: .datalad/.gitattributes
.datalad/config
.gitattributes
======================================================================
ERROR: datalad.distribution.tests.test_uninstall.test_no_interaction_with_untracked_content
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/tests/utils.py", line 615, in newfunc
    return t(*(arg + (filename,)), **kw)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/distribution/tests/test_uninstall.py", line 440, in test_no_interaction_with_untracked_content
    subds = ds.create('sub', force=True)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/wrapt/wrappers.py", line 603, in __call__
    args, kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/distribution/dataset.py", line 524, in apply_func
    return f(**kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/wrapt/wrappers.py", line 564, in __call__
    args, kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 491, in eval_func
    return return_func(generator_func)(*args, **kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/wrapt/wrappers.py", line 564, in __call__
    args, kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 479, in return_func
    results = list(results)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 428, in generator_func
    result_renderer, result_xfm, _result_filter, **_kwargs):
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 521, in _process_results
    for res in results:
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/distribution/create.py", line 339, in __call__
    fake_dates=fake_dates
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/repo.py", line 146, in __call__
    instance = type.__call__(cls, *new_args, **new_kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/annexrepo.py", line 196, in __init__
    fake_dates=fake_dates)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/gitrepo.py", line 690, in __init__
    self._repo = self._create_empty_repo(path, **git_opts)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/gitrepo.py", line 734, in _create_empty_repo
    % (path, stdout)
RuntimeError: Failing to initialize new repository under /tmp/datalad_temp_test_no_interaction_with_untracked_contentgl9i5awl/origin/sub where following files are known to a repository above: .datalad/.gitattributes
.datalad/config
.gitattributes
======================================================================
ERROR: datalad.interface.tests.test_utils.test_discover_ds_trace
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/tests/utils.py", line 442, in newfunc
    return t(*(arg + (d,)), **kw)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/tests/utils.py", line 615, in newfunc
    return t(*(arg + (filename,)), **kw)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/tests/test_utils.py", line 314, in test_discover_ds_trace
    {k: v for k, v in demo_hierarchy.items() if k in ['a', 'd']})
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/tests/test_utils.py", line 133, in make_demo_hierarchy_datasets
    nodeds = Dataset(node_path).create(force=True)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/wrapt/wrappers.py", line 603, in __call__
    args, kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/distribution/dataset.py", line 524, in apply_func
    return f(**kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/wrapt/wrappers.py", line 564, in __call__
    args, kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 491, in eval_func
    return return_func(generator_func)(*args, **kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/wrapt/wrappers.py", line 564, in __call__
    args, kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 479, in return_func
    results = list(results)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 428, in generator_func
    result_renderer, result_xfm, _result_filter, **_kwargs):
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 521, in _process_results
    for res in results:
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/distribution/create.py", line 339, in __call__
    fake_dates=fake_dates
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/repo.py", line 146, in __call__
    instance = type.__call__(cls, *new_args, **new_kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/annexrepo.py", line 196, in __init__
    fake_dates=fake_dates)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/gitrepo.py", line 690, in __init__
    self._repo = self._create_empty_repo(path, **git_opts)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/gitrepo.py", line 734, in _create_empty_repo
    % (path, stdout)
RuntimeError: Failing to initialize new repository under /tmp/datalad_temp_tree_test_discover_ds_trace4wtus_kw/a where following files are known to a repository above: .datalad/.gitattributes
.datalad/config
.gitattributes
======================================================================
ERROR: datalad.metadata.tests.test_aggregation.test_aggregate_query
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/tests/utils.py", line 442, in newfunc
    return t(*(arg + (d,)), **kw)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/metadata/tests/test_aggregation.py", line 136, in test_aggregate_query
    ds.create('sub', force=True)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/wrapt/wrappers.py", line 603, in __call__
    args, kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/distribution/dataset.py", line 524, in apply_func
    return f(**kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/wrapt/wrappers.py", line 564, in __call__
    args, kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 491, in eval_func
    return return_func(generator_func)(*args, **kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/wrapt/wrappers.py", line 564, in __call__
    args, kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 479, in return_func
    results = list(results)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 428, in generator_func
    result_renderer, result_xfm, _result_filter, **_kwargs):
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 521, in _process_results
    for res in results:
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/distribution/create.py", line 339, in __call__
    fake_dates=fake_dates
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/repo.py", line 146, in __call__
    instance = type.__call__(cls, *new_args, **new_kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/annexrepo.py", line 196, in __init__
    fake_dates=fake_dates)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/gitrepo.py", line 690, in __init__
    self._repo = self._create_empty_repo(path, **git_opts)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/gitrepo.py", line 734, in _create_empty_repo
    % (path, stdout)
RuntimeError: Failing to initialize new repository under /tmp/datalad_temp_tree_test_aggregate_querya6z2ewrb/sub where following files are known to a repository above: .datalad/.gitattributes
.datalad/config
.datalad/metadata/aggregate_v1.json
.datalad/metadata/objects/someshasum
.gitattributes

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Mar 9, 2019

ok, in direct mode it doesn't care being in a subdirectory, and just lists from the top of the repo

(git-annex)hopa:…mp/datalad_temp_tree_test_saving_prior0ihi5584/ds2[annex/direct/master]git-annex
$> git annex proxy git ls-files
.datalad/.gitattributes
.datalad/config
.gitattributes

so I will just skip the check in direct mode -- anyways that one to go.

edit: ha - we don't know if .. is potentially in annex direct mode, we are at GitRepo level, so not that "easy"

Relies on the fact that ls-files lists files in the top repo,
which unlikely to contain exactly the same files in the directory we
are trying to init in.  Then in direct mode we would result in simply
skipping this safe-guard.  Given unknown amount of direct mode usage
and its imminitent death of support in datalad - it is a reasonable
price to pay.

For indirect mode repos this check would rarely get executed since
unless inthe cases where to blow, there would be no files reported.
@mih
Copy link
Member

@mih mih commented Mar 9, 2019

Yes, rev-create also prevents conflicts with uninstalled subdatasets. #3189

@mih
Copy link
Member

@mih mih commented Mar 9, 2019

do you see any use case for this switch to be used?

If this PR is merged, rev-create needs to be RFed to unconditionally use this switch to avoid performing this redundant test.

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Mar 9, 2019

If this PR is merged, rev-create needs to be RFed to unconditionally use this switch to avoid performing this redundant test.

This PR needs to be merged to be there at least until rev-create replaces create which does not provide such protection and that is the ones in use in 0.11.x. My 1c is that the switch is not needed since I am willing to pay the price of additional 0.005 sec check for now, and when rev-create becomes a create we might just remove the check in GitRepo.

@bpoldrack
Copy link
Member

@bpoldrack bpoldrack commented Mar 9, 2019

This PR needs to be merged to be there at least until rev-create replaces create which does not provide such protection and that is the ones in use in 0.11.x. My 1c is that the switch is not needed since I am willing to pay the price of additional 0.005 sec check for now, and when rev-create becomes a create we might just remove the check in GitRepo.

See your point, but the switch doesn't hurt your intention either. It might still become useful, when rev-create becomes create, since we don't really have a solution for cloning (which means install/get are the ones to deal with that). We might run in the situation, that rev-create would require to remove that check, while some cloning still relies on it. And if it turns out, that we didn't really need the switch - well, it's a two liner. Come on.

@kyleam
Copy link
Contributor

@kyleam kyleam commented Mar 9, 2019

This PR needs to be merged to be there at least until rev-create replaces create which does not provide such protection and that is the ones in use in 0.11.x

We need this fix in the same way we need any other "fixed in revolution" change (actually, in my view less since this is a foot-shooting safeguard that has been missing for a long time).

Since you've already done the work here, it seems reasonable to merge this. Please just add the switch for rev-create to use.

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Mar 9, 2019

Failure is only that revolution testing one

  Found existing installation: datalad 0.11.3
    Uninstalling datalad-0.11.3:
Could not install packages due to an EnvironmentError: [Errno 13] 

please merge if no further comments. I all driving to JFK atm

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Mar 11, 2019

ok, since noone likes it I guess, I will be the one to press the Merge

@yarikoptic yarikoptic merged commit 8e2d1ab into datalad:0.11.x Mar 11, 2019
2 of 3 checks passed
@kyleam
Copy link
Contributor

@kyleam kyleam commented Mar 11, 2019

ok, since noone likes it I guess, I will be the one to press the Merge

AFAICS this PR was not resolved. @mih, @bpoldrack, and I all requested a switch.

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Mar 11, 2019

ok, since noone likes it I guess, I will be the one to press the Merge

AFAICS this PR was not resolved. @mih, @bpoldrack, and I all requested a switch.

switch is not relevant here (in 0.11.x) so I will add it in the PR bringing it all to master

@kyleam
Copy link
Contributor

@kyleam kyleam commented Mar 11, 2019

switch is not relevant here (in 0.11.x) so I will add it in the PR bringing it all to master

As you wish, though I think doing it here would have been simpler. At any rate, you never communicated that this was your plan or responded to our latest comments, so I don't see why any of us would have gone ahead with a merge.

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Mar 11, 2019

switch is not relevant here (in 0.11.x) so I will add it in the PR bringing it all to master

As you wish, though I think doing it here would have been simpler. At any rate, you never communicated that this was your plan or responded to our latest comments, so I don't see why any of us would have gone ahead with a merge.

the last comment was from me

please merge if no further comments

to which there were no further comments. And I never promised that I was going to work on this switch here, where it is irrelevant.
This is my last comment here

@yarikoptic yarikoptic deleted the bf-3068 branch Mar 11, 2019
@yarikoptic yarikoptic added this to the Release 0.11.4 milestone Mar 14, 2019
yarikoptic added a commit to yarikoptic/datalad that referenced this issue Mar 19, 2019
It is often desired to reference current dataset. E.g. to add an existing
dataset as subdataset to the current dataset somewhere not at the top of the
dataset directories tree. Especially in quite nested hierarchies it
becomes a burden to figure out where is the boundary of the dataset to specify
it in -d option.  -d .  would point to the (not installed, and possibly illegit
see datalad#3211) subdataset in current directory, whenever actual need is to specify
smth like -d ../../..  .

With -d^. it becomes possible to say "refer to the current dataset I am in".

The only gotcha I am aware of is that shell might like to use "^" as
a quick substitution for previous command, that is why I specified it
as -d^., i.e. without space - then seems to work as intended -- running
"datalad add -d^. ./subds"  would add subds in current subdirectory to the
dataset without requiring full path in -d.

Depending on the outcome of datalad#3230 discussion, this change might not be really
needed, e.g. if default context for operations (including on subdatasets) would
migrate to be of "current dataset".  Otherwise, if -d would be needed to "bind"
the context to current dataset - at least with this helper it would be easy to
unambigously reference current dataset regardless of the position in the
directories hierachy.
yarikoptic added a commit that referenced this issue Apr 6, 2019
    ## 0.11.4 (Mar 18, 2019) -- get-ready

    Largely a bug fix release with a few enhancements

    ### Important

    - 0.11.x series will be the last one with support for direct mode of [git-annex][]
      which is used on crippled (no symlinks and no locking) filesystems.
      v7 repositories should be used instead.

    ### Fixes

    - Extraction of .gz files is broken without p7zip installed.  We now
      abort with an informative error in this situation.  ([#3176][])

    - Committing failed in some cases because we didn't ensure that the
      path passed to `git read-tree --index-output=...` resided on the
      same filesystem as the repository.  ([#3181][])

    - Some pointless warnings during metadata aggregation have been
      eliminated.  ([#3186][])

    - With Python 3 the LORIS token authenticator did not properly decode
      a response ([#3205][]).

    - With Python 3 downloaders unnecessarily decoded the response when
      getting the status, leading to an encoding error.  ([#3210][])

    - In some cases, our internal command Runner did not adjust the
      environment's `PWD` to match the current working directory specified
      with the `cwd` parameter.  ([#3215][])

    - The specification of the pyliblzma dependency was broken.  ([#3220][])

    - [search] displayed an uninformative blank log message in some
      cases.  ([#3222][])

    - The logic for finding the location of the aggregate metadata DB
      anchored the search path incorrectly, leading to a spurious warning.
      ([#3241][])

    - Some progress bars were still displayed when stdout and stderr were
      not attached to a tty.  ([#3281][])

    - Check for stdin/out/err to not be closed before checking for `.isatty`.
      ([#3268][])

    ### Enhancements and new features

    - Creating a new repository now aborts if any of the files in the
      directory are tracked by a repository in a parent directory.
      ([#3211][])

    - [run] learned to replace the `{tmpdir}` placeholder in commands with
      a temporary directory.  ([#3223][])

    - [duecredit][] support has been added for citing DataLad itself as
      well as datasets that an analysis uses.  ([#3184][])

    - The `eval_results` interface helper unintentionally modified one of
      its arguments.  ([#3249][])

    - A few DataLad constants have been added, changed, or renamed ([#3250][]):
      - `HANDLE_META_DIR` is now `DATALAD_DOTDIR`.  The old name should be
         considered deprecated.
      - `METADATA_DIR` now refers to `DATALAD_DOTDIR/metadata` rather than
        `DATALAD_DOTDIR/meta` (which is still available as
        `OLDMETADATA_DIR`).
      - The new `DATASET_METADATA_FILE` refers to `METADATA_DIR/dataset.json`.
      - The new `DATASET_CONFIG_FILE` refers to `DATALAD_DOTDIR/config`.
      - `METADATA_FILENAME` has been renamed to `OLDMETADATA_FILENAME`.

* tag '0.11.4': (82 commits)
  Updated CHANGELOG.md for having merged check for not being closed
  [DATALAD RUNCMD] CHANGELOG: Re-linkify 0.11.4 entries
  CHANGELOG.md: Update 0.11.4 entries
  CHANGELOG.md: Adjust the format of a link
  BF: split lines on spaces and commas befoe doing sed capture of #issue
  RF: adjust tools/link_issues_CHANGELOG to have MD links as [#issue][]
  BF+DOC: Fix all links to mark them valid markdown
  BF: Fix markup bug that prevents sphinx-build from succeeding
  DOC: extend coumentation of is_interactive
  BF: guard .istty with try/except
  ENH: ui: Drop progress bars if not attached to tty
  ENH: ui: Add another name for SilentConsoleLog
  BF: check for stdin/out/err to not be closed before checking for .isatty
  RF: HANDLE_META_DIR -> DATALAD_DOTDIR (but keeping an alias for compatibility)
  RF: define/use consts DATASET_CONFIG_FILE, DATASET_METADATA_FILE, METADATA_DIR
  RF: METADATA_DIR/FILE -> OLDMETADATA_DIR/FILE
  BF: do not cause a side-effect on kwargs in @eval_results
  BF: join with ds.path when trying to see if any other metadata file is available
  RF: Move duecredit into its own section so it is not a part of install_requires
  [DATALAD RUNCMD] CHANGELOG: Re-linkify 0.11.4 entries
  ...
yarikoptic added a commit to yarikoptic/datalad that referenced this issue Apr 6, 2019
    ## 0.11.4 (Mar 18, 2019) -- get-ready

    Largely a bug fix release with a few enhancements

    ### Important

    - 0.11.x series will be the last one with support for direct mode of [git-annex][]
      which is used on crippled (no symlinks and no locking) filesystems.
      v7 repositories should be used instead.

    ### Fixes

    - Extraction of .gz files is broken without p7zip installed.  We now
      abort with an informative error in this situation.  ([datalad#3176][])

    - Committing failed in some cases because we didn't ensure that the
      path passed to `git read-tree --index-output=...` resided on the
      same filesystem as the repository.  ([datalad#3181][])

    - Some pointless warnings during metadata aggregation have been
      eliminated.  ([datalad#3186][])

    - With Python 3 the LORIS token authenticator did not properly decode
      a response ([datalad#3205][]).

    - With Python 3 downloaders unnecessarily decoded the response when
      getting the status, leading to an encoding error.  ([datalad#3210][])

    - In some cases, our internal command Runner did not adjust the
      environment's `PWD` to match the current working directory specified
      with the `cwd` parameter.  ([datalad#3215][])

    - The specification of the pyliblzma dependency was broken.  ([datalad#3220][])

    - [search] displayed an uninformative blank log message in some
      cases.  ([datalad#3222][])

    - The logic for finding the location of the aggregate metadata DB
      anchored the search path incorrectly, leading to a spurious warning.
      ([datalad#3241][])

    - Some progress bars were still displayed when stdout and stderr were
      not attached to a tty.  ([datalad#3281][])

    - Check for stdin/out/err to not be closed before checking for `.isatty`.
      ([datalad#3268][])

    ### Enhancements and new features

    - Creating a new repository now aborts if any of the files in the
      directory are tracked by a repository in a parent directory.
      ([datalad#3211][])

    - [run] learned to replace the `{tmpdir}` placeholder in commands with
      a temporary directory.  ([datalad#3223][])

    - [duecredit][] support has been added for citing DataLad itself as
      well as datasets that an analysis uses.  ([datalad#3184][])

    - The `eval_results` interface helper unintentionally modified one of
      its arguments.  ([datalad#3249][])

    - A few DataLad constants have been added, changed, or renamed ([datalad#3250][]):
      - `HANDLE_META_DIR` is now `DATALAD_DOTDIR`.  The old name should be
         considered deprecated.
      - `METADATA_DIR` now refers to `DATALAD_DOTDIR/metadata` rather than
        `DATALAD_DOTDIR/meta` (which is still available as
        `OLDMETADATA_DIR`).
      - The new `DATASET_METADATA_FILE` refers to `METADATA_DIR/dataset.json`.
      - The new `DATASET_CONFIG_FILE` refers to `DATALAD_DOTDIR/config`.
      - `METADATA_FILENAME` has been renamed to `OLDMETADATA_FILENAME`.

* tag '0.11.4':
  Updated CHANGELOG.md for having merged check for not being closed
  [DATALAD RUNCMD] CHANGELOG: Re-linkify 0.11.4 entries
  CHANGELOG.md: Update 0.11.4 entries
  CHANGELOG.md: Adjust the format of a link
  BF: split lines on spaces and commas befoe doing sed capture of #issue
  RF: adjust tools/link_issues_CHANGELOG to have MD links as [#issue][]
  BF+DOC: Fix all links to mark them valid markdown
  BF: Fix markup bug that prevents sphinx-build from succeeding
  DOC: extend coumentation of is_interactive
  BF: guard .istty with try/except
  BF: check for stdin/out/err to not be closed before checking for .isatty
  [DATALAD RUNCMD] CHANGELOG: Re-linkify 0.11.4 entries
  CHANGELOG: Add entries for 0.11.4
  ENH: version boost and initial changes to CHANGELOG
yarikoptic added a commit to yarikoptic/datalad that referenced this issue Dec 10, 2019
It is often desired to reference current dataset. E.g. to add an existing
dataset as subdataset to the current dataset somewhere not at the top of the
dataset directories tree. Especially in quite nested hierarchies it
becomes a burden to figure out where is the boundary of the dataset to specify
it in -d option.  -d .  would point to the (not installed, and possibly illegit
see datalad#3211) subdataset in current directory, whenever actual need is to specify
smth like -d ../../..  .

With -d^. it becomes possible to say "refer to the current dataset I am in".

The only gotcha I am aware of is that shell might like to use "^" as
a quick substitution for previous command, that is why I specified it
as -d^., i.e. without space - then seems to work as intended -- running
"datalad add -d^. ./subds"  would add subds in current subdirectory to the
dataset without requiring full path in -d.

Depending on the outcome of datalad#3230 discussion, this change might not be really
needed, e.g. if default context for operations (including on subdatasets) would
migrate to be of "current dataset".  Otherwise, if -d would be needed to "bind"
the context to current dataset - at least with this helper it would be easy to
unambigously reference current dataset regardless of the position in the
directories hierachy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants