Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NF: handle_dirty_dataset() #752

Merged
merged 4 commits into from
Sep 9, 2016
Merged

NF: handle_dirty_dataset() #752

merged 4 commits into from
Sep 9, 2016

Conversation

mih
Copy link
Member

@mih mih commented Sep 2, 2016

Check tests inside -- food for thought re subdataset treatment!

@mih
Copy link
Member Author

mih commented Sep 2, 2016

I guess the discovered behavior regarding the treatment of unregistered submodules on save should make us reconsider the behavior of save or the presence add_to_super or both.

@mih mih added the funny label Sep 2, 2016
@coveralls
Copy link

coveralls commented Sep 2, 2016

Coverage Status

Coverage increased (+0.05%) to 86.85% when pulling 1b8f375 on mih:nf-dealwithdirty into 4755adb on datalad:master.

@codecov-io
Copy link

codecov-io commented Sep 2, 2016

Codecov Report

Merging #752 into master will decrease coverage by 2.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #752      +/-   ##
==========================================
- Coverage    88.9%   86.85%   -2.05%     
==========================================
  Files         212      216       +4     
  Lines       18774    18918     +144     
==========================================
- Hits        16691    16432     -259     
- Misses       2083     2486     +403
Impacted Files Coverage Δ
datalad/interface/utils.py 100% <100%> (ø)
datalad/interface/save.py 95.45% <100%> (+4.02%) ⬆️
datalad/interface/common_opts.py 100% <100%> (ø) ⬆️
datalad/interface/tests/test_utils.py 100% <100%> (ø)
datalad/crawler/nodes/s3.py 32.82% <0%> (-57.26%) ⬇️
datalad/tests/test_s3.py 43.47% <0%> (-56.53%) ⬇️
datalad/downloaders/s3.py 41.73% <0%> (-51.31%) ⬇️
datalad/crawler/nodes/tests/test_s3.py 49.59% <0%> (-50.41%) ⬇️
datalad/customremotes/tests/test_datalad.py 59.25% <0%> (-40.75%) ⬇️
datalad/downloaders/tests/test_s3.py 67.92% <0%> (-28.31%) ⬇️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ff2165...62e570c. Read the comment docs.

@coveralls
Copy link

coveralls commented Sep 2, 2016

Coverage Status

Coverage increased (+0.05%) to 86.85% when pulling 1b8f375 on mih:nf-dealwithdirty into 4755adb on datalad:master.

@yarikoptic
Copy link
Member

I had similar error message yesterday while fixing more of symlinks install... I had.gitmodules but make of submodule was full path. Git puked as well
Yours - better ask/run by git ppl

orig_state = _check_auto_save(ds, orig_state)
# XXX surprisingly this is added as a submodule, but there is no .gitmodules
# which confused even Git itself (git submodule call now fails with
# "fatal: no submodule mapping found in .gitmodules for path 'subds'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should we do magic and add it explicitly?

Copy link
Member

@bpoldrack bpoldrack Sep 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold on. I cannot reproduce it manually via cmdline. So maybe this is not a funny behaviour of git, but a bug in datalad create ...

Edit: I cannot even reproduce it, using python API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so may be it is git version dependent?
(building fresh annex snapshot meanwhile)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be. I'm using 2.7.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.8.1 in my case.

Copy link
Member

@yarikoptic yarikoptic Sep 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fancy...

$> cd /tmp/datalad_temp_test_dirtyMM8nrW
something@  staged  subds/

$> git status                           
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

    subds/

nothing added to commit but untracked files present (use "git add" to track)
$> git annex add .                      

$> git status     
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

    subds/


$> git annex add .

$> git add .

$> git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    new file:   subds

$> git diff --cached
diff --git a/subds b/subds
new file mode 160000
index 0000000..41419ed
--- /dev/null
+++ b/subds
@@ -0,0 +1 @@
+Subproject commit 41419ed65118451d95eb27e8f48801f398effa5c

so it is added as a subproject (i.e. submodule) BUT nothing added to .gitmodules:

$> git commit -m 'bu' 
[master 83ab0e9] bu
 1 file changed, 1 insertion(+)
 create mode 160000 subds

$> git status
On branch master
nothing to commit, working tree clean

$> cat .gitmodules
cat: .gitmodules: No such file or directory

$> git submodule 
fatal: no submodule mapping found in .gitmodules for path 'subds'

$> git --version
git version 2.9.3

which is imho is a git bug. it could at least add a record without url (or even with relative path as a url). I will check on git IRC

NB1 git annex add . call here is of no side-effect. checked without it -- the same story

@yarikoptic
Copy link
Member

seems something happened with the buildbot -- didn't pick up this PR changes at all

@yarikoptic
Copy link
Member

yarikoptic commented Sep 2, 2016

ha -- paying the price for renaming yourself:
Ignoring pull request action synchronize made by mih since (s)he is not listed among trusted_logins
edit: made buildbot trust mih as well - so should be tested now

@mih
Copy link
Member Author

mih commented Sep 2, 2016

The sound of jealousy....

On Fri, Sep 2, 2016 at 2:29 PM, Yaroslav Halchenko <notifications@github.com

wrote:

ha -- paying the price for renaming yourself:
Ignoring pull request action synchronize made by mih since (s)he is not
listed among trusted_logins


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#752 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAIVH6nlkFJkBB3srplikidJYDz_YyaEks5qmBaogaJpZM4Jzb1B
.

Michael Hanke
http://mih.voxindeserto.de

# XXX surprisingly this is added as a submodule, but there is no .gitmodules
# which confused even Git itself (git submodule call now fails with
# "fatal: no submodule mapping found in .gitmodules for path 'subds'"
assert_equal(ds.get_subdatasets(), [])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall -- not sure if we should leave this issue without a workaround, even if it would be ugly in principle, e.g.

  • do the 'git add', 'git commit'
  • run git submodule and check for 'no submodule mapping' and if present
    • fix up for it by adding corresponding entry to .gitmodules (may be git submodule add could be ran explicitly without puking... didn't check
    • git commit --amend

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from a brief chat -- I don't think that anything would be done on git side

that "Subproject" commit is added by 'git add', and kinda "independent" from all the submodule mechanism, but submodule freaks out when sees Subproject without mapping (as we discovered). As to me -- awkward design... but oh well -- we can easily workaround, so we should.

Also (re)discovered the 'subtree' mechanism. Verified that it doesn't deal in terms of "Subproject" commits -- really does merges and records state information within the commits seems to me, e.g.:

commit 7faf2ffe6661f6957e48e104d7e20a83041aba1c
Merge: cf74de7 6210ac5
Author: Yaroslav Halchenko <debian@onerussian.com>
Date:   Fri Sep 2 14:35:51 2016 -0400

    Add 'tools/' from commit '6210ac540b07dd094004cdab6fce2ce9baf636fc'

    git-subtree-dir: tools
    git-subtree-mainline: cf74de722077e1ca8814e0ec9a6c893d7dbdda8d
    git-subtree-split: 6210ac540b07dd094004cdab6fce2ce9baf636fc

whenever I split tools/ into another branch, removed in master, and then 'subtree add'ed from the branch it was split into. Just beware of the beast ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we can NOT easily work around it from my point of view. The thing is: We want a create beneath an existing dataset with add_to_super=False. So, adding the submodule might satisfy git, but is not, what we actually want to do.

Copy link
Member Author

@mih mih Sep 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I am also having difficulties with this. It is easy to detect when we run into this, but it is not straightforward to me what would need to be done in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get, how git ppl are able to not care for this. I mean: Think of cases, we had in mind, like having a HOMEDIR, which is a git repo. Now you can not have a repo beneath it, that is not a submodule? This should break setups of a lot of ppl, shouldn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can , just don't do git add .
On a serious note I know no one who keeps entire home in git. That is why eg I rely on symlinks to config dirs and vcs-home iirc uses GITDIR env var to point to location of actual git repos

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mih indeed what to do might depend on how you would want to add that submodule. Since we are taking about simplifying typical workflow and provide publish etc, I think it would be OK to provide specific handling the way I outlined

@mih
Copy link
Member Author

mih commented Sep 8, 2016

Kept thinking about this. I believe the issue of submodules is best dealt with in the context of save. Currently, we force-add everything in a safe -a (which is what is called in handle_dirty_dataset()), and we better change that to yield a git repo with an intact submodule representation.

My current concept would be to go through the index before the commit in save, check whether anything in there is a repo that is not listed as a submodule, add that to .gitmodules, and then commit.

Objections? @yarikoptic @bpoldrack

@bpoldrack
Copy link
Member

Sounds reasonable to me.

Check tests inside -- for for thought re subdataset treatment!
@coveralls
Copy link

coveralls commented Sep 8, 2016

Coverage Status

Coverage decreased (-2.05%) to 86.858% when pulling f38eebf on mih:nf-dealwithdirty into 5ff2165 on datalad:master.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-2.05%) to 86.858% when pulling f38eebf on mih:nf-dealwithdirty into 5ff2165 on datalad:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-2.05%) to 86.858% when pulling f38eebf on mih:nf-dealwithdirty into 5ff2165 on datalad:master.

@mih
Copy link
Member Author

mih commented Sep 8, 2016

FTR: Something is fucked up with the coverage decrease reports for the last PRs (and this one). This time there is full coverage and it still reports and 2% decrease.

@mih
Copy link
Member Author

mih commented Sep 8, 2016

Ready to merge from my POV.

Will start composing a milestone for the RF to auto-save the world over and over.

@mih mih added this to the Auto-save milestone Sep 8, 2016

# test whether the potential submodule is scheduled for saving
if sum([realpath(utf_abspath).startswith(_with_sep(realpath(f)))
for f in files]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPT: _with_sep(realpath(f)) would be called on the same set of files for every untracked file -- so why not to pre-compute all those to save at least some cycles?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True.

ds=ds,
path=utf_abspath, # can be ignored, we don't need the return value
relativepath=utf,
name=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what happens if that submodule wasn't listed among files? then old behavior? (git add . and having it added as a SubProject by git but not as a proper submodule?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how that could happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so

else:
       raise RuntimeError("Michael didn't see how this could happen")

? ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this would not work. If the untracked file is not covered by that list,
it would not be saved later on. Only list elements will be.

On Sep 8, 2016 2:50 PM, "Yaroslav Halchenko" notifications@github.com
wrote:

In datalad/interface/save.py
#752 (comment):

  •    # submodule functionality completely
    
  •    for utf in ds.repo.repo.untracked_files:
    
  •        utf_abspath = opj(ds.path, utf)
    
  •        if not isdir(utf_abspath):
    
  •            # this cannot be a repository
    
  •            continue
    
  •        # test whether the potential submodule is scheduled for saving
    
  •        if sum([realpath(utf_abspath).startswith(_with_sep(realpath(f)))
    
  •                for f in files]):
    
  •            # matches at least one path -> turn into submodule
    
  •            _install_subds_inplace(
    
  •                ds=ds,
    
  •                path=utf_abspath,  # can be ignored, we don't need the return value
    
  •                relativepath=utf,
    
  •                name=None)
    

so

else:
raise RuntimeError("Michael didn't see how this could happen")

? ;)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/datalad/datalad/pull/752/files/f38eebf7993ba7dbcaa0cd4959d2e63c18cd82c3#r77998567,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAIVHwfG-iszjhpvFjm4VPAIOQsPhudHks5qoAR4gaJpZM4Jzb1B
.

@coveralls
Copy link

coveralls commented Sep 8, 2016

Coverage Status

Coverage decreased (-2.05%) to 86.859% when pulling 62e570c on mih:nf-dealwithdirty into 5ff2165 on datalad:master.

@mih
Copy link
Member Author

mih commented Sep 9, 2016

Following my mantra "eh, fuck it", I am merging this now. IEBAOSM (indifference expressed by absence of significant moaning ;-)

@mih mih merged commit 807fd73 into datalad:master Sep 9, 2016
@mih mih deleted the nf-dealwithdirty branch June 24, 2017 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants