Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inode leak #14

Open
thomasmeeus opened this issue Dec 10, 2014 · 7 comments
Open

inode leak #14

thomasmeeus opened this issue Dec 10, 2014 · 7 comments

Comments

@thomasmeeus
Copy link
Contributor

Hi,

I think subsplit has some kind of inode-leak:

root::gitlab { .../KunstmaanBundlesCMS/.subsplit/.git }-> pwd
/home/subsplit/dflydev-git-subsplit-github-webhook/temp/KunstmaanBundlesCMS/.subsplit/.git
root::gitlab { .../KunstmaanBundlesCMS/.subsplit/.git }-> for i in *; do echo -e "$(find $i | wc -l)\t$i"; done | sort -n
1   branches
1   config
1   description
1   FETCH_HEAD
1   HEAD
1   index
1   ORIG_HEAD
1   packed-refs
3   info
6   objects
10  hooks
75  refs
319 logs
7275935 subtree-cache

Is this a bug or am I supposed to rm the subtree-cache directory on a regular basis? Our server has run out of inodes twice.

Thx

@simensen
Copy link
Member

This is actually a bug in the underlying git subtree command. I submitted a patch to git subtree over a year and a half ago but the maintainer never responded to me. Every call to git subtree leaves behind a cache and it never gets cleaned up. I would suggest you add something to your build process to remove it periodically.

This should be documented as well. If you want to add it to the README send a PR. I'll try to do it myself when I have some time. I created #15 so that it doesn't get lost. :)

@mnapoli
Copy link

mnapoli commented Feb 2, 2015

@simensen do you have a link to the original issue in git subtree? I'd like to follow it to get notified if it's ever fixed…

@simensen
Copy link
Member

simensen commented Feb 2, 2015

@mnapoli I sent an email (with a patch) to the maintainer on March 21st, 2013 and have received no response. I pinged them again on December 10th, 2014. I just sent another email.

My research on how to work w/ git indicated the emailing the person in charge of a specific part of the project was the best way to get changes into git. Perhaps someone else is a better candidate for receiving this patch but I haven't taken the time to look into it more. I'll attach my original email in another comment so that the information is logged.

@simensen
Copy link
Member

simensen commented Feb 2, 2015

To: apenwarr@gmail.com

I'm kinda new to these email driven patch things, so please be gentle. :) I'm happy to learn! I'm going to try to learn. Also, we haven't talked before so I wanted to introduce myself and my project. Hope this isn't too wordy. :-/

A somewhat common pattern emerging in a part of the PHP community is to break out larger monolithic project into smaller components but keeping the main development in the original combined repository. For all intents and purposes the component repositories are "read-only git subtree splits."

One of the first examples I saw of this was Symfony 2. There is a master repository ( https://github.com/symfony/symfony ) where all of the development occurs and there are several component repositories marked with the likes of "[READ-ONLY] Subtree split of the Symfony HttpFoundation Component" ( https://github.com/symfony/HttpFoundation ).

My read on the community was that there seemed to be a need to help people navigate the options of git subtree. I wanted to write something to help automate the process of managing one-way read-only subtree splits for projects that did not have the engineering resources that Sensio/Symfony has to build and manage them. So I wrote git subsplit.

https://github.com/dflydev/git-subsplit

There are a number of users now and I am starting to bump up against some limits of my knowledge of git and git subtree. I might want to pick your brain on on some of those things if you are willing to walk me through some things. :)

Some of the people with larger projects (zf2 is currently trying to use my project to manage its 50+ components) are running into an issue I can actually help with. Unfortunately, it is something that I think needs to be handled in git subtree rather than something that I can code myself.

Currently git subtree leaves the cache files. As far as I can tell, they are not actually needed except during the split command. In fact, if the split command finds the cache directory it wants to exist already it simply deletes the directory and starts from scratch. The end result is that people will likely eventually run out of inodes as a huge number of files are created and never deleted.

At first I thought a simple rm -rf .git/subtree-cache would do the job, but this is not very safe as it would be hard to guarantee that another process is not actively running git subtree split, especially when running in an automated environment. In the case of naive post commit hooks or cron jobs that are not timed correctly, simply removing the subtree-cache directory might leave another process in an unknown state.

I thought maybe I could do something with $! but that only works if the git subtree split process is launched into the background. Additionally, this didn't really seem to work because I think that the $$ from git-subtree was used, and the $! was the PID from git itself. I was seeing things like PID 589 for git-subtree but 588 coming back via $!.

I'm attaching a patch showing what I think might be an OK solution but I'd love some feedback on it before I continue to work on it. Is this an OK way to go? Is there any reason to want to leave the cache files around in some cases?

Also, I found your repo and I found the contrib code in git.git. I cloned git.git as I thought that is where this patch should go but found that most of the fun stuff is in .gitignore. So I'm not sure so which repository I should contribute the final patch. :)

Is your personal repo still alive in any way? Would you be willing to accept a patch to that repo? I'm sending people to your repo if they don't have git subtree in their version of git. Would be great if the final fix was there as well so that people could leverage this fix. Happy to discuss this more!

For some background you can check out this PR to git-subsplit. It took awhile to figure out the actual problem. #7

Here is the patch, hopefully not too mangled. Thanks!


From 20031154fd5203e31244557c878fc595553ff244 Mon Sep 17 00:00:00 2001
From: Beau Simensen <beau@dflydev.com>
Date: Wed, 20 Mar 2013 21:43:18 -0500
Subject: [PATCH/RFC] Cleanup cache directory after split command

The subtree cache directory was previously growing out of control.
There was no safe way to remove individual cache sessions due to
the usage of `$$`. There were no safe alternatives to remove the
subtree cache in automated environments.

One could simply `rm -rf .git/subtree-cache` but this would leave
the door open for race conditions. If another subtree split command
was running at that time it would wipe out the cache for that process
leaving it in a potentially unsafe state.
---
 git-subtree.sh |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/git-subtree.sh b/git-subtree.sh
index 920c664..09d909e 100755
--- a/git-subtree.sh
+++ b/git-subtree.sh
@@ -161,6 +161,11 @@ cache_miss()
    done
 }

+cache_cleanup()
+{
+   rm -rf "$cachedir" || die "Can't delete cachedir: $cachedir"
+}
+
 check_parents()
 {
    missed=$(cache_miss $*)
@@ -638,6 +643,7 @@ cmd_split()
        say "$action branch '$branch'"
    fi
    echo $latest_new
+   cache_cleanup || exit $?
    exit 0
 }

--
1.7.9.5

@mnapoli
Copy link

mnapoli commented Feb 2, 2015

Thanks. Wow this sounds like an archaic workflow :p But whatever works for them…

@simensen
Copy link
Member

simensen commented Feb 2, 2015

@mnapoli Yep. :) I don't know if you've seen Linus talking about his thoughts on GitHub but if you do it might make more sense. ;) In any event, you can patch your own local version of git-subtree. I've done that in a few places in the past.

@luckydonald
Copy link

Would it be possible for subsplit to clean that up either automatically or with a --clear-cache after usage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants