-
Notifications
You must be signed in to change notification settings - Fork 131
Dealing with corrupt git repositories #79
Comments
I think we should figure out what's causing the corruption. Are we screwing up the locking somehow? Is there some kind of race condition with fetching and some other integration? Deleting it definitely feels suboptimal. I'd probably favor it failing rather than behaving in a possibly unintended fashion. |
It could be something odd with execute_and_log(["git", "fetch"])
execute_and_log(["git", "reset", "--hard", sha]) |
That approach has been solid for us for a few years but we don't use posix spawn. |
Isn't it used here? https://github.com/atmos/heaven/blob/master/app/models/provider/capistrano.rb#L22 Then I might have to do some digging to see what's causing those git errors to come up. |
Sorry, yeah. It is in use there. We've been using the same approach with capistrano |
I'm at a loss. I know that it keeps happening. There doesn't seem to be any obvious issues with the fetch and reset approach. |
Are you possibly getting multiple events close together where the different commands executing are leaving things in a bad state? Normally git leaves a lock file around that's pretty easy to identify and you don't seem to be getting that messages though. |
No, requests come in multiple times a day but rarely close together. |
I'm sure you've already tried, but |
@dblandin is this still happening to you? |
Last happened a few days ago:
fsck didn't reveal any problems either:
|
After looking at how Capistrano handles fetching/resetting, I've modified my providers to use the following checkout and sync methods: def checkout(revision)
unless File.exist?(checkout_directory)
log "Cloning #{repository_url} into #{checkout_directory}"
execute_and_log(["git", "clone", clone_url, checkout_directory])
end
end
def sync(revision)
Dir.chdir(checkout_directory) do
log "Fetching the latest code"
execute_and_log(["git", "config", "remote.origin.url", clone_url])
execute_and_log(["git", "config", "remote.origin.fetch", "+refs/heads/*:refs/remotes/origin/*"])
execute_and_log(["git", "fetch", "origin"])
execute_and_log(["git", "fetch", "--tags", "origin"])
execute_and_log(["git", "checkout", "--force", "-B", "deploy", sha])
end
end Hopefully this lessens the frequency of the git object errors that have crept up occasionally. |
Still running into this issue unfortunately. Current working theory is that the working directory gets into this state most often after a
|
Do you guys rebase/force that often? We just do merges at work. |
Never directly on master, but we'll occasionally rebase branches off of master and force push the updated branch. |
I'm leaning towards using a clean temporary directory for each deploy to get around this problem. It has become a significant pain at work. I don't see us changing the way we're rebasing anytime soon. |
that's probably a good idea. rsync is usually a good idea if you can keep the original copy nice and clean. |
Seems pretty stable so far: require 'tmpdir'
# A module to include for easy access to writing to a transient filesystem
module LocalLogFile
def working_directory
@working_directory ||= Dir.mktmpdir
end
def cleanup_working_directory
FileUtils.rm_r(working_directory)
end class DefaultProvider
...
def run!
Timeout.timeout(timeout) do
setup
execute unless Rails.env.test?
notify
record
end
rescue StandardError => e
Rails.logger.info e.message
Rails.logger.info e.backtrace
ensure
update_output
cleanup_working_directory
status.failure! unless completed?
end
end |
Closing this issue for now. Wasn't able to figure out the root cause or a fix for it but starting from a fresh working directory for each deployment seems to work well. Hopefully no one else encounters the same problem. Happy to submit a patch if this comes up for anyone else. |
It's a pity that i come across this problem now. It seems no fix for it, right? |
@Live2Learn Unfortunately I didn't come up with a better solution for this issue. Here's the commit where I setup a temp directory during deploys: https://github.com/dscout/heaven/commit/932500542745719cb460a0727cf0a3657dc8a7d9 |
Occasionally, I'll get git output like the following during deploys:
The deploy continues on, which usually doesn't make a big difference when using a Capistrano provider. I'm using Capistrano v3. We don't usually alter the Capistrano configuration, and the remote server will fetch the git repository itself.
But, the working directory repo obviously doesn't checkout to the right sha. And for other providers, this will be more problematic (such as deploying straight to S3).
Has anyone run into this problem on their own setups? Any ideas towards fixing this problem?
I usually end up deleting the working directory which forces heaven to re-clone the repository during the next deploy.
I suppose another question is: should the deploy continue on if any task during a provider fails, or should the entire deploy fail immediately and ignore any following tasks.
The text was updated successfully, but these errors were encountered: