Managing your Bioc code on hedgehog and github

Laurent Gatto edited this page Oct 23, 2015 · 43 revisions

Introduction

This page describes how Joe User manages his Bioconductor package joemisc on two different repositories, and keeps the source and commit messages in sync. He shares the same username joer on Bioconductor hedgehog svn server and on github (which is not a requirement, by the way).

Concious of the importance of source code tracking, he created his github repo early in the joemisc development, as described here. The github remote repository is git@github.com:joer/joemisc.git.

After long hours of work, refining his S4 classes and his package vignette, he successfully submits joemisc to Bioconductor. His package now also has a home on the Bioc svn server: https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/joemisc. He could delete the github repository and work exclusively with svn, but he likes github - the interface is clean, he likes the issue tracker, has a wiki describing some aspects of the package and does not want to abandon the git log and all the carefully written commit messages.

NB: It seems that fetching a remote svn branch does not work anymore in git > 1.8.3. Let us know if you have additional details.

A solution

git-svn is apparently exactly what he is looking for: - Bidirectional operation between a Subversion repository and git - groovy!

Configuration

To prepare a directory containing an existing git repository for remote svn tracking and to create a local tracking branch, Joe runs the following set of git commands:

git config --add svn-remote.hedgehog.url https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/joemisc
git config --add svn-remote.hedgehog.fetch :refs/remotes/hedgehog
git svn fetch hedgehog 
git checkout -b remotes/hedgehog
git svn rebase hedgehog

The two first lines create hedgehog, defining the remote svn repository to be fetched and tracked using git-svn. The name is arbitrary. These two commands produce the following section in .git/config:

[svn-remote "hedgehog"]
	url = https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/joemisc
	fetch = :refs/remotes/hedgehog

The third line fetches the svn repo from Seattle. To avoid fetching the 70000+ commits of subversion repository since early 2002, Joe could also use git svn fetch hedgehog -r 70455:HEAD, as revision 70455 is the first one that is relevant for his joemisc package.

The git checkout -b remotes/hedgehog command creates the remotes/hedgehog branch and checks it out (switches to that branch).

Finally, rebase updates the working branch remotes/hedgehog with the latest remote svn changes. This is equivalent as svn update. (If you get Unable to determine upstream SVN information from working tree history, see the trouble shooting section below.)

Joe has now two local git branches

joe@elyacin:~/dev/joemis$ git branch 
  master
* remotes/hedgehog

remotes/hedgehog tracks the remote svn branch on the hedgehog svn server in Seattle. master is the local branch that tracks the remote origin pointing at your GitHub repo.

Using with github

Joe is now ready to implement his latest feature in joemisc.

git checkout master

He uses master for his everyday development: he makes a couple of changes to R/kungfoo.R, updates his package version in DESCRIPTION references the changes in the NEWS file. He then adds and commits these changes to the local master branch with git commit -am "updated high kick function". So far, these offline commits only exist locally, in the master branch and pushed them to github, the remote (origin) master branch with

git push origin master
## or only
git push

Using git-svn

When it's time to push the new code to the Bioconductor server, he checks out remotes/hedgehog

git checkout remotes/hedgehog

He rebases the remote svn repository, in case any changes were committed (this is important to avoid the dreaded Unable to determine upstream SVN information from working tree history message - see below).

git svn rebase

To merge the master (that contains the latest code) and remotes/hedgehog

git merge master

And, eventually, to commit these latest updates to the remote svn server

git svn dcommit

This effectively commits each local commit from the local branch directly to the svn repository in Seattle, as well as the git commit message. See this page to preserve the original committer's name in the svn commit message.

The svn server in Seattle and the github repositories are now synced.

Example configuration files

[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
[remote "origin"]
	url = git@github.com:joer/joemisc.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
	remote = origin
	merge = refs/heads/master
[svn-remote "hedgehog"]
	url = https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/joemisc
	fetch = :refs/remotes/hedgehog

Known issues

svn requires a commit message of at least 10 characters. This is however not a requirement in git. If there is such a short git commit message, it might lead to an error in git svn dcommit. If the commits, and in particular the offending commit, have already been pushed to GitHub, updating the commit message on GitHub is not advised, as it would change the public history of the repository. I am currently not aware a a good way to fix this; if you have a way, please update this section accordingly or get in touch.

Troubleshooting

Unable to determine upstream SVN information

The dreadful message indicating that git and svngot out of sync is Unable to determine upstream SVN information from working tree history. This can happen, for example if one does not git svn rebase but changes were commit to hedgehog independently. Inspect you git log with git log --graph --decorate --pretty=oneline --abbrev-commit --all to identify such cases.

Useful references to sort such cases out are

If, when setting up git-svn for the first time, you get this issue, try the following. First look at the log with git log --graph --decorate --pretty=oneline --abbrev-commit --all. In my case, I get

* ffb1b97 (hedgehog) adding ProtGenerics to Manifest
* 9116531 (HEAD, origin/master, origin/HEAD, remotes/hedgehog, master) remove extra blank lines
* 7440967 update version
* 2da6931 update namespace
* fabaf4f first commit

Commit ffb1b97 is loose; we need to graft it to the git commit history.

$ git log --pretty=oneline hedgehog | tail -n1                                                                  
ffb1b97b0f00a46ad3593fe4bd150d240a186616 adding ProtGenerics to Manifest

Now we want to graft the latest commit ffb1b971 on, for example 9116531, the current head.

$ git log --pretty=oneline master | head -n1
9116531544c5d551a92089b7f511018d48232b58 remove extra blank lines

(it could be another one, up to the first commit)

echo "9116531544c5d551a92089b7f511018d48232b58 ffb1b97b0f00a46ad3593fe4bd150d240a186616" >> .git/info/grafts

and now

$ git svn rebase
Current branch remotes/hedgehog is up to date.

If everything else fails

It is also possible to temporarily use Dan's git-svn bridge to resolve any desperate git-svn issues. But note that when creating the bridge, the two code bases will not be merged, but one is overwritten by the other (that's the git wins or svn wins option). It is thus important to make sure that one of the svn for GitHub repositories include all the changes before proceeding with this option.

Appendix I: A Simpler Way

Sometimes you don't need all the steps above. Here is a simpler way provided by Jim Hester:

If you don't already have a github repository, create one and push it to GitHub. It can be empty. Then clone it:

git clone https://github.com/myusername/mypackagename.git

Go to the working copy:

cd mypackagename

Set up the svn side of the repos:

git svn init https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/mypackagename
git update-ref refs/remotes/git-svn refs/remotes/origin/master
git svn rebase

Whenever you need the latest from svn, just be in the master branch and do

git svn rebase

This will change the contents of your master branch, you will need to add/commit/push the changes in order to keep the git side in sync.

When you need to commit to svn, just be in the master branch and do:

git svn dcommit --add-author-from

References

Resources