Prerequisites:
- a basic knowledge of the Unix shell (cf. day one of bootcamp)
After completing this course you will know:
- what is version control and what it can do for you
- how to set up a new repository for your project
- how to track files, record changes and view their history
- how to revert to a previous state of your project
- how to share your code publicly
- Introduction to version control ==================================
- Some common issues arise when files are not version-controlled:
A lab notebook for files:
- Lab books make lab work traceable. Analyses should also be traceable.
- Analysis steps must be recorded, and reverting to any previous step must be possible.
- This ensures that we always exactly know how a result was generated.
- The first researcher who will need to reproduce your results is likely to be you.
- Version control is a tool to keep track of file changes.
- However, version control softwares offer more than simply recording
successive versions of a file:
- Version controlled projects can be forked, merged and shared with collaborators.
- Interesting both for collaborative work and for single researcher
- V1, V2, and V3 are successive versions of the script
- V4 is committed, but then a mistake is found. We revert to V3
- A new, correct V4 is committed
- V5 and V6 are successive versions of the script
- At this point, we want to implement a new feature that might be interesting, but which is experimental. In order to keep V6 clean, we create a new branch in which we can experiment with the script without damaging the stable V6
- V6b and V7b are successive versions of the experimental script
- At some point, the experimental changes are mature and we want to merge them back into the master branch. V7b and V6 are merged together into V8
- We realise we want to revert to a previous version of one function in the script. For this function, we revert to the code present in V2, keep all the rest as it is in V8 and commit it as V9
- V10 is the next commit
- Existing version control tools
- Subversion
- Bazaar
- Mercurial
- Git, which is one of the most popular ones nowadays
- Online servers for repositories
-
We want to write a cookbook of recipes. We decide to create a folder to hold all our recipes, with one file for each recipe.
-
We will use Git to track the changes in our recipe folder.
-
You will first learn how to use Git with the command line to understand how it works. Later, you can use one of the numerous Git graphical user interfaces to use Git with your projects.
-
Log into the remote server using
ssh
(GNU/Linux or Mac) orputty
(Windows) -
For
ssh
connection:ssh jyybioxx@130.234.109.113
-
Username:
jyybioxx
-
Password: on the whiteboard!
-
Create a new folder for your recipes
mkdir cookbook # Go into the new folder cd cookbook
-
Create an empty file for your first recipe:
touch pancakes
-
Edit your file with
nano
. Nano is a basic text editor which can be used from the command line. -
Live demonstration!
-
Nano usage:
nano pancakes
to start editing- Type text as you wish
- Use arrows to move around your text
- Press
CTRL + O
to save your edited text - Press
CTRL + X
to exit
-
Fill in some ingredients for your pancake recipe:
Pancake recipe Ingredients: * 500g of flour * 5 eggs * 1 liter of milk
-
Save your edited file and go back to the command line prompt.
-
Now we are ready to track our recipe file. First we need to initiate a Git repository in our project folder:
# Make sure the current folder is the cookbook folder pwd ls # Initialize an empty Git repository git init
-
What happened?
-
Each time you want to start using version control for a project, you have first to create an empty repository with
git init
in the project folder.
-
Git stores all its information in the
.git
folder. -
Folders and files whose name starts with a dot are hidden from the
ls
output by default, but you can force their display with:ls -a
-
We can ask Git about the status of our current repository anytime with
git status
. Try it:git status
-
Git doesn't know yet which file we want to track. The first step is to specify which changes we want to record in our repository. We use the
git add
command for that:git add pancakes
-
What is the status now?
git status
-
Git has some changes ready to be saved (they are staged). To actually save them to the repository, we tell git to commit the staged changes:
# Specify a commit message after the -m option git commit -m "Create a recipe for pancakes"
-
What happened?
-
One of the key feature of a version control system is to assign each change to someone. This ensures that all modifications can be traced to their original author.
-
The first time you use Git, you have to configure it with your name and your email address. You have to do this only once.
-
Configure Git with:
git config --global user.email "you@example.com" git config --global user.name "Your Name"
-
Try again to commit:
# Specify a commit message after the -m option git commit -m "Create a recipe for pancakes"
-
It is very important to use concise and meaningful commit messages!
-
What is the current status of the repository?
-
Your list of ingredients is missing something. Update it:
Pancake recipe Ingredients: * 500g of flour * 5 (or 4) eggs * 1 liter of milk * salt, oil
-
What is the status of the repository now?
-
Let's have a look at what actually change with
git diff
:git diff
-
git diff
compare lines by default, but we can make it work by "words":git diff --word-diff
-
Let's commit our changes:
git commit -m "Add missing ingredients for pancakes"
-
What happened?
-
Even if Git knows which files to track, by default it does not commit automatically changes in tracked files.
-
You have first to stage the changes by using
git add
again, and then to commit them withgit commit
:git add pancakes git commit -m "Add missing ingredients for pancakes"
-
This might look inefficient, but it gives you more control over what you want to commit when several files have been changed.
-
Often, however, you want to commit all the changes in the tracked files in one go. In this case, you can use the shortcut:
git commit -a -m "Add missing ingredients for pancakes" # which is equivalent to git commit -am "Add missing ingredients for pancakes"
-
The
-a
option tells Git to automatically add all changes in tracked files for commit.
-
Your repository history can be explored with:
git log
-
You can amend your last commit message with:
git commit --amend -m "Add salt and oil for pancakes" # View history git log
-
You can have a look at the Git log of ggplot2 for an example of history for a large project.
- Tracking a file and committing changes
- The staging area (and how to use the
-a
option) - Amend commit messages
- Git log to explore project history
-
Add some instructions about to make the pancake dough
-
Commit your changes:
git status git diff git commit -am "Add preparation instructions for the pancake recipe"
-
Add more information about the cooking method. Commit your changes.
-
Have a look at your history. Are your commit messages clear enough?
-
Let's see what is the overall difference between your latest commit and the first commit you did.
-
You already know how to get the difference between the last commit and your current files with
git diff
. You can also usegit diff
to compare commits.
-
Each commit is identified by a unique commit hash
commit d26f19ab15bf2baa9b2eaa42946689a4289546b0 Author: Matthieu Bruneaux <matthieu.bruneaux@gmail.com> Date: Thu Nov 10 14:11:21 2016 +0200 Basics for committing commit 9119038c82837229fccb44e9e309d0c307b4a6c3 Author: Matthieu Bruneaux <matthieu.bruneaux@gmail.com> Date: Thu Nov 10 14:11:01 2016 +0200 Add note about no copy-paste
-
These commit hashes can be used to specify which commits to compare with
git diff
:git diff 9119038c82837229fccb44e9e309d0c307b4a6c3 d26f19ab15bf2baa9b2eaa42946689a4289546b0
-
However, you don't need to always type the full hash. Often, the first characters are enough:
git diff 9119038 d26f19a
-
Use
git diff
and commit hashes to compare your first and your last commits. -
Use the same method to compare your first and your second commit?
-
Add some ingredients so that your pancake becomes a Hawaiian pancake:
Pancake recipe Ingredients: * 500g of flour * 5 (or 4) eggs * 1 liter of milk * salt, oil * pineapple juice * coconut syrup
-
Commit your changes.
-
Unfortunately, you heard that the National Finnish Institute for Pancakes emitted an official recommendation against pineapple in pancake dough. We have to revert to the previous version.
-
To revert to a previous version of
pancakes
, observe the hash of the version you want to revert to in Git history, and type:# Use the appropriate hash git checkout f32a121 -- pancakes
-
Commit your changes.
- Use diff to compare files
- Commits are identified by unique hashes
- How to revert to a previous version of a file with
git checkout
-
You think about adding a Christmas section to your book. You want to start working in this direction, but you are not totally sure you will end up using this version.
-
Let's create a new branch for our exploratory recipes:
git branch christmas git checkout christmas
-
We are now working in the
christmas
branch. Everything we do here will not have any effect on themaster
branch, which will remain clean. -
Run
git status
. What do you observe? -
Modify the recipe in
pancakes
:Pancake recipe Ingredients: * 500g of flour * 5 (or 4) eggs * 0.5 liter of milk * 0.5 liter of Glögi * salt, oil * cinnamon
-
Create a new recipe in a file called
snails
:Snails recipe Ingredients: * Burgundy snails * lots of garlic butter
-
Commit the changes to
pancakes
and the new filesnails
-
Have a look to your repository history
-
Switch back to the master branch with:
git checkout master
-
Have a look at your folder content and at
pancakes
. -
You now think that this Christmas project is a good thing and want to merge it with your master branch:
git merge christmas
-
Have a look at your repository history.
-
Repositories can easily be published online and copied locally from a remote location.
-
Copying a remote repository to your computer is called cloning.
-
Go to GitHub, a platform to host repositories.
-
Search for a repository of interest you might want to copy to your computer. In this example, we will clone the recipes repository from Hadley Wickham (GitHub repo).
-
Go back to your home folder with
cd
-
Clone the repository of your choice locally with:
# First, we make sure that we are in our home folder cd # We can clone somebody else's repository git clone https://github.com/hadley/recipes.git # Change the repository address if you wish
-
Now cd into the cloned repository
-
Explore the history and commits of the repository. What were the changes in the last commit? Who did it? Are there several contributors?
-
Did the author(s) use any branches?
-
Any interesting commit message?
-
Any interesting branching structure?
-
Modify one of the files and commit your changes
-
Have a look at the history and feel proud.
-
Remember: your commit messages should be clear and to the point!
- Advanced - Setting up a remote repository ============================================
-
You are pretty proud of your recipes and want to do good to the world: let's share your cookbook publicly!
-
Let's use GitHub to host a public repository of your code.
Note: you don't have to create a GitHub account if you don't want to - we totally understand you might be concerned about creating yet-another-account on a remote service. So please don't feel obliged to do so, and if you prefer not to do it just find a bootcamp partner who has a GitHub account to follow the next session with him/her.
-
Go to GitHub and create an account.
-
Login to your GitHub account.
-
Create a new repository for your cookbook project.
-
Go back to your project directory where you wrote your own cookbook.
-
Add a remote to your repository with:
git remote add origin https://github.com/myusername/myrepo.git/ # Use the appropriate address, which is given on the GitHub page of your repo
git remote
: command to manage remote repositoriesadd
: we create a new link between our local repo and a remote serverorigin
: this new link is calledorigin
for ease of usehttps://github.com/....
: this is the address of the remote repository
-
You are ready to push your local repository to the GitHub server:
git push origin master
git push
: command to push the local repository data to remote serversorigin
: the name of the link to a remote server we want to use (defined when we created the remote link withgit remote add ...
)master
: the branch we want to push. For now we have only been working with a single, master branch calledmaster
by default
-
Have a look to your repository on GitHub now. How does it look like?
-
Create a README file in your project folder, fill it with interesting information and commit it to your repository.
-
Push your changes to the remote repository:
git push origin master
-
Have a look to the remote repository on GitHub (you might need to refresh the browser page)