Skip to content

Latest commit

 

History

History
645 lines (451 loc) · 17.8 KB

version-control-with-git.md

File metadata and controls

645 lines (451 loc) · 17.8 KB

Before the course

Prerequisites:

  • a basic knowledge of the Unix shell (cf. day one of bootcamp)

After the course

After completing this course you will know:

  • what is version control and what it can do for you
  • how to set up a new repository for your project
  • how to track files, record changes and view their history
  • how to revert to a previous state of your project
  • how to share your code publicly
  1. Introduction to version control ==================================

1.1 A common problem

  • Some common issues arise when files are not version-controlled:

1.2 Reproducibility of research?

A lab notebook for files:

  • Lab books make lab work traceable. Analyses should also be traceable.
  • Analysis steps must be recorded, and reverting to any previous step must be possible.
  • This ensures that we always exactly know how a result was generated.
  • The first researcher who will need to reproduce your results is likely to be you.

1.3 A possible solution: version control

  • Version control is a tool to keep track of file changes.
  • However, version control softwares offer more than simply recording successive versions of a file:
    • Version controlled projects can be forked, merged and shared with collaborators.
    • Interesting both for collaborative work and for single researcher

1.4 Example of a version control flow for a Python script

  • V1, V2, and V3 are successive versions of the script
  • V4 is committed, but then a mistake is found. We revert to V3
  • A new, correct V4 is committed
  • V5 and V6 are successive versions of the script
  • At this point, we want to implement a new feature that might be interesting, but which is experimental. In order to keep V6 clean, we create a new branch in which we can experiment with the script without damaging the stable V6
  • V6b and V7b are successive versions of the experimental script
  • At some point, the experimental changes are mature and we want to merge them back into the master branch. V7b and V6 are merged together into V8
  • We realise we want to revert to a previous version of one function in the script. For this function, we revert to the code present in V2, keep all the rest as it is in V8 and commit it as V9
  • V10 is the next commit

1.5 What are the tools for version control?

  • Existing version control tools
  • Online servers for repositories
    • BitBucket (free private repositories)
    • GitHub (free for public repositories but not for private repositories)

2 Set up your project folder

  • We want to write a cookbook of recipes. We decide to create a folder to hold all our recipes, with one file for each recipe.

  • We will use Git to track the changes in our recipe folder.

  • You will first learn how to use Git with the command line to understand how it works. Later, you can use one of the numerous Git graphical user interfaces to use Git with your projects.

2.1 Connect to the server

  • Log into the remote server using ssh (GNU/Linux or Mac) or putty (Windows)

  • For ssh connection:

    ssh jyybioxx@130.234.109.113
    
  • Username: jyybioxx

  • Password: on the whiteboard!

2.2 Create your project folder

  • Create a new folder for your recipes

    mkdir cookbook
    # Go into the new folder
    cd cookbook
    
  • Create an empty file for your first recipe:

    touch pancakes
    

2.3 Write some text

  • Edit your file with nano. Nano is a basic text editor which can be used from the command line.

  • Live demonstration!

  • Nano usage:

    • nano pancakes to start editing
    • Type text as you wish
    • Use arrows to move around your text
    • Press CTRL + O to save your edited text
    • Press CTRL + X to exit
  • Fill in some ingredients for your pancake recipe:

    Pancake recipe
    
    Ingredients:
    * 500g of flour
    * 5 eggs
    * 1 liter of milk
    
  • Save your edited file and go back to the command line prompt.

3 Git basics - Tracking files and committing changes

3.1 Initialize a Git repository

  • Now we are ready to track our recipe file. First we need to initiate a Git repository in our project folder:

    # Make sure the current folder is the cookbook folder
    pwd
    ls
    # Initialize an empty Git repository
    git init
    
  • What happened?

  • Each time you want to start using version control for a project, you have first to create an empty repository with git init in the project folder.

Where does Git store its files?

  • Git stores all its information in the .git folder.

  • Folders and files whose name starts with a dot are hidden from the ls output by default, but you can force their display with:

    ls -a
    

3.2 Check current status, track and commit your changes

  • We can ask Git about the status of our current repository anytime with git status. Try it:

    git status
    
  • Git doesn't know yet which file we want to track. The first step is to specify which changes we want to record in our repository. We use the git add command for that:

    git add pancakes
    
  • What is the status now?

    git status
    
  • Git has some changes ready to be saved (they are staged). To actually save them to the repository, we tell git to commit the staged changes:

    # Specify a commit message after the -m option
    git commit -m "Create a recipe for pancakes"
    
  • What happened?

Tell Git who you are

  • One of the key feature of a version control system is to assign each change to someone. This ensures that all modifications can be traced to their original author.

  • The first time you use Git, you have to configure it with your name and your email address. You have to do this only once.

  • Configure Git with:

    git config --global user.email "you@example.com"
    git config --global user.name "Your Name"
    

Back to the commit

  • Try again to commit:

    # Specify a commit message after the -m option
    git commit -m "Create a recipe for pancakes"
    
  • It is very important to use concise and meaningful commit messages!

  • What is the current status of the repository?

3.3 Commit more changes

  • Your list of ingredients is missing something. Update it:

    Pancake recipe
    
    Ingredients:
    * 500g of flour
    * 5 (or 4) eggs
    * 1 liter of milk
    * salt, oil
    
  • What is the status of the repository now?

  • Let's have a look at what actually change with git diff:

    git diff
    
  • git diff compare lines by default, but we can make it work by "words":

    git diff --word-diff
    
  • Let's commit our changes:

    git commit -m "Add missing ingredients for pancakes"
    
  • What happened?

The staging area

  • Even if Git knows which files to track, by default it does not commit automatically changes in tracked files.

  • You have first to stage the changes by using git add again, and then to commit them with git commit:

    git add pancakes
    git commit -m "Add missing ingredients for pancakes"
    
  • This might look inefficient, but it gives you more control over what you want to commit when several files have been changed.

  • Often, however, you want to commit all the changes in the tracked files in one go. In this case, you can use the shortcut:

    git commit -a -m "Add missing ingredients for pancakes"
    # which is equivalent to
    git commit -am "Add missing ingredients for pancakes"
    
  • The -a option tells Git to automatically add all changes in tracked files for commit.

3.4 Explore history

  • Your repository history can be explored with:

    git log
    
  • You can amend your last commit message with:

    git commit --amend -m "Add salt and oil for pancakes"
    # View history
    git log
    
  • You can have a look at the Git log of ggplot2 for an example of history for a large project.

What we learnt about in this section

  • Tracking a file and committing changes
  • The staging area (and how to use the -a option)
  • Amend commit messages
  • Git log to explore project history

4 Git basics - Commit hashes and revert to previous versions

4.1 Write some recipe instructions

  • Add some instructions about to make the pancake dough

  • Commit your changes:

    git status
    git diff
    git commit -am "Add preparation instructions for the pancake recipe"
    
  • Add more information about the cooking method. Commit your changes.

  • Have a look at your history. Are your commit messages clear enough?

4.2 Diff

  • Let's see what is the overall difference between your latest commit and the first commit you did.

  • You already know how to get the difference between the last commit and your current files with git diff. You can also use git diff to compare commits.

A word about commit hash

  • Each commit is identified by a unique commit hash

    commit d26f19ab15bf2baa9b2eaa42946689a4289546b0
    Author: Matthieu Bruneaux <matthieu.bruneaux@gmail.com>
    Date:   Thu Nov 10 14:11:21 2016 +0200
    
        Basics for committing
    
    commit 9119038c82837229fccb44e9e309d0c307b4a6c3
    Author: Matthieu Bruneaux <matthieu.bruneaux@gmail.com>
    Date:   Thu Nov 10 14:11:01 2016 +0200
    
        Add note about no copy-paste
    
    
  • These commit hashes can be used to specify which commits to compare with git diff:

    git diff 9119038c82837229fccb44e9e309d0c307b4a6c3 d26f19ab15bf2baa9b2eaa42946689a4289546b0
    
  • However, you don't need to always type the full hash. Often, the first characters are enough:

    git diff 9119038 d26f19a
    

Do the diff

  • Use git diff and commit hashes to compare your first and your last commits.

  • Use the same method to compare your first and your second commit?

4.3 Revert

  • Add some ingredients so that your pancake becomes a Hawaiian pancake:

    Pancake recipe
    
    Ingredients:
    * 500g of flour
    * 5 (or 4) eggs
    * 1 liter of milk
    * salt, oil
    * pineapple juice
    * coconut syrup
    
  • Commit your changes.

  • Unfortunately, you heard that the National Finnish Institute for Pancakes emitted an official recommendation against pineapple in pancake dough. We have to revert to the previous version.

  • To revert to a previous version of pancakes, observe the hash of the version you want to revert to in Git history, and type:

    # Use the appropriate hash
    git checkout f32a121 -- pancakes
    
  • Commit your changes.

What we learnt about in this section

  • Use diff to compare files
  • Commits are identified by unique hashes
  • How to revert to a previous version of a file with git checkout

5 Intermediate - Branching and merging

  • You think about adding a Christmas section to your book. You want to start working in this direction, but you are not totally sure you will end up using this version.

  • Let's create a new branch for our exploratory recipes:

    git branch christmas
    git checkout christmas
    
  • We are now working in the christmas branch. Everything we do here will not have any effect on the master branch, which will remain clean.

  • Run git status. What do you observe?

  • Modify the recipe in pancakes:

    Pancake recipe
    
    Ingredients:
    * 500g of flour
    * 5 (or 4) eggs
    * 0.5 liter of milk
    * 0.5 liter of Glögi
    * salt, oil
    * cinnamon
    
  • Create a new recipe in a file called snails:

    Snails recipe
    
    Ingredients:
    * Burgundy snails
    * lots of garlic butter
    
  • Commit the changes to pancakes and the new file snails

  • Have a look to your repository history

  • Switch back to the master branch with:

    git checkout master
    
  • Have a look at your folder content and at pancakes.

  • You now think that this Christmas project is a good thing and want to merge it with your master branch:

    git merge christmas
    
  • Have a look at your repository history.

6 Intermediate - Cloning a remote repository

  • Repositories can easily be published online and copied locally from a remote location.

  • Copying a remote repository to your computer is called cloning.

6.1 Find an interesting repository to clone on GitHub

  • Go to GitHub, a platform to host repositories.

  • Search for a repository of interest you might want to copy to your computer. In this example, we will clone the recipes repository from Hadley Wickham (GitHub repo).

  • Go back to your home folder with cd

  • Clone the repository of your choice locally with:

    # First, we make sure that we are in our home folder
    cd
    # We can clone somebody else's repository
    git clone https://github.com/hadley/recipes.git
    # Change the repository address if you wish
    

6.2 Explore the repository locally

  • Now cd into the cloned repository

  • Explore the history and commits of the repository. What were the changes in the last commit? Who did it? Are there several contributors?

  • Did the author(s) use any branches?

  • Any interesting commit message?

  • Any interesting branching structure?

  • Modify one of the files and commit your changes

  • Have a look at the history and feel proud.

  • Remember: your commit messages should be clear and to the point!

(original link)

  1. Advanced - Setting up a remote repository ============================================
  • You are pretty proud of your recipes and want to do good to the world: let's share your cookbook publicly!

  • Let's use GitHub to host a public repository of your code.

7.1 Create a GitHub account

Note: you don't have to create a GitHub account if you don't want to - we totally understand you might be concerned about creating yet-another-account on a remote service. So please don't feel obliged to do so, and if you prefer not to do it just find a bootcamp partner who has a GitHub account to follow the next session with him/her.

  • Go to GitHub and create an account.

  • Login to your GitHub account.

  • Create a new repository for your cookbook project.

7.2 Adding a remote to your local repository

  • Go back to your project directory where you wrote your own cookbook.

  • Add a remote to your repository with:

    git remote add origin https://github.com/myusername/myrepo.git/
    # Use the appropriate address, which is given on the GitHub page of your repo
    
    • git remote: command to manage remote repositories
    • add: we create a new link between our local repo and a remote server
    • origin: this new link is called origin for ease of use
    • https://github.com/....: this is the address of the remote repository
  • You are ready to push your local repository to the GitHub server:

    git push origin master
    
    • git push: command to push the local repository data to remote servers
    • origin: the name of the link to a remote server we want to use (defined when we created the remote link with git remote add ...)
    • master: the branch we want to push. For now we have only been working with a single, master branch called master by default
  • Have a look to your repository on GitHub now. How does it look like?

7.3 Pushing local changes to a remote server

  • Create a README file in your project folder, fill it with interesting information and commit it to your repository.

  • Push your changes to the remote repository:

    git push origin master
    
  • Have a look to the remote repository on GitHub (you might need to refresh the browser page)