Skip to content

2. Git hands on

Diego Garrido-Martín edited this page Oct 28, 2018 · 38 revisions

1. Hello Git

Git is a software that allows you to keep track of changes made to a project over time. Git works by recording the changes you make to a project, storing those changes, then allowing you to reference them as needed. Probably you have it already installed, otherwise just follow these steps.

Let's create a folder called git_HandsOn and move to it. Then, you can use your favourite text editor (I would use nano) to create a bash script called seqClass.sh with the following content:

#!/bin/bash
seq=$1
if [[ $seq =~ ^[ACGTU]+$ ]]; then
  if [[ $seq =~ T ]]; then
    echo "The sequence is DNA"
  elif [[ $seq =~ U ]]; then
    echo "The sequence is RNA"
  else
    echo "The sequence can be DNA or RNA"
  fi
else
  echo "The sequence is not DNA nor RNA"
fi

Imagine that you have written this simple script to determine whether a sequence is RNA or DNA. To use it, you can try:

bash seqClass.sh AGTG
# The sequence is DNA
bash seqClass.sh ACGUA
# The sequence is RNA

However, you realize it has some flaws that you would like to fix:

bash seqClass.sh agtg
# The sequence is not DNA nor RNA
bash seqClass.sh UTUT
# The sequence is DNA

We will learn the basics of Git by using it to keep track of the changes done in this simple bash script. If it is the first time that you use Git, you may be asked at some point to modify several configuration options, please do it as requested.

(Before continuing, create also an empty file in the git_HandsOn folder named motifFinder.sh typing touch motifFinder.sh, we will use it later on)

2. Basic Git workflow

git init

Now that we have started working on our script, let’s turn the git_HandsOn directory into a Git project. We do this using:

git init

The word init means initialize. The command sets up all the tools Git needs to begin tracking changes made to the project.

Task 1: Initialize an empty Git repo in the git_HandsOn folder. Notice the message Initialized empty Git repository in: /path/to/git_hands_on/.git/

git status

As you modify seqClass.sh, you will be changing the contents of the working directory. You can check the status of those changes with:

git status

Task 2: Check the status of the git_HandsOn project. In the output, notice the file in red under untracked files. Untracked means that Git sees the file but has not started tracking changes yet.

Hint: Review the basic Git workflow in our slides!

git add

In order for Git to start tracking seqClass.sh, the file needs to be added to the staging area. We can add a file to the staging area with:

git add <file>

Task 3: Add seqClass.sh to the staging area and check the status of the project. In the output, notice that Git indicates the changes to be committed with new file: seqClass.sh in green text. Here Git tells us the file was added to the staging area.

Hint: You can use git add filename1 filename2 to add more than one file to the staging area. To add all the files use git add *.

git commit

A commit permanently stores changes from the staging area inside the repository. To commit the changes in your file(s) you can use:

git commit -m "user-defined message"

Task 4: Make your first commit!
Hint: You can add a brief (maximum 50 char) message, in quotes, using the option -m. You can use git status to make sure there is nothing else to commit.

git diff

Remember that we wanted fix some flaws in our script. Among them, it does not deal with lower-case sequences. Let's fix it by converting to upper-case the input sequence. Now the script should look like:

#!/bin/bash
seq=$1
seq=$(echo $seq | tr a-z A-Z)  # Note we just added this line
if [[ $seq =~ ^[ACGTU]+$ ]]; then
  if [[ $seq =~ T ]]; then
    echo "The sequence is DNA"
  elif [[ $seq =~ U ]]; then
    echo "The sequence is RNA"
  else
    echo "The sequence can be DNA or RNA"
  fi
else
  echo "The sequence is not DNA nor RNA"
fi

Notice that now this should work properly:

bash seqClass.sh agtg

Since the file is tracked, we can check the differences between the working directory and the staging area with:

git diff <file>

Task 5: use this command to check the difference between the working directory and the staging area. Changes to the file are marked with a + and are indicated in green. Then commit the changes in seqClass.sh.
Hint: Remember you should add the changes to the staging area before committing using git add <file>.

git log

Often with Git, you'll need to refer back to an earlier version of a project. Commits are stored chronologically in the repository and can be viewed with:

git log

Task 6: log a list of your commits. In the output, notice 1) a 40-character code, called a SHA, that uniquely identifies the commit (in orange), 2) the commit author, 3) the date and time of the commit and 4) the commit message.
Hint: You may need to type q to exit the log mode.

3. Backtracking

When working on a Git project, sometimes we make changes that we want to get rid of. Git offers a few eraser-like features that allow us to undo mistakes during project creation.

The HEAD commit

In Git, the commit you are currently on is known as the HEAD commit. In many cases, the most recently made commit is the HEAD commit. To see the HEAD commit, enter:

git show HEAD

The output of this command will display everything the git log command displays for the HEAD commit, plus all the file changes that were committed.

Task 6: Display the last commit using git show HEAD.
Hint: You can actually substitute 'HEAD' by any commit using the first characters of its SHA.

git checkout

What if you decided to make a change in your script, but then realized you wanted to discard that change? You could rewrite the line how it was originally, but what if you forgot the exact wording? The command:

git checkout HEAD <file>

will restore the file in your working directory to look exactly as it did when you last made a commit.

Task 7: Edit your script to modify the message it prints when the sequence is not RNA nor DNA. Then, undo it using git checkout.
Hint: You can actually substitute HEAD by any commit using the first characters of its SHA.

At this stage, you may also find useful commands such as git rm or git clean (review our slides).

A much more general use of this command allows you to check out an old version of your working directory (git checkout <SHA>) or a file (git checkout <SHA> <file>). You can return to master using git checkout master. Try it!

git reset

What if, before you commit, you accidentally delete an important line from your script? Unthinkingly, you add the file to the staging area. However, you do not want to commit this change! You can unstage that file from the staging area using:

git reset HEAD <file>

Task 8: Remove any line from your script. Then, add the changes to the staging area, and undo this action using git reset.
Hint: This command resets the file in the staging area to be the same as the HEAD commit. It does not discard file changes from the working directory, it just removes them from the staging area, so you will need to use git checkout to recover the erased line in your working directory!.

git revert

Still, if you carry out a change, stage and commit, and then decide that you do not want this edit anymore, you can still revert the change using

git revert HEAD

The new commit is an inverse of the last commit. This technicaly undoes the last commit, although it still exists in the history.

Task 9: Edit your script to add a comment line explaining what it does, stage it and commit. Then use git revert to undo your commit.
Hint: You will be asked to add a message for this action. Leave the default message and close the editor. You can use git log to check that git revert worked properly. Note that git revert does discard file changes from the working directory. (You can actually revert any commit using its SHA). Remember that you can also use git reset to remove the last commit (see our slides). Which is the difference from git revert? Additionally, you can fix a commit using git commit -amend (see our slides).

4. Branching

git branch

Up to this point, you have worked with a single Git branch called master. Git allows us to create branches to experiment with different versions of a project. Imagine you want to add new features to seqClass.sh. You could create a new branch and make the changes to that branch only, without affecting the master branch.

You can check on which branch you are using

git branch

and create a new branch with

git branch <branch name>

Task 10: Create a new branch named motif and check on which branch you are located.
Hint: the * (asterisk) indicates which branch you are on.

Currently, the master and motif branches are identical: they share the same exact commit history. You can switch to the new branch with

git checkout <branch name>

Task 11: Switch branch to motif. Verify that you switched branches succesfully and that the commit history of both branches is identical. Hint: Remember other uses of git checkout, such as travelling through the commit history or undoing unstaged changes in the working directory. You may also need to use git branch and git log.

Once you switch branch, you are able to make commits on motif without impact on master.

Imagine that you want to add new functionality to seqClass.sh: you want it to be able to find simple motifs in our sequences, besides printing the type of molecule (DNA or RNA).

Task 12: Modify seqClass.sh to add this new feature, by appending the following code to the end of the script:

motif=$(echo $2 | tr a-z A-Z)
if [[ -n $motif ]]; then
  echo -en "Motif search enabled: looking for motif '$motif' in sequence '$seq'... "
  if [[ $seq =~ $motif ]]; then
    echo "FOUND"
  else
    echo "NOT FOUND"
  fi
fi

stage and commit the changes in branch motif.

You can check that it worked by running, for instance:

bash seqClass.sh actg
# The sequence is DNA
bash seqClass.sh actg tg
# The sequence is DNA
# Motif search enabled: looking for motif 'TG' in sequence 'ACTG'... FOUND

git merge

As you believe that this new feature of seqClass.sh is very useful, you aim to update master with the changes you made to motif. motif would be the giver branch (it provides the changes) and master the receiver branch (it accepts those changes). You can do using it using

git merge <branch name>

Task 12: Switch again to the master branch and merge your motif branch back to master.
Hint: Notice the output: The merge is a fast forward, as there is a linear path from the tip of master to motif. You can use git log to check that the commit history is again identical for both branches.

However, imagine that before merging the motif branch, you made some changes in master. To simulate this scenario, let's introduce changes in both branches.

Task 13: In the master branch, modify the message that seqClass.sh prints when it finds the motif, add and commit the changes. Then, switch to the motif branch and modify the message that seqClass.sh prints when it does not find the motif, add and commit the changes. Finally, merge the motif branch back into master.

Hint: Notice this is a 3-way merge: there is not a linear path and therefore a dedicated commit will be used to join the two histories.

But what would happen if we modify exactly the same line in both branches and then try to merge them?

Task 14: Repeat the previous task but modifying in both cases the message that seqClass.sh prints when it finds the motif.

Hint: When merging, you should get the following error:

CONFLICT (content): Merge conflict in seqClass.sh
Automatic merge failed; fix conflicts and then commit the result.

You have made commits on separate branches that alter the same line in conflicting ways. Now, when you try to merge motif into master, Git does not know which version of the file to keep! This is called merge conflict, and should be resolved manually.

In the text editor, look at seqClass.sh. Git uses markings to indicate the HEAD (master) version of the file and the motif version of the file, like this:

<<<<<<< HEAD 
master version of line 
======= 
motif version of line 
>>>>>>> motif

Git asks us which version of the file to keep: the version on master or the version on motif. We will keep the version on motif.

Task 15: Delete the content of the line as it appears in the master branch as well as all Git's special markings including the words HEAD and motif. Then save the file, add and commit your changes.

In Git, branches are usually a mean to an end. You create them to work on a new project feature, but the final goal is to merge that feature back into the master branch. After the branch has been integrated into master, it has served its purpose and can be deleted.

Task 16: Delete the motif branch.

Hint: You can use git branch -d <branch name> for that.

5. Working with remote repositories

Even if during this course we will work just with one collaborator (that's yourself!), it is important to introduce the concept of the remote repository. Essentially, remote repositories are versions of your project that are hosted somewhere so that they are accessible to you and your collaborators from different locations. This somewhere is usually the Internet, but it can also be your local machine. For us, it will be our GitHub repository. GitHub provides a more user-friendly manner to browse your commits and have a look at the changes in your files. Before continuing with the hands-on, please create a GitHub account.

Create a repository on GitHub

Create an empty repository in your GitHub account named git_HandsOn. Make sure that it is a public repository and does not contain a README.md. Make it a remote repository for the local repository you have been working with, using:

git remote add origin https://github.com/username/git_HandsOn
git push -u origin master

Notice you should substitute username in the previous command with your actual user name. This step needs to be done only once. You will be prompted to introduce your GitHub username and password. When done, you will be able to see your repository at GitHub!

git push

When you commit the changes in your file(s), this action takes place in the local repository. However, you often want to update those changes in your remote repository. You can do so with:

git push

Task 17: Edit your script to add some comment lines explaining what every piece of code does, stage it and commit. Then push the commits to your remote repository. The changes will be visible at GitHub.

git pull

Even if we will not deal with collaborative projects during this course, it may happen that you make some commits through the GitHub webpage (yes, you can do it too!). In that case, these commits would be stored in your GitHub repository, but not in your local repository. Thus, you need to update your local repository accordingly using:

git pull

Task 18: Through the GitHub webpage, add a README.md file explaining the usage of the script seqClass.sh and commit. It can contain just a line, or something more elaborated. Then, pull the commit to your local repository.

Hint: If you want to learn more about Markdown language you can have a look at this cheatsheet. Notice that you would not be able able to push any change made at local level before pulling the changes from your GitHub repo.

git clone

You can get a local copy of other people's public GitHub repositories using:

git clone https://github.com/username/repo

Task 20: Clone ggsashimi repository from guigolab.

6. Exercises

Exercise 1:

  • Make a new branch called fix and move to it.
  • Fix the seqClass.sh script so that it is able to classify correctly any RNA or DNA sequence.
  • Merge the fix branch back to master.
  • Make sure you add comments to explain your changes.
  • Stage and commit the changes on master in your local repository.
  • Push your commits on master to your GitHub repository.
  • Extra: stage, commit and push your changes in the fix branch to your GitHub repository.