# **Git**

## Some Prerequisites

Before we start, make sure you have **Git** and **Atom** installed.

If you are on Mac, you will most likely have it already installed. Run the following command to check: 

In [2]:
!git --version

git version 2.11.1


If you are on Windows, see: https://gitforwindows.org/
If you are on Linux, see: https://git-scm.com/download/linux

Next, install **Atom**: https://atom.io/

# What is Git?
<hr>

If you've used Google Docs before, you're already familiar with a version control system for editing files. In Google Docs, you can go back to the history of versions for a document, and easily restore a prior version if you make a lot of unwanted changes. Google Docs even tells you who edited which portions of each version, so you can contact the right people for large documents.

Git is also a version control program, but is much older and more versatile. Whereas Google Docs stores version history for specific documents, Git stores version history for entire projects (known as Git <b>repositories </b>). Unlike Google Docs, Git allows you to make <b>branches</b>, or named working versions of a project, and you can quickly switch between them, remotely or locally. This <i>branching</i> feature is illustrated by the Git logo: 

![](git-logo.png)

## A quick example
To give you a more concrete idea of what branching is used for, here's an illustrative example.

Suppose you and your friend are working on a state-of-the-art Kaggle model. You have been working with your friend to make a basic model, storing it in the default Git branch, `master`. Now, you and your friend have a couple of different ideas on how to improve it further. You both want to start working on making a better model, and then compare which prototype performs better, but you don't want to get in the way of each other's work. So, you make a branch named `xg_boost`, and your friend makes a branch named `clockwork_rnn`. That way, both of you can independently work off of the same base code, without overwriting or affecting each other's branches.

If your model proves to be superior, you can merge your `xg_boost` branch into the `master` branch, and you can both work off of your new model. Alternatively, if your friend's model performs better, they can merge their `clockwork_rnn` branch into the `master` branch and you and your friend can work on that new model together. As one last possibility, suppose neither the `xg_boost` model nor the `clockwork_rnn` model outperform the model in the master branch. Then, you can both easily switch back to the `master` branch with a single command, and try new ideas without skipping a beat. This is the value of Git, especially for complex or group projects.

# Git vs GitHub
<hr>
<code>git</code> is the command-line program that allows you to make use of the Git version control system. However, Git operates by default locally - that is, just offline on your computer. In order to actually share your code with others, you have to use a service like <b>Github</b> or <b>Bitbucket</b>. They both work very similarily, and basically allow you to upload your code to an online <i>repository</i> in the cloud.

The standard workflow is to work on your code locally, using <code>git</code> for version control, but when you want to <b>push</b> the latest functioning version of your code, you can upload it to the cloud maintained by Github/Bitbucket. Then, even friends who aren't located nearby geographically can <b>pull</b> your online code to their local computers, and work on your project remotely, pushing their own updates when needed.

# Installing Git
<hr>
You can install Git by visiting this link to [Downloads](https://git-scm.com/downloads) of the official [Git Website](https://git-scm.com/)

# Using Git Locally
<hr>
We are now going to dive into how to use <code>git</code>! Don't worry, there are just a few basic commands, and once you get used to the workflow, it will feel painless and natural.

<i>Note: for this section of the tutorial, the examples will be run in a demo repository in <code>cl0-ws</code> called <code>demo-repo</code></i>

## <code> git init </code>

First, we're going to learn how to actually make a <b>Git repository</b>. A <i>repository</i> (aka repo) is essentially a folder/directory that you specially indicate as being attached to a version control system. First, let's make a demo folder and copy some files into it: 

<code>cd ~
mkdir git-workshop && cd git-workshop
echo "Hello World" > hello.txt
</code>

To initialize a directory as a repo, navigate to the folder in question and run:

<code>git init<code>

## <code>git status</code>, <code>git add</code> and <code>git commit</code>

Now, let's record our files into the version control system. 

<code>git status</code>

This command should show that the file <code>hello.txt</code> is untracked. In order to have Git control this file, we use <code>git add</code>

<code>git add</code> will tell Git to track your files, or prime them to be "stamped" as a major step in development later in the <code>git commit</code> step. You usually just <code>add</code> the files that have changed since the last <code>commit</code>, or just the files that you want to track. Alternatively, you can <code>add</code> all the files that are in your repo with the <code>-A</code> option argument. Let's add all our files to be tracked.

<code>git add -A</code>

<code>git commit</code> is then used to make a named record of your work so far. All commits require a message to be attached, to describe what's changed since the last version. If you simply run <code>git commit</code>, Git will automatically open up your default command-line text editor to force you to enter a message, which can be a pain to work with. For quick messages, you can use the <code>-m</code> option command to give a message as a string. Let's commit our newly <code>add</code>ed files:

<code>git commit -m "My first commit"</code>

As you can see, we've successfully tracked and committed the 4 files that we added to the empty repo.


## <code>git branch</code> and <code>git merge</code>

Now, we're going to give a brief illustration of how to use Git's branching mechanism. Let's make a new <i>branch</i>, edit a file, and then <i>merge<i> that branch back into the <code>master</code> branch.

To open a new Git branch, use `git checkout -b <new_branch_name>`. Let's make a branch called <code>dev<code>.


`
cd ~/git-workshop
git checkout -b dev
`

If you run <code>git branch</code> without any arguments, Git will display all the branches currently associated with this repo. What are all the branches in our case?

<code>* master
dev</code>

Notice the <code>*</code> next to the <code>master</code> branch - this indicates we are still on the <code>master</code> branch.


Now, we're ready to make some changes without it affecting our <code>master</code> branch. Let's rewrite the contents of <code>cx-is-awesome.txt<code>.

<code>echo "Hello, Berkeley" > hello.txt </code>

Let's <code>add</code> and <code>commit</code> our changes. A quick shortcut to do this is the <code>-a</code> option argument of <code>git commit<code>, which automatically adds all changed files, and then commits them.

<code>git commit -am "Made a few changes for this demo"</code> <-- "-am" is the same as doing "-a" and "-m"

If we <code>cat</code> (display the contents of) the <code>hello.txt</code> file in our <code>demo_branch</code>, we see that it has in fact be overwritten:


<code>cat hello.txt</code>

However, if we switch back to the <code>master</code> branch, we find that our old version of <code>hello.txt</code> is still available once we're on that branch.

<code>git checkout master
cat hello.txt</code>

Suppose we're satisfied with our work in the <code>dev</code>, and now want to officiate it by merging it back with the <code>master</code> branch. To do this, we use the <code>git merge</code> command. The syntax `git merge <branch>` will merge the `<branch>` branch with the branch you are currently on. Right now, we're on the master branch <i>(Quiz: how do we know that we are currently on <code>master</code>? How can we double-check which branch we are on?)</i>, so we run the following command to merge our <code>demo_branch</code> with the <code>master</code> branch:

<code>git merge dev</code>

<i>(Notice the <code>Fast-forward</code> description to this merge - there are several types of merges Git uses, and this one is the most basic - we simply fast-forward to the changes of the <code>demo_branch</code>. You can learn about the other types of merges in the section below.)</i>

# Using Git Remotely
<hr>
Now that we've covered the basics of how Git works locally, let's make use of it's cloud-interfacing abilities. First, let's <code>push</code> our local <code>git-workshop</code> <code>master</code> branch to Github. We must first make an empty <b>Github</b> remote repository on the Github website, then <code>push</code> our local repo onto it.

## <code>git push</code>

<code>git push</code> has a very simple syntax for pushing local repositories to remote repositories - `git push <remote_repo_url> <local_branch_name>` will push the `<local_branch_name>` branch into the remote repository at `<remote_repo_url>`. (By default, if no `<local_branch_name>` is given, the default is the current branch) Let's push our <code>demo-repo</code>'s <code>master</code> branch into the remote <code>git-demo<code> repository:

First, we must tell Git to push commits to our repository on GitHub. In order to do so, set up a **remote** as follows:

<code>git remote add origin {repository URL}</code>

Now you can run:

<code>git push origin master</code>

## <code>git pull</code>

Finally, we're going to cover the other command for working with remote repositories, <code>git pull</code>. To <code>fetch</code> and automatically <code>merge</code> any changes in the remote repo that aren't yet in your local repo, use the syntax `git pull <remote_repo_url> <remote_branch_name>`. (Again, by default if no `<remote_branch_name>` is specified, Git will try to guess which branch is associated with your current branch). Let's <code>pull</code> from our remote repository in case any changes have been made to it that we aren't aware of locally.

<code>cd ~/git-workshop
git pull origin master</code>

## <code>git clone</code>

An alternative to making a local repo and then pulling remote files to it is to <code>clone</code> a remote repo directly as a new local repo on your computer. The <code>git clone</code> syntax is as follows: `git clone <remote_repo_url> <local_repo_name>`. (By default, if no `<local_repo_name>` is specified, Git will give the local repo the same name as the remote repo).

To <code>clone</code> all of the [SUSA crash-course tutorials](https://github.com/SUSA-org/crash-course) (including this one!) to your computer, run the following command in whichever directory you want to store the repo in:

<code>cd ~
mkdir susa-crash-course && cd susa-crash-course
git clone https://github.com/SUSA-org/crash-course.git </code>

# Merge Conflicts

Scenario: you are working with a friend on a repository (on GitHub). You update a file on your branch, <code>dev</code>, and try to merge it into <code>master</code>, but your friend already updated the file there!

Git now has two commits (versions) of your files: one that you are trying to update and push to <code>master</code>, and one that was already pushed to <code>master</code> when you were working! This is called a **merge conflict**.

![](merge-conflict.png)

Let's create this scenario: 

<code>cd ~/git-workshop
git checkout -b evil
git checkout master
echo "Hello Cal" > hello.txt
git commit -am "made 'hello.txt' more concise"

git checkout evil
echo "dlrow olleh" > hello.txt
git commit -am "hehe"
git checkout master

git merge evil master</code>

You should see:
<code>
Auto-merging hello.txt
CONFLICT (content): Merge conflict in hello.txt
Automatic merge failed; fix conflicts and then commit the result.
</code>

Thankfully, **Atom** has very good support for merge conflicts. Open your <code>git-workshop</code> repository in Atom, manually or by using the following command:

<code>cd ~/git-workshop
atom .
</code>

After you're done fixing the merge conflict, run:

<code>git commit -am "fixed merge"

# Additional Readings
<hr>
<ul>
  <li>For more information on the command-line shell, visit the [Linux Command Line Guide](http://linuxcommand.org/lc3_learning_the_shell.php)</li>
  <li>For more information on how to use Git, visit the official [Pro Git Book](https://git-scm.com/book/en/v2)</li>
  <li>There is also a Git quick-guide cheatsheet, available [here](https://services.github.com/on-demand/downloads/github-git-cheat-sheet.pdf)</li>
</ul>