<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Version Control

_Authors: Kiefer Katovich (San Francisco), Dave Yerrington (San Francisco), Sam Stack (Washington, D.C.) _

"Version control by filename" is better than nothing, but it has downsides that will be familiar:

![](../assets/images/version_control_by_filename.png)

We can do better.

## Demo

### Creating Our First Commit

I wrote a Python script that allows you to set a timer to run for a specified number of hours, minutes, and seconds:

```python
import argparse

import time

from tqdm import tqdm


def main(args):
    num_seconds = 60**2 * args['hours'] + 60 * args['minutes'] + args['seconds']
    for i in tqdm(range(num_seconds)):
        time.sleep(1)
    beep()


def beep():
    for _ in range(3):
        for _ in range(3):
            print('\a')
            time.sleep(.1)
        time.sleep(.5)


def _parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-r', '--hours', required=False, default=0, type=int)
    parser.add_argument('-m', '--minutes', required=False, default=0, type=int)
    parser.add_argument('-s', '--seconds', required=False, default=0, type=int)
    args = vars(parser.parse_args())
    return args


if __name__ == '__main__':
    args = _parse_args()
    main(args)
```

Let's put this script into a file `timer.py` in a new directory and initialize a Git repository there:

```bash
git init
```

We can also create a snapshot "commit" of the initial state of our file. We do so in two steps:

1. Add the file to the "staging area":

```bash
git add timer.py
```

2. Create a snapshot of the changes called a "commit":

```bash
git commit -m "Initial commit"
```

The `-m "Initial commit"` part of this command attaches the message "Initial commit" to the snapshot. When we make changes in future commits, we will provide messages that indicate what changed.

### Changing One File

We are now ready to track changes to this script.

Let's revise the script to allow millisecond rather than second precision:

```python
import argparse

import time

from tqdm import tqdm


def main(args):
    num_seconds = 1000*(60**2 * args['hours'] + 60 * args['minutes'] + args['seconds']) + args['milliseconds']
    for i in tqdm(range(num_seconds)):
        time.sleep(1/1000)
    beep()


def beep():
    for _ in range(3):
        for _ in range(3):
            print('\a')
            time.sleep(.1)
        time.sleep(.5)


def _parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-r', '--hours', required=False, default=0, type=int)
    parser.add_argument('-m', '--minutes', required=False, default=0, type=int)
    parser.add_argument('-s', '--seconds', required=False, default=0, type=int)
    parser.add_argument('-i', '--milliseconds', required=False, default=0, type=int)
    args = vars(parser.parse_args())
    return args


if __name__ == '__main__':
    args = _parse_args()
    main(args)
```

This command tells us what has changed:

```bash
git status
```

To make a commit of the changes, we run the same two steps as above:

1. Add the file to the staging area:

```bash
git add timer.py
```

2. Create a new commit with a descriptive message:

```bash
git commit -m "Use millisecond precision"
```

This command shows us the history of our commits:

```bash
git log
```

### Rolling Back One Commit

We can use this command to roll back our changes, replacing `<git hash>` with the sequence of characters shown with the commit when we run `git log`:

```bash
git revert <git hash>
```

### Changing Two Files

Suppose we wanted to reorganize our code by moving the function `beep` to a separate file `util.py`.

The revised `timer.py`:

```python
import argparse

import time

from tqdm import tqdm

from util import beep


def main(args):
    num_seconds = 1000*(60**2 * args['hours'] + 60 * args['minutes'] + args['seconds']) + args['milliseconds']
    for i in tqdm(range(num_seconds)):
        time.sleep(1/1000)
    beep()


def _parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-r', '--hours', required=False, default=0, type=int)
    parser.add_argument('-m', '--minutes', required=False, default=0, type=int)
    parser.add_argument('-s', '--seconds', required=False, default=0, type=int)
    parser.add_argument('-i', '--milliseconds', required=False, default=0, type=int)
    args = vars(parser.parse_args())
    return args


if __name__ == '__main__':
    args = _parse_args()
    main(args)
```

The new file `util.py`:

```python
import time


def beep():
    for _ in range(3):
        for _ in range(3):
            print('\a')
            time.sleep(.1)
        time.sleep(.5)
```

1. Add the file to the staging area:

```bash
git add timer.py util.py
```

2. Create a new commit with a descriptive message:

```bash
git commit -m "Move beep to a separate file"
```

## How Git Works

- The **working directory** stores the current state of the files in your repository on your hard drive.
- A **git commit** is a record of selected changes in your working directory that have occurred since the previous commit.
- When you are ready to capture some set of changes, you first add them to the **staging area** and then commit them.

## Why Is It So Complicated?

Google Docs takes snapshots of your files automatically and allows you to roll back to those snapshots. Why not use that approach with code?

- Automatic snapshots would not work well because code is usually broken while it is being edited.
- File-level histories would not work well because changes often need to be coordinated across files (e.g. when we moved `beep` from `timer.py` to `util.py`). Using a "staging area" allows us to include exactly the files whose changes we want to capture in a given commit.
- In addition, Git allows you to create "branches" where you can work on a feature separately from other features and from the stable code on your main `master` branch until that feature is ready. Multiple people who are working on the same repo can use "pull requests" to review each other's changes on their branches before "merging" them into the `master` branch. You will not need to work with branches or pull requests in this course.

## GitHub

### What is GitHub?

[GitHub](https://github.com/) is:

- A hosting service for Git repositories.
- A web interface to explore Git repositories.
- A social network of programmers.

### What is GitHub Enterprise (GHE)?
[GitHub Enterprise](https://enterprise.github.com/home) is essentially GitHub with some adiditional privacy, security, and administrative features that make it easier for large organizations to manage their code.

### For This Course

Course materials are hosted on GitHub Enterprise. You will need a GitHub Enterprise account to access them. You should not need to use Git to manage changes to these materials.

Each of your projects will live inside a Git repository on your local machine. You will also post it to GitHub. To do so, you will need a GitHub account that is separate from your GitHub Enterprise account.

## Code-Along

### Git

#### Configure Git

If you haven't yet used Git on your machine, you will need to configure it. Run these commands with your own username and email address:

```bash
git config --global user.name "John Doe"
git config --global user.email "your_email@example.com"
```

#### Initialize a Git Repo

First, create a directory on your Desktop.

```bash
$ cd ~/Desktop
$ mkdir hello-world
```

You can place this directory under Git revision control using the following command:

```bash
$ cd hello-world      # don't forget to CD into the folder.
$ git init
```

Git will reply:

```bash
Initialized empty Git repository in <location>
```

You've now initialized the working directory.

#### The .git folder

We can look at the contents of this empty folder using this command:

```bash
ls -a
```

We should see that there is now a hidden folder called `.git`. This is where all of the information about your repository is stored. You should not touch this folder; this `git` program will manage it for you.

#### Add a file

Let's create a new file.

```bash
$ touch a.txt
```

If we run `git status`, we should get:

```bash
On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	a.txt

nothing added to commit but untracked files present (use "git add" to track)
```

This means that there is a new, **untracked** file. Next, tell Git to add `a.txt` to the staging area, ready to be committed.

```bash
$ git add a.txt
```

To confirm the file is staged and ready to be committed, again run `git status`.

You can alternatively add _all_ new and modified files at once using the command below. This is not recommended, because you can accidentally add extra files if you are not careful! However, sometimes it is useful if you are adding many files at once and carefully use `git status` for verification.

```bash
$ git add .
```

#### Commit

To permanently store the contents of the staging area in the repository, you need to run the following command:

```bash
$ git commit -m "Create a.txt"
```

You should now get something like the following:

```bash
[master (root-commit) 6dd2a9c] Create a.txt
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 a.txt
```

#### Check the Log

If we want to view the commit history, we can run:

```bash
git log
```

As a result, you should see something like this:

```bash
commit 6dd2a9c3a185ea4ced6ab137b708daeac86e583a (HEAD -> master)
Author: gsganden <greg@gandenberger.org>
Date:   Sun Apr 15 21:07:18 2018 -0500

    Create a.txt
```

### GitHub

A repository on GitHub/GHE is called a **remote** repository. We can **clone** a remote repository to a local machine, creating a **local** repository, create new commits on the local repository, and then **push** those commits to GitHub/GHE. If any other changes are pushed to the remote repository, we can **pull** those changes down to our local repository to keep it up-to-date.

#### Create a Remote Repository

**Let's do this together:**

1. Go to your GitHub account.
2. On the right-hand side, create a `New repository`.
3. Name your repository `hello-world`.
    - **DO NOT initialize the repository with a `README`, `.gitignore`, or license.**
4. Click the big, green `Create Repository` button.

We now need to connect our local Git repository with our newly created remote repository on GitHub. We have to add a "remote" repository, an address where we can send our local files to be stored.

On the right-hand side of your GitHub there should be a green 'Clone or download' button. This button should reveal a tiny window with a URL.  Copy the provided URL, which is the path to this remote repo.  

Make sure you changed directories into `hello-world` prior to running this command:

```bash
git remote add origin <url from GitHub>
```

#### Push to Your Remote Repository

In order to send files from our local machine to our remote repository on GitHub, we need to use the command `git push`. However, you also need to add the name of the remote repo — in this case, we called it `origin` — and the name of the branch, in this case `master`.

```bash
git push -u origin master
```

Refresh your GitHub web page, and your files should appear.

You only need to use the `-u` flag the first time you push a particular local branch to a particular remote repository. Git will then associate that local branch with the corresponding remote branch.

#### Create a File and Push Changes

Let's create a `README.md` file and push it to GitHub! 

Any file ending with `.md` is a Markdown file -- a text file with optional [Markdown formatting](https://daringfireball.net/projects/markdown/syntax). On GitHub, the contents of the displayed directory's `README.md` is automatically displayed.

Create a new `README.md` text file and add some text. (We'll try the command-line text editor `nano` this time!)

```bash
nano README.md
```

**Exercise (4 mins.)**

Use the same procedure as before to:

1. Add the new `README.md` file to the staging area.
2. Verify the file is in the staging area, ready to be committed.
3. Commit the file.
4. Push the commits to GitHub.

Refresh your GitHub web page, and the new `README.md` file should appear. Take a look underneath the directory tree, and its contents will be automatically displayed.

$\blacksquare$

#### Pull from GitHub

On GitHub, click on README.md, then click on the pencil to edit the file. Make a few changes, then scroll to the bottom of the page, add a short message, and click "Commit changes."

Locally, we will need to `fetch` these changes and `merge` them with our local files. To do both of these steps at once, run the `pull` command:

```bash
git pull origin master
```

Use a Terminal command to confirm that your local file has changed.

#### Clone a Repository

Use the Terminal to navigate back to your Desktop and **delete your `hello-world` repository**:

```bash
cd ..
rm -r hello-world
```

Now go to *my* hello world repository on GitHub (https://www.github.com/gsganden/hello-world), click on Clone or Download, and use the URL there to *clone* the repository.

```bash
$ git clone <URL>
```

Git should reply:

```bash
Cloning into 'hello-world'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 3 (delta 0), pack-reused 0
Receiving objects: 100% (3/3), done.
Checking connectivity... done.
```

Run `git log` to confirm that you have retrieved not only the files in my working directory, but also my Git history. By contrast, when you *download* a repository you get only its current state.

## How we will use Git and GitHub in this class.

I try to keep your interactions with Git and GitHub simple because I would rather focus on Python and machine learning in this course.

You may download the course materials from GHE rather than cloning them. You should not have to do any additional git commands on them.

You will need to put your project materials on GitHub, which requires initializing a local Git repository, creating commits, creating a remote GitHub repository, and pushing your commits to that remote repository. I would encourage you to work on using Git as you go along to make snapshots every time you do a significant amount of work, rather than leaving all of the Git commands until the end.

## Lesson Review

- The **working directory** stores the current state of the files in your repository on your hard drive.
- A **git commit** is a record of selected changes in your working directory that have occurred since the previous commit.
- When you are ready to capture some set of changes, you first add them to the **staging area** and then **commit them**.
- You can also **clone** repositories on GitHub and **push** to and **pull** from them.