# GIT and Version Control (An Overview)

Adopted from the great work of the folks at [Software Carpentry](https://software-carpentry.org) from this [git repository](https://github.com/swcarpentry/git-novice) made available under the [Creative Commons Attribution
license][cc-by-human]. The following is a human-readable summary of
(and not a substitute for) the [full legal text of the CC BY 4.0
license][cc-by-legal]. 


[![Piled Higher and Deeper by Jorge Cham, http://www.phdcomics.com/comics/archive_print.php?comicid=1531](../fig/phd101212s.png)](http://www.phdcomics.com)

"Piled Higher and Deeper" by Jorge Cham, http://www.phdcomics.com

## Have You Been There?  
- Multiple nearly-identical versions of the same document
- Confusing to identify order of changes and true final version
- Word and Google docs have "track changes" mode to enavle some better workflow




## Reproducablity is Critical for Science (and don't think that data science isn't science)

```
"Science is facing a 'reproducibility crisis' where more than two-thirds of researchers have tried and failed to reproduce another scientist's experiments, research suggests."
```
  -[BBC News Article](http://www.bbc.com/news/science-environment-39054778)

```
"[...] Manage versions. Manage data versions. Being able to reproduce the models. What if, you know, the data disappears, the person disappears, the model disappears... And we cannot reproduce this. I have seen this hundreds of times in Bing. I have seen it every day. Like... Oh yeah, we had a good model. Ok, I need to tweak it. I need to understand it. And then... Now we cannot reproduce it. That is my biggest nightmare!” 
```
  -Microsoft Employee Answering question "What is your worst nightmare? (Related to Machine Learning Systems)" as quoted in [*Machine Teaching: A New Paradigm for Building Machine Learning Systems*](https://arxiv.org/abs/1707.06742v2)




## The Long History of Version Control Systems in CS/Application Development

- Automated version control systems (VCS) are nothing new.
- Tools like RCS, CVS, or Subversion have been around since the early 1980s and are used by many large companies.
- However, many of these are now becoming considered as legacy systems due to various limitations in their capabilities.
- In particular, the more modern systems, such as Git and [Mercurial](http://swcarpentry.github.io/hg-novice/) are *distributed*, meaning that they do not need a centralized server to host the repository.
- New Data Science specific VCS like [Pachyderm](http://www.pachyderm.io) emerging.


## Github and  GitHub Desktop
- If not too familiar with command line (or even if you are) Github Desktop is convenient.
- We will introduce concepts and leave it to you to do more in-depth study. 
- Help for using the desktop software is [here](https://help.github.com/desktop-beta/guides/getting-started-with-github-desktop/).
- Create an ID on the [GitHub](http://github.com/) website.

## Configuring GIT

When we use Git on a new computer for the first time,
we need to configure a few things. Below are a few examples
of configurations we will set as we get started with Git:

-   our name and email address,
-   to colorize our output,
-   what our preferred text editor is,
-   and that we want to use these settings globally (i.e. for every project)

```{bash}
$ git config --global user.name "Vlad Dracula"
$ git config --global user.email "vlad@tran.sylvan.ia"
$ git config --global color.ui "auto"
```


In [None]:
!git config --global user.name "<your name>"


In [None]:
!git config --global user.email "jkuruzovich@gmail.com"

In [1]:
# Use this to confirm your settings
!git config --list

credential.helper=osxkeychain
filter.lfs.required=true
filter.lfs.clean=git-lfs clean -- %f
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.process=git-lfs filter-process
user.name=Jason Kuruzovich
user.email=jkuruzovich@gmail.com
core.repositoryformatversion=0
core.filemode=true
core.bare=false
core.logallrefupdates=true
core.ignorecase=true
core.precomposeunicode=true
remote.origin.url=https://github.com/jkuruzovich/techfundamentals-fall2017-materials.git
remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
branch.master.remote=origin
branch.master.merge=refs/heads/master


## Installing Git 
Once Git is configured,
we can start using it.
Let's create a directory for our work and then move into that directory:

```{bash}
$ mkdir planets
$ cd planets
```


Then we tell Git to make `planets` a [repository]({{ page.root }}/reference/#repository)—a place where
Git can store versions of our files:

```{bash}
$ git init
```


If we use `ls` to show the directory's contents,
it appears that nothing has changed:

```
$ ls
```


But if we add the `-a` flag to show everything,
we can see that Git has created a hidden directory within `planets` called `.git`:

```
$ ls -a
```


```
.	..	.git
```


Git stores information about the project in this special sub-directory.
If we ever delete it,
we will lose the project's history.

We can check that everything is set up correctly
by asking Git to tell us the status of our project:

```
$ git status
```



In [None]:
#This is how we create a directory
!mkdir planets

In [None]:
!cd plan

## Recording Changes via Git
- "`git status` shows the status of a repository."
- "Files can be stored in a project's working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded)."
- "`git add` puts files in the staging area."
- "`git commit` saves the staged content as a new commit in the local repository."
- "Always write a log message when committing changes."

In [2]:
#First let's make sure we're still in the right directory.
#You should be in the `planets` directory.  The command `pwd` will do that.
!pwd

/Users/jasonkuruzovich/githubdesktop/techfundamentals-fall2017-materials/classes/01-overview


In [None]:
!mkdir moons 

## Git Remote Repositories
- "A local Git repository can be connected to one or more remote repositories."
- "Use the HTTPS protocol to connect to remote repositories until you have learned how to set up SSH."
- "`git push` copies changes from a local repository to a remote repository."
- "`git pull` copies changes from a remote repository to a local repository."


Let's start by sharing the changes we've made to our current project with the
world.  Log in to GitHub, then click on the icon in the top right corner to
create a new repository called `planets`:

![Creating a Repository on GitHub (Step 1)](../fig/github-create-repo-01.png)

Name your repository "planets" and then click "Create Repository":

![Creating a Repository on GitHub (Step 2)](../fig/github-create-repo-02.png)

As soon as the repository is created, GitHub displays a page with a URL and some
information on how to configure your local repository:

![Creating a Repository on GitHub (Step 3)](../fig/github-create-repo-03.png)

Copy that URL of the repository from the browser, go into the local `planets` repository, and run
this command:

```
$ git remote add origin https://github.com/vlad/planets.git
```

We can check that the command has worked by running `git remote -v`. The name `origin` is a local nickname for your remote repository. We could use
something else if we wanted to, but `origin` is by far the most common choice.

Once the nickname `origin` is set up, this command will push the changes from
our local repository to the repository on GitHub:

```
$ git push origin master
```

We can pull changes from the remote repository to the local one as well:

```
$ git pull origin master
```


## Github Flow and Group Projects
- This is the typical workflow of introducing changes via Git in team projects. 
- [https://guides.github.com/introduction/flow/](https://guides.github.com/introduction/flow/)
![](https://guides.github.com/activities/hello-world/branching.png)