# Git/GitHub Tutorial


## Git vs GitHub

**Git** is the software, **GitHub** is the cloud hosting service. That is, you can use **Git** to track changes *locally* and never have to do anything with **GitHub**. 

Of course, by using **Git** with **GitHub** you get the peace of mind that comes with backing up your project to the cloud, as well as the easy abilty to share code with others, and work collaboratively on the same project. 

Let's start with **Git**, and work up to using **GitHub**

## Git

If you're using **git** for the first time, take a quick second to configure your environment with your info:

```
git config --global user.name "Jeff MacInnes"
git config --global user.email "jeff.macinnes@gmail.com"
```

## Git

### Create a Git repository
We'll start by creating a new directory for our project, named *myProject*

In [2]:
cd /Users/jeff
mkdir myProject
cd myProject
pwd

/Users/jeff/myProject


Now that we've created a directory, we can set it up as a **Git** repository (or repo)

In [3]:
git init

Initialized empty Git repository in /Users/jeff/myProject/.git/


You can now find a hidden directory named `.git/` in the `myProject` directory. **Git** uses the `.git/` directory to log information about changes that occur to your project

You can see the hidden git directory by typing:

In [4]:
ls -la

total 0
drwxr-xr-x   3 jeff  staff    96 Jan 30 16:21 .
drwxr-xr-x+ 90 jeff  staff  2880 Jan 30 16:21 ..
drwxr-xr-x  10 jeff  staff   320 Jan 30 16:21 .git


Ok, but so far that's the **only** thing in our project directory. Let's start adding some files

### Adding files to your project

Create some quick text files to add to your project. (using the command line here, but you can do this with your favorite text editor as well. Atom, in particular, works really well with **Git/GitHub**)

In [5]:
touch script1.txt
touch script2.txt

In [6]:
ls -la

total 0
drwxr-xr-x   5 jeff  staff   160 Jan 30 16:21 .
drwxr-xr-x+ 90 jeff  staff  2880 Jan 30 16:21 ..
drwxr-xr-x  10 jeff  staff   320 Jan 30 16:21 .git
-rw-r--r--   1 jeff  staff     0 Jan 30 16:21 script1.txt
-rw-r--r--   1 jeff  staff     0 Jan 30 16:21 script2.txt


We've got 2 new files in our project directory (script1.txt, script2.txt). But we haven't told **Git** to keep track of these files yet. This can be confirmed by running:

In [7]:
git status

On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31mscript1.txt[m
	[31mscript2.txt[m

nothing added to commit but untracked files present (use "git add" to track)


The `git status` command tells you the current status of your repository. In this case, it shows us that there are 2 new files. BUT it points out that these files aren't being **tracked**. In **git** parlance, this means that git isn't monitoring these files for any changes that may occur. 

Anytime you create new files, you must **add** them to the repository before **git** will start keeping track of them. 

You can either **add** the files one at a time, like:

```
git add script1.txt
git add script2.txt
```

OR, you can add all of the new files at once by typing the command:

In [8]:
git add *

Now, when we check the status, we see that the 2 new files have been added, and they are ready to be committed

In [9]:
git status

On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	[32mnew file:   script1.txt[m
	[32mnew file:   script2.txt[m



### Committing Files

**Committing** files is like taking a snapshot of the file contents. The snapshot gets timestamped and saved in that hidden `.git/` directory. You can think of this as a bookmark that you can always go back to later. 

For instance, say you are working on an analysis script. It's working pretty well and you've been periodically committing your files. Then one day you try to add a new statistical model and the whole thing breaks. Rather than going back and removing your edits line by line, you can simple revert back to a prior snapshot of the code (or prior **commit**).

Each time you commit, you want to include a brief message to describe the state of your project:

In [10]:
git commit -m "added first two script files"

[master (root-commit) deb7ab0] added first two script files
 2 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 script1.txt
 create mode 100644 script2.txt


At any given point in time, you can see the full **commit** history of your repository by typing:

In [11]:
git log --oneline

[33mdeb7ab0[m added first two script files


Each log entry starts with a unique reference to that commit, as well as the commit message

### Making changes to files

Right now both of our script files are just empty text files. Let's add some stuff:

In [12]:
echo "run my analyses" > script1.txt

Now, when we run `git status` we see that it noticed the change we made to `script1.txt`

In [13]:
git status

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	[31mmodified:   script1.txt[m

no changes added to commit (use "git add" and/or "git commit -a")


If we want to commit these changes, we first have to tell **git** to add the modified file to the staging area again:

In [14]:
git add *
git commit -m "added important stuff to script1.txt"

[master 170d79a] added important stuff to script1.txt
 1 file changed, 1 insertion(+)


And now when we check the log, we see both of our previous commits:

In [15]:
git log --oneline

[33m170d79a[m added important stuff to script1.txt
[33mdeb7ab0[m added first two script files


You can also see your log history as a tree graph (makes more sense once your project gets more complex and you begin working on separate branches)

In [16]:
git log --graph

* [33mcommit 170d79a82fd1408602c2f16d7db68a0223bbc3f6[m
[31m|[m Author: Jeff MacInnes <jeff.macinnes@gmail.com>
[31m|[m Date:   Tue Jan 30 16:21:32 2018 -0800
[31m|[m 
[31m|[m     added important stuff to script1.txt
[31m|[m  
* [33mcommit deb7ab0d79f6393c2aac38c6cdf1abcfa53b0750[m
  Author: Jeff MacInnes <jeff.macinnes@gmail.com>
  Date:   Tue Jan 30 16:21:25 2018 -0800
  
      added first two script files


### Reverting back to earlier versions of your repo

So far we've just added a single line to `script1.txt` and committed those changes. Let's continue by adding some more information to that script

In [17]:
echo "additional analyses" >> script1.txt

In [18]:
cat script1.txt

run my analyses
additional analyses


`script1.txt` has been modified to include the new content. We can confirm that **git** notices the changes by checking the status 

In [19]:
git status

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	[31mmodified:   script1.txt[m

no changes added to commit (use "git add" and/or "git commit -a")


Now commit the new changes

In [20]:
git add *
git commit -m "added new info to script1.txt"

[master 3acbf82] added new info to script1.txt
 1 file changed, 1 insertion(+)


In [21]:
git log --oneline

[33m3acbf82[m added new info to script1.txt
[33m170d79a[m added important stuff to script1.txt
[33mdeb7ab0[m added first two script files


Great! Only, imagine that the new content you just put into `script1.txt` broke your analysis and you need to go back to a previous state of your directory. 

You can do this using the **reset** command and referencing the commit you want to go back to. Let's go back in time to our last commit *before* we added the new content to `script1.txt`

#### NOTE: there's no going back after a hard reset to previous state. 
If you are worried about losing any potential changes you've made *since* the last commit, there are better ways, albeit more complicated, ways to do this. For more information, look up "git branching"

In [22]:
git reset --hard 672f370

fatal: ambiguous argument '672f370': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'


: 128

Now, script1.txt only has the original content:

In [23]:
cat script1.txt

run my analyses
additional analyses


In [24]:
git status

On branch master
nothing to commit, working tree clean


In [25]:
git log

[33mcommit 3acbf822656c66a363dad7fb88e6e5d0f6b72370[m
Author: Jeff MacInnes <jeff.macinnes@gmail.com>
Date:   Tue Jan 30 16:21:42 2018 -0800

    added new info to script1.txt

[33mcommit 170d79a82fd1408602c2f16d7db68a0223bbc3f6[m
Author: Jeff MacInnes <jeff.macinnes@gmail.com>
Date:   Tue Jan 30 16:21:32 2018 -0800

    added important stuff to script1.txt

[33mcommit deb7ab0d79f6393c2aac38c6cdf1abcfa53b0750[m
Author: Jeff MacInnes <jeff.macinnes@gmail.com>
Date:   Tue Jan 30 16:21:25 2018 -0800

    added first two script files


## Integrating with GitHub

So far everything we've done is local to our computer. It's more often the case that you'll want to keep a version of your code stored remotely on a cloud. To do so, we need to link our **git** repositiories to **GitHub**. In addition to cloud storage for your ever-evolving project directories, **GitHub** also offers numerous tools for collaborative research, including the ability to have multiple individuals working on the same project simultaneously. 

In order to get started, sign up for a free account at [GitHub](www.github.com)

## Starting a new repository

In the steps above, we created a new project directory and started **git** tracking it by calling 

```
git init
```

It's also possible (and often easier) to start a new project on your GitHub page itself. Log in to GitHub and go to **New Repository** option under the menu in the top-right corner


![alt text](images/newRepo.png)

On the next page, give your new project a name and description (optional), and make sure to **initialize it with a README** file. This is a simple markdown file that you'll find in the root level directory of every GitHub repository. This file is a helpful place to store information about your project, as it gets formatted and rendered as html on the main page of each repository: 

![altText](images/setupNew.png)

Now you have your new repository created on GitHub, the next step is to grab a local copy of the repo on your workstation. From GitHub, copy the link that appears when you press the **Clone or download** button. 

![altText](images/gitClone.png)

Now, on your workstation, navigate to the directory where you want to copy the repository, and then type `git clone` and paste in the copied link

In [32]:
cd /Users/jeff
git clone https://github.com/jeffmacinnes/myNewProject.git

Cloning into 'myNewProject'...
remote: Counting objects: 3, done.[K
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (3/3), done.


Navigate into the new repository, and run `git status` to check the current status of the repo 

In [33]:
cd myNewProject
git status

On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean


### Making changes locally, then updating the remote repository

We can start working on our project just like we did before. Only now, after committing changes, we'll take the additional step to **push** those changes to the remote repository. 

Let's create a simple text file, add some content, and commit those changes just like before

In [34]:
touch myAnalysisScript.py
echo "here are all of my analyses..." > myAnalysisScript.py

git add *

In [35]:
git status

On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	[32mnew file:   myAnalysisScript.py[m



In [36]:
git commit -m "started working on myAnalysisScript.py"

[master 6e33dab] started working on myAnalysisScript.py
 1 file changed, 1 insertion(+)
 create mode 100644 myAnalysisScript.py


In [37]:
git log --oneline

[33m6e33dab[m started working on myAnalysisScript.py
[33m05a7099[m Initial commit


We've committed the changes locally, but now we have to push the changes to the remote repository. They basic syntax is

```
git push <remoteName> <branch>
```

* **remoteName** is name git assigned to your remote repository when you cloned it. This is *origin* by default
* **branch** the branch of your repo you want to push. For now, keep this the default *master* branch

In [38]:
git push origin master

Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 344 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/jeffmacinnes/myNewProject.git
   05a7099..6e33dab  master -> master


**Success**. You pushed your changes to the remote repository. When you go to the GitHub page for the repository, you should now see the new file we created:

![altText](images/newPush.png)

### Controlling which files get tracked

Sometimes your repository will have files that you don't want **Git** to keep track of. For instance, on OSX, you'll often find annoying `.DS_Store` files in every directory -- they're hidden on OSX, but they appear in your git logs and will appear in your remote repository on **GitHub** and can clutter things up. 

You may also have data files in your project directory that you want your scripts to access, but that you don't want **Git** to track either because the data files are huge, or they contain private information about your subjects (particularly relevant if you're pushing to a public repository on **GitHub**

Thankfully, there's an easy solution. You can create a file named `.gitignore` in the root level of your directory that tells **Git** what files to ignore. Every new line in the file can be the relative path to a file or subdirectory in your repository that you want ignored, and you can include wildcards

Let's make a subdirectory in our project where we'll store sensitive information that we don't want shared publicly, and then add some files to it: 

In [39]:
mkdir privateData

touch privateData/creditCardNumbers.txt
touch privateData/socialSecurityNumbers.txt
touch privateData/deepestFearsAndAnxieties.txt

In [40]:
git status

On branch master
Your branch is up-to-date with 'origin/master'.
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31mprivateData/[m

nothing added to commit but untracked files present (use "git add" to track)


Right now **Git** sees that we created a new directory. Next, we'll create a `.gitignore` file and add this new directory to it.

Use the wildcard character to tell it to ignore *any* file that is in the privateData directory

In [41]:
echo "privateData/*" > .gitignore

Now, when we run `git status`, the privateData directory no longer appears

In [42]:
git status

On branch master
Your branch is up-to-date with 'origin/master'.
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31m.gitignore[m

nothing added to commit but untracked files present (use "git add" to track)


Add and commit the `.gitignore` file to your repository, and then push to the remote and confirm that privateData isn't included

In [43]:
git add .gitignore
git commit -m "started .gitignore file"
git push origin master

[master d7b2ab6] started .gitignore file
 1 file changed, 1 insertion(+)
 create mode 100644 .gitignore
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 349 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/jeffmacinnes/myNewProject.git
   6e33dab..d7b2ab6  master -> master


![altText](images/gitIgnore.png)

Our .gitignore file is there, but not our privateData directory!

## Example workflow

Git is an incredibly powerful tool with *many* **many** different options. However, only a few simple commands are needed to start using git to version control your small projects. Using them will quickly become like second nature. Here's a basic simple workflow:

* Get a repository
    * `git init` to create a new local repository
    * `git clone` to get a copy of an existing remote repository
    * `git pull` to get the most up-to-date version of a remote repository


* Make edits to the repository
    * `git add *` to add all new files and updated files
  
  
* Commit your changes to create a snapshot
    * `git commit`
  
  
* Push your changes to the remote repository
    * `git push` 

## Additional Resources:

[Git - the Simple Guide](http://rogerdudler.github.io/git-guide/)

[Atlassian Git Tutorials](https://www.atlassian.com/git/tutorials)