# Git and GitHub

## Getting Started with Git

* Type `git` in the command line to verify it is installed
    - Go here: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git to see instructions on how to download and install
* Git is a distributed a control system where each user has a copy of a repository
* Book on git by git: https://git-scm.com/book/en/v2. Videos are also available.

### Git is a snapshot
* Delta
<img src="https://git-scm.com/book/en/v2/images/deltas.png"/>
* Snapshot
<img src="https://git-scm.com/book/en/v2/images/snapshots.png"/>

### The Three States

* Files are either: 
    - Modified: you have made some changes to a file that is in a working directory
    - Staged: you have marked a file before it is committed and it is in staging area
    - Committed: the change you have made is saved in the Repository
<img src="https://git-scm.com/book/en/v2/images/areas.png"/>

image sources:https://git-scm.com/book/en/v2/Getting-Started-What-is-Git%3F

### Some Essential Shell Commands

* You will eventually need to learn how to naviagte through command lines to fully take advantage of git

* `pwd`: check your current/working directory
* `cd`: change the directory
* `cd ~` or `cd [space]`: go to home directory
* `cd /`: go to the root directory
* `mkdir DIRECTORY_NAME`: make a directory
* `rm FILE_NAME`: remove a file
* `rmdir DIRECTORY_NAME`: remove a directory if it is empty
* `rm -rf DIRECTORY_NAME`: remove everything in a directory
* `ls`: list files in your current/working directory
* `ls -a`: list all files, including hidden files or folders (`.FILE` or `.FOLDER`)

* Learning resources: https://learn.datacamp.com/courses/introduction-to-shell and google for more

### Setup Git Environment
* Type `git config --list --show-origin` to check your configurations

#### Identity
* Set this to characterize your commit with the identity. This will also be used when "pushing" the commit to the GitHub. So, make sure that it is identical with your GitHub identity.

    > `git config --global user.name "John Doe"`
    >
    > `git config --global user-email "johndoe@example.com"`

#### Editor
* Set your text editor. I recommend `emacs` but you could use any.
> `git config --global core.editor emacs`
* Check your configuration by typing `git config --list`
* Check your value for the key such `user.name` by typing `git config user.name`
<!--- Check unexpected config variable with git config --show-origin KEY--->

#### Credentials
* `git config --global credential.helper cache`: keep your credentials in cahce for 15 minutes. This is to avoid typing credentials everytime we push the change to the remote.
* More on setting credentials: https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage#_credential_caching

## Git Basics

### Initalize your project

* Create a folder: `mkdir git_exp`
* The commit message should (be):
    - Descriptive but concise
    - Use active voice and present tense. For exampe, write `Initalize the first version` rather than `Initalized the first version`
    - Have no period

### Clone a project from a server like GitHub

Suppose you want to clone `stat_computing_seminar` repo from my GitHub Page. Go to https://github.com/dophos/stat_computing_seminar. Click Clone and copy the URL. The copied url is what you are going to use to clone the repo:

> `git clone https://github.com/dophos/stat_computing_seminar.git` (this is the url you copied)

You will see the folder `stat_computing_seminar`. If you want to clone the repo with different name, say, `my_stat_computing_seminar` type:

> `git clone https://github.com/dophos/stat_computing_seminar.git my_stat_computing_seminar`

### Record Changes to the Repo

Your file in the working directory is either ***tracked*** or ***untracked***. Every file that was committed previously is being tracked and if you make any edits to it, the git will detect that change.

Any tracked file could be *unmodified*, *modified* or *staged*.
Any files that were not in the previous *snapshot* (*commit*) will be indicated as untracked so the change or lack thereof will not be followed by git unless you commit the file.

In picture:
<img src="https://git-scm.com/book/en/v2/images/lifecycle.png" /> source:https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository

###  Track new file with `git add`

* `git add` has multiple purposes such as:

   * track new files
   * track modified files
   * mark merge-conflicted files as resolved

* ***IMPORTANT***. What happens if you modify staged files? 
    * `git add` is a preview of the snapshot(commit) that will be taken and instructs git by saying "add precisely this content to the next commit" rather than "add this file to the project” [1](https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository).


### Preview of `git diff`
* If `git status` showed the overview of your working directory, `git diff` will allow you more detailed examination of what has changed.
    * `git diff`: between modified file and the file in the staging area
    * `git diff --staged` or `git diff --cached`: between the staged file and the last commit

* Note that if you type `git diff` right after you commited all the files that were modifiedand, you will not see any changes since every tracked file is up to date.

### Commit your changes

* `git commit` will open an editor that you have configured earlier with some comments.
* You will only be able to commit the files that have you have staged, and nothing that you have not staged will be committed.
* You could commit without opening an editor by using `git commit -m "Your Commit Message"`.
* You could skip the staging part by using `git commit -a -m "Your Commit Message"`. This will commit all the modified files(which are being tracked) skipping staging step. Caution is needed for using this command, as you could commit unwanted changes.
* Follow the commit message convention. Some [blog post](https://chris.beams.io/posts/git-commit/) on this.
    * Used imperative
    * Do not end the subject line with period
    * Start with capital letter on the subject line
    * One line sperating subject and the body
    * Body should include what and why, not how 

### Ignoring Files

* Suppose you want to let git know not to follow certain files. Then, you should specify this in `.gitignore` file. For example, your `.gitignore` might look like this:

>```
>.[oa]
>*~
>```

This informs git to ignore all files with extension `.o` or `.a` and also temprorary files that ends with `~`.

* You could have single `.gitignore` in the root directory of your project directory which applies to all the files recursively, or you could have one or more `.gitignore` files in any subdirectories which will take precedence over `.gitignore` in the parent directory.
* For more go to [here]( https://github.com/github/gitignore)


### Removing Files

* `git rm --cached`: untrack but do not delete (from git's point of view it is deleted)
* `git rm`: delete it
* `git rm -f`: delete staged or modified file

### Moving Files
* You want to rename a file that is being tracked. Suppose you are trying to change the name of `README.md` to `README`. A and B do the same thing:

> A
>> `git mv README.md README`

> B 
>> `mv README.md README`
>>
>> `git rm README.md`
>>
>> `git add README`

Both of them change the name of the file, untrack the file with the old name and stage the file with new name.

### Viewing the Commit History

* `git log` lists your commit history. Listed in reverse chronological order.
    * `git log --oneline`
    * `git log --graph`: useufl when you have divergent and merged commits

* Limit the log output 
    * `git log -<n>` shows the last `n` commits.
    * `git log --since=2.weeks` shows the commits in the last 2 weeks.

* For more go [here](https://git-scm.com/book/en/v2/Git-Basics-Viewing-the-Commit-History).

### Undoing Things

#### Amend your commit
* Becareful because there are times you cannot undo the undos.
* Often used to only correct the minor mistakes such as typo or forgetting to stage the modified file that was supposed to be committed together with other files.
* For example, suppose you forgot to stage the file named `forgotten_file` in your previous, initial commit. You could fix this by:
    >```
    git add forgotten_file
    git commit --amend
    ```
    >
    You will replace the previous commit which will not show up in the repo history.

#### Unmodify a modified file
* Suppose you have made a change to a `file`. But, you decide to discard the change and revert it to last commmited version. To do this, type `git checkout -- file`
* ___Be very cautious___ in using this because you will lose all the changes that you have made to this file in the working directory and only thing you have left with is the last committed version of the file. You will not be able to recover it unless you have already preserved it. 
    * If you want to preserve it but get it out of the way from the repo for now, you could use [stashing](https://git-scm.com/book/en/v2/Git-Tools-Stashing-and-Cleaning#:~:text=The%20answer%20to%20this%20issue,even%20on%20a%20different%20branch) or branching.


### Working with remotes
* Remote repositories are just different versions of your local repository e.g. GitHub.

* `git remote`

* `git remote -v`: show the url of the configured remote that you use when you read/write from/to the remote

* `git remote add <shortname> <url>`: allows you to add a new remote with `shortname` as a reference name. This is useful because you will not need to specify the whole url when you are reading the data that you do not have from the remote. For example, if you have cloned some repository, it was named `origin`. This is a shorthand you could use to reference the repo instead of using the full url.

* `git fetch <remote>`: allows you do download the data from `remote` that you do not already have. It will not modify the current repo you are working with but it will only add to it. You could inspect it and decide to merge it with the branch you are currently working with. Merging and branching will be explained later.

* `git pull <remote>`: allows you to fetch and merge at the same time a remote branch ou have set up to track.

* `git push <remote> <branch>`: will merge your local repository snapshot with the `remote`'s `branch`. This only works if you have write access to the remote. 
    * If you have another collaborator that has write access to the remote, and the person push the repo before you and then you try to push your change, then your push will be rejected. To push your results, you would have to fetch the change that your collaborator has made and incorporate the change first.

* `git remote show <remote>`: will show how you have configured your local repo with `remote`.

* `git remote rename <old_name> <new_name>`: changes the old shorthand, `old_name`, to `new_name`.

* `git remote remove <remote>`: you could stop following a remote, for example, because you have a collaborator whose remote is not being updated anymore.


### Tagging
* Suppose you have wrote a package version 1.0. After a year, you modify some methods in the package and you want to tag this version as 2.0. Tagging marks a particular point (snapshot) of a repo.

* `git tag [-l|--list]` will list tags in alphabetical order. 
    * If you want to look at only certain tags you could specify the pattern. 
        * For example, if you only want to see tags realted to version 1.x, then you would write `git tag -l "v1.*"`
        * Note that if you are limiting your list of your takes `[-l|--list]` becomes mandatory.

#### Annotated Tags
* `git tag -a <tagname> -m "<tagging message>"`
    * checksummed
    * tagger name, email
    * date
    * tagging message
    * can be signed and verified with [GNU Privacy Guard](https://gnupg.org/)
* `git show <tagname>` will show tag data along with a commit

#### Lightweight Tags
* `git tag <tagname>`
* `git show <tagname>` will not show the extra tag information which was visible in annotaged tags.

#### Tag commits in the past
* `git tag -a <tagname> <checksum>` can be used to tag a commit in the past. The checksum of a commit can be retrieved from `git log`.

#### Share tags
* `git push` does not update the tag to remote server, so you would need to do it explictly.
    * `git push <remote> <tagname>`
    * `git push <remote> --tags`: push all the tags that have not been updated in the remote.
    * `git push <remote> --follow-tags`: push only the annotated tags. No option to push only the light weight tags yet.
    
#### Delete tags
* `git tag -d <tagname>`: remove a tag from local repo.
* `git push <remote> --delete <tagname>`: remove a tag from `remote`.

#### Check out tags
* `git checkout <tagname>`: puts your repo in "detached HEAD" state. In this state, any commit you make will not belong to any branch and will only be accessible with the commit hash.
* To keep the change, create a branch while accessing the tag and your commit will be accessible through this branch: `git checkout -b <branch_name> <tagname>`.
    * Note that the new change will move forward from the tag you checked out.

### Git Aliases

* Allows you to use a shorthand for often used git subcommands:
    * `git config --global alias.co checkout`
    * `git config --global alias.br branch`
    * `git config --global alias.ci commit`
    * `git config --global alias.st status`
    * `git config --global alias.unstage "reset HEAD --"`
    * `git config --global alias.last "log -l HEAD"`
* You could also use external git tool that is not one of git's subcommands. For example to use `gitk` with alias `visual` you type:
    * `git config --global alias.visual "!gitk"`

    * `!` let git know that what follows is not a git subcommand.

## Git Branching

### Overview

* Branching allows you to test current working version of package/project quickly, efficiently and safely
* Branch is just a [pointer](https://en.wikipedia.org/wiki/Pointer_(computer_programming)) to a last commit object you have made when you committed.
* A commit object is just a meta data file with a pointer to a tree object.
    * The tree object has a collection of pointers to files (blobs).
        * These files are the files that you took a snapshot or committed.
        
In picture:
<img src="https://git-scm.com/book/en/v2/images/commit-and-tree.png" /> 

* After a couple of commits, this is how the commits are organized. Notice each commit has a pointer named `parent` that points to a parent commit object:
<img src="https://git-scm.com/book/en/v2/images/commits-and-parents.png" />

* Suppose you want to creat a branch named `testing`. You type `git branch testing`. This creates a pointer, `testing`, that points to the lastest commit you are on:
<img src="https://git-scm.com/book/en/v2/images/two-branches.png" />

* But how do we know which branch we are on? This is tracked by another pointer `HEAD` that points to a branch pointer of interest. Whichever `HEAD` branch is pointing to is the branch you are on and what your working directory is:
<img src="https://git-scm.com/book/en/v2/images/head-to-master.png" />
You could check where the `HEAD` is pointing with `git log --decorate`.

* git checkout `testing`: change the branch to `testing`. In other words, point the `HEAD` to `branch`. In picture:
<img src="https://git-scm.com/book/en/v2/images/head-to-testing.png" />

* Suppose you have made a commit while you were on `testing` branch. Both `testing` pointer and `HEAD` move forward with a new commit:
<img src="https://git-scm.com/book/en/v2/images/advance-testing.png" /> 

* Suppose you wanted to go back the `master` branch and make some changes. In picuture, structure looks like this:
<img src="https://git-scm.com/book/en/v2/images/advance-master.png" />

You could see the project has diverged after the commit `f30ab`. This is what's neat about branching--you have experimented with what you originally had without disrupting its development.

* Some commands to see branching:
    * `git log --decorate`: shows where the `HEAD` is pointing
    * `git log --oneline --decorate --graph --all`: print the branch structure with where the `HEAD` is. 
    * Note `git log` does not show all the branch struture. It only shows the branch that the `HEAD` is pointing.

source:https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell

### Merging

* Suppose you are working on some package and you have found an issue from the last stable version. So you checkout and create a new branch, `iss53` with `git checkout -b iss53`:
<img src="https://git-scm.com/book/en/v2/images/basic-branching-2.png" />

* You make a progress with `iss53` and you make a commit:
<img src="https://git-scm.com/book/en/v2/images/basic-branching-3.png" />

* Suppose your collaborator emails you and notifies that there is an error in estimating the standard error and must be fixed ASAP. You proceed to create and checkout a branch `hotfix` with `git checkout -b hotfix` and you have fixed the issue and commit it:
<img src="https://git-scm.com/book/en/v2/images/basic-branching-4.png" />

* You want to incorporate (merge) this change into `master` branch. We checkout `master` and merge the `hotfix` branch into `master`:
<img src="https://git-scm.com/book/en/v2/images/basic-branching-5.png" />

    This is called "fast-forward" because `master` was an ancestor of `hotfix` so all we had to do was move up the `master` pointer up to where the `hotfix` is pointing: 
    <img src="https://git-scm.com/book/en/v2/images/basic-branching-5.png" />
    Since we do know need the `hotfix` branch anymore, we delte by `git branch -d hotfix`:
    <img src="https://git-scm.com/book/en/v2/images/basic-branching-6.png" />

* Suppose you have finished working on `iss53` and you hope to merge the change into `master`. At this point, your git looks like this:
<img src="https://git-scm.com/book/en/v2/images/basic-merging-1.png" />

    * When there is no conflict.
        * You type `git checkout master` and `git merge iss53`. The git creates so called "merge commit" that incorports the change in `iss53`. The merge commit has two parents:
        <img src="https://git-scm.com/book/en/v2/images/basic-merging-2.png" />
     
    * When there is conflict. 
        * You type `git checkout master` and `git merge iss53`, but git displyas there is a conflict. This means there is a file that was changed in both `iss53` and `master` that are different but must be resolved. For example:
        ```
        git merge iss53
        Auto-merging script1.R
        CONFLICT (content): Merge conflict in script1.R
        Automatic merge failed; fix conflicts and then commit the result.```

          If you run `git status` after the conflict, you get:
          
          ```
          On branch master
             You have unmerged paths.
                  (fix conflicts and run "git commit")

            Unmerged paths:
                  (use "git add <file>..." to mark resolution)

                   both modified:      script1.R

               no changes added to commit (use "git add" and/or "git commit -a")
          ```
          
         * At this point, you would have to resolve the conflict manually. For example, opening a file, you find this:
         ```
         <<<<<<< HEAD:script1.R
         li <- lapply(1:100,function(i){mean(x[i])})
         =======
         for(i in 1:100) li[[i]] <- mean(x[i])
         >>>>>>> iss53:script1.R
         ```
         This means the line above ======= is from `master`'s branch and below that is from iss53. You could choose one or the other, or you could rewrite it altogether for example, you could have:
         ```
         li <- foreach(i=1:100) %do% {mean(x[i])}
         ```
         Note, <<<<<<<, =======, and >>>>>>> have been removed. If you have more than one conflict you do the same for each file and mark each file with conflict as resolved by staging them.
         
         * You could resolve the conflict with the help of graphical tool by typing `git mergetool`. This would be more helpful when merging with more complicated conflicts.
         
         * Once you have staged the resolved files, running `git status` would show:
         
         ```
         On branch master
         All conflicts fixed but you are still merging.
           (use "git commit" to conclude merge)

         Changes to be committed:

             modified:   script1.R
         ```
         
         * When you commit, you could expand on commit message to include details on how the conflicts were resolved and why you made the changes the way you did if they are not clear.
         
source for pictures: https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging

### Branch Management

* `git brach`: `*` next to the branch name indicates where the `HEAD` is pointing
* `git branch -v`: shows the last commit message for each branch
* `git branch --merged`: shows the branch that has merged in with currently checked out branch. Branches that show up here could be deleted with `git branch -d` because you have already incorporated the change and you will not lose your work.
* `git branch --no-merged`: shows the branch that has not been merged in with currently checked out branch
* You could also check the merge status of the branch that you are not currently checking out.
    * `git checkout --no-merged master`: this shows branches that has not merged with `master`.
    
### Branching Workflows

* Example of brancing workflow. Suppose you have three branches off of `master`: `dumbidea`, `iss91` and `iss91v2`.

<img src="https://git-scm.com/book/en/v2/images/topic-branches-1.png" />

* Branch structure after you have decided to discard `iss91` and merge `dumbidea` and `iss91v2` to the `master`.

<img src="https://git-scm.com/book/en/v2/images/topic-branches-2.png" />

source: https://git-scm.com/book/en/v2/Git-Branching-Branching-Workflows

### Remote Branches

* `git ls-remote <remote>` or `git remote show <remote>`: check which remote your local repo has configured with.
* `git remote add`: add a new `<remote>` to be the reference of your local repo.

### Fetching and Pushing

* `git fetch <remote>`: update commits in `<remote>` that are not in your local repo. It DOES NOT modify your working directory. It simples downloads the data.
* `git push <remote> <branch>`: write `<branch>` in your local repo to `<remote>` repo.

### Tracking Branches

* It is a local branch that has a direct relationship with a remote branch.
* For example, when you clone `origin/master`, your local `master` automatically tracks `origin/master`("upstream branch").
* To set up tracking branches, from local repo, you checkout a branch in a remote:
    * `git checkout -b <branch> <remote/branch>`: you could use this to set a different name for the tracking branch from upstream branch.
    * `git checkout --track <remote/branch>`
    * `git checkout <branch>` if you do not have this branch in the local repo and matches a branch on only one remote.
    * `git branch -u <remote/branch>` or `git branch -u --set-upstream-to <remote/branch>`if you already have a local branch or have a tracking branch but want to change the upstream branch it tracks.
* `git fetch -all; git branch --v`: check tracking branches. `git fetch -all` allows local to be up-to-date with the remote where upstream branches are.

### Pulling

* Pulling means fetch and then merge. Not recommended.

### Delete Remote branches
* `git push origin --delete <branch>`: deletes remote branch that is now defunct. This only deletes the pointer from the server and the actual data are not deleted until the next garbage collection. Therefore, you might be able to recover the data before then.
