# Git Primer

*J. Runnoe* <br>
*August, 2023*

---
## Contents
* [Version Control](#intro)
* [Install Git](#git)
* [Practice Git](#gitpractice)
* [GitHub Workflow](#github)
* [ASTR8060 and Git](#classgit)
    * [Setting Up Your Working Directory](#estup)
    * [Best Practices for Git](#best)
    * [Class Rules for Git](#rules)
* [Common Commands](#commands)
* [Summary](#summary)

---
## Version Control <a class="anchor" id="intro"></a>

In software development, version control is a class of tools designed to help a team manage changes to code over time.  The development of codes using a common user repository is a feature of modern work in large collaborations -- it is partly how large surveys such as the Sloan Digital Sky Survey (SDSS) have been able to be so successful.  

Git, Mercurial, and SubVersion (SVN) are a few examples of common version control software.  They share the practice of keeping track of how codes change as the development team fixes bugs and implements new features.  Beyond that, there are some general differences in workflow (a consistent recipe for using version control software to accomplish tasks).  SVN maintains a central repository called the trunk from which users check out a working copy.  They can then pull changes from other users to update their local copy and commit any subsequent changes that they make.  With SVN, the user does not necessarily have the entire repository stored locally.  Mercurial and Git are *distributed* version control systems, meaning that each user actually clones the entire repository locally.  All copies of the repository are created equally, so users can push and pull their changes among the distributed system as necessary.  Another way that Mercurial and Git are different from SVN is that they actually track the *changes*, rather than the files themselves, which saves space.  Mercurial and Git have many similarities, but we will be using Git in this course because it is by far the most commonly used version control system in the world.

---
## Install Git <a class="anchor" id="git"></a>

Git is an example of version control software. If you have a Mac, it is probably already installed on your computer. If not, or if you are on another platform, you can get it for any operating system from their download page: [https://git-scm.com/downloads](https://git-scm.com/downloads/).

1. Check whether Git is installed on your computer:<br>
   `$ which git` <br>
   <font color='gray'>/usr/bin/git</font>

    `$ git --version`<br>
    <font color='gray'>git version 2.37.1 (Apple Git-137.1) </font><br>

2. If Git is not installed, download the appropriate installer and follow the prompts to install it.

3. If this is your first time using Git, configure it so that it has your name and email: <br>
`$ git config --global --edit`


---
## Practice Git <a class="anchor" id="gitpractice"></a>

Let’s make a dummy repository to learn the ins and outs of using Git on your local machine. To get started, you’ll need an open terminal window and your favorite text editor. In the terminal, create a directory and change into it. Then issue these commands to initialize your own test repository:

Use a terminal window to make a dummy directory and initializes it as a Git repository: <br>
`$ mkdir test_repo` <br>
`$ cd test_repo` <br>
`$ git init` <br>

Now let's make and commit your first file. I’ll pretend I’m making a grocery list, which I will add to the staging area and commit: <br>
`$ echo “Avocado” >> shopping.txt` <br>
`$ git add shopping.txt` <br>
`$ git commit -m "Added list of groceries."` <br>

Now you can open groceries.txt in your text editor, update the text a few times, and make several commits. Remember to use git add each time to track changes to the staging area and use an informative message with each commit. However, be wary of using `git add *` because it may pick up temporary files.

After I have updated my test file many times, my log of commits looks like this (the top is the most recent):<br>
`$ git log –oneline`<br>
<font color='gray'>8fd949c Updated car parts.<br>
38d46d7 Added car parts.<br>
8ab1d45 Updated breakfast foods. 41bda62 Added breakfast foods.<br>
efdc3d5 Added avocado toast ingredients. 2c689a3 Added list of groceries<br></font>

At the end, I started adding car parts to my list of groceries. That's a little crazy, so I want to revert back to an earlier version of the groceries file. There are several ways to do this in Git, but we will use git revert because it maintains a logical file history and is therefore best for collaborative projects. If I use `git revert head`, Git will make a new commit that is the inverse of the last one. This only lets you go one commit back, because executing it a second time reverts the revert. Since I have two commits worth of groceries that I want to revert, I can specify the hash (or ID) of the commit I want to go back to:<br>
`$ git revert --no-commit 8ab1d45..head` <br>
`$ git commit -m "Reverted to commit before car parts were added to grocery list."` <br>

The `–no-commit` option just lets you use your own more informative commit message instead of the default one generated by git revert. Now the crazy car parts changes are still stored in my history (and I could go back and view or adopt them if I want to), but my current working file does not include those changes and the revert is preserved in the history: <br>

`$ git log –oneline`<br>
<font color='gray'>461dff4 Reverted to commit before car parts were added to grocery list.  <br>
8fd949c Updated car parts. <br>
38d46d7 Added car parts. <br>
8ab1d45 Updated breakfast foods. <br>
41bda62 Added breakfast foods. <br>
efdc3d5 Added avocado toast ingredients. 2c689a3 Added list of groceries. <br></font>

If you just want to go back and look at the state of your directory several commits ago, you can use `git checkout`.  Please be careful to use this only to explore your directory and look at code in the previous state.  Commits and branches in this case can result in a detached head, which will complicate a shared history among your classmates.

What about moving, renaming, and deleting files when they're version controlled with Git?  Deleting a document is simple, use `git rm` to delete and stage it, and then make a commit noting that you removed that file (alternatively, you can use the UNIX `rm` command and then either `git rm` or `git add` to stage it before the commit).  If you want to rename a file, use `git mv` rather than the UNIX command so that Git knows the file has been renamed.  If you accidentally use the UNIX `mv` command, you can rename the file back and Git will never know.  Play around with some of the tutorials in the links section until you feel comfortable with basic operations in Git. 

---
## GitHub Workflow <a class="anchor" id="github"></a>

GitHub is a host for git repositories. If you do not already have a (free) [GitHub](https://github.com/) account, make one now.

Git has multiple common workflows, we will use "forking workflow" that is commonly associated with Github.  The recipe to track a change in Git with this workflow is to select files that you want to track to a *staging area*.  Then, you write a message summarizing the work you have done and commit the changes. When you are ready, you can push your changes to your server-side repository, and create a pull request for the "official" server-side repository which will constitute your submission.

1. Fork the class repository on Github (you will need your GitHub account).

   ![github_fork](http://astro.phy.vanderbilt.edu/~runnojc1/media/jupyter/git_workflow_imgs/git_workflow_imgs_001.jpeg)

   The new fork on my personal account looks like this:

   ![github_forked](http://astro.phy.vanderbilt.edu/~runnojc1/media/jupyter/git_workflow_imgs/git_workflow_imgs_002.jpeg)


2. Clone the new repository to your local machine. Use the URL for your fork on your Github account. <br>
   `$ git clone https://github.com/YOUR_USERNAME/astr8060_f23 .`

   It looks like this for me:
   ![github_forked](http://astro.phy.vanderbilt.edu/~runnojc1/media/jupyter/git_workflow_imgs/git_workflow_imgs_003.jpeg)

3. Configure your remotes. Use “origin” for your Github repo (this was done automatically when you cloned):<br>
   `$ git remote -v` <br>
    <font color=gray>origin	https://github.com/runnoejc/astr_8060_f23/ (fetch)<br>
    origin	https://github.com/runnoejc/astr_8060_f23/ (push)<br></font>

   Set your main Github class repo as "upstream". We will all use the same URL for this one so that we all grab changes from the same place.<br>
   `$ git remote add upstream https://github.com/VanderbiltAstronomy/astr8060_f23.git`<br>

   Check that this worked:</br>
   `$ git remote -v` <br>
    <font color=gray>origin	https://github.com/runnoejc/astr_8060_f23/ (fetch)<br>
    origin	https://github.com/runnoejc/astr_8060_f23/ (push)<br>
    upstream	https://github.com/VanderbiltAstronomy/astr_8060_f23.git (fetch)<br>
    upstream	https://github.com/VanderbiltAstronomy/astr_8060_f23.git (push)<br></font>

   It looks like this for me:
   ![github_forked](http://astro.phy.vanderbilt.edu/~runnojc1/media/jupyter/git_workflow_imgs/git_workflow_imgs_004.jpeg)

4. Now we will set up your workspace. Create your working directory in work/your_username/:<br>
   `$ cd ./work.`<br>
   `$ mkdir your_username`<br>
   `$ touch .gitkeep` <br>

   Track the changes locally with:<br>
   `$ git add .gitkeep` <br>
   `$ git commit -m "Added your_username directory with a gitkeep."`<br>

   Prepare to send these changes to GitHub. First, check the *class repository* to see whether there are any changes made by classmates that you need to download:<br>
   `$ git pull upsteam main`<br>
   <font color=gray>From https://github.com/VanderbiltAstronomy/astr_8060_f23 <br>
      \* branch            main       -> FETCH_HEAD<br>
      \* [new branch]      main       -> upstream/main<br>
   Already up to date.<br></font>

   Now send your changes *to your repository fork* on GitHub:<br>
   `$ git push origin main` <br>

   It looks like this for me:
   ![github_forked](http://astro.phy.vanderbilt.edu/~runnojc1/media/jupyter/git_workflow_imgs/git_workflow_imgs_005.jpeg) 

5. To "submit" work, you will need to create a pull request. Go to [https://github.com/VanderbiltAstronomy/astr8060_f23](https://github.com/VanderbiltAstronomy/astr8060_f23) and select "Pull requests".<a class="anchor" id="submit"></a>

   ![github_forked](http://astro.phy.vanderbilt.edu/~runnojc1/media/jupyter/git_workflow_imgs/git_workflow_imgs_006.jpeg)

   You will need to compare the state of the repositories across forks to get it to find your changes:

   ![github_forked](http://astro.phy.vanderbilt.edu/~runnojc1/media/jupyter/git_workflow_imgs/git_workflow_imgs_007.jpeg)

   Select your fork from the dropdown menu:
   
   ![github_forked](http://astro.phy.vanderbilt.edu/~runnojc1/media/jupyter/git_workflow_imgs/git_workflow_imgs_008.jpeg)

   Before you submit, write me a helpful note.

   ![github_forked](http://astro.phy.vanderbilt.edu/~runnojc1/media/jupyter/git_workflow_imgs/git_workflow_imgs_009.jpeg)

Overall, your workflow should look like this:
* Change code in your local repo.
* Commit to your local repo with `git add` and `git commit`.
* Pull changes from the VanderbiltAstronomy repo with ` git pull upstream main`.
* Do the above as much as you want while you work.
* Push changes to your personal Github fork with ` git push origin main`.
* Do the above as much as you want while you work.
* To submit an assignment: go to [https://github.com/VanderbiltAstronomy/astr_8060_s23](https://github.com/VanderbiltAstronomy/astr_8060_s23) and manually submit a pull request.

---
## ASTR8060 and Git <a class="anchor" id="classgit"></a>

In this class, we will use a Git repository to distribute course material and submit (but not grade) work. For instance, homeworks will be submitted by uploading code and/or documents to our Git repository by the stated deadline. 

Think of a Git repository as a collaborative directory to which multiple people have access.  The directory is special in that it uses snapshots to track how it changes with time, and logs information on who has made the changes.  That way, it's possible for any user who has a clone of the repository to track changes and use anything in the repository.  Git is not intended for tracking large data files, so think code and documents.

Every document or piece of code in the class Git repository will be available for everyone in the class to use and edit.  If you are worried about other students copying your homework (and other) submissions, note that a simple `diff` in UNIX, and/or logging via `git diff`, will make it obvious to me if another student has directly copied your submission.  In fact, frequently uploading your work to the Git repository as you write and develop it will make it far harder for your work to be copied than uploading it in a single submission.  It is always possible to return a Git repository to an earlier state, so any unwanted changes can be easily redacted.

### Setting up your working directory <a class="anchor" id="setup"></a>

Our directory -- our Git repository -- sits on the Vanderbilt Astronomy Github page. The difference between this repository and the one you just practiced on is that this one will be cloned onto all of your classmates computers in a way that can interact with your own. By the time you get here, you should have learned how to fork the class repository on Github and then clone it to your local machine.

Our class repository is structured as follows. Beneath the parent directory, which is called I have put a directory `work/runnoe/`.

This directory is my personal workspace. You should similarly create a directory that is your name and beneath it you will create directories for homework assignments and data reduction tasks: <br>

`ASTR8060/work/YOUR_USERNAME` <br>
`ASTR8060/work/YOUR_USERNAME/homework` <br>

Now create a working directory with your name, plus a directory for your homework submissions. <br>
`cd ASTR8060/work/` <br>
`mkdir runnoe` <br>
`cd runnoe/` <br>
`mkdir homework` <br>
`cd homework/` <br>
`touch .gitkeep` <br>
`git add .gitkeep` <br>
`git commit -m "Added empty homework working dir with dummy keep file."` <br>

Note that Git won't let you add an empty directory to your repository, so adding an empty .gitkeep file is the industry standard workaround. This commit has only been made to your local version of the repository but has not yet been communicated to your classmates via our remote repository. In general, it is considered good practice to make sure your repository is completely up to date relative to the remote repo before you push any of your own changes and issue a pull request. So first, evaluate whether you are out of date:<br>

`git fetch`<br>
`git status`<br>

If your local repository is out of date due to changes made by your classmates, git status will tell you. The next step is to pull the latest state of the remote repository to your local clone and merge any changes: <br>

`git pull upstream main`<br>

The workflow that described for the class will make merging changes very simple since you will never be editing the same files as your classmates. Thus, merging should largely be handled by Git and will be pretty straightforward by following the prompts. Now you are ready to push your changes to *your GitHub fork*:<br>
`git push origin master`<br>

This will send your updated `homework/` directory to your server-side Github repository. You can execute this workflow as much and as often as you like. When you are ready to submit work, go to your Github account and submit a pull request following [these instructions](#submit). Experiment with this local workflow until you are comfortable using it and understand the outputs from git log and git status. Ask if you have questions.

### Best Practices for Git <a class="anchor" id="best"></a>

There are various examples of good Git etiquette for our workflow to help people share documents and code:

1. Always `git fetch` and then `git pull` before committing anything new to the directory.  This way, if somebody else commits something new just before you, then you won't lose track of which version of the directory you're working with.
2. Git is for storing code and documents, *not large data files*.  Large data files will make the repository slow to operate.  If you have a large data file, keep it local to your machine and share it with people in other ways. In particular, do not put the Imaging/ data set for the reduction assignments in our repository.
3. Git requires good coding practices.  Place comments within the body of your code carefully using your initials.  So, my code will have many comment lines of the form, e.g., `# JCR this line of code does this`.  When writing documents collaboratively, make similar comments if you make major changes to the document.
4. When you commit a new document using the `git commit` command, *always* provide a comment as in `git commit -m "this is what I did"`.

### Class Rules for Git <a class="anchor" id="rules"></a>

1. Do not edit code written by another member of the class without their permission.  The reason for this is twofold: a) it maintains separate spaces for each student to submit their independent work and b) it facilitates a workflow that makes merging changes to the repository straightforward.  *Editing other people's code that is placed in any directory that contains their name (i.e. `ASTR8060/theirname/HWx/somecode`) will be considered grounds for failing the course.*
2. It is permissible to read and to make a copy of any member of the class's code *after* it has been graded as a homework submission...so, on HW2, it will be permissible to raid people's *HW1* directory.  But, use other people's code by making a copy of it in your personal directory or linking to it in full.  *Do not edit it in their directory.  Editing other people's code that is placed in any directory that contains their name (i.e. `theirname/HWx/somecode`) will be considered grounds for failing the course.* If you have questions about whether it is okay to look at your classmate's code in specific situations, just ask!
3. Provide your own homework solutions.  Do not copy each other's work.  It is very easy for me to check in Git whether your homework submission greatly resembles another student's submission.  *Plagiarizing each other's homework submissions will be considered grounds for failing the course.* Feel free to discuss homework problems and issues with each other but *write your own submissions sitting by yourself.*

---
## Common Commands <a class="anchor" id="commands"></a>

`git status` <br>
This command prints the status of your repository to the screen.  It will tell you about files that are added, tracked, missing, or renamed as well as the state of your local repository clone relative to the remote. <br>

`git fetch` <br>
This command queries the remote class repository to see whether there are any changes since your last pull.  Issue it *before pushing any changes to the remote repository*. <br>

`git pull` <br>
This command downloads and merges changes to your local directory to mirror the current copy of the remote repository.  Issue this command in combination with `git fetch` *often*.<br>

`git push` <br>
This command sends your commit history and changes to the remote repository.  Always issue `git fetch` and `git pull` to merge any changes *before* pushing your own updates to the remote. <br>

`git add a_new_file` <br>
This adds a file to the staging area to be included in the next commit.  Issue it every time you want to commit a file.  Use `git status` to find modified files that are not staged for a commit. <br>

`git commit -m "this is what I did"` <br>
This command commits a file you are working on to the local repository.  It will be committed to the directory in the repository that mirrors the directory that you are in locally.  The `-m` switch commits a comment that will be logged by Git. *Always include a comment*.<br>

`git log` <br>
This command lists all of the changes to the repository.  Use `git log --oneline` for a compact, easy-to-read version.  To see recent changes, pipe it to `more` at the command line (e.g., `git log | more`).  To see changes somebody specific made, `grep` that person's username at the command line (e.g., `git log | grep runnoe -A 3`). <br>

`git ls-files` <br>
This command lists what is in the repository. This is a useful command as you may expect to find a file in the Git repository when, in fact, you forgot to add or commit that file. This command, then, will allow you to see what is actually in the repository as compared to what is in your local directory. <br>

`git diff revA..revB` <br>
This command shows the differences in the repository between commits with hashes A and B.  This allows you to track changes that people have made.  One use, e.g., would be to see who has added what to a LaTeX document since you went to bed last night. <br>

If you find yourself using many other different commands, feel free to add them to this notebook.

---
## Summary <a class="anchor" id="summary"></a>

At this point, all of you should have:
* Have Git set up on your computer.
* An understanding of what version control is.
* Have beginning literacy with command-line git.
* Have Github account and workflow for class homework submission.
* Know the class rules for Git.