## Learning Objectives {.unnumbered}

-   Apply the principles of Git to track and manage changes of a project
-   Utilize the Git workflow including pulling changes, staging modified files, committing changes, pulling again to incorporate remote changes, and pushing changes to a remote repository
-   Create and configure Git repositories using different workflows

## Introduction to Version Control

![](images/phd_comics_final.png){width="70%" fig-align="center"}

Every file in the scientific process changes. Manuscripts are edited. Figures get revised. Code gets fixed when bugs are discovered. Sometimes those fixes lead to even more bugs, leading to more changes in the code base. Data files get combined together. Sometimes those same files are split and combined again. In just one research project, we can expect thousands of changes to occur. 

These changes are important to track, and yet, we often use simplistic file names to do so. Many of us have experienced renaming a document or script multiple times with the ingenuine addition of "final" to the file name (like the comic above demonstrates).

You might think there is a better way, and you'd be right: **version control**. Version control provides an organized and transparent way to track changes in code and additional files. This practice was designed for software development, but is easily applicable to scientific programming.

There are many benefits to using a version control software including:

-   **Maintain a history** of your research project's development while keeping your workspace clean
-   **Facilitate collaboration** and transparency when working on teams
-   **Explore bugs or new features** without disrupting your team members' work
-   and more!

The version control system we'll be diving into is Git, the most widely used modern version control system in the world.


## Introduction to Git + GitHub

Before diving into the details of Git and how to use it, let's start with a motivating example that's representative of the types of problems Git can help us solve.

### A Motivating Example

Say, for example, you're working on an analysis in R and you've got it into a state you're pretty happy with. We'll call this version 1:

:::{.column-body-outset-right}
![](images/git-intro-slide01.png)
:::

You come into the office the following day and you have an email from your boss, "Hey, you know what this model needs?"

:::{.column-body-outset}
![](images/git-intro-slide02.png)
:::

You're not entirely sure what she means but you figure there's only one thing she could be talking about: more cowbell. So you add it to the model in order to really explore the space.

But you're worried about losing track of the old model so, instead of editing the code in place, you comment out the old code and put as serious a warning as you can muster in a comment above it.

:::{.column-body-outset}
![](images/git-intro-slide03.png)
:::

Commenting out code you don't want to lose is something probably all of us have done at one point or another but it's really hard to understand why you did this when you come back years later or you when you send your script to a colleague. Luckily, there's a better way: Version control. Instead of commenting out the old code, we can change the code in place and tell Git to commit our change. So now we have two distinct versions of our analysis and we can always see what the previous version(s) look like.

:::{.column-body-outset}
![](images/git-intro-slide04.png)
:::

You may have noticed something else in the diagram above: Not only can we save a new version of our analysis, we can also write as much text as we like about the change in the commit message. In addition to the commit message, Git also tracks who, when, and where the change was made.

#### With Git we can enhance our workflow: {.unnumbered}

- **Eliminate** the need for **cryptic filenames** and comments to track our work.
- Provide **detailed descriptions of our changes** through commits, making it easier to understand the reasons behind code modifications.
- Use commits to **access and even execute older versions** of our code.
- Additionally, Git offers a powerful distributed feature. **Multiple individuals can work on the same analysis concurrently** on their own computers, with the ability to merge everyone's changes together.

ADVANCED:
- Work on multiple **branches** simultaneously, allowing for parallel development, and optionally merge them together.
- **Assign meaningful tags** to specific versions of our code.


### What *exactly* are Git and GitHub?

#### Git: {.unnumbered}

-   an open-source distributed **version control** software
- designed to manage the versioning and tracking of source code files and project history
-   **operates locally** on your computer, allowing you to create repositories, and track changes
- provides features such as committing changes, branching and merging code, reverting to previous versions, and managing project history
- works directly with the files on your computer and does not require a network connection to perform most operations
- primarily used through the command-line interface (CLI, e.g. Terminal), but also has various GUI tools available (e.g. RStudio IDE)

::: {.column-margin}
![](images/git-intro.png)
:::

#### GitHub: {.unnumbered}
- **online platform** and service built around Git
- provides a **centralized hosting platform for Git repositories**
- allows us to store, manage, and collaborate on their Git repositories in the cloud
-  offers additional features on top of Git, such as a web-based interface, issue tracking, project management tools, pull requests, code review, and collaboration features
-   enables easy sharing of code with others, facilitating collaboration and contribution to open source projects
-   provides a social aspect, allowing users to follow projects, star repositories, and discover new code


::: {.column-margin}
![](images/github-intro.png)
:::

#### General Picture

- Repositories (“repos”) are Git and GitHub main unit. A “contained”  folder with permissions. A repository can be public or private.

- Files are stored in repositories.

- Repositories are “owned” (live under) by users /organizations.

- All repositories have the same structure. Makes is easy and familiar to navigate.

### Let's take a look at a repository

One of the first thing to note here it that every repository will have the same structure. At the top we have the username or organization name/name-of-repository (doesn't change, each repository under a user name or organization has a unique name). Note that the url to a repo always follows the same structure too "_github.com/username/repo-name_". In the repository landing page we can find information about the most activity in this repository. This is important because it tells you how recently the work in this repo has been updated. And then you have all the files in the repository.

![](images/palmer-penguin-repo-orientation.png)
Generally, a repository will also have a README. You don't have to have a readme but it's best practice to have a read-me document at the top level of your repository to say describe the repository. What are people looking into in this repository? Depending on the kind of repository, often there are installation instructions and how to get help. In the case of [`palmerpeguins`](https://github.com/allisonhorst/palmerpenguins), this is an R package so the readme provides all the infromation about the package and how to install it.

![](images/palmer-penguin-readme.png)

Check out other repositories:

- [NCEAS Learning Hub Modules Website Repository](https://github.com/NCEAS/learning-hub-modules)

- [Preparation for the 2023 Ocean Health Index](https://github.com/OHI-Science/ohiprep_v2023/tree/gh-pages)



### Git and GitHub Workflow

There are different workflows for creating version-controlled repositories. Here we will describe one of them: Create a remote repo (on GitHub) first, then clone to your local computer.

#### Go to github.com and create a repository under your user or your organization
![](images/github-new-repo.png){width=20%, fig-align="center"}

#### _Clone_ the repository to your local computer

![](images/github-clone.png){width=80%}
[**Clone:** download an identical copy - a 'clone' - of a repository to your local computer. Cloned repositories can still be synced with the online version(s) at your whim.]{.aside}

#### In your local computer, you work on your code, analysis, report, etc.

![](images/github-work-local.png){width=80%}

#### You _stage_ and _commit_ your changes to your local repository.
[**Stage:** Indicating which of the modified files are ready to be committed.<br>
**Commit:** Records changes to the repository and include a descriptive message (you should always include a commit message!).]{.aside}


![](images/github-work-local.png){width=80%}




#### You _pull_ to make sure your local repository is up to date with the remote repository.

![](images/github-pull.png){width=80%}
[**Pull:** Retrieves changes from a remote repository and merges them into your local working file(s).]{.aside}


#### You _push_ your commits into the GitHub remote repository.

[**Push:** Sends local commits to a remote repository.]{.aside}

![](images/github-push.png){width=80%}




#### All together: Git Workflow Vocabulary{.unnumbered}




|Term| Action| Definition|
|----|-------|-----------|
|Clone|Clone the repository to your local computer|download an identical copy - a 'clone' - of a repository to your local computer. Cloned repositories can still be synced with the online version(s) at your whim|
|Stage| You stage modified files to indicate the changes you want to commit| Indicating which of the modified files are ready to be committed|
|Commit| You commit your changes to your local repository|Commit: Records changes to the repository and include a descriptive message (you should always include a commit message!)|
|Pull| You pull to make sure your local repository is up to date with the remote repository| Retrieves changes from a remote repository and merges them into your local working file(s)|
|Push| You push your commits into the GitHub remote repository|Sends local commits to a remote repository |


The processes described in the above sections (i.e. making changes to local working files, recording "snapshots" of them to create a versioned history of changes in a local Git repository, and sending those versions from our local Git repository to a remote repository on GitHub is illustrated using islands, buildings, bunnies, and packages in the artwork, below:

[A basic git workflow represented as two islands, one with "local repo" and "working directory", and another with "remote repo." Bunnies move file boxes from the working directory to the staging area, then with Commit move them to the local repo. Bunnies in rowboats move changes from the local repo to the remote repo (labeled "PUSH") and from the remote repo to the working directory (labeled "PULL").]{.aside}

[![Artwork by Allison Horst](images/git-workflow-allison-horst.png)](https://twitter.com/allison_horst.png)


Let's put this workflow on practice!

## Exercise 1: Create a remote repository on GitHub

::: callout-tip
## Setup

1.  Login to [GitHub](https://github.com/)
2.  Click the **New repository** button
3.  Name it `{FIRSTNAME}_test`
4.  Add a short description
5.  Check the box to add a `README.md` file
6.  Add a `.gitignore` file using the `R` template
7.  Set the `LICENSE` to Apache 2.0
:::

If you were successful, it should look something like this:

<br>


:::{.column-body-outset}
![](images/new-repo-github.png)
:::

<br>

You've now created your first repository! It has a couple of files that GitHub created for you: `README.md`, `LICENSE`, and `.gitignore`. 

::: {.callout-note}
## `README.md` files are used to share important information about your repository
You should always add a `README.md` to the root directory of your repository -- it is a markdown file that is rendered as HTML and displayed on the landing page of your repository. This is a common place to include any pertinent information about what your repository contains, how to use it, etc.
:::

<br>


:::{.column-body-outset}
![](images/github-test-repo.png)
<!--![](images/github-test-repo-sam.png)--->
:::


For simple changes to text files, such as the `README.md`, you can make edits directly in the GitHub web interface.

::: callout-note
## Challenge

Navigate to the `README.md` file in the file listing, and edit it by clicking on the **pencil icon** (top right of file). This is a regular Markdown file, so you can add markdown text. Add a new level-2 header called "Purpose" and add some bullet points describing the purpose of the repo. When done, add a commit message, and hit the **Commit changes** button.
:::

<br>

<!-- :::{.column-page} -->

![](images/github-test-edit.png)

<!-- ::: -->

<br>

Congratulations, you've now authored your first versioned commit! If you navigate back to the GitHub page for the repository, you'll see your commit listed there, as well as the rendered `README.md` file.

<br>

<!-- :::{.column-page} -->

![](images/github-test-displayed.png)

<!-- ::: -->

<br>

The GitHub repository landing page provides us with lots of useful information. To start, we see: 

- all of the files in the remote repository
- when each file was last edited
- the commit message that was included with each file's most recent commit (which is why it's important to write good, descriptive commit messages!)

Additionally, the header above the file listing shows the most recent commit, along with its commit message, and a unique ID (assigned by Git) called a [SHA](https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/about-commits#about-commits). The SHA (aka hash) identifies the specific changes made, when they were made, and by who. If you click on the SHA, it will display the set of changes made in that particular commit.

::: {.callout-caution icon=false}
## What should I write in my commit message?

Writing effective Git commit messages is essential for creating a meaningful and helpful version history in your repository. It is crucial to avoid skipping commit messages or resorting to generic phrases like "Updates." When it comes to following best practices, there are several guidelines to enhance the readability and maintainability of the codebase.

Here are some guidelines for writing effective Git commit messages:

1. **Be descriptive and concise**: Provide a clear and concise summary of the changes made in the commit. Aim to convey the purpose and impact of the commit in a few words.

2. **Use imperative tense**: Write commit messages in the imperative tense, as if giving a command. For example, use "Add feature" instead of "Added feature" or "Adding feature." This convention aligns with other Git commands and makes the messages more actionable.

3. **Separate subject and body**: Start with a subject line, followed by a blank line, and then provide a more detailed explanation in the body if necessary. The subject line should be a short, one-line summary, while the body can provide additional context, motivation, or details about the changes.

4. **Limit the subject line length**: Keep the subject line within 50 characters or less. This ensures that the commit messages are easily scannable and fit well in tools like Git logs.

5. **Capitalize and punctuate properly**: Begin the subject line with a capital letter and use proper punctuation. This adds clarity and consistency to the commit messages.

6. **Focus on the "what" and "why"**: Explain what changes were made and why they were made. Understanding the motivation behind a commit helps future researchers and collaborators (including you!) comprehend its purpose.

7. **Use present tense for subject, past tense for body**: Write the subject line in present tense as it represents the current state of the codebase. Use past tense in the body to describe what has been done.

8. **Reference relevant issues**: If the commit is related to a specific issue or task, include a reference to it. For example, you can mention the issue number or use keywords like "Fixes," "Closes," or "Resolves" followed by the issue number.
:::


## Exercise 2: `clone` your repository and use Git locally in RStudio

Currently, our repository just exists on GitHub as a remote repository. It's easy enough to make changes to things like our `README.md` file (as demonstrated above), from the web browser, but that becomes a lot harder (and discouraged) for scripts and other code files. In this exercise, we'll bring a copy of this remote repository down to our local computer (aka **clone** this repository) so that we can work comfortably in RStudio.

::: {.callout-important title="An important distinction"}
We refer to the **remote copy** of the repository that is on GitHub as the **origin repository** (the one that we cloned from), and the copy on our local computer as the **local repository**.
:::


Start by clicking the green **Code** button (top right of your file listing) and copying the URL to your clipboard (this URL represents the repository location):

:::{.column-body-outset}
![](images/github-test-clone-url.png){width="70%" fig-align="center"}
:::

RStudio makes working with Git and version controlled files easy -- to do so, you'll need to be working within an R project folder. The following steps will look similar to those you followed when first creating an R Project (see [Appendix](http://localhost:5456/session_12.html#create-an-r-project)), with a slight difference. Follow the instructions in the Setup box below to clone your remote repository to your local computer in RStudio:


::: callout-tip
## Setup

-   Click **File** > **New Project**
-   Select **Version Control** and paste the remote repository URL (which should be copied to your clipboard) in the **Repository ULR** field
-   Press **Tab**, which will auto-fill the **Project directory name** field with the same name as that of your remote repo -- while you can name the local copy of the repository anything, it's typical (and highly recommended) to use the same name as the GitHub repository to maintain the correspondence

<!-- ![](images/rstudio-clone-repo.png){width="90%" fig-align="center"} -->
![](images/rstudio-clone-repo-sam.png){width="90%" fig-align="center"}
:::

Once you click **Create Project**, a new RStudio window will open with all of the files from the remote repository copied locally. Depending on how your version of RStudio is configured, the location and size of the panes may differ, but they should all be present -- you should see a **Git** tab, as well as the **Files** tab, where you can view all of the files copied from the remote repo to this local repo. 

:::{.column-body-outset}
<!-- ![](images/github-rstudio-test.png) -->
![](images/github-rstudio-test-sam.png)
:::

You'll note that there is one new file `sam_test.Rproj`, and three files that we created earlier on GitHub (`.gitignore`, `LICENSE`, and `README.md`).

In the **Git** tab, you'll note that the one new file, `sam_test.Rproj`, is listed. This **Git** tab is the status pane that shows the current modification status of all of the files in the repository. Here, we see `sam_test.Rproj` is preceded by a **??** symbol to indicate that the file is currently untracked by Git. This means that we have not yet committed this file using Git (i.e. Git knows nothing about the file; hang tight, we'll commit this file soon so that it's tracked by Git). As you make version control decisions in RStudio, these icons will change to reflect the current version status of each of the files.

Inspect the history. Click on the **History** button in the **Git** tab to show the log of changes that have occurred -- these changes will be identical to what we viewed on GitHub. By clicking on each row of the history, you can see exactly what was added and changed in each of the two commits in this repository.

<!-- ![](images/rstudio-history-1.png) -->
![](images/rstudio-history-1-sam.png)

::: callout-note
## Challenge

1.  Make a change to the `README.md` file -- this time from RStudio -- then commit the `README.md` change
2.  Add a new section to your `README.md` called "Creator" using a level-2 header. Under it include some information about yourself. **Bonus:** Add some contact information and link your email using Markdown syntax.
:::

Once you save, you'll immediately see the `README.md` file show up in the **Git** tab, marked as a modification. Select the file in the **Git** tab, and click **Diff** to see the changes that you saved (but which are not yet committed to your local repository). Newly made changes are highlighted in green.

<!-- ![](images/rstudio-status-pane.png) -->
![](images/rstudio-status-pane-sam.png)

<!-- And here's what the newly made changes look like compared to the original file. New lines are highlighted in green, while removed lines are in red. -->

<!-- ![](images/rstudio-diff.png) -->

**Commit the changes.** To commit the changes you made to the `README.md` file using RStudio's GUI (Graphical User Interface), rather than the command line:

1. **Stage** (aka add) `README.md` by clicking the check box next to the file name -- this tells Git which changes you want included in the commit and *is analogous to using the git command, `git add README.md`, in the command line*
2. **Commit** `README.md` by clicking the **Commit** button and providing a descriptive commit message in the dialog box. Press the **Commit** button once you're satisfied with your message. *This is analogous to using the git command, `git commit -m "my commit message"`, in the command line*.

![](images/rstudio-commit-1.png)

A few notes about our local repository's state:

- We still have a file, `sam_test.Rproj`, that is listed as untracked (denoted by **??** in the **Git** tab). 
- You should see a message at the top of the **Git** tab that says, `Your branch is ahead of ‘origin/main’ by 1 commit.`, which tells us that we have 1 commit in the local repository, but that commit has not yet been pushed up to the `origin` repository (aka remote repository on GitHub). 

**Commit the remaining project file** by staging/adding and committing it with an informative commit message.

![](images/rstudio-commit-2.png)

When finished, you'll see that no changes remain in the **Git** tab, and the repository is clean.


**Inspect the history.** Note that under **Changes**, the message now says:

`Your branch is ahead of ‘origin/main’ by 2 commits.`

These are the two commits that we just made, but have not yet been pushed to GitHub. 

Click on the **History** button to see a total of four commits in the local repository (the two we made directly to GitHub via the web browser and the two we made in RStudio).



**Push these changes to GitHub.** Now that we've made and committed changes locally, we can push those changes to GitHub using the **Push** button. This sends your changes to the remote repository (on GitHub) leaving your repository in a totally clean and synchronized state (meaning your local repository and remote repository should look the same). 

::: {.callout-note}
## If you are prompted to provide your GitHub username and password when **Push**ing...
it's a good indicator that you did not set your GitHub Personal Access Token (PAT) correctly. You can redo the steps outlined in the [GitHub Authentication section](https://learning.nceas.ucsb.edu/2023-09-ucsb-faculty/session_12.html#github-authentication) of the [Appendix](https://learning.nceas.ucsb.edu/2023-09-ucsb-faculty/session_12.html) to (re)set your PAT, then **Push** again.
:::

:::{.column-body-outset}
![](images/rstudio-history-3.png)
<--![](images/rstudio-history-3-sam.png)-->
:::

If you look at the History pane again, you'll notice that the labels next to the most recent commit indicate that both the local repository (`HEAD`) and the remote repository (`origin/HEAD`) are pointing at the same version in the history. If we look at the commit history on GitHub, all the commits will be shown there as well.

<!-- :::{.column-page} -->

![](images/github-history.png)

::: callout-note
## Last thing, some Git configuration to surpress warning messages

Git version 2.27 includes a new feature that allows users to specify the default method for integrating changes from a remote repository into a local repository, without receiving a warning (this warning is informative, but can get annoying). To suppress this warning *for this repository only* we need to configure Git by running this line of code in the Terminal:

```{bash}
#| eval: false
#| echo: true

git config pull.rebase false
```

`pull.rebase false` is a default strategy for pulling where Git will first try to auto-merge the files. If auto-merging is not possible, it will indicate a merge conflict (more on resolving merge conflicts in [Chapter 11](https://learning.nceas.ucsb.edu/2023-09-ucsb-faculty/session_11.html)).
<!--
**Note:** Unlike when we first configured Git (see [Appendix](https://learning.nceas.ucsb.edu/2023-09-ucsb-faculty/session_12.html#set-up-global-options-in-git)), we do not include the `--global` flag here (e.g. `git config --global pull.rebase false`). This sets this default strategy for this repository only (rather than globally for all your repositories). We do this because your chosen/default method of grabbing changes from a remote repository (e.g. [pulling](https://git-scm.com/docs/git-pull) vs. [rebasing](https://git-scm.com/docs/git-rebase)) may change depending on collaborator/workflow preference. -->
:::





## Git resources

There's a lot we haven't covered in this brief tutorial. There are some great and much longer tutorials that cover advanced topics, such as:

-   Using Git on the command line
-   Resolving conflicts
-   Branching and merging
-   Pull requests versus direct contributions for collaboration
-   Using `.gitignore` to protect sensitive data
-   GitHub Issues - how to use them for project management and collaboration

and much, much more.


- [Pro Git Book](https://git-scm.com/book/en/v2)
- [Happy Git and GitHub for the useR](https://happygitwithr.com/)
-   [GitHub Documentation](https://docs.github.com/en/get-started/quickstart/set-up-git)
-   [Learn Git Branching](https://learngitbranching.js.org/) is an interactive tool to learn Git on the command line
-   [Software Carpentry Version Control with Git](https://swcarpentry.github.io/git-novice/)
-   Bitbucket's tutorials on [Git Workflows](https://www.atlassian.com/git/tutorials/comparing-workflows)
