# Lab Session 3: GitHub

> Friday 09-19-2025, 9AM-11AM & 1PM-3PM & 3PM-5PM
>
> Instructors: [Jimmy Butler](https://statistics.berkeley.edu/people/james-butler) & [Sequoia Andrade](https://statistics.berkeley.edu/people/sequoia-rose-andrade)

**Plan for today:**

  1. **Creating your own GitHub repos**. You've cloned repos that someone else has already created. How can you create your own to clone?

  1. **Simulating a GitHub Collaboration Pair**. Last week, we simulated two colleagues working on the same git repository with the use of local branches. Today, we will split up into pairs and you will get practice pushing to the same repo and attempting to resolve any merge conflicts as they arise.

  1. **Reconstructing past versions**. Let's say you want to back to previous versions of a repository (marked by previous commits), or that you want to recover deleted files? We'll go through that today.

  1. **Forks and Pull Requests (if time)**. Let's say we have a collaboration where you don't have write access to a repository but you would like to suggest changes. We will go over how to fork repositories, make our edits, and suggest they be merged into the main repo with a pull request.

**Useful links:**
- [JupyterHub](https://stat159.datahub.berkeley.edu/hub/login?next=%2Fhub%2F)
- [Dotfiles](https://github.com/fperez/dotfiles)

## 1. GitHub Check-In


### 1.1 Authentication

Recall that whenever we want to interact with repos hosted on GitHub, we must authenticate using the `GHAUTH.ipynb` notebook. If you don't already have it, go to the `shared` directory and make a copy of `GHAUTH.ipynb` in your home directory. If this is your first time using it, follow the one-time step at the top of the notebook. Then, run the cell and enter the 8 digit code in the field at the provided link.

If you already have this notebook, simply open it and authenticate as usual.

### 1.2 Creating your Own GitHub Repos

We've practiced cloning repos that have already been created for you and pushing changes to those repos (you did it for HW1!). We will also cover how to create your own repos.

#### Starting from Scratch

First, we'll go through how to create an empty repo on GitHub, clone it to our JupyterHub accounts, and make some edits.

1. Create a new repository in GitHub inside your personal account. For this, you can decide to create an empty repository or fill it with some basic content (for example, a `README.md` file) that renders on the repo landing page. For now, let's create an empty repository.
![image.png](attachment:811e6135-32b2-4c99-9040-7cce1cf4efaf.png)
![image.png](attachment:e310ac78-fd19-4bdf-92ce-858f4f022f80.png)
1. Clone the repo onto the JupyterHub with `git clone <repo_url>`.
1. Create a text file with any piece of text in it. Add, commit, and push your change.
1. In GitHub (the website, not in your Hub session), refresh the page and verify your new file is there. Then, make some edit to that text file and commit it, *all within GitHub*.
1. Going back to the JupyterHub, pull these changes to your local repository using `git pull`. You should see the text file contents have changed to what you just did on the GitHub website.

#### Syncing with an Existing Local Repo

Let's say we wanted to create a GitHub repo, but instead of starting from scratch, we want to sync it with an existing local repo.

1. Let's first create a local git repo with some content, simulating a local repo with important info we'd like to sync to a GitHub remote repo. Create a folder called `lab3_sync_test`, and instantiate a git repository in it with `git init`.
1. Add a text file with some random text. Commit it.
1. Now, go back to GitHub and follow the same steps as above to create a new repo (use a different name than the last repo you created).
1. To sync this repo with your existing local repo, follow the steps under "..or push an existing repo from the command line". Try to decipher what each step is doing, relative to your mental model of how git operates.

In either of the above cases, we now have a local copy of a git repo that is connected to a remote copy hosted on GitHub. For the next exercise, let's just work with the `lab3_sync_test` local git repo, although this exercise works equally well with the cloned one.

Create a new branch in your local repository (using what we learned last lab), make a change, and try to push it to GitHub. What happens when you do this? If you try to `git push` changes in a new branch that you just created, you will receive the following message error `fatal: The current branch <branch name> has no upstream branch`. This is because you have just created the branch in local and not in remote. Instead, the first time you push a file to a new branch you have to do `git push -u origin <branch name>` (`-u` is just a shortcut for `--set-upstream`).

## 2. Collaborating on GitHub with a small team

For this part of the lab, we are going to set up a shared collaboration with one partner.  This will show the basic workflow of collaborating on a project with a small team where everyone have write privileges to the same repository.  

We will have two people, let's call them Alice and Bob, sharing a repository.  Alice will be the owner of the repository and she will give Bob write privileges. Decide on who in your pair will be Alice and who will be Bob.

### 2.1  Simple Synchronization

We begin with a simple synchronization example, much like we just did in the previous lab, but now between **two people** instead of one person.  

1. Alice creates a new repository in GitHub with some basic text files on it. For now, let's make this repository hosted in Alice's personal GitHub account and let's make the repository public. 
1. Bob clones Alice's repository.
1. Bob makes changes to a file and commits them locally.
1. Bob pushes his changes to GitHub. Now, in order to Bob to be able to push changes, he needs to be added by Alice as a collaborator of her respository. Alice needs to add Bob as collaborator in her Github setting page. One more detail: Alice needs to be sure that the repo has permission to use the GitHub app we are using for the course! in order to do that, Alice needs first to authenticate and then go the config page in the app and be sure the new repository is listed.
1. Alice pulls Bob's changes into her own repository.

Next, we will have both making non-conflicting changes each, and commit them locally.  Then both try to push their changes:

1. Alice adds a new file, `alice.txt` to the repository and commits.
1. Add a tag to this stage of the repository. You can learn more about [tag](https://mirrors.edge.kernel.org/pub/software/scm/git/docs/git-tag.html) here. In which cases do you think it will be useful to use tags?
1. Bob adds `bob.txt` and commits.
1. Alice pushes to GitHub.
1. Bob tries to push to GitHub.  What happens here?

The problem is that Bob's changes create a commit that conflicts with Alice's, so GitHub refuses to apply the changes in the remote. This forces Bob to first merge in his machine. If there is a conflict in the merge, Bob needs to deal with the conflict manually (git could try to do the merge on the server, but in that case if there's a conflict, the server repository would be left in a conflicted state without a human to fix things up).  The solution is for Bob to first pull the changes.

Next, let's have Bob creating a new branch, commiting changes in that branch to GitHub and then Alice retrieving the new branch to her local repository. 

1. In his local repository, Bob creates a new branch (see previous lab).
1. Bob commits new changes to his branch and then try to push this changes to the remote repository in GitHub. 
1. Alice now pull the new branch from GitHub. Can Alice display Bob's new branch with `git branch`? What happen when we see `git branch --all`?.

### 2.2 GitHub with conflicts

Follow the same workflow than before, but now try to induce a conflict. In order to induce a conflict, 
1. Both Alice and Bob need to make changes in the same line of the text files and commit these changes. 
1. After commiting, one of them will push the changes to GitHub. 
1. Right after, Alice will try to pull and she will see the merging conflict happening. How to solve this? Just in the same way you solve branch merging conflicts last lecture!
Are you tired of solving the issues during the merge? You can always do a `git merge --abort` and try to avoid the merge conflict before it happens. 

### 2.3 Working directly in GitHub

So far we were using GitHub just to host our remote repository, but GitHub also allow us to make operations like quick changes, documenting issues or just viewing the state of our repository, including past versions. 

Explore a little bit your GitHub repository. Specifically, 
1. Modify a text file and commit these changes from GitHub. 
1. How to see different branches in GitHub? Are these the same you have in your local machine?
1. Alice adds Bob as a collaborator in the repository. You will need to go to setting for this. 
1. Open a new issue and tag the other person in that issue. 
1. Review past versions of the repositories. 

## 3. Reconstructing past versions

Sometimes we make accidental changes to some of the files in a repository, or maybe we just want to come back to a previous version. In any case, it is easy to restore or even recover old versions of files that have been track in a commit message. 

For these next examples, we are going to use the `git checkout` command to restore past versions of a file. This can lead to some confusion, since this is the same command we use for changing branches. In a sense, `git checkout` does both the work of `git switch` and `git restore`. You can do all the following exercises with `git restore` instead of `git checkout`.  

### 3.1. Restoring old versions

For this example, we are going to make modifications to one of the files in our repository and then recover some of the older versions. 

1. Make more than one change in the same file in your repository, for example you can use write some new text inside `text.txt`. With `echo "..." >> text.txt` you will print new lines at the end of `text.txt` (with `>` you will just overwrite all the contents). 
1. Try to restore previous version of such file by using 
```bash
git checkout <commit> <filename>
```
or 
```bash 
git restore --source=<commit> <filename>
```
You will need to specify the stage at which you want to restore the file. You can do this by looking at the log of the repository (`git log`, `git slog`, `git log --all`). This is why commit messages are so important! 

Observation: you can also see old versions of your files directly on GitHub, in case you need to inspect previous versions of files.

### 3.2. Recovering deleted files

Now, let's practice deleting and recovering an specific file

1. Remove one of the files in your test repository. You can also just create a new file and remove it. To do so, use the `git rm <file name>`. You can take a look to this [link](https://www.atlassian.com/git/tutorials/undoing-changes/git-rm) to see some flags you can add to this command. 
1. If you haven't commit your changes, you can recover the file just by coming back to the previous snapshot of the repository by using `git checkout HEAD`
1. Now, if you make a commit after removing the changes, you need to do a little bit more of work. Use `git slog -- <filename>` to see all the history associated to the file you removed and then use `git checkout <commit> -- <filename>` to recover it. 

## 4. Forks and Pull Requests `*`

Sometimes is is useful to create a complete new copy of a repository and work on that before merging the new changes to the main repository. This is particularly when working with packages, big projects or just when we want to experiment a little bit having the version control capabilities without adding changes to the main repository. For these cases is when we want to use forks and then pull requests if we want to merge changes across repositories (or inside the repository too).

If you are not familiar with the concept of pull requests, take a look to the [pull request chapter](https://merely-useful.tech/py-rse/git-advanced.html#git-advanced-pull-requests) in the course book. The idea of a pull request is to merge a branch for any given repository to a branch of another repository. 

Re-pair up and re-assume the roles of Alice and Bob.
1. Bob forks Alice's repository (`alice/test`) into his own GitHub account (`bob/test`). Take a look to [Using Other People's Work](https://merely-useful.tech/py-rse/git-advanced.html#git-advanced-fork). 
1. Bob clones his repository into his local machine. 
1. Bob makes some changes and commit them. 
1. Bob pushes his changes to GitHub. 
1. Once in GitHub, Bob creates a pull request to merge his changes into `alice/test`. At the moment of doing this, add Alice as reviewer of the pull request. 
1. Now Alice needs to accept Bob changes to the main repository. 

Repeat this but now change roles between Bob and Alice. 

## 5. Extra

If you already finished all the previous tasks, you are welcome to explore some more useful git commands!
- [show](http://www.kernel.org/pub/software/scm/git/docs/git-show.html)
- [reflog](http://www.kernel.org/pub/software/scm/git/docs/git-reflog.html)
- [rebase](http://www.kernel.org/pub/software/scm/git/docs/git-rebase.html)
- [fetch](https://mirrors.edge.kernel.org/pub/software/scm/git/docs/git-fetch.html)

Can you think in a situation where these commands may result useful? For example, instead of using `git pull`, can you do the same with `git fetch` and then `git merge`? If so, what can be the advantage of doing such a thing?