# SLU06 - Git Intermediate

In SLU03, we got acquainted with the way Git and other Version Control Systems (VCS) operate and learned the basic workflow to store and track changes in local and remote repositories. Recall the Git universe and the basic commands which make the data flow between the different parts of it:

<img width="700" src="media/01_git_flow.png"/>

As a refresher (*and totally copied from SLU03*)...

>  _The local repository is the “container” that tracks the changes to your project files. It holds all the commits — a snapshot of all your files at a point in time — that have been made._
>
>  _The workspace consists of files that you are currently working on. You can think of the workspace as a file system where you can view and modify files._
>
>  _The staging area is where commits are prepared. The index compares the files in the workspace to the files in the local repository. When you make a change in the workspace, the index marks the file as "modified" before it is committed._

We briefly covered the remote repository, and how to communicate with it through the `git pull` and `git push` commands allowing us to incorporate changes from the remote repository into our workspace and to update the remote repository with our changes, respectively. 

In this SLU we'll dive deeper into the remote repository and how to make the most out of it. *Get ready!* 

## 1. The remote repository
The remote repository (also called *remote*) holds the version history of your project, basically a mirror of your local repository, and is hosted somewhere on a server that you can reach over the network. You can have several remote repositories and collaborating with others involves managing these remote repositories and pushing and pulling data to and from them when you need to share work.

<img width="700" src="media/02_git_remote.png"/>

Managing remote repositories includes knowing how to add remote repositories, remove the ones that are no longer valid, manage various remote branches and define them as being tracked or not (what the heck are branches, you ask? You'll see!), and more.

Most of the work you need to do related to version control happens in your local repository: staging, committing, viewing the status, viewing logs, and so on.

The main advantage of a remote repository is that is allows collaboration with other developers. All developers work in their local repository, make modifications, and commit them locally. These changes are then uploaded to the remote repository to share with others and integrate with changes made by others.

### 1.1 Cloning a repository

You are probably wondering: *"Isn't this what I did at the beginning of this course!?"*. The answer? *Yes!*

You already created a remote repository and cloned it to your local machine when you followed [these instructions](https://github.com/LDSSA/ds-prep-course-2025/blob/main/docs/github.md#32-setup-your-workspace-repository).

You can check which remote repositories you have configured by using the command `git remote`. If you run that command inside your `ds-prep-workspace` you should see something like this:

<img width="700" src="media/03_git_remote.png"/>

The name `origin` is the default name Git gives to the server you cloned from.

If you do `git remote -v` (for *verbose*), it includes the URL of each connection.

<img width="700" src="media/04_git_remote_v.png"/>

As you remember from SLU03 - Git basics:
- the `git pull` command  is used to fetch and download content from a remote repository and immediately update the local repository to match that content
- the `git push` command is used to upload local repository content to a remote repository

With the `git remote -v` command you can see what URLs git is using to make the connection to the remote repository for these operations.

Another important command covered in SLU03 was `git status` (*hands down the most used git command in all the land!*). This command lets you see the status of your workspace and staging area files. 

Here's the status of the repository of the instructor who worked on these learning materials in 2022:

<img width="700" src="media/05_git_status.png"/>

The `git status` command is super fast and you can use it offline, meaning you don't need to have an active network connection to use it. This is possible because you are checking the status of your local Git repository and don't need to communicate with the remote repository for it.

One simple way to know if your local repo is out of date in comparison to the remote repository is to use the command `git remote show origin`. To update it, the `git pull` command is your friend!

### 1.2 Stash changes

You already know how to save changes permanently in the repository using the trusty `git commit` command. Now imagine the following (hypothetical) situation:

You are midway through the development of Feature-X. Your code is a complete tangled mess.  
Your manager enters the room asking you to solve BUG-XPTO. The development of Feature-X is completely unrelated to BUG-XPTO.  
You surely cannot commit, or you'll be ridiculed by your colleagues and never be asked to play darts again! So, what do you do?

This is where `git stash` comes to the rescue! As I have read somewhere: *stash is like a clipboard on steroids!* ;) 

<img width="250" src="media/06_keep_calm_git_stash.png"/>

`git stash` saves all your staged and unstaged files (if you want to include untracked files you must add the `-u` or `--include-untracked` option) and saves them. You are left with a repository in the same state as when you committed your latest changes.

Let's look at an example. This is the status of the local instructors repo looks like after adding a few changes:

<img width="700" src="media/07_git_status_before_stash.png"/>

As you can see, there are different files/folders with different status:
- The first one is already added to the local repo with the "git add" command. *(Changes to be committed)*
- The second one was already added to the local repository but has been modified since then. *(Changes not staged for commit)*
- The last one has never been added to the local repository. *(Untracked files)*

Since there are untracked files (the last one), we can stash them using the `-u` option.

<img width="700" src="media/08_git_stash.png"/>

Notice how the result of stash identifies the latest commit of your local repo and how the status shows that there is nothing to commit? *It's magic!* 

*But now how do I get my work back?!*

To get the changes out of your stash, you use the command `git stash pop`. The changes you stashed will be removed from the stash and applied to your local repository again. *Super duper awesome magic!*

<img width="700" src="media/09_git_stash_pop.png"/>

You can store multiple stashes, do partial stashes, *pop* specific stashes, and much more. But this is outside the scope of this SLU. What you've learned should be enough to keep you out of trouble (*I hope so!*) :)

## 2. Git branches

Now we reached the scariest bit, but the most powerful one of git. The wonderful world of BRANCHES!! Woohoo! *Party everywhere!* 

What the heck is it, you ask? Well... In simple terms, a branch is a parallel version of the code that can be modified without affecting the other versions. This is extremely useful for collaborative work, to introduce new features, for anything really... Branches are cool. *Trust me.* 

Branching in Git is fast and easy and is strongly (!!!!!!!) encouraged.

### 2.1 The main branch

The main branch in git is called **main** (*surprising, I know*) although until very recently it used to be called **master** - this nomenclature is still currently used. The main branch is created by Git when you create the repository. 

All the changes you do in the other branches should, sooner or later, be integrated (called *merging* in git) into the **main** branch. 
When you work on a new feature or fix a bug, you create a new branch to isolate your changes. This makes it harder for unstable code to find its way to the main codebase. If this is true for a single person, imagine working on a team where teammates are working on the same codebase (a.k.a. repository) but on different features or bug fixes. Life would be extremely hard without branching.

When you start a new repository, you, by default, are in the **main** branch (or **master** on older repositories). All your pushes are going to this branch and you don't need to specify it.

<img width="700" src="media/10_master_branch.svg"/>

When you create a new branch, let's say for a *crazy experiment*, you are creating a copy of your main branch at that time.

<img width="700" src="media/11_crazy_experiment_branch.svg"/>

As you continue your development, the two branches will continue to co-exist on their own, independently of each other until you purposefully merge them together (coming soon!). This process can be repeated *indefinitely*.

<img width="400" src="media/12_branchs_coexisting.png"/>

*Do not mess with the master (the main branch ruined my pun...)*

Why? The main branch should be where your stable code lives. It's supposed to be your production-ready code that you can deploy at any moment. If you want to work on a new feature or experiment with something, you branch out. Only when you're sure the code is stable should it be merged to main. 

For the purpose of the Prep Course, you are using your workspace repository more as a cloud *so we'll ignore the advice of people wiser than us and keep pushing to main... #rebels*

<img width="600" src="media/13_branchs_merging_to_main.png"/>

### 2.2 Creating and switching branches

To demonstrate the use of branches let's pretend we want to work on two new features:
- **Feature A** is a list of books we want to read.
- **Feature B** is a list of foods to try.

We know better than to create these new features directly in the main branch because *Your boss will get sad and cry*. So we should create two new branches, one for each of the features.

Let's first create a file with some groceries, stage and commit it in the *main* branch.

![14_git_create_file_on_main_branch.png](media/14_git_create_file_on_main_branch.png)

#### 2.2.1 Feature A branch
Let's start by creating the branch for Feature A. We'll name it **"feature_a_books"** and create it by using the command `git branch <branch name>`.

<img width="500" src="media/15_git_create_branch_feature_a.png"/>

So, the `git branch <branch name>` has no output and `git status` says that we're still on the main branch. *Did anything go wrong?* No! 

The `git branch` command just creates the branch but does not switch you to it. You continue to work in whatever branch you were in before. 

To switch branches in git you use the command `git checkout <branch name>`. You can also list the branches with the command `git branch`. The branch identified with an `*` is the currently active one.

<img width="500" src="media/16_git_checkout_feature_a.png"/>

#### 2.2.2 Feature B branch

To create the branch for Feature B, I'll use the command `git checkout -b <branch name>`. The `-b` of the *checkout* command is very handy as it creates the branch and immediately makes it the active one. This is the most common way of creating a branch.

<img width="500" src="media/17_git_checkout_feature_b.png"/>

Now let's create a file with some foods, stage and commit it.

<img width="500" src="media/18_git_create_and_commit_foods.png"/>

#### 2.2.3 Checking the changes

Let's check the commits log on each of the branches:

<img width="550" src="media/19_git_log_branches.png"/>

- *main* still the same two commits
- *feature_a_books* has a new commit with the message "add books to read"
- *feature_b_foods* has a new commit with the message "add foods list"

Let's compare one final thing in the three branches: the files in each one of them. I'll list only the files with the extension `.txt` to remove some clutter from the screen.

<img width="500" src="media/20_list_files_in_branches.png"/>

- *main* has only the file `groceries.txt`
- *feature_a_books* has the files `groceries.txt` and `books_to_read.txt`, which were added in the commit with the message "add books to read"
- *feature_b_foods* has the files `groceries.txt` and `foods_to_try.txt`, which were added in the commit with the message "add foods list"

### 2.3 Pushing the branches to the remote

When you're pushing a branch to the remote for the first time, you need to specify some extra options in the command. That's because the remote still doesn't know about the branch, it only exists in the local repository. Conveniently, git shows the exact syntax you should use when you `git push` a branch for the first time.

*Make sure that you are on the correct branch before pushing!*

<img width="700" src="media/21_git_push_new_branch.png"/>

The syntax for pushing a branch that does not exist in the remote is `git push --set-upstream origin <branch name>`. After the branch is created in the remote repository, a `git push` is enough to update the remote.

After pushing, we can see the branches on Github.

<img width="1500" src="media/22_github_branches.png"/>

### 2.4 A final note on creating branches

When you create a new branch, it becomes a snapshot of the branch you created it from (where you "branched off"). The commits you have in the current branch will be "copied" to the new branch. After that, all commits done to one branch or the other stay in that branch. You can create branches from the main branch or any other branch, depending on where in the project you plan to add new features.

### 2.5 Merging branches

![23_git_merge_battle.gif](media/23_git_merge_battle.gif)

We can't keep new features and commits in separate branches forever. We'll eventually want to integrate the changes from one branch to main (or another branch). This integration of branches is called **merging** and is done with the `git merge` command.

It's very easy to perform a merge, but you must be careful to get the order of the operations correctly:
1. You checkout to the branch that you want to merge into. Ex: if you want to merge into the main, you checkout to the main.
2. You call `git merge` with the name of the branch you want to merge into the current brach.

It's super easy to get this wrong as a beginner, so be careful!

So back to our repository... This is how it looks like after we created the two feature branches:

![24_diagram_two_branches.png](media/24_diagram_two_branches.png)

We see that the *main* is still pointing to the same commit, and each one of the feature branches is pointing to the commit with the changes we made in that branch.

#### 2.5.1 Merging *feature_a_books* branch (fast-forward)

Now let's merge the branch *feature_a_books* into the *main*. To do that we run the following commands:

```bash
git checkout main
git merge feature_a_books
```

And it should output something like:

```
Updating 52f0f1a..b17f0f8
Fast-forward
 books_to_read.txt | 3 +++
 1 file changed, 3 insertions(+)
 create mode 100644 books_to_read.txt
```

Notice the text *Fast-forward* above. I will explain that in a minute.

`git log --oneline` should now output:

```
b17f0f8 (HEAD -> main, origin/feature_a_books, feature_a_books) add books to read
52f0f1a (origin/main, origin/HEAD) add groceries list
74b7996 add groceries.txt file
4ab27a7 Initial commit
```

Our diagram, after merging, looks like this:

![25_diagram_merge_feature_a.png](media/25_diagram_merge_feature_a.png)

You see the **fast-forward** above? Well... 

Do you notice how both the *main* and *feature_a_books* branches are both pointing to the commit *b17f0f8*? Git did this merge simply by adding the extra commit to the *main* branch.

Since there was a linear path from the tip (latest commit) of the *main* branch to the tip of the *feature_a_books* branch, all git had to do was move the tip of *main* to the tip of *feature_a_books*. This is called a **fast-forward merge**. In other words, a fast-forward merge could happen because *main* did not have any extra commits compared to *feature_a_books*.

#### 2.5.2 Merging *feature_b_foods* branch (3-way merge)

Our repository looks like this now (I removed *feature_a_books* branch for simplicity):

![26_diagram_before_3_way_merge.png](media/26_diagram_before_3_way_merge.png)

There isn't a linear path from *main* to *feature_b_books*, so a fast-forward merge will not be possible in this situation. These branches have diverged and *main* now has an extra commit that *feature_b_foods* doesn't have.

Git will have to merge the branches using a **3-way merge.** This creates a new *merging commit* on the active branch to join the two branch histories.

We merge the two branches with our known commands:

```bash
git checkout main
git merge feature_b_books
```

Git will open the text editor for us to enter the commit message. It should look something like this:

```bash
Merge branch 'feature_b_foods'

# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
```

We will save it as it is and exit the text editor, and then the following should show up in the command line:

```
Merge made by the 'recursive' strategy.
 foods_to_try.txt | 2 ++
 1 file changed, 2 insertions(+)
 create mode 100644 foods_to_try.txt

```

The commit and merge are complete. The resulting diagram after all the merges is:

![27_diagram_merges_complete.png](media/27_diagram_merges_complete.png)

A new commit (*1f0f464*) was created, and our local *main* branch is pointing to it. Notice that we did all these merges in the local repo but didn't push them to the remote yet. I'll skip that part here for the sake of brevity, but you should always do it!!

### 2.6 Merge conflicts

Sometimes things don't go so smoothly...

![28_merge_conflict_is_comming.jpeg](media/28_merge_conflict_is_comming.jpeg)

Merge conflicts are common. My hope in this section is that you understand what they are, why they happen, and some simple strategies on how to solve them.

Most of the time, git will be able to merge automatically, making the necessary changes to the affected files. But there are situations where it cannot make the decisions all by itself. Examples of these situations:
- the branches you are trying to merge have changes in the same line of the same file
- one branch deleted a file, and the other branch made modifications to that file

They all go like this: two branches edit the same content in conflicting ways.

#### 2.6.1 Making a merge conflict

I'll go ahead and make changes to my repo to force a merge conflict. No new commands here, just the usual stuff. I will create two new branches, and both will edit line 6 of the file *groceries.txt*. One branch will change it to "- Chocolate ice-cream" and the other to "- Strawberry ice-cream".

```bash
# start from the main
git checkout main
git checkout -b chocolate
# edit line 6 of groceries.txt to "- Chocolate ice-cream"
git add groceries.txt
git commit -m "change Ice-cream to Chocolate ice-cream"
# we want both branches to be based off the main, so we'll checkout from the main again
git checkout main
git checkout -b strawberry
# edit line 6 of groceries.txt to "- Strawberry ice-cream"
git add groceries.txt
git commit -m "change Ice-cream to Strawberry ice-cream"
```

Now let's merge the branch *chocolate* into *main*.

```bash
git checkout main
git merge chocolate
```

It should be a fast-forward merge, without any conflict.

```
Updating 1f0f464..0666af0
Fast-forward
 groceries.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
```

Now the line 6 of the file "groceries.txt" in the branch *main* should have "- Chocolate ice-cream".

Now let's merge *strawberry* into *main*.

```bash
# you should already be in main, so no need to checkout, but you never know :)
git checkout main
git merge strawberry
```
And BAM! **Merge conflict**!

<img width="500" src="media/29_merge_conflict.jpeg"/>

```
Auto-merging groceries.txt
CONFLICT (content): Merge conflict in groceries.txt
Automatic merge failed; fix conflicts and then commit the result.
```
A conflict happened in the file *groceries.txt*. This is because line 6 changed from "- Ice-cream" to "- Chocolate ice-cream" in the branch *main* and from "- Ice-cream" to "- Strawberry ice-cream" in the branch *strawberry*. 

### 2.7 Solving a merge conflict

There are several ways you can solve merge conflicts. We'll explain a few.

#### 2.7.1 Abort the merge
If you have no clue why the conflict is happening or are not sure which version you want to keep, abort the merge and inspect the branches to gain some clarity. You abort a merge with the command `git merge --abort`. The merge is aborted, and it all goes back to the state it was before.

#### 2.7.2 Checkout ours
If you know that you want to keep the *main* version of the conflicted file and ignore the incoming changes, you use the command `git checkout --ours groceries.txt`.

*ours* just means the branch you're in, in this example, it's the *main*.

After you do this, line 6 of the file should have "- Chocolate ice-cream".

Note: even if you maintain your file with `checkout --ours`, you still have to commit the file for the merge to be completed.

#### 2.7.3 Checkout theirs
If you know that you want to keep the *strawberry* version of the conflicted file and ignore the file in the *main*, then you use the command `git checkout --theirs groceries.txt`.

*theirs* just means the incoming branch, in this example, it's *strawberry*.

After you do this, line 6 of the file should have "- Strawberry ice-cream".

Finally, you have to commit the file for the merge to be completed.

#### 2.7.4 Edit the conflicted file
There are situations where you cannot simply choose *ours* or *theirs*:
- If you are not sure what version to keep
- want to keep both changes
- there are multiple conflicts within the file and for some of them you want to keep `ours` and others want to keep `theirs`
- etc.

In this case you edit the conflicted file with a text editor. You should see something like this:

```
Quarantine Groceries:
- Toilet paper
- Popcorn
- Coffee
- Beer
<<<<<<< HEAD
- Chocolate ice-cream
=======
- Strawberry ice-cream
>>>>>>> strawberry
- Canned beans
```

Git annotates the conflicted file with git conflict markers, marking the areas where it needs help deciding what to do:
- `<<<<<<< HEAD` identifies the start of the section of the current branch
- `- Chocolate ice-cream` this is the value committed in the current branch (*main*)
- `=======` marks the end of the changes on the current branch and the start of the changes in the incoming branch
- `- Strawberry ice-cream` this is the value committed in the incoming branch
- `>>>>>>> strawberry` marks the end of the changes in the incoming branch and identifies the branch

You simply (ok, not always this simple) have to edit the file so that it looks exactly how you want it to. You have to get rid of the git conflict markers. We'll choose to keep both ice-creams, so our file will look like this:

```
Quarantine Groceries:
- Toilet paper
- Popcorn
- Coffee
- Beer
- Chocolate ice-cream
- Strawberry ice-cream
- Canned beans
```

You then have to commit the file for the merge to be completed.

### 2.8 A note about Jupyter notebooks conflicts 

Although editing the file provides the most flexibility while merging, you'll find it tough to do it with Jupyter notebook files. They are long `json` files, not easily read by human eyes. You're better off keeping to the `checkout ours` or `checkout theirs` methods.

## 3. Collaboration

As stated in SLU03, Git was designed to solve the problem of multiple people working on the same code. For example, all of the learning materials are a collaborative effort between instructors and the quality assessment team to bring you the best possible source of knowledge. (*If you think I do a lot of typos... QA has full days with me!*)

There are two ways to collaborate: 
- you contribute to the same repository, or 
- you work in different repositories and combine them at the end.

Let's look at some things you can do to collaborate on existing code.

### 3.1 Create an issue

One of the easiest ways to collaborate in a repository is through the creation of **issues**. You can use issues to track ideas, propose enhancements, or alert to bugs. They are like an email, *sort of*, associated with the remote repository.

Check out our [Prep Course GitHub issue section](https://github.com/LDSSA/ds-prep-course-2025/issues). (This screenshot is from two years ago, so hopefully we already solved those issues.)

<img width="700" src="media/30_create_an_issue.png"/> 

As you can see, you can create a new issue with the green button in the top right corner. You have to define a header and the body of the issue (where you explain what's wrong). You can assign collaborators who should work on the issue, add labels, annex files, and a lot more!

A message from our Documentation Queen: 

    Always provide an actionable.

An **actionable** is a concrete request of action - it can be a solution proposal, a feedback request, a bug fix... It does not need to be the solution, but it needs to be clear what action the assignee needs to do.

Lack of clarity might get your issue closed without it being addressed! 

*Found a typo in the learning materials?* Open an issue! 

### 3.2 Fork a repository

If you want to contribute to an existing project to which you don’t have push access, you can **fork** the project. When you fork a project, GitHub will make a copy of the project that is entirely yours - it lives in your namespace, and you can push to it.

<img width="700" src="media/31_git_clone_vs_git_fork.png"/>

To fork a project, visit the project page and click on the “Fork” button at the top-right of the page. After a few seconds, you’ll be taken to your new project page, with your own writeable copy of the code.

<img width="150" src="media/32_git_fork.png"/>

### 3.3 Create a pull request

Above, we were merging branches just like that, but the most common way to merge branches is through **pull requests**. Pull request or PR lets you tell others about changes you've pushed to a branch in GitHub repository that you want to merge with the existing code. The PR allows your collaborators to review the proposed changes and suggest improvements. When everyone is happy with the proposed changes, the pull request is approved and your code can be merged. 

<img width="700" src="media/33_pull_request.png"/>

A pull request can be done from a branch in the repository or from a forked repository. GitHub has a tab where you can open a PR and see all open (unmerged) PRs.

<img width="350" src="media/34_pull_request_in_remote.png"/> 

### 3.4 The GitHub flow

GitHub is designed around pull requests. This flow works whether you’re collaborating with a tightly-knit team in a single shared repository or a globally-distributed company or a network of strangers contributing to a project through dozens of forks.

Here’s how it generally works:

- Fork the project.

- Create a topic branch from the main/master.

- Make some commits to improve the project.

- Push this branch to your GitHub project.

- Open a pull request on GitHub.

- Discuss and optionally continue committing.

- The project owner merges or closes the pull request.

- Sync the updated main/master back to your fork.

## 4. Git is love, Git is life

So this was quite a lot... Git is a very complex system but at the root of it all, it's incredibly simple and that's got to be magic! Right now it looks like a lot, but remember that you've already done most of these things.

The rest? It comes with practice!

There are a few resources available online to help you navigate the wonderful world of version control, which will accompany you throughout the rest of the Prep Course.

- https://www.atlassian.com/git/tutorials/what-is-version-control
- https://www.git-tower.com/learn/git/ebook/en/command-line/basics/what-is-version-control#start

And if you want to practice without fear of messing with local repositories, here's a good resource:
- https://learngitbranching.js.org/?locale=en_US

I hope you enjoyed these Learning Units! You're now ready to face the Exercise Notebook. 

And remember....

<img width="700" src="media/35_in_case_of_fire.jpg"/> 