In [1]:
%%html
<style>
    .gray {
        background-color: #EEEEEE;
    }
</style>

<font color=red>
This is to save how links can be embedded in markdown cells. The link to the workbench tutorial can be found
</font> 
<A Href="https://carpentries.github.io/lesson-development-training/infrastructure.html">here</A>.

**<font size="4">Episodes</font>** (to be updated)

1. Why to use Git as a (digital) humanities scholar
2. How does Git work?
3. Different ways of using Git
4. Let's get a bit more technical: what are Git and Github?
5. Installing Git and creating a Github account
6. The logic of the world of Git

...

7. Git best practice (There are important rules and methods of collaboration as a team on a certain project. )

**<font size="4">Objectives</font>**

This course addresses learners from the humanities with zero or little git and programming skills (try to be even more specific about the target audience and their background knowledge), who wish to integrate big data and digital methods of data analysis into their research. It walks the learners through learning the concept of git and the best practice of implementing it in their solo or collaborative projects. 

**<font size="4">Desired Outcome</font>**

At the end of this course, the learners will have gained a solid understanding of what git is and how it can be used through github and its desktop app to collaborate with other team members on the development of a digital humanities project that incorporates any filetype and folder structure. 

**<font size="4">Motivation of the Audience</font>**

Address the concrete needs and desires of people already working in the area of digital humanities: 

- What challenges have they been facing?
- How would learning git help them progress in their collaborative projects?

**<font size="4">Prerequisite Knowledge</font>**

- Knowing how to work with an internet browser;
- basic knowledge of working with the terminal

# Why to use Git as a (digital) humanities scholar

No matter in which area of the humanities you are working and whether you are implementing digital methods and tools or not, learning to use Git can be very beneficial for you. This episode explores the reasons why it is useful to learn git and how it can contribute to your research, no matter if you are carrying out individually or in collaboration with a team.

In this episode, I am going to briefly touch upon concrete problems that most humanities scholars have come across in their research projects. For now, I will only introduce the problems and will not address the solution, which is Git. So the purpose of this episode is to first make sense of the use cases of Git, before actually learning about how to implement it. 

Working on book projects or writing articles, all humanities scholars know how file names such as **final_article.docx**, **final_final_article.docx**, **final_final_article_reviewed_by_gs.docx** and **final_final_article_reviewed_by_gs_commented_by_dh.docx** etc. are developed. The creation of files with ever larger titles, each addressing a certain development phase of the original article, can go on for a long time and it results in multiple versions of the original article file. As long as there is only one person working on the article file and deciding what changes to make to it and which comments or review points to ignore, this **<font color=blue>version control</font>** system can work alright. Although after some time passes, the owner of the file might not remember any more exactly which stage of the development of the original article is stored in which file and what exactly is the difference between the contents of **final_article.docx** and **final_final_article.docx**, unless they read both versions entirely again. In other words, the file title does not offer us enough space for an extensive **<font color=blue>documentation</font>** of the work in progress.

<A Href="https://swcarpentry.github.io/git-novice/01-basics.html">automated version control</A> (Credit: The Carpentries)

But what if multiple people are working on the same file at the same time? One method for collaboratively developing any kind of document is using cloud services that enable synchronous and collaborative writing, such as Google Docs or Google Sheets. However, if people you are collaborating with on a certain Google Docs document delete or change parts of the text that others had written previously, it would be very difficult and confusing to compare the current version of the text with the previous versions and decide whether or not there have been conflicts between these versions or how it would be possible to synthesize and merge two different versions of the same paragraph by two different collaborators, without completely eliminating one and replacing it with the other. In other words, **<font color=blue>collaboration</font>** on developing the same file or files can become confusing if you do not want to replace content that has been developed by other people with your own, but you rather prefer to merge the contributions of everybody involved in the project.

If you are writing code, the problems of **<font color=blue>version control</font>**, **<font color=blue>documentation</font>** and **<font color=blue>collaboration</font>** can become even more predominant, because different people will be working on different project files and folders simultaneously. This is when Git can come to your rescue. 

# How does Git work?

To develop a deep understanding of what Git is and how it works, it would be helpful to use your powers of imagination. Imagine your partner and you get to raise a child. Let us name this child Lily. As soon as Lily enters your lives, each one of you develops their own plans and wishes for her future: you want Lily to become an internationally known sports athlete, whereas your partner wants her to grow up to become a prominent scientist. Since you both want the best for Lily's future, but each have your own methods of raising and educating her, you realize you cannot have her go both life paths at once. This is when the wizard 🧙🏻 of Git comes to assist you by inviting you to an experiment.

The wizard of Git makes you an offer: he keeps the real infant Lily by himself and gives you and your partner each a cloned Lily instead. He also hands each one of you a magical polaroid camera and a photo album. From here on, each one of you enters an alternative reality, taking their cloned Lily, their camera and album with them. Each parent raises their cloned Lily in their own alternative reality exactly the way they want to, and documents important moments of her life using the camera. For example, the first photo in your album would be one of your cloned Lily at the age of 3 with a swimsuit, standing near a swimming pool. You put this photo in the album and write a note next to it: "Lily at the age of 3, just after having taken her first swimming lesson." You and your partner each keep raising their cloned Lily up to the age of 18 and each of you documents the key moments of her growth in their own alternative world, using the magical camera and the album. As soon as Lily turns 18 in each alternative reality, the magic ends: your cloned Lilies disappear and you and your partner both return to the real world, only carrying your photo albums with you. 

In the real world, nothing has changed since you entered the two alternative realities. Lily is still an infant and now you want her to grow up to the age of 18 at the blink of an eye, going through every single stage that both you and your partner have documented with your cameras. The wizard of Git is here to help you do that. He looks at the photos and sees if it is possible to merge both parallel realities with each other and have Lily go through every single life stage that was documented on both sides. Sometimes it is possible to merge these realities. For example, Lily has lived in Johannesburg in your alternative reality at the age of 10, going to school and taking swim lessons in her free time. You have a photo in your album that shows her during practice, with a note that delivers exactly this information. In your partner's alternative reality, Lily has also lived in Johannesburg and gone to the same school at the age of 10, but she has spent her spare time focusing more deeply on her mathematics lessons. Your partner's photo of the ten-year-old Lily depicts her smiling proudly in the camera, holding the solution of a maths problem in her hand. These two events in the lives of the two cloned Lilies can be merged, so that the real Lily can have gone to school in Johannesburg at the age of 10 and divided her spare time between solving mathematics problems and excelling in swimming. 

But sometimes conflicts between the two alternative realities can make the merging process difficult. If your cloned Lily has been in Brazil at the age of 17 taking full-time professional swimming training, whereas your partner's cloned Lily has lived in South Korea at that age while she was taking full-time professional math courses, then it is not clear what the real Lily's life experience should be. Here, you and your partner should engage in conversation and each manipulate your alternative realities in a way so that they can be merged into the *real* reality by the Git wizard. Going through the entire albums, solving the conflicts and merging the two alternative realities up until Lily's 18th birthday can finally result in a real Lily who is now suddenly grown from infancy to being 18 years old and who has a memory of the merged life experiences resulting from both alternative realities. 

This imaginary anecdote can clearly explain how Git works. Just replace Lily with any project that you are developing collaboratively with a team, the photos and their notes with *commits*, the alternative realities with the *branches* and the final merging of the two realities with the *pull and merge request*. The real Lily stands for the final product of your project. 

Now that you know how Git functions and what the logic behind it is, you can go through the next episode to learn about the technical aspects of Git. 

# Different ways of using Git

You can use Git locally on your computer, in order to keep track of the development of your solo projects. You can also use a remote Git server, such as GitHub or GitLab, to work with a team collaboratively on a project.

In both cases, you can use the *Command Line Interface (CLI)* or the *terminal* of your operating system to work with Git. This is what most professional programmers do. It is also possible to use a *Graphical User Interface (GUI)* for Git. A Graphical User Interface is a way to interact with your computer that uses pictures and symbols instead of just words. It is what you see on your screen when you are using programs with buttons, menus, and icons that you can click on to do things. So, instead of typing commands like you might with a command line, you can use your mouse or touchscreen to navigate and control your computer in a more visual way.

In this lesson, you are going to learn how to use the CLI to interact with the online Github server to collaborate with your team members on the projects you are developing together. I have opted for the CLI because, in my opinion, it is much easier to use than Github's GUI. 

# Let's get a bit more technical: what are Git and Github?

In this episode, we can finally bring the contents of the previous episodes together and explain what Git is in more technical terms. Just like other software that you use on your PC on a daily basis, Git is also a software with a logic and a functionality of its own. So in order to use it, you should first install it on your computer. You will learn how to install Git on different operating systems in the following episode. 

Git's use cases include the ones mentioned in episode one: 

1. version control
2. documentation of the development phases
3. collaborative project development

## The logic behind Git

Usually, the projects that you start developing are composed of files, stored in folders on your personal computer. But remember! In the world of Git, objects and the relationships between them have different names than in the worlds of common operating systems such as Windows or macOS. When you project the magical world of Git on the folder structure that you have already created locally on your computer, the main folder containing your files is not called a *folder* any more, but rather a *repository* in the world of Git. When you first introduce your local project to the wizard of Git, the contents of the main folder containing your project are the same as the contents of the *main* repository in the world of Git. 

🡺 Note: How the main repository was used to be called *master*, but the anticolonial discourse changed it. 

When you introduce a certain folder to the wizard of Git, that is, when you *initialize* Git within a certain folder, it starts watching that folder as a *repository* and every change that takes place within it, including adding new files or changing or deleting existingb ones. Whenever you feel like you want to take a snapshot of the current state of your main repository and its contents to document that stage of the repository's development, you should *add* the files and folders the changes of which you want to document in your snapshot to the imaginary photo album in which you are documenting the stages of the project's development. But remember, taking a snapshot from a certain stage of development is not enough. You also have to write a note about the snapshot that explains exactly which stage of development this snapshot is documenting. This note is called a *commit* in the world of git. 

So, let us go through all the stages of documentation one more time, whenever you want to save a certain stage of a project's development: 

1. You *initialize* Git within the main folder of your project, introducing that folder to Git as the project's *main repository*. 
2. You *add* all files and folders within your *main* repository, the changes of which you wish to document, to Git's *stage*. In other words, you hereby *stage* certain files and folders. 
3. You write a *commit*, explaining what stage of the project's development you are documenting at this moment. 

### Branching: Creating virtual realities within the world of Git

As long as you are working in the main repository, no virtual realities are created and Git is only tracking changes and saving snapshots together with their commits. This means: as long as you keep working in the main folder of your project and the main Git repository, your local files and folders are the same as the ones in the Git repository and Git only registers their changes at your demand. The real magic of Git begins when you create *brances*, or virual realities, from the main repository. 

Now imagine you are going to test some alternatives on the project that you are working on, but you are not sure if these alternative will turn out the way you have imagined. For example, suppose you are a graphic designer and you are designing a logo for a client. You have already come up with a draft of the logo and saved it to the file "logo.jpg", and would like to test if would look nicer if you rounded the edges and changed the yellow colors to orange. One way of doing so would be to create a copy of your draft file, rename it to, say, "logo_round_edges_and_orange.jpg", and keep working on it. But, as it was mentioned in earlier episode, this could soon lead to a chaos where you will have multiple files with similar names on your computer and you could quickly lose an overview of what is what and which alternative designs are in which file. Here is how the wizard of Git can help you:

In the world of Git, it is possible to create alternative realities. If the reality consists of the concrete files and folders that you have stored on your hard drive, Git allows you to create alternative versions of these files and folders without changing the actual files and folders. This is the most important part of the magic of Git! 

**Remember!** <font color=blue> Whereas the folder- and file systems on your computer follow a spacial logic (eg. you look for the *path* of a file and a certain file is *inside* a certain folder), the logic of Git is of a temporal nature, meaning that Git can create *simultaneous* parallel realities in which different adjustments are made to the files that are stored on your computer. Also, Git can record different *stages* of the development of certain files and folders. </font>

You can create these alternative realities, using *branching*. The wizard of Git imagines the temporal aspect of your project as a tree. As long as you are working on the main repository and saving the changes directly to its files and folders, you are on the *trunk* of the tree, or on its *main branch*. When you decide to create an alternative reality as a graphic designer and test your new idea in it, you should create a new *branch* stemming from the *main branch* and work in there. When you create a new branch, you should first give it a name. In the example of the graphic designer, this name could be "round_edges_and_orange". When you enter this branch, you sill see your files and folders as they were on your hard drive before you created the branch. So here, you will keep working on the file "logo.jpg", make your desired changes to it, and save them. But if you exit the round_edges_and_orange branch, return to the main branch, and open the same file "logo.jpg", you would see that it is still in the state that it was before you created the new branch and applied changes to it. Remember the example of Lily and the wizard of Git? Lily, or the "logo.jpg" file on the main branch, remains unchanged, whereas you keep changing and developing it further in a virtual reality. Just as in the main branch, it is also possible to *add* your file in the new branch to Git's stage and write a *commit* on it at each desired and meaningful stage of development, just as it was possible to document Lily's growth with the magical camera and add it to an album with a short note. 

### Merging new branches with the main branch

When you are finished with your adjustments, it is time to compare the resulting design in the "round_edges_and_orange" branch with the ones in the *main branch*. If you do not like the alternative design, you can simple delete the branch that you created this design in it and keep working in the main branch again, making the final adjustments to the logo. But if you *do* prefer the alternative design, you should *merge* it with the design that you already have on the *main* branch. Because, remember, the only *real* reality in the world of Git consists of files and folders that you have saved onto your hard drive. If you copy these files and folders onto a USB-stick or email them to somebody, they will be transfered in the stage that they are in within the main branch. In other words, the other branches created out the main branch do not convert into reality until they are *merged* with the reality. Before the merging, they are only *alternative* realities that exist in a magical world. 

Just as in Lily's example, conflicts may occur while trying to merge different branches with the main branch. In Lily's example, the conflict occurred because somebody else had also created their own branch, i.e. their own virtual reality, and had developed it differently from you. Conflicts such as these occur while collaborating with other people, using a remote repository hosting service such as Github. For now, we should handle merge conflict management between a new branch and the main branch, which is also likely to occur. 

Depending on which kind of file you are working on, resolving merge conflicts can be different. In case of image files, since it is very difficult to create a synthesis of the two images in the two branches, you should often choose between the two images. If you opt for the new logo that you have developed in the "round_edges_and_orange" branch while merging this branch with the main branch, this logo will replace the one you originally had on your hard drive, with sharp edges and the dominant color yellow. After merging the new branch with the main branch, you can delete the new branch, because you do not need the alternative reality that you had created in it any more. Alternatively, if you prefer your original logo design for now, but would like to keep the alternative one as well and maybe return to it later, you can still keep the branch containing the alternative logo, but keep working further on the one that you previously had on the main branch, without merging the two branches. 

Till now, you have learned the logic of Git when using it on your personal computer. But what happens when you are collaborating with a team on a project, with each team member working on the project from their own computers? This question will be answered in the next chapter, which teaches you how to use a remote *Git repository hosting service*, namely Github. 

**TIP:** It is also possible to create branches from the new branch that you have created, and merge them with the new branch before merging the new branch with the main branch. We will not handle this in the current lesson, but since you now know the logic of Git and how branching implements this logic, it would not be difficult for you to implement this kind of *second order branching* (my own terminology that I have borrowed from George Spencer-Brown) whenever it is meaningful and necessary for your project. 

<font color=red> Important question: how to return to the commits and revert the changes to a certain commit? </font>

## Github: A Git repository hosting service

If you are collaborating with a team on the same project, your partners in the project should also gain access to the main repository that you have created on your computer and be able to *clone* it onto their own computers. So instead of copying the source folder of the project and the files that it contains on a USB stick and handing it personally to your partners, or sending the entire content of the source folder to your partners by email, you can synchronize your local world of Git with a remote world of Git and share the remote repository with other collaborators of the project. There are many different *Git repository hosting services* online, such as Github, Gitlab, Bitbucket, Azure DevOps Server, etc. Here, we are going to learn to use Github.

In order to put your local repository on the cloud, you first have to create an account on a Git repository hosting service - in our case, Github - and tell your computer that this account should host the local repositories that you have created in the world of Git. After doing so, you *push* your local repository to Github and make it accessible to your teammates. Then, your teammates *clone* that repository onto their own computers. So now you all have the same source folder of the project, together with all the files that it contains, on your computers. All of you also have the world of Git projected on your folder and files locally, which means that the wizard of Git is following your main repositories and their contents and it can do things with them. Now you can start collaborating with your teammates on the project. 

### How Github works

Returning to the previous example, let us suppose that you are the owner of the logo design project, and therefor also the owner of the project's Github repository. You are collaborating with another person in this project, who lives in a different timezone than yours. So right now that you are working on the project, your collaborator is asleep and you are the only person making changes to the files in the project's repository. In this case, you only have to make one additional step as compared to the case where you were implementing Git locally. That step is: after *add*ing the files you have been working on to Git's stage and *commiting* the changes, you should also *push* these changes to Github. By doing this, you keep your remote repository on Github synchronized with your local repository. 

At the end of the working day, you will have made certain changes to the logo and pushed all the changes to the Github repository. Your colleague, who has previously *cloned* the project's repository on their computer, is now awake and wants to keep on working on the project from where you stopped. The first thing they have to do is to re-synchronize their local Git repository with the remote one. For doing so, they have to *pull* the remote repository onto their own computers. Just as was the case with merging a new branch with the main branch locally, version conflict might also occur for your collaborator between the old versions of the project files they had on their hard drive yesterday and the project files that were updated by you. So before keeping to work further on the project files, your collaborator must resolve the conflicts and make sure that their local version of the files now corresponds with the main branch, or the remote repository, on Github. 



### Best practice of collaborating with a team, using Github

As you can imagine, it can be very time-consuming and sometimes frustrating, if you and your collaborator keep working on the main branch, pushing it to the Github *repo* (short term for *repository*) and then having the other person spend a lot of time solving conflicts locally on their own computer. To make collaboration easier and the steps in the process more understandable, it is always recommended not to work on the main repo, but rather to create branches and work on them. This way, you and your collaborator each make your own branches of the main repo, work on those branches, and push those branches to Github. So, the main repository remains unchanged for both you and your collaborator and you keep developing the project in your respective branches of the main repo. 

**TIP:** Does this not remind you of Lily and how you and your partner set off into different virtual realities with your own cameras and photo albums?

If you and your collaborator work simultaneously on the same feature of the project, you could face a lot of problems while trying to merge both of your branches into the main branch on Github. It is recommended that you constantly communicate with your collaborator and decide who is going to work on which feature, in order to minimize the number of conflicts. For example, if you are writing an article together, you can agree upon who is going to work on which chapter. If you create your own branch from the main branch and keep developing chapter 1 from there, and your partner does the same with chapter 2, it would be quite easy to merge both of your branches into the main branch on Github. This way, you see both versions of both chapters on Github and you tell Github to replace the existing chapter 1 in the main branch with your version of chapter 1 and replace the existing chapter 2 in the main branch with your collaborator's version of chapter 2. 

The question you may ask here is: who is going to resolve such conflicts while merging different branches with the main? In a big project, it can cause a lot of troubles if every collaborator in the project merges different branches with the main and makes important decisions on how to resolve the conflicts. It is best practice to have only one person make such decisions. This person may be the project manager or the owner of the main repo on Github. If you are the owner of the main repo on Github, your collaborator does not directly merge their branch with the main branch, but they rather create a *pull request* on Github. This means that instead of actively *pushing* their branch to the main, they ask you to *pull* this branch into the main branch. Then, you will be notified of the pull request by Github and check on Github if there are conflicts between your collaborator's branch and the main branch. You then make decisions about how to solve the conflicts and accept or deny the pull request. Of course, it is always best to resolve the conflict in conversation with your collaborator, so that everybody understand how and based on what information the conflicts were resolved. 

When working on larger projects, the processes of creating new branches, developing them locally, pushing them to the remote branch on Github and finally creating pull requests to merge these branches with the main branch take place multiple times. Therefore, the first thing that you do when you start your working day should be to pull everything from the Github repo onto your local repo to update the state of the project files and work from there. 

**Remember:** This pulling (from the Github repo to your local repo) is different from the pull request on Github, during which you ask the repo owner or project manager to pull your branch into the main branch. 

**Remember:** The idea of creating branches when collaborating with others on a project is that everybody works on their own branches. Although you can see all the branches in their latest state of development when you pull them onto your computer, you should never work on those branches except their owners ask you to. The rule of thumb here is: every bird sits on its own branch. No shared branches, unless it is desired or necessary under very specific circumstances. 

Now that you the logic behind Git and Github and the best practice of using them locally and remotely, it is time to actually get your hands dirty and start using Git. 

# Installing Git and creating a Github account

## Installing Git (Credit: <A Href="https://swcarpentry.github.io/git-novice/#installing-git">The Carpentries</A>)

<u>On Windows</u>: Git should be installed on your computer as part of your Bash install (see the <A Href="https://carpentries.github.io/workshop-template/install_instructions/#shell">Shell installation instructions</A>).

<u>On MacOS</u>: See <A Href="https://carpentries.github.io/workshop-template/install_instructions/#git">this link</A>. 

<u>On Linux</u>: If Git is not already available on your machine you can try to install it via your distro's package manager. For Debian/Ubuntu run `sudo apt-get install git` and for Fedora run `sudo dnf install git`.


## Creating a Github account (Credit: <A Href="https://swcarpentry.github.io/git-novice/#creating-a-github-account">The Carpentries</A>)


1. Go to https://github.com and follow the “Sign up” link at the top-right of the window.
2. Follow the instructions to create an account.
3. Verify your email address with GitHub.
4. Configure multifactor authentication (see below).

**Multi-factor Authentication**
In 2023, GitHub introduced a requirement for all accounts to have multi-factor authentication (2FA) configured for extra security. Several options exist for setting up 2FA, which are summarised here:

1. If you already use an authenticator app, like Google Authenticator or Duo Mobile on your smartphone for example, add GitHub to that app.
2. If you have access to a smartphone but do not already use an authenticator app, install one and add GitHub to the app.
3. If you do not have access to a smartphone or do not want to install an authenticator app, you have two options:
set up 2FA via text message (list of countries where authentication by SMS is supported), or
use a hardware security key like YubiKey or the Google Titan key.
The GitHub documentation provides more details about configuring 2FA.

## Preparing your working directory (Credit: <A Href="https://swcarpentry.github.io/git-novice/index.html#preparing-your-working-directory">The Carpentries</A>)

We’ll do our work in the Desktop folder so make sure you change your working directory to it with:

                            BASH <>


$ cd

$ cd Desktop

## Setting up Git (Credit: <A Href="https://swcarpentry.github.io/git-novice/02-setup.html">The Carpentries</A>)

When we use Git on a new computer for the first time, we need to configure a few things. Below are a few examples of configurations we will set as we get started with Git:

* our name and email address,
* what our preferred text editor is,
* and that we want to use these settings globally (i.e. for every project).

On a command line, Git commands are written as git verb options, where verb is what we actually want to do and options is additional optional information which may be needed for the verb. So here is how a person named Vlad Dracula sets up their new laptop:

<div class="gray">
    
$ git config --global user.name "Vlad Dracula"

$ git config --global user.email "vlad@tran.sylvan.ia"
    
</div>

Please use your own name and email address instead of Dracula’s. This user name and email will be associated with your subsequent Git activity, which means that any changes pushed to GitHub, BitBucket, GitLab or another Git host server after this lesson will include this information.

For this lesson, we will be interacting with GitHub and so the email address used should be the same as the one used when setting up your GitHub account. If you are concerned about privacy, please review GitHub’s instructions for keeping your email address private.

**Keeping your email private:**

If you elect to use a private email address with GitHub, then use that same email address for the user.email value, e.g. username@users.noreply.github.com replacing username with your GitHub one.

**Line endings:**

As with other keys, when you hit Enter or ↵ or on Macs, Return on your keyboard, your computer encodes this input as a character. Different operating systems use different character(s) to represent the end of a line. (You may also hear these referred to as newlines or line breaks.) Because Git uses these characters to compare files, it may cause unexpected issues when editing a file on different machines. Though it is beyond the scope of this lesson, you can read more about this issue in the Pro Git book.

You can change the way Git recognizes and encodes line endings using the core.autocrlf command to git config. The following settings are recommended:

<u>On macOS and Linux</u>:

<div class="gray">
    
$ git config --global core.autocrlf input
    
</div>

<u>And on Windows</u>:

<div class="gray">
    
$ git config --global core.autocrlf true
    
</div>

**Setting up your favorite text editor:**

To set up your favorite text editor, please follow this table: 

![image.png](attachment:image.png)

We are going to set Notepad for Windows users and Sublime Text for MacOS users to default. 

**Setting your default branch:**

Git (2.28+) allows configuration of the name of the branch created when you initialize any new repository. Usually, the name of the repository is set to main so it matches the cloud service you will eventually use:

<div class="gray">
    
$ git config --global init.defaultBranch main
    
</div>

**Default Git branch naming:**

By default, Git will create a branch called master when you create a new repository with git init (as explained in the next Episode). This term evokes the racist practice of human slavery and the software development community has moved to adopt more inclusive language.

In 2020, most Git code hosting services transitioned to using main as the default branch. As an example, any new repository that is opened in GitHub and GitLab default to main. However, Git has not yet made the same change. As a result, local repositories must be manually configured have the same main branch name as most cloud services.

For versions of Git prior to 2.28, the change can be made on an individual repository level. The command for this is in the next episode. Note that if this value is unset in your local Git configuration, the `init.defaultBranch` value defaults to master.

The five commands we just ran above only need to be run once: the flag --global tells Git to use the settings for every project, in your user account, on this computer.

Let’s review those settings and test our core.editor right away:

<div class="gray">
    
$ git config --global --edit
    
</div>

Let’s close the file without making any additional changes. Remember, since typos in the config file will cause issues, it’s safer to view the configuration with:

<div class="gray">
    
$ git config --list
    
</div>

And if necessary, change your configuration using the same commands to choose another editor or update your email address. This can be done as many times as you want.

# Start using Git

From now on, we are solely going to use the computer's terminal for working with Git. So here, you do not have the visual aids that you would have when working with programs with graphical interfaces, such as Windows and MacOS. But do not let the command line intimidate you. With a little practice, you will grab hold of the commands and how they refer to your local folder and file system. 

## Creating a repository (Credit: <A Href="https://swcarpentry.github.io/git-novice/03-create.html">The Carpentries</A>)

Once Git is configured, we can start using it. 

For the sake of practice, let us imagine that you are writing an article together with your collaborator. First, let us create a new directory in the Desktop folder for our work and then change the current working directory to the newly created one. We will name this directory "research".  

<div class="gray">
    
$ cd ~/Desktop

$ mkdir research

$ cd research

</div>

Now that we are inside the research directory, we can *initiate* tell the wizard of Git to make this directory a reository: 

<div class="gray">
$ git init
</div>

If we use `ls` to show the directory’s contents, it appears that it is empty:

<div class="gray">
    
$ ls
    
</div>

But if we add the `-a` flag to show everything, we can see that Git has created a hidden directory within research called .git:

<div class="gray">
    
$ ls -a
    
</div>

<div class="gray">
    
**Output:** 
    
.	..	.git
    
</div>

Git uses this special subdirectory to store all the information about the project, including the tracked files and sub-directories located within the project’s directory. If we ever delete the .git subdirectory, we will lose the project’s history.

Next, we will change the default branch to be called *main*. This might be the default branch depending on your settings and version of git. See the setup episode (previous episode) again for more information on this change.

Remember that all of the git commands have a meaning and they have not been chosen randomly. For example, in the following code '-b' stands for 'branch'.

<div class="gray">

$ git checkout -b main

</div>

<div class="gray">
    
**Output:** 
    
Switched to a new branch 'main'    
</div>

We can check that everything is set up correctly by asking Git to tell us the status of our project:

<div class="gray">
    
$ git status
    
</div>

<div class="gray">
    
On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)
    
</div>

If you are using a different version of git, the exact wording of the output might be slightly different.

## Tracking changes

Now is the time to actually create a file inside your main repository and start working on it. First let us navigate to the right directory, namely research directory, if you have left it:

<div class="gray">
    
$ cd ~/Desktop/research
    
</div>

Now let us create a file called article.txt that should contain some early notes of the research article that you are about to write. We do so in Windows by creating the file and immediately opening it in Notepad, using the following command: 

<div class="gray">
    
$ start notepad article.txt
    
</div>

If you are a MacOS user and want to use Sublime Text as your editor, run the following command in the terminal:

<div class="gray">
    
$ subl article.txt
    
</div>

Now type the text below into the newly-created file article.txt and save the file:

<div class="gray">
    
1. introduction
    
2. literature review
    
</div>

Let us verify that the file was properly created by running the list command (ls):

<div class="gray">
    
$ ls

**Output:**

article.txt
    
</div>

Since we have ordered the wizard of Git to follow every change inside the main repository, which is projected onto the research directory, it should have noticed that there had been a change in this repo, namely that a new file has been created. We can verify this by asking Git to provide us with the current *status* of the project: 

<div class="gray">
    
$ git status

**Output:** 

On branch main

No commits yet

Untracked files:
   (use "git add <file>..." to include in what will be committed)

article.txt

nothing added to commit but untracked files present (use "git add" to track)
    
</div>

The “untracked files” message means that there is a file in the directory that Git has not been told to keep track of yet. We can tell Git to track a file using git add:

<div class="gray">
    
$ git add article.txt
    
</div>

and then check that the right thing happened:

<div class="gray">
    
On branch main

No commits yet

Changes to be committed:
    
(use "git rm --cached <file>..." to unstage)

new file:   article.txt
    
</div>

Git now knows that it is supposed to keep track of article.txt, but it has not recorded these changes as a commit yet. To get it to do that, we need to run one more command:

<div class="gray">
    
$ git commit -m "paper structure was defined."

**Output:**

[main (root-commit) f22b25e] paper structure was defined.

 1 file changed, 1 insertion(+)
 
 create mode 100644 article.txt
    
</div>

When we run git commit, Git takes everything we have told it to save by using git add and stores a copy permanently inside the special .git directory. This permanent copy is called a commit (or revision) and its short identifier in this example is f22b25e. Your commit may have another identifier.

We use the -m flag (for “message”) to record a short, descriptive, and specific comment that will help us remember later on what we did and why. If we just run git commit without the -m option, Git will launch our text editor so that we can write a longer message.

Good commit messages start with a brief (<50 characters) statement about the changes made in the commit. Generally, the message should complete the sentence “If applied, this commit will ...” . 

If we run git status now:

<div class="gray">
    
$ git status

**Output:**

On branch main
nothing to commit, working tree clean
    
</div>

it tells us everything is up to date. If we want to know what we have done recently, we can ask Git to show us the project’s history using git log:

<div class="gray">
    
$ git log

**Output:**

commit f22b25e3233b4645dabd0d81e651fe074bd8e73b

Author: name lastname <name@mail.com>

Date:   Thu Aug 22 09:51:46 2024 -0400

paper structure was defined.
    
</div>

Git log lists all commits made to a repository in reverse chronological order. The listing for each commit includes the commit’s full identifier (which starts with the same characters as the short identifier printed by the git commit command earlier), the commit’s author, when it was created, and the log message Git was given when the commit was created.

**Where are my changes?**

If we run ls at any point while we are working with the article file, we will still see just one file called article.txt. That is because Git saves information about files’ history in the special .git directory mentioned earlier so that our filesystem does not become cluttered (and so that we cannot accidentally edit or delete an old version).

Remember that it was said earlier that Git projects a temporal dimension onto the spatial dimension of the local directories and files? This is how it does that. 

Now let us add more information to the file by running one of the following codes in the terminal, depending on whether you have Windows or MacOS:

<div class="gray">
    
$ start notepad article.txt

$ subl article.txt
    
</div>

Opening the article.txt file, add the 3rd and 4th points to the list of the things to be done, like below, and save the file again. 

<div class="gray">
    
1. introduction
    
2. literature review
    
3. argument
    
4. conclusion
    
</div>

When you run git status now, the wizard of Git tells you that a file it already knows about has been modified:

<div class="gray">
    
$ git status

**Output**

On branch main

Changes not staged for commit:

(use "git add <file>..." to update what will be committed)

(use "git checkout -- <file>..." to discard changes in working directory)

modified:   article.txt

no changes added to commit (use "git add" and/or "git commit -a")
    
</div>

The last line is the key phrase: “no changes added to commit”. You have changed this file and saved the changes on your local drive, but you have not told Git you will want to save those changes (which you do with git add) nor have you saved them (which you do with git commit). So let us do that now. 

It is good practice to always review your changes before saving them. You do this using git diff. This shows you the differences between the current state of the file and the most recently saved version:

<div class="gray">
    
$ git diff

**Output**

diff --git a/article.txt b/article.txt
index df0654a..315bf3a 100644
--- a/article.txt
+++ b/article.txt
@@ -1 +1,2 @@
 1. introduction
    
2. literature review
    
++ 3. argument
    
4. conclusion

<font color=red>
This output, together with the others, should be replaced by the real messages that are shown while performing these operations. 
</font>
    
</div>

The output is cryptic because it is actually a series of commands for tools like editors and patch telling them how to reconstruct one file given the other. If we break it down into pieces:

1. The first line tells us that Git is producing output similar to the Unix diff command comparing the old and new versions of the file.
2. The second line tells exactly which versions of the file Git is comparing; df0654a and 315bf3a are unique computer-generated labels for those versions.
3. The third and fourth lines once again show the name of the file being changed.
4. The remaining lines are the most interesting, they show us the actual differences and the lines on which they occur. In particular, the + marker in the first column shows where we added a line.

After reviewing the change, it is time to add it to Git's stage and commit it:

<div class="gray">
    
$ git add article.txt

$ git commit -m "add two further chapter titles"

**Output**

[main 34961b1] add two further chapter titles

1 file changed, 1 insertion(+)
    
</div>

<font color=red>
Add one more line to the article.txt file so that exploring the history makes sense. It should look like this in the end: 
    
1. introduction   
Introduce the question and the strategy to answer it. 
2. literature review
3. argument
4. conclusion

</font>

## Exploring history (Credit: <A Href="https://swcarpentry.github.io/git-novice/05-history.html#top">The Carpentries</A>)

As we saw in the previous episode, we can refer to commits by their identifiers. You can refer to the most recent commit of the working directory by using the identifier HEAD:

<div class="gray">

$ git diff HEAD article.txt
    
**OUTPUT:**
    
<font color=red>
paste the real output
</font>
    
</div>

which is the same as what you would get if you leave out HEAD (try it). The real goodness in all this is when you can refer to previous commits. We do that by adding ~ 1 (where “~” is “tilde”, pronounced [til-duh]) to refer to the commit one before HEAD.

<div class="gray">
    
$ git diff HEAD~1 article.txt

**OUTPUT:**
    
<font color=red>
paste the real output
</font>
</div>

If we want to see the differences between older commits we can use git diff again, but with the notation HEAD~1, HEAD~2, and so on, to refer to them:

<div class="gray">

$ git diff HEAD~3 article.txt

**OUTPUT:**
    
<font color=red>
paste the real output
</font>

</div>

We could also use git show which shows us what changes we made at an older commit as well as the commit message, rather than the differences between a commit and our working directory that we see by using git diff.

<div class="gray">

$ git show HEAD~3 article.txt

**OUTPUT:**
    
<font color=red>
paste the real output
</font>

</div>

In this way, we can build up a chain of commits. The most recent end of the chain is referred to as HEAD; we can refer to previous commits using the ~ notation, so HEAD ~ 1 means “the previous commit”, while HEAD ~ 123 goes back 123 commits from where we are now.

We can also refer to commits using those long strings of digits and letters that git log displays. These are unique IDs for the changes, and “unique” really does mean unique: every change to any set of files on any computer has a unique 40-character identifier. Our first commit was given the ID <font color=red> f22b25e3233b4645dabd0d81e651fe074bd8e73b </font>, so let uss try this:

<div class="gray">

$ git diff f22b25e3233b4645dabd0d81e651fe074bd8e73b article.txt

**OUTPUT:**
    
<font color=red>
paste the real output
</font>

</div>

That is the right answer, but typing out random 40-character strings is annoying, so Git lets us use just the first few characters (typically seven for normal size projects):

<div class="gray">

$ git diff f22b25e article.txt

**OUTPUT:**
    
<font color=red>
paste the real output
</font>

</div>

All right! So we can save changes to files and see what we have changed. Now, how can we restore older versions of things? Let us suppose we change our mind about the last update to article.txt (the “ill-considered change”).

git status now tells us that the file has been changed, but those changes have not been staged:

<div class="gray">

$ git status

**OUTPUT:**
    
<font color=red>
paste the real output
</font>

</div>

We can put things back the way they were by using git checkout:

<div class="gray">

$ git checkout HEAD article.txt

<font color=red>
as second command, open the file in notepad or sublime text. 
</font>

**OUTPUT:**
    
<font color=red>
paste the real output
</font>

</div>

As you might guess from its name, git checkout checks out (i.e., restores) an old version of a file. In this case, we are telling Git that we want to recover the version of the file recorded in HEAD, which is the last saved commit. If we want to go back even further, we can use a commit identifier instead:

<div class="gray">

$ git checkout f22b25e mars.txt

<font color=red>
as second command, open the file in notepad or sublime text. 
</font>

**OUTPUT:**
    
<font color=red>
paste the real output
</font>

</div>

<div class="gray">

$ git status

**OUTPUT:**
    
<font color=red>
paste the real output
</font>

</div>

Notice that the changes are currently in the staging area. Again, we can put things back the way they were by using git checkout:

<div class="gray">

$ git checkout HEAD article.txt

**OUTPUT:**
    
<font color=red>
paste the real output
</font>

</div>

It is important to remember that we must use the commit number that identifies the state of the repository before the change we are trying to undo. A common mistake is to use the number of the commit in which we made the change we are trying to discard. In the example below, we want to retrieve the state from before the most recent commit (HEAD~1), which is commit f22b25e:

![image.png](attachment:image.png)

# Ignoring things (Credit: <A Href="https://swcarpentry.github.io/git-novice/06-ignore.html">The Carpentries</A>)

**<font color=orange>
The development of this lesson will be paused here, until I have finished the tasks regarding the data challenges. 
</font>**