New! Keyboard shortcuts … Drive keyboard shortcuts have been updated to give you first-letters navigation
![Logo](Figures/git.png){ align=right width="130"}

# Git


***This tutorial was originally written by [Nicki Skafte Detlefsen](https://github.com/SkafteNicki/dtu_mlops), and adapted by [Hadeel Moustafa](https://hadeel-moustafa.github.io/Portfolio/).***

Proper collaboration with other people will require that you can work on the same codebase in an organized manner.
This is the reason that **version control** exist. Simply stated, it is a way to keep track of:

* Who made changes to the code
* When did the change happen
* What changes were made

For a full explanation please see this [page](https://git-scm.com/book/en/v2/Getting-Started-What-is-Git%3F)

Secondly, it is important to note that GitHub is not git! GitHub is the dominating player when it comes to
hosting repositories but that does not mean that they are the only one providing free repository hosting
(see [bitbucket](https://bitbucket.org/product/) or [gitlab](https://about.gitlab.com/)) for some other examples.


## Initial config

1. [Install git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) on your computer and make sure
    that your installation is working by writing `git help` in a terminal and it should show you the help message for
    git.

2. Create a [GitHub](https://github.com/) account if you do not already have one.

3. To make sure that we do not have to type in our GitHub username every time that we want to do some changes,
    we can once and for all set them on our local machine

    ```bash
    # type in a terminal
    git config credential.helper store
    git config --global user.email <email>
    ```

## Git overview

The most simple way to think of version control, is that it is just nodes with lines connecting them

![Image](Figures/git_branch.png){width="1000," }

Each node, which we call a *commit* is uniquely identified by a hash string. Each node, stores what our code
looked like at that point in time (when we made the commit) and using the hash codes we can easily
revert to a specific point in time.

The commits are made up of local changes that we make to our code. A basic workflow for
adding commits are seen below

![Image](Figures/git_structure.PNG){width="1000," }

Assuming that we have made some changes to our local *working directory* and that we
want to get these updates to be online in the *remote repository* we have to do the following steps:

* First we run the command `git add`. This will move our changes to the *staging area*. While changes are in the
    staging area we can very easily revert them (using `git restore`). There have therefore not been assigned a unique
    hash to the code yet, and we can therefore still overwrite it.

* To take our code from the *staging area* and make it into a commit, we simply run `git commit` which will locally
    add a note to the graph. It is important again, that we have not pushed the commit to the online *repository* yet.

* Finally, we want others to be able to use the changes that we made. We do a simple `git push` and our
    commit gets online

Of course, the real power of version control is the ability to make branches, as in the image below

![Image](Figures/git_branches.png){ width="1000" }


<figcaption>
<a href="https://dev.to/juanbelieni/creating-an-alias-for-deleting-useless-git-branches-105j"> Image credit </a>
</figcaption>


Each branch can contain code that are not present on other branches. This is useful when you are many developers
working together on the same project.

### ❔ Exercises

1. In your GitHub account create an repository, where the intention is that you upload a file "file.py"

    1. After creating the repository, clone it to your computer

        ```bash
        git clone https://github.com/my_user_name/my_repository_name.git
        ```

    2. Move/copy the file you created  into the repository (and any other that you want)

    3. Add the file to a commit by using `git add` command

    4. Commit the files using `git commit` command where you use the `-m` argument to provide a commit message (1).
        { .annotate }

        1. :man_raising_hand: Writing good commit message is a skill in itself. A commit message should be short but
            informative about the work you are trying to commit. Try to practise writing good commit messages
            throughout the course. You can see
            [this guideline](https://github.com/joelparkerhenderson/git-commit-message) for help.

    5. Finally push the files to your repository using `git push`. Make sure to check online that the files have been
        updated in your repository.

    6. You can always use the command `git status` to check where you are in the process of making a commit.

    7. Also checkout the `git log` command, which will show you the history of commits that you have made.

2. Make sure that you understand how to make branches, as this will allow you to try out code changes without
    messing with your working code. Creating a new branch can be done using:

    ```bash
    # create a new branch
    git checkout -b <my_branch_name>
    ```

    Afterwards, you can use `git checkout` (1) to change between branches (remember to commit your work!)
    Try adding something (a file, a new line of code etc.) to the newly created branch, commit it and
    try changing back to master afterwards. You should hopefully see whatever you added on the branch
    is not present on the main branch.
    { .annotate}

    1. :man_raising_hand: The `git checkout` command is used for a lot of different things in git. It can be used to
        change branches, to revert changes and to create new branches. An alternative is using `git switch` and
        `git restore` which are more modern commands.

7. As a final exercise we want to simulate a *merge conflict*, which happens when two users try to commit changes
    to exactly same lines of code in the codebase, and git is not able to resolve how the different commits should be
    integrated.

    1. In your browser, open your favorite repository (it could be the one you just worked on), go to any file of
        your choosing and click the edit button (see image below) and make some change to the file. For example, if
        you choose a Python file you can just import some random packages at the top of the file. Commit the change.

    2. Make sure not to pull the change you just made to your local computer. Locally make changes to the same
        file in the same lines and commit them afterwards.

    3. Now try to `git pull` the online changes. What should (hopefully) happen is that git will tell you that it found
        a merge conflict that needs to be resolved. Open the file and you should see something like this

        ```txt
        <<<<<<< HEAD
        this is some content to mess with
        content to append
        =======
        totally different content to merge later
        >>>>>>> master
        ```

        this should be interpret as: everything that's between `<<<<<<<` and `=======` are the changes made by your
        local commit and everything between `=======` and `>>>>>>>` are the changes you are trying to pull. To fix
        the merge conflict you simply have to make the code in the two "cells" work together. When you are done,
        remove the identifiers `<<<<<<<`, `=======` and `>>>>>>>`.

    4. Finally, commit the merge and try to push.

8. (VS Code & Git)  The above exercises have focused on how to use git from the terminal, which I highly recommend learning.
    However, if you are using a proper editor they also have build in support for version control. We recommend getting
    familiar with these features (here is a tutorial for
    [VS Code](https://code.visualstudio.com/docs/editor/versioncontrol))

## 🧠 Knowledge check

1. How do you know if a certain directory is a git repository?

    ??? success "Solution"

        You can check if there is a ".git" directory. Alternative you can use the `git status` command.

2. Explain what the file `gitignore` is used for?

    ??? success "Solution"

        The file `gitignore` is used to tell git which files to ignore when doing a `git add .` command. This is
        useful for files that are not part of the codebase, but are needed for the code to run (e.g. data files)
        or files that contain sensitive information (e.g. `.env` files that contain API keys and passwords).

3. You have two branches - *main* and *devel*. What sequence of commands would you need to execute to make sure that
    *devel* is in sync with *main*?

    ??? success "Solution"

        ```bash
        git checkout main
        git pull
        git checkout devel
        git merge main
        ```

4. What best practices are you familiar with regarding version control?

    ??? success "Solution"

        * Use a descriptive commit message
        * Make each commit a logical unit
        * Incorporate others' changes frequently
        * Share your changes frequently
        * Coordinate with your co-workers
        * Don't commit generated files




## References

```bibtex
@misc{skafte_mlops,
    author       = {Nicki Skafte Detlefsen},
    title        = {Machine Learning Operations},
    howpublished = {\url{https://github.com/SkafteNicki/dtu_mlops}},
    year         = {2024}
}
```