# Agenda

* Intro: Code Versioning
* Simple Worklows with Git
* Non-linear Workflows: Branching & Merging

# Before to Start

* Who knows version control? Who knows git?
* Did you ever used git? How?
* Which is the most complex use case in which you used it?  

# Intro

Git is *awesome*. That's the truth.

Git is *simple* (once you get its core concepts)

Git is *powerful* (once you practice with it).

#### Stop fighting Git, and it stops fighting back.

# Today's Goals

* Don't appear like a crazy git-nerdy guy ‚ùå

* Understand **why** and **how** Git will help with your workflow

* Learn Git fundamentals with an hands-on interactive workshop. There's no stupid questions, don't be shy!

* Became a more *educated developer* to plays well with others

The ultimate goal for today, un-learn this:

![](https://imgs.xkcd.com/comics/git.png)

# The Git Parable (Revisited)

I'm standing on the shoulder of the giants:
* [The Git Parable](https://tom.preston-werner.com/2009/05/19/the-git-parable.html) awesome blog post by Tob Preston Werner

* [Think like a git](https://think-like-a-git.net/sections/git-makes-more-sense-when-you-understand-x.html) gem website

* [Git for 4 ages and up](https://www.youtube.com/watch?v=1ffBJ4sVUb4) the most nice explaination ever made 

# The Source Tree

Every project starts as a set of source code files in an empty directory on your local PC. This is your *source tree*.

The recipe is simple:

1. Apply changes and save them.
3. Run the code.
4. Go to step 1 (... until it works)

<img src="assets/parable-1.png" width=350 img/>



### What's the problem?

Everytime you save files, changes are **overriden**. There's no *history* of the changes made

# Snapshots

The *educated developer* keep tracks of the work done. 

We can simply make a copy of the source tree, each time with a different name.

<img src="assets/parable-2.png" width=450 img/>

### What's the problem?

Each snapshot is just a bunch of files, with no **meaningful names** for them

# Snapshots with Names
The *educated developer* always **clearly communicate** their intended changes, even to their *future self*. 

We may add a file with a meaningful message.

<img src="assets/parable-3.png" width=450 img/> 

### What's the problem?
We lacks a way to know which version is the "current" active one at any given moment. 

This seems an "who cares" problem but only gets worse when you start working on different tasks.

# Keep a Pointer

We can add a "special file" that act just as *pointer* to the current version 

<img src="assets/parable-4.png" width=450 img/>

### What's the problem here?

The workflow is so cumbersome and tedious you might consider changing careers (can't blame you).

Don't worry, Git to the rescue üî•

## Someone said DAG?

Before introducing git we need to understand one of the most fundamental concept around Git. Working with git is basically the same as working with a Directed Acyclic Graph (DAG):

* Each snapshot is a node

<img src="assets/parable-graph-1.png" width=450 img/>

* Nodes are connected by Parent-children relationships

<img src="assets/parable-graph-2.png" width=450 img/>

* Directed

<img src="assets/parable-graph-3.png" width=450 img/>

* Labeled: attach meaningful labels

<img src="assets/parable-graph-4.png" width=450 img/>

# Hands on: The Local Workflow

In [1]:
!mkdir my-awesome-project

In [3]:
# just change the current working directory to avoid cd command in the next cells
%cd my-awesome-project

/home/prfina/git-collaboration-workflow-workshop/my-awesome-project


First of all let's setup git

In [4]:
! git config user.name "<your name>" && git config user.email "<your email>"

In [5]:
! git config user.name

<your name>


In [6]:
! git config user.email

<your email>


Let's create a new git *repository*

In [7]:
!git init .

Initialized empty Git repository in /home/prfina/git-collaboration-workflow-workshop/my-awesome-project/.git/


Let's create some files and make changes

In [8]:
!touch dataset.py model.py

In [9]:
!git status

On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31mdataset.py[m
	[31mmodel.py[m

nothing added to commit but untracked files present (use "git add" to track)


Let's snapshot a first version.

In [9]:
!git add dataset.py model.py

In [10]:
!git commit -m "add simple dataset and model"

[main (root-commit) 9462dd9] add simple dataset and model
 2 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 dataset.py
 create mode 100644 model.py


Let's make some more changes 

In [11]:
!touch metrics.py

In [12]:
!git add metrics.py

In [13]:
!git commit -m "add recall and precision metrics"

[main 0565ab7] add recall and precision metrics
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 metrics.py


In [14]:
!echo "def rocauc()" > metrics.py

In [15]:
!git status

On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   metrics.py[m

no changes added to commit (use "git add" and/or "git commit -a")


In [16]:
!git add metrics.py

In [17]:
!git commit -m "add roc-auc metric"

[main e8b1eb2] add roc-auc metric
 1 file changed, 1 insertion(+)


Let's see the *history*

In [18]:
!git log --oneline

[33me8b1eb2[m[33m ([m[1;36mHEAD[m[33m -> [m[1;32mmain[m[33m)[m add roc-auc metric
[33m0565ab7[m add recall and precision metrics
[33m9462dd9[m add simple dataset and model


## Unravel the Magic 

What's a git repository, really?

In [19]:
!ls -l .git

total 40
-rw-rw-r--  1 prfina prfina   19 nov  3 13:09 COMMIT_EDITMSG
-rw-rw-r--  1 prfina prfina   92 nov  3 13:09 config
-rw-rw-r--  1 prfina prfina   73 nov  3 13:09 description
-rw-rw-r--  1 prfina prfina   21 nov  3 13:09 HEAD
drwxrwxr-x  2 prfina prfina 4096 nov  3 13:09 hooks
-rw-rw-r--  1 prfina prfina  297 nov  3 13:09 index
drwxrwxr-x  2 prfina prfina 4096 nov  3 13:09 info
drwxrwxr-x  3 prfina prfina 4096 nov  3 13:09 logs
drwxrwxr-x 12 prfina prfina 4096 nov  3 13:09 objects
drwxrwxr-x  4 prfina prfina 4096 nov  3 13:09 refs


All `git *` commands change these files in some ways

`git add` *prepares* the next commit, let you choosing which files or parts of them needs to be included.

<img src="assets/working-staging-repo.png" width=600 img/>

This workflow provides greater flexibility, allowing you to create commits where **related** changes are grouped together.

`git commit` snapshot the staged files

In [20]:
!git log

[33mcommit e8b1eb203280afafd80e1f0a085b8ab24ee7136d[m[33m ([m[1;36mHEAD[m[33m -> [m[1;32mmain[m[33m)[m
Author: Pio Raffaele Fina <pio.fina@syndiag.ai>
Date:   Mon Nov 3 13:09:40 2025 +0100

    add roc-auc metric

[33mcommit 0565ab727c5de315cc740f809b662a634e94462d[m
Author: Pio Raffaele Fina <pio.fina@syndiag.ai>
Date:   Mon Nov 3 13:09:39 2025 +0100

    add recall and precision metrics

[33mcommit 9462dd98afe06d2f0010083ead8370e6a404d5cd[m
Author: Pio Raffaele Fina <pio.fina@syndiag.ai>
Date:   Mon Nov 3 13:09:39 2025 +0100

    add simple dataset and model


Commit: snaphshot + context
   * An *immutable* snapshot of files (the what)
   * has a message (the why)
   * has an *id*, looks like random but actually is not (see SHA 256) (the who) 
   * Records the parent(s) commits, author, date and other metadata (the where/when).


`git checkout` let us moving troughout the history

In [21]:
! git checkout HEAD~1

Note: switching to 'HEAD~1'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 0565ab7 add recall and precision metrics


In [22]:
! git switch - # reset detached head

Previous HEAD position was 0565ab7 add recall and precision metrics
Switched to branch 'main'


`HEAD` is a just reference to a specific commit. 

# Q&A Recap

* What is git? Why you should use it?
* What is a repository?
* What is a commit?
* Which is are the main piece of information that git use to let us move troughout the history?
* What is the staging area?

# Non-linear Workflows: Branching & Merging

Development workflow is non-linear:
* Some other task with higher prioritiey may need to be addressed first (eg. bug fixes)
* In data science: you want to expriment different alternative hypothesis requiring small changes to your code

Rember DAGs? Git let you working manipulating DAG not chains

<img src="assets/git-branches.png" width=600 img/>

What is a branch? 
* Conceptually: it's an indipendent line of work
* Practically: a "*reference*" that points to the commit on branch's tip

Q&A: Why just the branch's tip and not a list of commits?

## Merging

![](https://images.squarespace-cdn.com/content/v1/6139a3d5291bae3f69278e96/eb6aaa6d-de5b-4672-93a1-86ab929336d5/git-merging.png)

# Hands on: Branching & Merging

In [24]:
! git branch

* [32mmain[m


In [25]:
! git switch -c my-new-branch

Switched to a new branch 'my-new-branch'


### Some Resources

#### Readings
* [Pro-git book](https://git-scm.com/book/en/v2): the reference book for Git, everthing you need to know is probably there.
* [The Git Parable](https://tom.preston-werner.com/2009/05/19/the-git-parable.html) awesome blog post by Tob Preston Werner (more on the "why" than "how")
* [Think like a git](https://think-like-a-git.net/sections/git-makes-more-sense-when-you-understand-x.html) gem website
* [Git for 4 ages and up](https://www.youtube.com/watch?v=1ffBJ4sVUb4) the most nice explaination ever made 

#### Tooling
* [Git-school](https://git-school.github.io/visualizing-git/#free) for playing with git visually: type cli commands and visualize how the graph change
* [Using Git with VS Code](https://code.visualstudio.com/docs/sourcecontrol/overview)
* If you don't like the default, there's a [nice extension](https://marketplace.visualstudio.com/items?itemName=mhutchie.git-graph) to visualize git graphs

#### Misc
* [Awesome Git](https://github.com/dictcp/awesome-git): collected resources curated by the community