### CS424

Prof. Götz Pfeiffer<br />
School of Mathematics, Statistics and Applied Mathematics, NUI Galway

# Lecture 6: Version Control

Experience shows that during the development and maintenance of software, the
files containing the code and other data change a lot over time. And a typical
software project usually consists of MANY files.  In the light of incremental
development one wants to:

* **avoid the loss** of valuable information

* support **collaboration** of groups of developers on many files

* **document change**: who what when why

* ...

Solution: keep copies of every version of every file
under a simple administrative layer!  That's the purpose of
**version control systems**.

## The `git` Version Control System

`git` keeps the files comprising a project in **repositories**.

A **repository** is a database containing all the information neede to retain
and manage the the **revisions** and history of a project.  This includes
a complete copy of the entire project.

The repository sits in a (hidden) folder `.git` next to a working copy
of the project.  It maintains two **primary data structures**: the
**object store** and the **index**.  The object store can be used to
make copies (clones) of the project, usually as part of a distributed
version control.  The index is transitory and private to a particular
repository.

## The Object Store

The Object Store forms the heart of a `git` repository.
It contains the original files, together with log messages,
author information, dates, and other information needed to rebuild
any version or branch of the project.

`git`'s higher level data structure work with (only) four types
of objects: blobs, trees, commits, and tags.

### Blobs

Each version of a file is stored as a **blob**.
Blobs are "binary large objects", this is any file
(without meta data) whose internal structure is
ignored by `git`.
Each blob has a **unique identifier** which is
used to refer to it.

### Trees

A **tree** object represents one level of directory
information. It records blob identifiers,
paths, metadata, and possibly other trees.

### Commits

A **commit** object holds metadata for each change performed on the
repository: author, committer, commit date, and a log message.
It points to a tree object that captures the current state of the repository
as a snapshot, and to a commit **parent** (unless it is the initial commit).


### Tags

A **tag** assigns a human readable name to an object, usually a commit.

## The Index

The index is a temporary and dynamic binary file
that captures a version of the project.
The developer edits, adds or deletes files
in the working copy.
These changes can be **staged** in the index until
the developer is ready to commit a set of changes.
This setup allows for a gradual transition from
one state of the project to another (better)
state, and thus supports incremental development.