Let's learn Git by building it (Part 1)

I intend to write this whole project in Rust to learn the language along the way.

How does Git handle each files and each directories?

In Git, each file and folder is considerated as a git object. The git object are stored in the .git/objects folder. There are 2 main types of git objects: blobs and trees.

Blobs

A blob is a git object that contains the content of a file.

Trees

A tree is a git object that contains the direct content of a directory.

OID (Object IDentifier)

You might be wondering but how do I know which git object is representing which file or directory? Well, git use a special technique to create a unique identifier for each git object. This unique identifier is called OID (Object IDentifier).

The OID is created by hashing the content of the git object. The hashing algorithm used is SHA-1. The SHA-1 algorithm will generate a 40 characters long hexadecimal string. This string is the OID of the git object.

Let's take an example. Let's say we have a file named foo.txt with the following content:

Hello World!

The OID of this file will be generated by using:

SHA-1("blob 12\0Hello World!")

So what is happening here?

First, we are using the type of the git object: blob or tree.
Then, we are adding a space. (This is just a convention)
Then, we are adding the length of the content of the file: 12.
Then, we are adding a null character \0. This is used to separate the header from the content.
Finally, we are adding the content of the file: Hello World!\n.

The SHA-1 algorithm will generate the following hash:

c57eff55ebc0c54973903af5f72bac72762cf4f4

So now we have a unique identifier for our file. Let's try to do that with git.

mkdir git-objects
cd git-objects
git init
echo "Hello World!" > foo.txt
git add foo.txt

Now, go inside the .git/objects folder. Right now, you should be wondering where is the file name with the c57eff55ebc0c54973903af5f72bac72762cf4f4 oid. Well, git is smart, having too many files in the same directory can make the system slow. To prevent this, git is using a special technique to store the git objects.

Git is using the first 2 characters of the OID as a directory name and the last 38 characters as the file name. So in our case, the file will be stored in the 14 directory with the name c57eff55ebc0c54973903af5f72bac72762cf4f4.

Let's try to see the content of the file.

cat .git/objects/c5/7eff55ebc0c54973903af5f72bac72762cf4f4

Ok all of this is fun but we only have the name of the file. By the way, you cannot reversed the SHA-1 algorithm. So how do we get the content of the file?

Git objects content

Each git object has for content the exact content of the file or directory. If you tried previously to see the content of the file, you should have seen something unreadable. It is completly intended.

Each file has for goal to be store on a database to be restore later. But a database cannot store GB of data in a single object and having multiple objects would break the OID concept we just explain.

To save place, git is compressing with ZLIB the content of the file and get a binary output. This binary output is the content of the git object.

If you want to see the content of the file, you need to decompress the content of the git object. To do that, you can use the git cat-file command.

git cat-file -p c57eff55ebc0c54973903af5f72bac72762cf4f4

You should see the content of the file.

Git objects header

Each git object has a header. The header is used to store the type of the git object and the length of the content of the git object.

To see the header of a git object, you can use the git cat-file command.

git cat-file -t c57eff55ebc0c54973903af5f72bac72762cf4f4

You should see the type of the git object: blob.

git cat-file -s c57eff55ebc0c54973903af5f72bac72762cf4f4

You should see the size of the content of the git object: 12.

Side note about SHA-1

The SHA-1 algorithm is not considered as secure anymore. It is possible to create 2 different content that will generate the same SHA-1 hash. This is called a collision. This is why Git is moving to SHA-256 in the newest version (2.x<)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
test		test
.gitignore		.gitignore
.rgitignore		.rgitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
test.txt		test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Let's learn Git by building it (Part 1)

How does Git handle each files and each directories?

Blobs

Trees

OID (Object IDentifier)

Git objects content

Git objects header

Side note about SHA-1

About

Uh oh!

Releases

Packages

Languages

BragdonD/rgit

Folders and files

Latest commit

History

Repository files navigation

Let's learn Git by building it (Part 1)

How does Git handle each files and each directories?

Blobs

Trees

OID (Object IDentifier)

Git objects content

Git objects header

Side note about SHA-1

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages