plumbing command `git hash-object`, which takes some data, stores it in your `.git/objects` directory (the object database), and gives you back the unique key that now refers to that data object.

ref: [Git Objects](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects)

In [24]:
import subprocess, shlex

blob_text = "explore git internal\n"
blob_object_sha1 = subprocess.run(shlex.split('git hash-object -t blob -w --stdin'), 
                                  input=blob_text, capture_output=True, encoding='utf-8').stdout.strip()
print(blob_object_sha1)

7ce5af5ef26cedb844923e88c8a6b07f401626d4


examine that content with the `git cat-file` command. This command is sort of a Swiss army knife for inspecting Git objects. 

In [27]:
blob_raw_cat_file_result = subprocess.run(shlex.split('git cat-file -p ' + blob_object_sha1),
                                          capture_output=True, encoding='utf-8').stdout
print(blob_raw_cat_file_result)

explore git internal



all objects share the following characteristics: 
- they are all deflated with zlib, 
- and have a header that not only specifies their type, but also provides size information about the data in the object

In [29]:
import zlib
with open('../.git/objects/{0}/{1}'.format(blob_object_sha1[:2], blob_object_sha1[2:]), 'rb') as f:
    decompressed_content = zlib.decompress(f.read())
print(decompressed_content)

b'blob 21\x00explore git internal\n'


all objects can be validated by verifying that 
- (a) their hashes match the content of the file and 
- (b) the object successfully inflates to a stream of bytes that forms a sequence of `<ascii type without space> + <space> + <ascii decimal size> + <byte\0> + <binary object data>`.

In [37]:
# parse blob object file
import re
p = re.compile(b'^(blob|tree|commit|tag) (\d+)\x00(.*)$', re.DOTALL)
m = p.match(decompressed_content)
[object_type, content_length, content] = m.groups()
print(object_type)
print(content_length)
print(content)

b'blob'
b'21'
b'explore git internal\n'


In [40]:
assert(len(content) == int(content_length))
assert(content.decode('utf-8') == blob_text)

In [44]:
# re-construct sha1
raw_content = ('blob {0}\x00{1}'.format(len(blob_text), blob_text)).encode('utf-8')
print(raw_content)

import hashlib
computed_sha1 = hashlib.sha1(raw_content).hexdigest()
print(computed_sha1)

b'blob 21\x00explore git internal\n'
7ce5af5ef26cedb844923e88c8a6b07f401626d4


In [45]:
assert(computed_sha1 == blob_object_sha1)