In [3]:
import hashlib

In [4]:
"markov"

'markov'

Converting the string into bytes. This doesn't render with a massive change in python other than the "b" being added to the front.

In [5]:
bytes("markov", "utf-8")

b'markov'

We can unpack the bytes for each metter in the word `markov` by using a loop. Then, we can _hash_ the entire word's bytes to see that the `hashing function` converts the bytes into something... well different.

In [6]:
for letter in bytes("markov", "utf-8"):
    print(letter)
m = hashlib.sha256()
m.update(bytes("markov", "utf-8"))
m.hexdigest()

109
97
114
107
111
118


'43cedaa98924112f2f89c3277a0c1f7ecc487350c22d428ebd2f61dc32389787'

If we make the small, subtle change of one letter in `markov` to `markon`, we can see the bytes don't change very much. However, the hash is _dramatically_ different.

In [7]:
for letter in bytes("markon", "utf-8"):
    print(letter)
m = hashlib.sha256()
m.update(bytes("markon", "utf-8"))
m.hexdigest()

109
97
114
107
111
110


'd9a3984bfb9ffb3d8287bee1398821f4adfe09f7f9daf80d56e5006f6e6ab866'

Each position in the hash can be any of 10 numbers or 26 lower-case letters for a total of 36 possibilities. There are 64 characters in the resultant hash, so the total number of possible characters would be 36^64 combinations.

In [81]:
f"{(26+10)**64} possible combinations of 64 characters of 36 options"

'4011991914547630480065053387702443812690402487741812225955731622655455723258857248542161222254985216 possible combinations of 64 characters of 36 options'

moving on, we can again see that the one-letter difference is has a dramatic difference in the hash

In [8]:
a = hashlib.sha256()
a.update(bytes("markov", "utf-8"))
print(a.hexdigest())
b = hashlib.sha256()
b.update(bytes("markon", "utf-8"))
print(b.hexdigest())

43cedaa98924112f2f89c3277a0c1f7ecc487350c22d428ebd2f61dc32389787
d9a3984bfb9ffb3d8287bee1398821f4adfe09f7f9daf80d56e5006f6e6ab866


But what if we did the updates back-to-back?

In [83]:
c = hashlib.sha256()
c.update(bytes("markov", "utf-8"))
c.update(bytes("markon", "utf-8"))
c.hexdigest()

'2cff5a6595c2c90cafb14a83398e52a161374c4f47496cf09abe53b83acd04b6'

The result is different than either of the previous hashes. Running .update() sequentially essentially hashes the two bytes objects together. Below is an example that has the same result, by using addition of the bytes as opposed to multiple updates.

In [9]:
d = hashlib.sha256()
d.update(bytes("markov", "utf-8")+bytes("markon", "utf-8"))
d.hexdigest()

'2cff5a6595c2c90cafb14a83398e52a161374c4f47496cf09abe53b83acd04b6'

Does the order that the bytes object matter? YES. QUITE A BIT!! (notice that the hash again is dramatically different than the previous value)

In [10]:
e = hashlib.sha256()
e.update(bytes("markon", "utf-8")+bytes("markov", "utf-8"))
e.hexdigest()

'0e6bae302ba74bdfb42162b34f17bf1d19bf2bdab50621d3a85084018cb4453e'

Now let's consider a situation in which we have a ledger of transactions, like the one below:

In [12]:
ledger_block = """
Akanksha pays Jose 200
Jose pays Chase 100
Jose pays Aimon 40
Jose pays Theo 40
Salma pays Shazia 30
Aimon pays Jessica 500"""

We could hash each line of this ledger together into one larger hash using a for-loop:

In [87]:
f = hashlib.sha256()
for line in ledger_block.split("\n"):
    f.update(bytes(line,"utf-8"))
f.hexdigest()
    

'05971ec9f83ba4cbac6325d667d192f6349a8d6154d797576b6d02bdb086baa2'

Which gives us a hash that can be created only by having _exactly_ those transactions in _exactly_ that order. Even a small change that would be difficult to detect by human eyes makes the hash different, and can act as a fraud-resistant mechanism. For example, the `scam_block` ledger below has a tiny addition made to the original ledger; can you spot the difference?

In [13]:
scam_block = """
Akanksha pays Jose 200
Jose pays Chase 100
Jose pays Aimon 4000
Jose pays Theo 40
Salma pays Shazia 30
Aimon pays Jessica 500"""

Jose is now baying Aimon $4000 instead of 40. Let's see what happens when we hash the ledger.

In [14]:
e = hashlib.sha256()
for line in scam_block.split("\n"):
    e.update(bytes(line,"utf-8"))
e.hexdigest()

'712ac912e61300edba3bbb3b148e0cbb84d161ce990ffab6cea00794f1962a85'

Notice that the hash is _very_ different than the last one, and thus we could conclude that some modification to the ledger has been made. Although though we don't know _what the change is_, it's immediately apparent that something has changed and should thus be inspected. This is a great way to check extremely large datasets for differences or changes over time.

Finally, let's consider the situation in which we have a ledger that is composed of many pages, which we'll nickname "blocks"

In [15]:
block_1 = """
Akanksha pays Sam 30
Sam pays Chase 91
Sam pays Aimon 40
Sam pays Theo 19
Salma pays Shannon 23
Aimon pays Margret 89"""

block_2 = """
Bohr pays Boltzmann 8
Boltzmann pays Gibbs 19
Hemholtz pays Fuller 12
Hemholtz pays Lavoisier 94
Fuller pays Bohr 17
Lavoisier pays Carnot 91"""

block_3 = """
Euclid pays Ramanujan 55
Descartes pays Euler 34
Descartes pays Euler 12
Gauss pays Turing 314
Pascal pays Gauss 218
Gauss pays Pascal 70"""

We could take each of those blocks, and hash them as a single giant line. However, to do something even more interesting, we can take the current hash, tack on the next set of bytes, and hash them together. That way, the only way to reproduce each successive block is to have _exactly_ the previous block, and _exactly_ the one before that, so on and so forth. This process of chaining the hashes together is referred to as a `blockchain`! 

In [17]:

g = hashlib.sha256()
g.update(bytes(block_1,"utf-8"))
first_block = g.hexdigest()

blocks = [block_2,block_3]
blockchain = [first_block]


for block in blocks:
    h = hashlib.sha256()
    h.update(bytes(blockchain[-1],"utf-8") + bytes(block,"utf-8"))
    block_hash = h.hexdigest()
    blockchain.append(block_hash)
blockchain

['2c64a519378828ad696519c3b4bc015db11cc8331deb6ec121d7a36e0386002e',
 '07f67cc7d09c5f0588385d392faf3f614f4f26e1d9a4083250f6c1f3bc73193d',
 'ceb395af34c5ea2be2badf519bf8f1e29658731c331fb14dd53cf79382fcdfae']

Each of these outputs comes from chaining the successive blocks together. The power of this technique is that it makes it cryptographically (and statistically) nearly impossible to make any changes to the historical ledger without also recalculating the hashes for _every single operation that took place after that_ as well. THAT'S A LOT OF WORK. 

Finally, as an example to clarify, here is an example of the second block's hash without the chaining. Notice that it does not match the second hash in the sequence of hashes in the blockchain.

In [18]:
p = hashlib.sha256()
p.update(bytes(block_2,"utf-8"))
p.hexdigest()

'6c577b126af66b5593c796e60e0b19e7fc526c0702a3b563d3b6d6b1e3ce1043'