# Appendix: The Bytes Data Type

The functions in the hashlib module requires [Bytes](https://docs.python.org/3/library/stdtypes.html#bytes-objects) objects. In the case of strings, the [encode] and [decode] methods can be used, however for custom objects, two possible ways are:

1. Converting the object to json and then the json string to bytes
1. Implementing a \_\_bytes__ method

However, usually hashes are used with plain types like strings which are easily convertible to bytes.

## Data Conversions

In [74]:
data_string = "Hello World!"
data_bytes = data_string.encode("utf-8")
data_hex = data_bytes.hex()
data_decoded = data_bytes.decode("utf-8")
data_hex_bytes = bytes.fromhex(data_hex)

print(f"     Original String: {data_string}")
print(f"From String to Bytes: {data_bytes}")
print(f"   From Bytes to Hex: {data_hex}")
print(f"From Bytes to String: {data_decoded}")
print(f"   From Hex to Bytes: {data_hex_bytes}")

     Original String: Hello World!
From String to Bytes: b'Hello World!'
   From Bytes to Hex: 48656c6c6f20576f726c6421
From Bytes to String: Hello World!
   From Hex to Bytes: b'Hello World!'


### Using the `binascii` module

The [binascii module](https://docs.python.org/3/library/binascii.html) exposes two utility functions, one to convert from bytes to hex called [hexlify](https://docs.python.org/3/library/binascii.html#binascii.hexlify) and another to do the reverse conversion called [unhexlify](https://docs.python.org/3/library/binascii.html#binascii.unhexlify)

**Important note:** the hexilify function returns a bytes object whereas the .hex() method of bytes returns a string.

In [77]:
import binascii

data_string = "Hello World!"
data_bytes = data_string.encode("utf-8")
data_hex = binascii.hexlify(data_bytes)
data_hex_string = binascii.unhexlify(data_hex)

print(f"     Original String: {data_string}")
print(f"From String to Bytes: {data_bytes}")
print(f"   From Bytes to Hex: {data_hex}")
print(f"   From Hex to Bytes: {data_hex_bytes}")

     Original String: Hello World!
From String to Bytes: b'Hello World!'
   From Bytes to Hex: b'48656c6c6f20576f726c6421'
   From Hex to Bytes: b'Hello World!'


## Examples

#### Example with plain strings

In [45]:
data = "Hello World!"
data_bytes = data.encode("utf-8")
data_decoded = data_bytes.decode("utf-8")
data_hashed = hashlib.sha256(data_bytes).hexdigest()
print(f"Original: {data}")
print(f" Encoded: {data_bytes}")
print(f" Decoded: {data_decoded}")
print(f"  Hashed: {data_hashed}")

Original: Hello World!
 Encoded: b'Hello World!'
 Decoded: Hello World!
  Hashed: 7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069


#### Example with Custom objects

In [46]:
from dataclasses import dataclass, asdict
import json

@dataclass
class Person:
    first_name: str
    last_name: str
    
    def __bytes__(self):
        dictionary_representation = asdict(self)
        json_representation = json.dumps(dictionary_representation)
        return json_representation.encode("utf-8")
    
    @classmethod
    def from_bytes(cls, bytes_object):
        string_representation = bytes_object.decode("utf-8")
        dictionary_representation = json.loads(string_representation)
        return cls(**dictionary_representation)
    
person = Person("John", "Doe")
person_bytes = bytes(person)
person_decoded = Person.from_bytes(person_bytes)
person_hashed = hashlib.sha256(person_bytes).hexdigest()

print(f"Original: {person}")
print(f" Encoded: {person_bytes}")
print(f" Decoded: {person_decoded}")
print(f"  Hashed: {person_hashed}")

Original: Person(first_name='John', last_name='Doe')
 Encoded: b'{"first_name": "John", "last_name": "Doe"}'
 Decoded: Person(first_name='John', last_name='Doe')
  Hashed: fee485b19074e0b0b2856ae5f27fcdd67ff12204cbff73c5eaa10b1aac887042


## Random Bytes

There are many ways to generate randomness or pseudo-randomness, some of which are considered **insecured** and other **secure**.

### Using the `secrets` module

Python 3.6 introduced the [`secrets`](https://docs.python.org/3/library/secrets.html) module to conveniently generate several types of **secure** random bytes.

The relevant methods are the `token_*` methods, each receives a lenght parameter. The more bytes, the safer the token, see [this resource](https://docs.python.org/3/library/secrets.html#how-many-bytes-should-tokens-use) for more information. Moreover, this [video](https://www.youtube.com/watch?v=S9JGmA5_unY) illustrates how secure 32 bytes (256bits) randomness is.

When using only hexadecimal, there will be 2 characters per byte, to generate shorter strings but at the same time being able to insert them in URL (e.g. for password reset tokens), the `token_urlsafe` can be used, which will yield a string approximately 25% shorter

There are other ways to generate random bytes in Python but using secrets is common practice since Python 3.6. For other options see this [detailed answer](https://stackoverflow.com/questions/42190663/questions-about-python3-6-os-urandom-os-getrandom-secrets).

In [55]:
import secrets

lenght = 15

print(f"              Secure Random Bytes: {secrets.token_bytes(lenght)}")
print(f"        Secure Random Bytes (Hex): {secrets.token_hex(lenght)}")
print(f"Secure Random Bytes (Hex URLSafe): {secrets.token_urlsafe(lenght)}") # See URLSafe Chapter

              Secure Random Bytes: b'\xf8J\x96\x89\xa50\xb0\xf5;c=\x84DZ\x03'
        Secure Random Bytes (Hex): d5dc04032cc6028e220917821c3922
Secure Random Bytes (Hex URLSafe): 5exkWi2QSTfWPKwWt3BA


### Comparing Secrets

To avoid [timming attacks](https://www.wikiwand.com/en/Timing_attack), it is important to **NOT** use `==` when comparing secrets. For that the `secrets` module exposes a method [`compare_digest`](https://docs.python.org/3/library/secrets.html#secrets.compare_digest) which is actually a wrapper of the [`hmac`](https://docs.python.org/3/library/hmac.htm) module [analogous method](https://docs.python.org/3/library/hmac.html#hmac.compare_digest).

For a demonstration of this type of attack, see this [demo](https://www.youtube.com/watch?v=XThL0LP3RjY).

In [76]:
import secrets

# Excesively large lenght for better illustration
lenght = 1000
real_token = secrets.token_bytes(lenght)
guess_token_all_wrong = secrets.token_bytes(lenght)
guess_token_all_but_one = real_token[:-1] + secrets.token_bytes(1)

print(f"Is short guess the real? {secrets.compare_digest(real_token, guess_token_all_wrong)}")
print(f"Is long guess the real? {secrets.compare_digest(real_token, guess_token_all_but_one)}")
print(f"Is real guess the real? {secrets.compare_digest(real_token, real_token)}")

Is short guess the real? False
Is long guess the real? False
Is real guess the real? True
