# PostgreSQL & Python Hash Functions

In this notebook, we explore hash functions for textual data in PostgreSQL and Python:
- PostgreSQL: `digest()` with 'md5' and 'sha256' from the `pgcrypto` extension.
- Python equivalents: `hashlib.md5()`, `hashlib.sha256()`
- Concept of hash collisions
- Example: storing hash values for integrity checks
- Educational exercise: find collisions in a trivial hash function

We'll use a dedicated table `hash_examples` for demonstration.

In [8]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [9]:
%sql postgresql://fahad:secret@localhost:5432/people

---
## 1. Create a Table for Hash Examples

We'll create `hash_examples` to hold sample strings and store their hash values.

In [10]:
%%sql
DROP TABLE IF EXISTS hash_examples;
CREATE TABLE hash_examples (
    id SERIAL PRIMARY KEY,
    sample_text VARCHAR(100)
);

 * postgresql://fahad:***@localhost:5432/people
Done.
Done.


[]

In [11]:
%%sql
INSERT INTO hash_examples (sample_text) VALUES
('password123'),
('hello world'),
('fahad_shah'),
('pythonrocks');

 * postgresql://fahad:***@localhost:5432/people
4 rows affected.


[]

---
## 2. PostgreSQL Hash Functions

PostgreSQL's advanced hash functions like `DIGEST()` are part of the `pgcrypto` extension, which must be enabled first.

In [12]:
%%sql
CREATE EXTENSION IF NOT EXISTS pgcrypto;

 * postgresql://fahad:***@localhost:5432/people
Done.


[]

In [13]:
%%sql
SELECT
    sample_text,
    ENCODE(DIGEST(sample_text, 'md5'), 'hex') AS md5_hash,
    ENCODE(DIGEST(sample_text, 'sha256'), 'hex') AS sha256_hash
FROM hash_examples;

 * postgresql://fahad:***@localhost:5432/people
4 rows affected.


sample_text,md5_hash,sha256_hash
password123,482c811da5d5b4bc6d497ffa98491e38,ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f
hello world,5eb63bbbe01eeed093cb22bb8f5acdc3,b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9
fahad_shah,49b53a7757d01474f6b9b60e60c4930e,fe33388b1237c29b92510e76609d7850e39133f4849ab791ef7947ca841b2a6d
pythonrocks,f553a2caad5cfa4673928ad9e507ac51,426a8d3a8b027fefd39ddb2b66a7c697e229ba1e75b8da2abbbfd622444f3b7e


---
## 3. Python `hashlib` Comparison

Python's built-in `hashlib` library produces the exact same hash values, demonstrating the standardization of these algorithms.

In [14]:
import hashlib

texts = ['password123', 'hello world', 'fahad_shah', 'pythonrocks']

for t in texts:
    md5 = hashlib.md5(t.encode()).hexdigest()
    sha256 = hashlib.sha256(t.encode()).hexdigest()
    print(f"Text: {t}\nMD5: {md5}\nSHA256: {sha256}\n")

Text: password123
MD5: 482c811da5d5b4bc6d497ffa98491e38
SHA256: ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f

Text: hello world
MD5: 5eb63bbbe01eeed093cb22bb8f5acdc3
SHA256: b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9

Text: fahad_shah
MD5: 49b53a7757d01474f6b9b60e60c4930e
SHA256: fe33388b1237c29b92510e76609d7850e39133f4849ab791ef7947ca841b2a6d

Text: pythonrocks
MD5: f553a2caad5cfa4673928ad9e507ac51
SHA256: 426a8d3a8b027fefd39ddb2b66a7c697e229ba1e75b8da2abbbfd622444f3b7e



---
## 4. Concept: Hash Collisions

- Two different inputs producing the same hash output is called a **collision**.
- This is highly unlikely in secure algorithms like SHA256 but is common in simpler, non-cryptographic hash functions.

In [15]:
# Educational: Trivial hash function (sum of ord values modulo 10)
def trivial_hash(s):
    return sum(ord(c) for c in s) % 10

examples = ['abc', 'acb', 'bac', 'bca', 'cab', 'cba']
for e in examples:
    print(f"{e} -> hash: {trivial_hash(e)}")

# Notice collisions: All permutations of 'abc' produce the same trivial hash

abc -> hash: 4
acb -> hash: 4
bac -> hash: 4
bca -> hash: 4
cab -> hash: 4
cba -> hash: 4


---
## Notes

- MD5 is considered cryptographically broken and should not be used for security; use SHA-256 or stronger.
- PostgreSQL's `pgcrypto` is a powerful tool for in-database encryption and hashing.
- Hashing is a one-way process; you cannot reverse a hash to get the original text.
- Always store hash values instead of plaintext for sensitive information like passwords.