<a href="https://colab.research.google.com/github/V-Nayak/python-basics/blob/main/Hashing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **What Is Hashing in Python?**

Hashing converts input data, such as a string, file, or object, into a fixed-size string of bytes. The hash or digest represents the input in a unique and reproducible way.

Hashing plays a significant role in detecting data manipulation and enhancing security. It can compute a hash value for a file, message, or other piece of data. An application stores the hash securely to verify later that the data has not been tampered with.

## **Python’s Built-In Hashing Function**

Python’s built-in hashing function, hash(), returns an integer value representing the input object. The code then uses the resulting hash value to determine the object’s location in the hash table. This hash table is a data structure that implements dictionaries and set

In [5]:
my_string = "hello world"

# Calculate the hash value of the string
hash_value = hash(my_string)

# Print the string and its hash value
print("String: ", my_string)
print("Hash value: ", hash_value)

String:  hello world
Hash value:  4727557598447582474


# The hash value is same within single instance and different when invoked in second instance

In [6]:
my_string = "hello world"

# Calculate the hash value of the string
hash_value1 = hash(my_string)
hash_value2 = hash(my_string)

# Print the string and its hash value
print("String: ", my_string)
print("Hash value: ", hash_value1)
print("Hash value: ", hash_value2)

String:  hello world
Hash value:  4727557598447582474
Hash value:  4727557598447582474


## **Hashing Use Cases**
Once you have an adequate hashing function with all these characteristics, you can apply it to various use cases. Hashing functions work well for:

Password storage —
Hashing is one of the best ways to store user passwords in modern systems. Python combines various modules to hash and secure passwords before storing them in a database.

Caching —
Hashing stores a function’s output to save time when calling it later.

Data retrieval —
Python uses a hash table with a built-in dictionary data structure to quickly retrieve values by key.

Digital signatures —
Hashing can verify the authenticity of messages that have digital signatures.

File integrity checks —
Hashing can check a file’s integrity during its transfer and download.

## **Limitations of Hashing**
Although Python’s hash function is promising for various use cases, its limitations make it unsuitable for security purposes. Here’s how:

Collision attacks —
A collision occurs when two different inputs produce the same hash value. An attacker could use the same input-making method to bypass security measures that rely on hash values for authentication or data integrity checks.

Limited input size —
Since hash functions produce a fixed-sized output regardless of the input’s size, an input larger in size than the hash function’s output can cause a collision
.
Predictability —
A hash function should be deterministic, giving the same output every time you provide the same input. Attackers might take advantage of this weakness by precompiling hash values for many inputs, and then comparing them to target value hashes to find a match. This process is called a rainbow table attack.
To prevent attacks and keep your data safe, use secure **hashing algorithms** designed to resist such vulnerabilities.

## **Using hashlib for Secure Hashing in Python**
Instead of using the built-in Python hash(), use **hashlib** for more secure hashing. This Python module offers a variety of hash algorithms to hash data securely. These algorithms include MD5, SHA-1, and the more secure SHA-2 family, including SHA-256, SHA-384, SHA-512, and others.

## **MD5**
The widely used cryptographic algorithm MD5 reveals **a 128-bit hash value.**

In [7]:
import hashlib

text = "welcome"
hash_object = hashlib.md5(text.encode())
print(hash_object.hexdigest())

40be4e59b9a2a2b5dffb918c0e86b3d7


## **SHA-1**
The SHA-1 hash function secures data by making **a 160-bit hash value**.  

In [8]:
text = "Welcome"
hash_object = hashlib.sha1(text.encode())
print(hash_object.hexdigest())

ca4f9dcf204e2037bfe5884867bead98bd9cbaf8


# **SHA-256**
There are various hash options in the SHA-2 family. The hashlib SHA-256 constructor generates a more secure version in that family with a 256-bit hash value.

Programmers often use SHA-256 for cryptography, like digital signatures or message authentication codes

In [10]:
hash_object = hashlib.sha256(text.encode())
print(hash_object.hexdigest())

0e2226b5235f0ff94a276eb4d07a3bfea74b7e3b8b85e9efca6c18430f041bf8


## **Limitations of Hashing**
Although Python’s hash function is promising for various use cases, its limitations make it unsuitable for security purposes. Here’s how:

Collision attacks — A collision occurs when two different inputs produce the same hash value. An attacker could use the same input-making method to bypass security measures that rely on hash values for authentication or data integrity checks.(the **MurmurHash function** is often used for feature hashing because it tends to produce fewer collisions than other hashing functions.)
Limited input size — Since hash functions produce a fixed-sized output regardless of the input’s size, an input larger in size than the hash function’s output can cause a collision.
Predictability — A hash function should be deterministic, giving the same output every time you provide the same input. Attackers might take advantage of this weakness by precompiling hash values for many inputs, and then comparing them to target value hashes to find a match. This process is called a rainbow table attack.
To prevent attacks and keep your data safe, use secure hashing algorithms designed to resist such vulnerabilities.