# **Hash Table**

## **Introduction:**
### In the ever-expanding realm of data science and machine learning, efficient data storage and retrieval are vital. One indispensable data structure that plays a significant role in these domains is the Hash Table. In this blog, we will explore the fundamentals of Hash Tables and shed light on their real-world applications, showcasing their potential to revolutionize data processing in various industries.

## **What is a Hash Table?**
### A Hash Table, also known as a Hash Map, is a data structure that enables rapid data insertion, retrieval, and search operations. It leverages a technique called hashing to map keys to corresponding values, allowing for direct access to stored data. This unique characteristic makes Hash Tables highly efficient, especially when dealing with large datasets.

## **Hash Table Applications in Data Science:**

### **Data Indexing and Searching:** Hash Tables excel at indexing and searching operations. In data science, this capability proves invaluable when building search engines, recommendation systems, and data catalogs. Hash Tables allow for quick lookup and retrieval of information based on specific search criteria, enhancing user experiences and improving system performance.

### **Data Deduplication:** Duplicate data can be a major hurdle in data processing and analysis. Hash Tables facilitate efficient deduplication by hashing unique identifiers or attributes and storing them as keys. This approach enables the identification and removal of redundant data entries, saving storage space and enhancing data quality.

### **Caching:** In data science pipelines, caching frequently accessed data can significantly boost computational efficiency. Hash Tables can serve as caching mechanisms, storing precomputed results or intermediate data. This reduces computational overhead and speeds up iterative processes, enabling faster data analysis and model training.

## **Hash Table Applications in Machine Learning:**

### **Feature Engineering:** Feature engineering is a critical step in building machine learning models. Hash Tables can be used to efficiently encode categorical variables into numerical representations. By hashing categories and mapping them to fixed-length vectors, Hash Tables enable the handling of large and high-cardinality categorical features in a memory-efficient manner.

### **Text Mining and Natural Language Processing (NLP):** Hash Tables find applications in NLP tasks, such as document classification, sentiment analysis, and information retrieval. Hashing functions can convert text data into numerical representations, known as feature hashing or the "hashing trick." This technique enables dimensionality reduction and efficient handling of large text corpora, facilitating faster model training and inference.

### **Distributed Computing:** In distributed machine learning frameworks, Hash Tables are utilized for parallel and distributed processing. They allow for the efficient distribution of data across nodes by mapping data partitions to specific computing resources. This enables scalable and distributed training of machine learning models on large datasets, improving performance and reducing processing time.

In [3]:
class Hashtable:
    def __init__(self, size=7):
        """
        Initializes a Hashtable object.

        Args:
            size (int): The size of the hashtable. Defaults to 7.
        """
        # Initialize a Hashtable object
        self.data_map = [None] * size

    def __hash(self, key):
        """
        Private method to calculate the hash value for a given key.

        Args:
            key (str): The key to calculate the hash value for.

        Returns:
            int: The hash value of the key.
        """
        # Private method to calculate the hash value for a given key
        my_hash = 0
        for letter in key:
            # Calculate the hash value by summing the ASCII values of the characters in the key
            my_hash = (my_hash + ord(letter) * 23) % len(self.data_map)
        return my_hash

    def print_table(self):
        """
        Prints the contents of the hashtable.
        """
        for i, val in enumerate(self.data_map):
            print(i, ' : ', val)

    def set_item(self, key, value):
        """
        Sets a key-value pair in the hashtable.

        Args:
            key (str): The key of the item.
            value (any): The value of the item.
        """
        # Set a key-value pair in the hashtable
        index = self.__hash(key)
        if self.data_map[index] is None:
            # If the bucket is empty, create a new list
            self.data_map[index] = []
        # Append the key-value pair to the bucket
        self.data_map[index].append([key, value])

    def get_item(self, key):
        """
        Retrieves the value associated with a given key from the hashtable.

        Args:
            key (str): The key to retrieve the value for.

        Returns:
            any: The value associated with the key, or None if the key does not exist.
        """
        # Retrieve the value associated with a given key from the hashtable
        index = self.__hash(key)
        if self.data_map[index] is not None:
            for i in range(len(self.data_map[index])):
                if self.data_map[index][i][0] == key:
                    # If the key is found, return its value
                    return self.data_map[index][i][1]
        # If the key is not found, return None
        return None

    def keys(self):
        """
        Retrieves all the keys in the hashtable.

        Returns:
            list: A list of all the keys in the hashtable.
        """
        # Retrieve all the keys in the hashtable
        all_keys = []
        for i in range(len(self.data_map)):
            if self.data_map[i] is not None:
                for j in range(len(self.data_map[i])):
                    # Append each key to the list of keys
                    all_keys.append(self.data_map[i][j][0])
        return all_keys


# Test Code for Hash Table

In [6]:
# Create a Hashtable object
hash_table = Hashtable()

# Test the set_item() function
hash_table.set_item("apple", 10)
hash_table.set_item("banana", 20)
hash_table.set_item("cherry", 30)
hash_table.set_item("date", 40)
hash_table.set_item("mango", 50)
hash_table.set_item("pineapple", 60)
hash_table.set_item("tomato", 70)

# Test the print_table() function
hash_table.print_table()

# Test the get_item() function
print(hash_table.get_item("banana"))  # Output: 20
print(hash_table.get_item("date"))  # Output: 40
print(hash_table.get_item("grape"))  # Output: None

# Test the keys() function
print(hash_table.keys())  # Output: ['apple', 'banana', 'cherry', 'date']



0  :  [['banana', 20]]
1  :  None
2  :  [['date', 40]]
3  :  [['apple', 10], ['mango', 50]]
4  :  [['cherry', 30], ['tomato', 70]]
5  :  [['pineapple', 60]]
6  :  None
20
40
None
['banana', 'date', 'apple', 'mango', 'cherry', 'tomato', 'pineapple']


## **Conclusion:**
### Hash Tables are versatile data structures with immense potential in data science and machine learning. Their ability to enable fast data retrieval, indexing, and deduplication, coupled with their applications in feature engineering, NLP, caching, and distributed computing, make them indispensable tools in the modern data landscape. By leveraging Hash Tables, data scientists and machine learning practitioners can unlock new possibilities, expedite data processing, and drive innovation across various industries.