# Hash Table Implementation Documentation

## Node Class

### `Node(key, value)`

- **Description:**
  - A class representing a node in the hash table.
- **Parameters:**
  - `key` (any): The key associated with the node.
  - `value` (any): The value associated with the node.
- **Attributes:**
  - `key` (any): The key associated with the node.
  - `value` (any): The value associated with the node.
  - `next` (Node): Reference to the next node in the case of collisions.

## HashTable Class

### `HashTable(size, dispersion_function)`

- **Description:**
  - A class representing a simple hash table.
- **Parameters:**
  - `size` (int): The size of the hash table.
  - `dispersion_function` (function): The hash function for dispersing keys.
- **Attributes:**
  - `size` (int): The size of the hash table.
  - `table` (list): The list representing the hash table.
  - `dispersion_function` (function): The hash function used for key dispersion.

### `insert(key, value)`

- **Description:**
  - Inserts a key-value pair into the hash table.
- **Parameters:**
  - `key` (any): The key of the element to be inserted.
  - `value` (any): The value associated with the key.
- **Steps:**
  1. Calculate the hash index using the dispersion function.
  2. If the index is empty, create a new node with the key and value and place it at the index.
  3. If the index is not empty, traverse the linked list until the end and add a new node with the key and value.

### `search(key)`

- **Description:**
  - Searches for a key in the hash table and returns the associated value.
- **Parameters:**
  - `key` (any): The key to search for.
- **Returns:**
  - `value` (any): The value associated with the key if found, otherwise `None`.

### `print()`

- **Description:**
  - Prints the contents of the hash table.
- **Steps:**
  1. Loop through each index in the hash table.
  2. If the index is not empty, print the key and value of each node in the linked list.

## `eliminate_duplicate` Function

### `eliminate_duplicate(dataset_path)`

- **Description:**
  - Reads a dataset from a CSV file, eliminates duplicate entries, and stores the unique entries in a hash table.
- **Parameters:**
  - `dataset_path` (str): The path to the CSV dataset file.
- **Returns:**
  - `table_hash` (HashTable): A hash table containing unique key-value pairs from the dataset.

## Example Usage

```python
# Example Usage
dataset_path = 'hashwork2.csv'

# Create a hash table with unique entries
table_duplicates = eliminate_duplicate(dataset_path)

# Print the unique entries in the hash table
table_duplicates.print()


In [None]:
import pandas as pd

class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.next = None


class HashTable:
    def __init__(self, size, dispersion_function):
        self.size = size
        self.table = [None] * size
        self.dispersion_function = dispersion_function

    def insert(self, key, value):
        index = self.dispersion_function(key)
        if self.table[index] is None:
            self.table[index] = Node(key, value)
        else:
            current = self.table[index]
            while current.next:
                current = current.next
            current.next = Node(key, value)

    def search(self, key):
        index = self.dispersion_function(key)
        current = self.table[index]
        while current:
            if current.key == key:
                return current.value
            current = current.next
        return None

    def print(self):
        for i in range(self.size):
            if self.table[i] == None:
                pass
            else:
                print(self.table[i].key,self.table[i].value)

def eliminate_duplicate(dataset_path):
    table_hash = HashTable(size=120, dispersion_function = lambda x: x)

    with open(dataset_path, 'r') as file:
        data = pd.read_csv(dataset_path)
        for i in range(data.shape[0]):
            key = data.iloc[i,0]
            value = data.iloc[i,1]
            if table_hash.search(key) is None:
                table_hash.insert(key, value)
            else:
                pass

    return table_hash

In [None]:
dataset_path = 'hashwork2.csv'
data = pd.read_csv(dataset_path)
display(data)

Unnamed: 0,Nome,CPF
0,0,921839647
1,1,891260418
2,2,428333760
3,3,889012710
4,4,329709734
...,...,...
115,35,412670347
116,36,455566596
117,37,385159824
118,38,609404242


In [None]:
table_duplicates = eliminate_duplicate(dataset_path)
table_duplicates.print()

0 921839647
1 891260418
2 428333760
3 889012710
4 329709734
5 293922567
6 677278754
7 535599646
8 227574400
9 540762707
10 759666623
11 109874223
12 573294108
13 318974492
14 974158098
15 579693882
16 400527797
17 628966798
18 656045727
19 669405406
20 526409491
21 140981520
22 119545694
23 119021991
24 307732353
25 245137317
26 696777790
27 520153550
28 702115240
29 599920477
30 438168227
31 446442570
32 481998390
33 969285225
34 871279987
35 412670347
36 455566596
37 385159824
38 609404242
39 105555225
