# Module 5: Searching and Hashing

## Searching

A searching algorithm typically answers `True` or `False` as to whether an item is present in the data. 

### Sequential Search

A sequential search traverses all items in the list until the desired element is found. This means that best case, when our element is the first item in the list, this type of function is O(1). If the element is not in the list, we had to traverse the entire list to verify that, and this function is O(n). A `True` would have an average case of O(n/2), while a `False` must always be O(n).

The built-in `in` function is a sequential search operates this way.

In [12]:
original_list = [54,26,93,17,77,31]

print(53 in original_list)
print(77 in original_list)

False
True


### Binary Search

This "divide and conquer" method reduces the search space by using sorted data.

If we know our desired element is less than the middle, we only need to search the first half. Similarly, if it is greater than the middle, we only need to search the second half.

By determining if the target element is greater or less than the middle value of a list, and continuing to make this determination until we either find the exact value or end up with an empty list, we make a O(logn) function. As sorting takes O(nlogn), we have drastically refined our process.

The following is an illustration of the process, but not an actual example of the practice of how the code is written.

In [93]:
binary_list = [6,13,14,25,33,43,51,53,64,72,84,93,95,96,97]
target = 33


def binary_search(alist,target,tlo=False,thi=False):
    hi = len(alist) - 1
    low = 0

    if hi == 0:
        if alist[low] == target:
            return f"The index of the target value is {thi}"
        else:
            return f"Target could not be found"
        
    if alist[low] <= target:
        if alist[hi] >= target:
            mid_index = len(alist)//2
            if binary_list[mid_index] > target:
                thi -= (hi - mid_index + 2)
                hi = mid_index
                searching = alist[low:hi]
                if len(searching) == 1:
                    tlo = thi
            else:
                tlo += low + mid_index
                low = mid_index
                searching = alist[low:hi]
                if len(searching) == 1:
                    thi = tlo




    return searching, tlo, thi



In [97]:
first_cut, tlo, thi = binary_search(binary_list,target,0,len(binary_list))
print(first_cut, tlo, thi)

[6, 13, 14, 25, 33, 43, 51] 0 6


In [98]:
second_cut, tlo, thi = binary_search(first_cut,target,tlo,thi)
print (second_cut, tlo, thi)

[25, 33, 43] 3 6


In [99]:
third_cut, tlo, thi = binary_search(second_cut, target,tlo,thi)
print(third_cut,tlo,thi)

[33] 4 4


In [100]:
print(binary_search(third_cut,target,tlo,thi))
print(binary_list[4])

The index of the target value is 4
33


### Dictionary Search

Because a dictionary is a key, value pair, and retrieving any value from a key in a dictionary is O(1), searching for a dictionary by key is also O(1).

## Hashing

Python dictionaries are so efficient because they utilize **hashing**.

If we had a consecutive memory region with 11 slots, we can convert any potential item into something stored in one of those 11 slots by performing a mathematical function that only has 11 potential outcomes.

This function would be an example of a **hash function**, which converts a key into an index.

### Basic Hash Example

In [101]:
original_list = [54,26,93,17,77,31]
table = [None for i in range(11)]

def h(key):
    return key % 11

for item in original_list:
    table[h(item)]=item

print(table)

[77, None, None, None, 26, 93, 17, None, None, 31, 54]


### Retrieving from Hash

By using the hashing function, we cna determine the index from the key value.

Dictionaries use this by keeping two separate lists, which match 1:1. The key index will be the same as the value index.

In [105]:
target = 23
print(table[h(target)]==target)
target = 93
print(table[h(target)]==target)

False
True


### Collision Resolution

If two items map to the same index, there are a few ways this can be resolved.

**Linear probing** resolves this by finding the next empty slot. However, this also reduces the efficiency of retrieval. A search would start at the hashed index, but would continue until it finds empty slots in the hash table in case it was stored elsewhere via linear probing.

**Separate Chaining** involves creating a linked list at each hash table. This link is followed to access the next "collision" item.

A key to maximizing the efficiency of hash functions is to minimize collisions, but this is also limited by memory concerns.