# Search

## Linear Search and Using Indirection to Access Elements

In [0]:
import time
import sys

The purpose of *python_search()* function is to determine whether Python's *in* operator uses linear search for identifying existence of the element in the sequence.

In [0]:
def python_search(L, e):
  print('List length:', len(L))
  print('Size:', sys.getsizeof(L))
  print('Searching for', e)
  t1 = time.time()
  if e in L:
    print('Found it')
  else:
    print('Does not exist in the list')
  t2 = time.time()
  print('Time consumed:', t2 - t1)

For packing/unpacking in Python, refer [here](https://stackoverflow.com/questions/6967632/unpacking-extended-unpacking-and-nested-extended-unpacking)

In [3]:
L1 = [*range(1000000)]
L2 = [*range(10000000)]
L3 = [*range(100000000)]
python_search(L1, 999999)
python_search(L2, 9999999)
python_search(L3, 99999999)

List length: 1000000
Size: 9000112
Searching for 999999
Found it
Time consumed: 0.012149810791015625
List length: 10000000
Size: 90000112
Searching for 9999999
Found it
Time consumed: 0.12440061569213867
List length: 100000000
Size: 900000112
Searching for 99999999
Found it
Time consumed: 1.1572279930114746


*linear_search_sorted()* improves the average running time. 
However, it does not change the work-case complexity fo the algorithm. 
In worst case each element of L is examined.

In [0]:
def linear_search_sorted(L, e):
  """Assumes L is a list, the elements of which are in ascending order.
    Returns True if e is in L and False otherwise"""
  print('List length:', len(L))
  print('Size:', sys.getsizeof(L))
  print('Searching for', e)
  t1 = time.time()       
  for i in range(len(L)):
    if L[i] == e:
      t2 = time.time()
      print('Time consumed:', t2 - t1)
      return True
    if L[i] > e:
      t2 = time.time()
      print('Time consumed:', t2 - t1)
      return False
  t2 = time.time()      
  print('Time consumed:', t2 - t1)
  return False

In [0]:
print(linear_search_sorted(L1, 99999))
print(linear_search_sorted(L1, 55555))
print(linear_search_sorted(L1, 9999999))

## Binary Search
### Pre-condition
Binary Search assumes that the list to search is sorted in ascending order

### Steps
1. Pick an index, i, that divides the list L roughly in half.
2. Ask if L[i] == e.
3. If not, ask whether L[i] is larger or smaller than e
4. Depending upon the answer, search either the left or right half of L for e.

### Why not call b_search() directly?
*binary_search_rec()* is a **wrapper function**. It is a pass through provides an interface for client codes. 

*b_search()* function is not called directly, because callers should not be concerns with parameters *low* and *high*. These parameters are implementation details and should be abstracted. 

The complexity of *bSearch()* depends on the number of recursive calls.

### Decrementing function
- The decrementing function for *b_search()* is high - low
- For each recursive call, the value of the decrementing function is less than the value of the previous recursion call. 
- When its value is 0, the recursion terminates.


In [0]:
  def binary_search_rec(L, e):
    """Assumes L is a list, the elements of which are in ascending order.
       Returns True if e is in L and False otherwise"""
    
    def b_search(L, e, low, high):
        # Decrements high - low
        if high == low:
            return L[low] == e
        mid = (low + high) // 2
        if L[mid] == e:
            return True
        elif L[mid] > e:
            if low == mid: # nothing left to search
                return False
            else:
                return b_search(L, e, low, mid -1)
        else:
            return b_search(L, e, mid + 1, high)
        
    if len(L) == 0:
        return False
    else:
        return b_search(L, e, 0, len(L) - 1)

In [0]:
binary_search_rec(L1, 999999)

# Sorting

## sequence.sort() vs sorted()

In [0]:
import random
random_list = random.sample(range(1000), 10)

In [0]:
sorted_random_list = sorted(random_list)

In [0]:
print(random_list)
print(sorted_random_list)

In [0]:
random_list.sort()

In [0]:
random_list

### Selection Sort
Prefix Sorted L[0, i], suffix L[i+1,len(L)] whose smallest element is larger than all elements in prefix.

- Base Case: At start of first iteration, prefix is empty, suffix is the entire list.
- Induction step: move one element from the suffix to the prefix.
  - Append minimum element of suffic to end of prefix.
- Termination: When loop is exited, prefix includes the entire list and suffix is empty.


The complexity of sel_sort is *O(len(L)**2)*, quadratic in length of L.
- The complexity of the inner loop is O(len(L)).
- The complexity of the outer loop is also O(len(L))


In [0]:
def sel_sort(L):
  """Assumes that L is a list of elements that can be compared
    using >. Sorts L in ascending order"""
  suffix_start = 0
  while suffix_start != len(L):
    # look at each element in suffix
    for i in range(suffix_start, len(L)):
      if L[i] < L[suffix_start]:
        #swap position of elements
        L[suffix_start], L[i] = L[i], L[suffix_start]
    suffix_start += 1

## Merge Sort
Divide and conquer algorithm:
- Threshold input size, below which problem is not subdivided
- Size and number of sub-instances into which instance is split
- Algorithm used to combine sub-solutions.
- Threshold is sometimes called recursive base.

Complexity of merge_sort is log-linear.

In [0]:
def merge(left, right, compare):
    """Assumes left and right are sorted lists and 
       compare defines an ordering on the elements.
       Returns a new sorted (by compare) list containing
       the same elements as (left + right) would contain."""
    result = []
    i, j = 0, 0
    while i < len(left) and j < len(right):
        if compare(left[i], right[j]):
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    while (i < len(left)) :
        result.append(left[i])
        i += 1
    while (j < len(right)):
        result.append(right[j])
        j += 1
    return result

def merge_sort(L, compare = lambda x, y: x < y):
    """Assumes L is a list, compare defines an ordering
       on elements of L.
       Returns a new sorted list with the same elements as L"""
    if len(L) < 2:
        return L[:]
    else:
        middle = len(L) // 2
        left = merge_sort(L[:middle], compare)
        right = merge_sort(L[middle:], compare)
        return merge(left, right, compare)

In [0]:

print(merge_sort([1, 5, 12, 18, 19, 20, 2, 3, 4, 17]))
L = [2, 1, 4, 5, 3]
print(merge_sort(L), merge_sort(L, lambda x, y: x > y))

### Exploiting Functions as Parameters

In [0]:
def last_name_first_name(name1, name2):
    arg1 = name1.split(' ')
    arg2 = name2.split(' ')
    if arg1[1] != arg2[1]:
        return arg1[1] < arg2[1]
    else: #last names the same, sort by first name
        return arg1[0] < arg2[0]

def first_name_last_name(name1, name2):
    arg1 = name1.split(' ')
    arg2 = name2.split(' ')
    if arg1[1] != arg2[1]:
        return arg1[0] < arg2[0]
    else: #first names the same, sort by last name
        return arg1[1] < arg2[1]

In [0]:
L =  ['Tom Brady', 'Eric Grimson', 'Gisele Bundchen']
new_L = merge_sort(L, last_name_first_name)
print('Sorted by last name =', new_L)
new_L = merge_sort(L, first_name_last_name)
print('Sorted by first name =', new_L)

### Sorting in Python

In [0]:
L = [3, 5, 2]
D = {'a':12, 'c':5, 'b':'dog'}
print(sorted(L))
print(L)
L.sort()
print(L)

### Sorting Dictionary
- When *sorted()* function is applied to a dictionary, it returns a sorted list of the keys of the dictionary. 
- When *sort()* method is applied to a dictionary, it causes an exception to be raised. There is no method *sort()* in dictionary. 

In [0]:
print(sorted(D))
D.sort()

### Additional parameters for *list.sort()* and *sorted()* function
- *key* parameter: comparison function to be used
- *reverse* parameter: specifies whether to sort in ascending or descending order

In [0]:
L = [[1, 2, 3], (3, 2, 1, 0), 'abc']
print(sorted(L, key = len, reverse = True))

## Hash Tables

In [0]:
class intDict(object):
    """A dictionary with integer keys"""
    
    def __init__(self, num_buckets):
        """Create an empty dictionary"""
        self.buckets = []
        self.num_buckets = num_buckets
        for i in range(num_buckets):
            self.buckets.append([])
    
    def add_entry(self, key, dict_val):
        """Assumes key an int. Adds an entry."""
        hash_bucket = self.buckets[key%self.num_buckets]
        for i in range(len(hash_bucket)):
            if hash_bucket[i][0] == key:
                hash_bucket[i] = (key, dict_val)
                return
        hash_bucket.append((key, dict_val))

    def get_value(self, key):
        """Assumes key an int.
           Returns vlaue associated wtih key"""
        has_bucket = self.buckets[key%self.num_buckets]
        for e in  has_bucket:
            if e[0] == key:
                return e[1]
        return None
    
    def __str__(self):
        result = '{'
        for b in self.buckets:
            for e in b:
                result = result + str(e[0]) + ':' + str(e[1]) + ','
        return result[:-1] + '}' #result[:-1] omits last comma

In [0]:
import random
D = intDict(17)
for i in range(20):
    # choose a random in in the range 0 to 10**5 - 1
    key = random.choice(range(10**5))
    D.add_entry(key, i)
print('The value of the intDict is:')
print(D)
print('\n', 'The buckets are:')
for hash_bucket in D.buckets: # violates abstraction barrier
    print(' ' , hash_bucket)
            

### Complexity of *getValue()*
- If there were no collision, O(1), since each bucket will be of length 0 or 1
- If everything hashed to the same bucket (collision), it will be O(n), where O is the number of entries in the dictionary. 
- By making the hash table large enough, number of collisions can be reduced (trade off between space and time)