Dan Shea  
2021-06-02  
#### Binary Search
http://rosalind.info/problems/bins/

Binary search is the ultimate divide-and-conquer algorithm. To find a key $k$ in a large file containing keys $A[1 \ldots n]$ in sorted order, we first compare $k$ with $A[n/2]$, and depending on the result we recurse either on the first half of the file, $A[1 \ldots n/2]$, or on the second half, $A[n/2+1 \ldots n]$. The recurrence now is $T(n)=T(n/2)+O(1)$. Plugging into the master theorem (with $a=1,b=2,d=0$) we get the familiar solution: a running time of just $O(logn)$.

Source: Algorithms by Dasgupta, Papadimitriou, Vazirani. McGraw-Hill. 2006.

##### Problem
The problem is to find a given set of keys in a given array.

__Given:__ Two positive integers $n \leq 105$ and $m \leq 105$, a sorted array $A[1 \ldots n]$ of integers from $−105$ to $105$ and a list of $m$ integers $−105 \leq k1,k2,\ldots,km \leq 105$.

__Return:__ For each $k_i$, output an index $1 \leq j \leq n$ s.t. $A[j]=k_i$ or $-1$ if there is no such index.

__Sample Dataset__
```
5
6
10 20 30 40 50
40 10 35 15 40 20
```
__Sample Output__
```
4 1 -1 -1 4 2
```

In [5]:
def binary_search_iter(n, l, idx):
    #print(n, l, idx)
    # if the list is empty, just return -1
    if l == []:
        return -1
    # if the list is a singleton, compare and return the idx or -1 based on results of the comparison
    if len(l) == 1:
        if n == l[0]:
            return idx
        else:
            return -1
    # if n equals the pivot of the search space, return the list idx
    if n == l[len(l)//2]:
        return idx
    # if n is less than the pivot, take the left partition
    if n < l[len(l)//2]:
        l = l[0:len(l)//2]
        # the offset from the current index in the list is length of the list minus the pivot
        # (i.e. - the number of elements in the right partition of the new search space plus the pivot)
        offset = len(l) - len(l)//2
        # update our list index
        idx -= offset
    # otherwise n is greater than the pivot, and we take the right partition
    else:
        l = l[len(l)//2+1:]
        # the offset is the new pivot plus 1 (pivot index is relative to the new search space!
        # (i.e. - if new pivot is 1 we would add 2, since there is an index 0 element)
        offset = len(l)//2 + 1
        idx += offset
    
    return binary_search_iter(n, l, idx)

def binary_search(n, l):
    # select an initial pivot by floor dividing the length of the list by 2
    idx = len(l)//2
    result = binary_search_iter(n, l, idx)
    # Rosalind expects 1-based indices, so either return the -1 if not found, or add one to the index found
    # since python is using 0-based indices
    if result == -1:
        return result
    else:
        return result + 1

In [6]:
def parse_input_return_answer(filename):
    with open(filename, 'r') as fh:
        # skip the first two lines of input as we don't need them
        next(fh)
        next(fh)
        # Get the list, strip the newline, split it on the spaces, map the values to integer and return the list
        l = list(map(int, next(fh).strip().split(' ')))
        # Get the search values
        vals = list(map(int, next(fh).strip().split(' ')))
        results = []
        for val in vals:
            results.append(binary_search(val, l))
        return ' '.join(map(str, results))

In [7]:
parse_input_return_answer('sample.txt')

'4 1 -1 -1 4 2'

In [9]:
parse_input_return_answer('rosalind_bins.txt')

'8042 3411 8856 5984 7880 3984 3330 2468 843 5576 2918 -1 3019 1432 1497 -1 2869 4684 5529 7064 97 -1 5019 2068 4815 8566 3003 -1 3295 712 5462 2593 1024 3714 1129 4424 1311 460 5525 7682 7005 5166 202 5990 8600 5811 91 2394 8489 6793 6535 8588 8545 1561 5211 3142 7081 1750 1687 5465 6367 2035 5546 5816 3548 740 7703 2667 3483 6683 3609 8913 2427 1680 -1 3495 4944 4203 8661 3098 3773 4078 1738 5742 8150 5567 7155 7546 7876 665 5435 5706 -1 6302 2529 3691 293 7590 5103 4351 4570 6337 1421 2030 313 3419 5282 4573 4520 5842 973 4238 8827 2304 6254 336 3404 7054 3853 5925 5553 6689 8692 6054 68 5171 5983 5937 2593 7824 3955 4002 3282 5296 3941 2798 2403 532 7060 7000 443 2597 1056 1481 1496 1024 4111 902 5634 1401 5972 1768 204 8557 7117 6563 6333 375 -1 6197 2044 5929 5414 5511 7075 4071 4198 8764 1925 3978 374 8030 862 5551 4752 7265 1050 8744 5809 5230 6447 8298 2592 2598 7362 7121 1411 882 8594 550 4113 969 -1 2426 3547 93 6247 6879 3631 1114 7709 1077 5685 8696 6420 5729 6105 6982 486