## Welcome! 

### This talk will introduce you to searching, sorting, and sharing using your favorite language and mine, Python!

### Python is a great language to quickly prototype and is backed by a great open-source community.

<img src="python_ecosystem.jpg">

**Note: I adapted a ton of images from <a href="http://interactivepython.org/runestone/static/pythonds/index.html">Interactive Python</a>, check them out if you want to learn more about Python**

### We'll start with searching, and grow outwards from there.

#### We want to start by importing numpy's random module, to generate random integers for our list of values

In [1]:
from IPython.display import display
import numpy.random as random

#### Let's initialize a list ```random_values``` of random values in the range [0,1000] using numpy's ```random.randint``` function (<a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.randint.html">documentation</a>)

In [2]:
# Will hold 1000 elements in the range 0 ~ 1000
random_values = [random.randint(1000) for x in range(1000)]

### So how would we go about finding a particular key value in the list?

Well, to be completely sure whether or not our key is in the list, we have to iterate through and ask: <br>

    "Is the value at my current index equal to my key value (the value I'm looking for)?"
    
If **yes**: "Awesome! Return ```True``` and break or whatever." <br>
If **not**: "Lame. Keep on looking, though."

#### Let's print out the contents of our ```random_values``` list

In [3]:
# Let's take a look at our values list
print(random_values) # alternatively, ```random_values``` would also print our list

[715, 479, 734, 712, 500, 162, 410, 397, 272, 198, 270, 256, 46, 952, 624, 202, 280, 466, 637, 167, 730, 916, 194, 894, 846, 339, 386, 515, 323, 207, 387, 981, 444, 616, 19, 316, 457, 522, 354, 559, 603, 387, 47, 143, 431, 448, 299, 753, 615, 282, 389, 536, 719, 511, 918, 917, 634, 805, 127, 991, 109, 407, 87, 237, 455, 79, 574, 692, 682, 878, 416, 451, 57, 595, 433, 399, 862, 202, 165, 767, 703, 709, 613, 350, 276, 722, 241, 878, 74, 690, 738, 194, 831, 346, 968, 480, 312, 716, 110, 580, 638, 497, 340, 899, 444, 521, 277, 129, 240, 305, 240, 635, 865, 961, 488, 823, 834, 567, 164, 193, 363, 638, 683, 435, 328, 921, 913, 315, 842, 897, 999, 153, 56, 160, 592, 923, 978, 880, 69, 447, 442, 358, 315, 298, 98, 634, 855, 763, 211, 307, 353, 413, 343, 408, 330, 451, 614, 7, 909, 866, 963, 156, 702, 872, 588, 962, 990, 690, 240, 621, 975, 449, 632, 989, 515, 52, 542, 97, 135, 146, 464, 758, 359, 814, 115, 44, 147, 650, 989, 516, 99, 63, 719, 297, 616, 233, 801, 258, 249, 89, 295, 960, 234, 89

## Now we can linearly search three ways: the bad way, the better way, and the Pythonic way.

<img src="linear_search.png">

### The bad way is to hard-code a loop through our values list.
_Thought experiment_: Why is this bad? It works fine, right?

Let's assume we're looking for the value ```516``` within our list. Let's code up our implementation (Note: for some of you, ```516``` won't be found in the list. That's fine!):

In [4]:
key = 516
for value in random_values:
    if value == key: 
        print("Found it!")
        break

Found it!


Here's what it looks like, visually:

<img src="linear_search_done.png">

You might have noticed that this isn't modular. We would have to rewrite each of these lines – key definition and iteration structure – for any possible list we want to iterate over. Kinda tedious.

### Now let's do it the semi-right way and make our linear search into a function

We define a function ```linear_search```. What will the function need? We'll need an iterable object (i.e. a list) and we'll need a key value to search for. <br>
So we say:

In [5]:
def linear_search(iterable, key):
    found = False
    for value in iterable:
        if value == key:
            found = True
            break
            
    return found

### Why is this better than hardcoding our loop? Because this way, we can do more complex tasks, say:

```Given all numbers in the range of 0 to 1000, check if each number is in the random_values list. 
If it is, print "Found x" (where x is the number)```

Let's do just that!

In [6]:
# Create our normal range from 0 to 1000
normal_values = [x for x in range(1000)]

# For each item in our normal_values list
for i in normal_values:
    # If our linear search evaluates to True, we print the number
    if linear_search(random_values, i) == True:
        print("Found {}!".format(i))

Found 2!
Found 4!
Found 5!
Found 6!
Found 7!
Found 8!
Found 9!
Found 10!
Found 11!
Found 12!
Found 13!
Found 14!
Found 16!
Found 17!
Found 19!
Found 20!
Found 21!
Found 22!
Found 23!
Found 24!
Found 28!
Found 29!
Found 32!
Found 34!
Found 37!
Found 38!
Found 39!
Found 44!
Found 45!
Found 46!
Found 47!
Found 49!
Found 51!
Found 52!
Found 53!
Found 54!
Found 55!
Found 56!
Found 57!
Found 59!
Found 62!
Found 63!
Found 64!
Found 65!
Found 66!
Found 67!
Found 68!
Found 69!
Found 70!
Found 72!
Found 74!
Found 76!
Found 77!
Found 78!
Found 79!
Found 81!
Found 82!
Found 83!
Found 84!
Found 85!
Found 87!
Found 89!
Found 92!
Found 95!
Found 96!
Found 97!
Found 98!
Found 99!
Found 101!
Found 102!
Found 103!
Found 106!
Found 109!
Found 110!
Found 112!
Found 114!
Found 115!
Found 116!
Found 117!
Found 118!
Found 119!
Found 120!
Found 121!
Found 124!
Found 127!
Found 128!
Found 129!
Found 131!
Found 132!
Found 135!
Found 136!
Found 138!
Found 139!
Found 142!
Found 143!
Found 144!
Found 146!
Found 14

### Now let's do it the Pythonic way
Python has this nifty inclusion operator called ``in`` that we use all the time in our loops! Let's revisit our two previous examples using the ``in`` operator.

In [7]:
key = 516
key in random_values

True

In [8]:
for i in normal_values:
    if i in random_values:
        print("Found {}!".format(i))

Found 2!
Found 4!
Found 5!
Found 6!
Found 7!
Found 8!
Found 9!
Found 10!
Found 11!
Found 12!
Found 13!
Found 14!
Found 16!
Found 17!
Found 19!
Found 20!
Found 21!
Found 22!
Found 23!
Found 24!
Found 28!
Found 29!
Found 32!
Found 34!
Found 37!
Found 38!
Found 39!
Found 44!
Found 45!
Found 46!
Found 47!
Found 49!
Found 51!
Found 52!
Found 53!
Found 54!
Found 55!
Found 56!
Found 57!
Found 59!
Found 62!
Found 63!
Found 64!
Found 65!
Found 66!
Found 67!
Found 68!
Found 69!
Found 70!
Found 72!
Found 74!
Found 76!
Found 77!
Found 78!
Found 79!
Found 81!
Found 82!
Found 83!
Found 84!
Found 85!
Found 87!
Found 89!
Found 92!
Found 95!
Found 96!
Found 97!
Found 98!
Found 99!
Found 101!
Found 102!
Found 103!
Found 106!
Found 109!
Found 110!
Found 112!
Found 114!
Found 115!
Found 116!
Found 117!
Found 118!
Found 119!
Found 120!
Found 121!
Found 124!
Found 127!
Found 128!
Found 129!
Found 131!
Found 132!
Found 135!
Found 136!
Found 138!
Found 139!
Found 142!
Found 143!
Found 144!
Found 146!
Found 14

## Let's talk about binary search
So linear search is cool and all, but what about something faster? Well, we can improve our searching if we *know* that our collection is in sorted order.

Let's sort our ```random_values``` list:

In [9]:
sorted_values = sorted(random_values)

Now we can take advantage of our sorted values and say: compare my ```key``` against the middle value within my list. From there, we evaluate:

**Is my ```key``` value greater than the list's value at the middle index? Is it less than? Equal to?**

If our ```key``` is found, then we're done. For our purposes, let's say our key is *greater than* the value at the middle of the list. Since our list is sorted, we **know** we won't find it *below* the middle value. Therefore, we can eliminate *half of our search space* and only consider the upper half of our list when re-searching.

<img src="bin_search.png">

Let's implement binary search recursively:

In [10]:
def rec_binary_search(list_of_values, key):
    # if our list is empty, we can't find key
    if len(list_of_values) == 0:
        return "{} was not found".format(key)
    else:
        
        mid = len(list_of_values) // 2
        if key > list_of_values[mid]:
            return rec_binary_search(list_of_values[mid+1:], key)
        elif key < list_of_values[mid]:
            return rec_binary_search(list_of_values[:mid], key)
        else:
            return "{} was found".format(key)

Let's implement binary search iteratively:

In [11]:
def iter_binary_search(list_of_values, key):
    left_index = 0
    right_index = len(list_of_values) - 1
    
    while (left_index <= right_index):
        mid = (left_index + right_index) // 2
        if key > list_of_values[mid]:
            left_index = mid + 1
        elif key < list_of_values[mid]:
            right_index = mid - 1
        else:
            return "{} was found".format(key)
        
    return "{} was not found".format(key)

In [12]:
list_of_values = [17, 20, 26, 31, 44, 54, 55, 65, 77, 93]
for value in list_of_values:
    print(rec_binary_search(list_of_values, value))
    print(iter_binary_search(list_of_values, value))

17 was found
17 was found
20 was found
20 was found
26 was found
26 was found
31 was found
31 was found
44 was found
44 was found
54 was found
54 was found
55 was found
55 was found
65 was found
65 was found
77 was found
77 was found
93 was found
93 was found


Cool! So we've that wraps up searching in Python. Next, we introduce sorting.

# Sorting

Sorting is the (not-so) simple task of taking a collection of values/objects/etc. and arranging them in a cohesive, sorted order (ascending *or* descending). We all (hopefully) know the basic sorting methods, including selection, insertion, and bubble sorts. 

We'll introduce two more advanced sorting methods -- mergesort and quicksort -- that are much quicker than the previously mentioned sorting algorithms. Note that we won't be covering the **very advanced, objectively necessary** <a href="http://rosettacode.org/wiki/Sorting_algorithms/Sleep_sort">sleep sort</a> or <a href="https://en.wikipedia.org/wiki/Bogosort">bogo sort</a> algorithms*.

With that, let's approach mergesort.

\* I'm totally kidding.


## Mergesort

Mergesort is a recursive sorting algorithm that recursively splits a list in half until it's dealing with a list of size 1. By definition, a list of size 0 or 1 is considered sorted. Ergo, if the list has a size of 2 or more, we recursively split it in half and invoke merge sort on both halves.

A vital part of merge sort is the *merge* operation, which is done once each half is sorted. Merging is where we take two sorted sub-arrays and merge them into a single, sorted, list.

Let's see if we can visualize it. Take our list:

                        | 10 | 12 | 34 | 58 | 43 | 25 | 19 | 61 | 49 | 32 |

We note that it's length is greater than 1, so (by our definition) it's unsorted. We first recursively break the problem into half by recursively calling mergesort on the first half of the list while the list's length is greater than 1. I've put an asterisk * next to the arrays that are considered sorted. Ergo:

                        | 10 | 12 | 34 | 58 | 43 | 25 | 19 | 61 | 49 | 32 |
                                   /                           \
            | 10 | 12 | 34 | 58 | 43 |                       | 25 | 19 | 61 | 49 | 32 |
                     /      \                                         /      \
          | 10 | 12 |        | 34 | 58 | 43 |              | 25 | 19 |        | 61 | 49 | 32 |
              / \                /      \                      / \                /      \
        | 10 |*  | 12 |*   | 34 |*       | 58 | 43 |     | 25 |*  | 19 |*   | 61 |*       | 49 | 32 |
                                             / \                                              / \
                                       | 58 |*  | 43 |*                                | 49 |*   | 32 |*
                                       
Once we've gotten to the point where each array is now of size 1, we join them back together in sorted order. Consider our right-most branch, containing ```49``` in our left array and ```32``` in our right array. In this case, we would iterate through both arrays and compare the available value (which, in this case, is only ```49``` and ```32```). Ergo, the result of joining them together will result in the array: ```| 32 | 49 |*```. Visually joining the lists back, we see:

        | 10 |*  | 12 |*       | 34 |*  | 58 |* | 43 |*  | 25 |*  | 19 |*  | 61 |*  | 49 |*  | 32 |*
        |--- merge ---|                 |-- merge ---|   |--- merge ---|            |--- merge ---|
          | 10 | 12 |*                   | 43 | 58 |*      | 19 | 25 |*               | 32 | 49 |*
                               |------- merge -------|                     |------- merge --------|
                                  | 34 | 43 | 58 |*                           | 32 | 49 | 61 |*
          |----------------- merge ------------------|     |--------------- merge ----------------|
                  | 10 | 12 | 34 | 43 | 58 |*                     | 19 | 25 | 32 | 49 | 61 |*
                  |------------------------------- merge ----------------------------------|
                            | 10 | 12 | 19 | 25 | 32 | 34 | 43 | 49 | 58 | 61 |*

Nothing to it, right? Let's implement a recursive mergesort:

In [18]:
def mergesort(list_of_values):

    # if we have a list longer than 1, break it up into two at the midpoint
    if len(list_of_values) > 1:
        
        # get our midpoint
        mid = len(list_of_values) // 2
        
        # break our list into two halfs -- left and right -- by splicing
        left_half = list_of_values[:mid]
        right_half = list_of_values[mid:]
        
        # recursively break the list into halves
        mergesort(left_half)
        mergesort(right_half)
        
        # at this point, we assume the left and right halves are sorted
        # hold index values for each halved array, plus our original
        left_index = 0
        left_length = len(left_half)
        
        right_index = 0
        right_length = len(right_half)
        
        merged_index = 0
        
        # iterate through each list while you can do so for both (very important!)
        # and compare the values at our respective indices
        while left_index < left_length and right_index < right_length:
        
            # for whichever value in either array, the *smaller* is placed into the merged array
            # and increment the respective index to look at the next value
            if left_half[left_index] < right_half[right_index]:
                list_of_values[merged_index] = left_half[left_index]
                left_index += 1
            
            elif left_half[left_index] >= right_half[right_index]:
                list_of_values[merged_index] = right_half[right_index]
                right_index += 1
        
            # regardless, we increase the merged list's index
            merged_index += 1
    
        # if we've run through one of the halves, we can just copy the other
        # these two blocks of code do just that: copy whatever's left (if anything)
        # in both arrays into the merged list_of_values
        while left_index < left_length:
            list_of_values[merged_index] = left_half[left_index]
            left_index += 1
            merged_index += 1

        while right_index < right_length:
            list_of_values[merged_index] = right_half[right_index]
            right_index += 1
            merged_index += 1

Let's test out our merge sort on our list ```random_values```

In [19]:
copy = random_values
mergesort(random_values)

Sounds good! Now we move onto *quicksort*

## Quicksort
Quicksort is, like Mergesort, a divide-and-conquer algorithm but has the added benefit of being in-place (that said, space isn't worth much these days) but, anecdotally, faster than Mergesort by a non-insignificant amount (usually). Whereas Mergesort is guaranteed an *nlog(n)* runtime – where *n* is the size of the input data – Quicksort can vary between *nlog(n)* (although anecdotally faster than Mergesort) and *n^2* in a worst-case scenario.

So how's Quicksort like? Quicksort, for starters, is tricky because we'll need to define three different functions that make up the entirety of Quicksort — a general wrapper, a splitter, and a partition-solver. Under the hood, Quicksort uses the idea of a **pivot**, which will be a sort of "filter" for our values. The pivot's purpose is to split the list into recursive-subproblems once it has reached its position within the final, sorted array (also called the **split position**).

There are many different ways to pick a pivot, but let's go with something easy: the first value in the collection.



## Feel free to check out any of the following links for more resources on Python:
<a href="http://interactivepython.org/runestone/static/pythonds/index.html">Interactive Python</a>: A great open source repository for interactive textbooks, including one on problem solving in Python.
<a href="http://interactivepython.org/runestone/static/pythonds/SortSearch/TheMergeSort.html">Mergesort</a>

<a href="http://nbviewer.ipython.org/">nbviewer</a>

<a href="http://www.amazon.com/Python-Cookbook-Alex-Martelli/dp/0596007973/">Python Cookbook</a>: Good collection of problems to solve and projects to undertake using Python <br>
<a href="http://flask.pocoo.org/docs/0.10/">Flask</a>: A microframework for Python web development. <br>

# Thanks!