## Basic sorting algorithms
----

### Exercise: Selection Sort
Write the function ```SelectionSort(coll)``` that returns a sorted list with the elements in *coll*. 
You have to implements Selection Sort algorithm.

In [1]:
#my_list #random generated #range(1 to
# 100) \ len = 10
from random import randint, seed

In [2]:
## Your implementation here!!!

def SelectionSort(A):
    for i in range(len(A)):
        min_pos = i
        for j in range(i+1, len(A)):
            if A[j] < A[min_pos]:
                min_pos = j
        A[i], A[min_pos] = A[min_pos], A[i]
    return A

In [3]:
## Check correctdness your implementation!

#--------------
my_list= []
for _ in range(10):
    my_list.append(randint(1,100))    
#--------------

def test_sortedness(my_list):
    return my_list == sorted(my_list)

print("original one:", my_list)
print("algo one:", SelectionSort(my_list))

assert test_sortedness( SelectionSort(my_list) ), "Must be increasing!"

original one: [53, 42, 24, 7, 14, 8, 46, 71, 71, 58]
algo one: [7, 8, 14, 24, 42, 46, 53, 58, 71, 71]


In [4]:
## Your implementation here!!!

def InsertionSort(A):
    
    for actual_pos in range(1, len(A)):
        key = A[actual_pos]
        prec_pos = actual_pos-1
        
        while prec_pos >= 0 and key < A[prec_pos]:
            A[prec_pos+1] = A[prec_pos]
            prec_pos -= 1
        A[prec_pos+1] = key
    return A

In [7]:
## Check correctdness your implementation!

#--------------
my_list= []
for _ in range(10):
    my_list.append(randint(1,100))    
#--------------

print("original one: {}".format(my_list))
print("algo one: {}".format(InsertionSort(my_list)))

assert test_sortedness( InsertionSort(my_list) ), "Must be increasing!"

original one: [56, 9, 41, 3, 16, 18, 89, 71, 9, 84]
algo one: [3, 9, 9, 16, 18, 41, 56, 71, 84, 89]


### Comparators

You have learned that many sorting are based on comparison. 
They obtain a ordered sequence by comparing elements. 

It's often very useful to define our own way to compare elements. Any comparator that implies a total order 
is a good one. 

For example, assume you have a list of tuple. Each tuple stores information about a person. 
If you sort this list, the final ordering is *"lexicographic"* one. First we compare the first component, 
then the second component for tuples with the same first component, and so on.
 
However, you may want impose your own way to order. For example, sort person by name, then increasingly by age, and so on. 

This is possible by implementing your own comparator and let ```.sort()``` and ```sorted()``` to use it.

### How? 
You know that comparison-based algorithms sort a sequence by comparing pairs of elements. 
Thus, a comparator is a function that takes two elements, say a and b, and compare them.

The result of a comparison is a value smaller than $0$, if a must precede b in the ordering. 
The result is larger than $0$, if b must precede a. The result is $0$, if we do not care.

For example, we can use the following comparator to sort numbers in reverse order.

In [8]:
def my_cmp(a, b):
    if a > b: return -1
    return 1

In [9]:
# shorter version
def my_cmp(a, b): 
    return b-a # a is before if larger that b

To use our own comparator with ```.sort()``` and ```sorted()```, we have to use ```functools.cmp_to_key(cmp)```function. This converts our comparator to a function that can be used as a argument for parameter ```key```. 


In [10]:
import functools

print( sorted(list(range(10)), key=functools.cmp_to_key(my_cmp)) )

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]



------
### Exercise: Strange orderings
Given a list, write and test comparators to obtain the following orderings:
- Even number precede odd ones. Even numbers are sorted in increasing  order while odd ones are sorted in decreasing order.
- Strings are sorted in non-increasing order based on their lengths. Strings having the same length are sorted in non-increasing lexicographic order. 

In [11]:
my_list = list(range(10))
my_list2 = ["a", "b", "aba", "cad", "zzzz", "aaaa"]

In [12]:
## Your implementation here!!!

#1 even number precedd odd ones
#1----------
def IsItEven(num):
    return num %2 == 0 

def my_cmp_numbers(a, b):
    if b %2 == 0: return 1
    return -1

print(sorted(my_list, key=functools.cmp_to_key(my_cmp_numbers)))
#1----------

#2strings are sorted
#2----------
def my_cmp_strings(a, b):
    if len(a) < len(b): 
        return 1
    if len(a) == len(b):
        if str(a) < str(b):
            return 1
    return -1
    
print(sorted(my_list2, key=functools.cmp_to_key(my_cmp_strings)))
#2----------

[0, 2, 4, 6, 8, 9, 7, 5, 3, 1]
['zzzz', 'aaaa', 'cad', 'aba', 'b', 'a']


### Exercise: Insertion Sort with a comparator
Write the function ```InsertionSort(coll, cmp)``` that returns a sorted list with the elements in *coll* using 
```cmp```as a comparator.

In [18]:
## Your implementation here!!!

def cmp(a,b):
    if a < b:
        return -1
    if a > b:
        return 1
    if a == b:
        return 0

def InsertionSortComp(coll, cmp):
    
    for actual_pos in range(1, len(coll)):
        key = coll[actual_pos]
        prec_pos = actual_pos-1
        
        while prec_pos <= 0 and cmp(key, coll[prec_pos]) < 0 :
            coll[prec_pos+1] = coll[prec_pos]
            prec_pos -= 1
        coll[prec_pos+1] = coll[prec_pos]
    return coll

In [19]:
## Test here your implementation by using comparators from previous exercise.

def test_sortedness(my_list, cmp):
    return InsertionSortComp(my_list, cmp) == sorted(my_list, key = functools.cmp_to_key(cmp))

assert test_sortedness(my_list, cmp), "Must be sorted"
assert test_sortedness(my_list2, cmp), "Must be sorted"

-----

### Exercise: Intersection of two lists
Write a function ```intersection_slow(l1, l2)``` which returns the intersection of the two lists l1 and l2.

Use the trivial algorithms that runs in $\Theta(|l1|\times|l2|)$. 

In [107]:
## Your implementation here!!!

def intersection_slow(l1,l2):
    l3 = []
    for i in range(len(l1)):
        for j in range(len(l2)):
            if l1[i] == l2[j]:
                if l2[j] not in l3:
                    l3.append(l1[i])
                else:
                    break
    return l3

In [90]:
## Test here your implementation 

l1 = [3, 5, 1, 2]
l2 = [1, 4, 6, 2]

print(intersection_slow(l1,l2))

assert set(intersection_slow(l1, l2)) == set([1, 2]), "Urca"

[1, 2, 2]


----
### Exercise: Faster intersection of two lists
Write a function ```intersection(l1, l2)``` which returns the intersection of the two lists l1 and l2.

Assume that both l1 and l2 are sorted!

In [113]:
## Your implementation here!!!

def intersection_fast(l1,l2):
    l1 = sorted(l1)
    l2 = sorted(l2)
    l3 = list()
    
    i = 0
    j = 0
    
    while i < len(l1) and j < len(l2):
        if l1[i] == l2[j]:
            l3.append(l1[i])
            i+= 1
            j+=1
            
        elif l1[i] < l2[j]:
            i+=1
            
        else:
            j+=1
        
    return l3

In [114]:
## Test here your implementation 

l1 = sorted([3, 5, 1, 2])
l2 = sorted([1, 4, 6, 2])

print(intersection_fast(l1,l2))
assert set(intersection_fast(l1, l2)) == set([1, 2]), "Urca"

[1, 2]


----
### Exercise: You own search engine
You are given a collection of texts and you want to build your own search engine, people at Google are already very scared!

Modern search engines are based on a data structure called *Inverted Index*. 

Each document of the collection is assigned an identifier, starting from 0.
An inverted index stores a list, called *inverted list*, for each term of the collection.
The list for a term *t* contains the identifiers of all the documents containing term *t*. The list is sorted.

For example,

````
C = ["dog cat elephant monkey",  "dog lion tiger", "fish dog dog cat cow"]

````

The list of term *cat* is [0,2], the list of *elephant* is [0].

Given two terms, an AND query reports all the documents containing both terms. For example, 
*query("cat", "dog"), the result is [0, 2].

You goal is to implement a simple search engine. Do the following. 

- Given the collection, build a dictionary that maps each term to its inverted list. Observe that 
each document occurs at most once in each list. 
- Implement a function *query* which answers an AND query. 

In [115]:
## Your implementation here!!!

def build_index(C):
    index = {}

    for doc_id, doc in enumerate(C):
        terms = doc.split()
        for term in terms:
            if term in index:
                index[term].append(doc_id)
            else:
                index[term] = [doc_id]
                
    return index

def query(index, t1, t2):
    l1 = index[t1]
    l2 = index[t2]
    return intersection_fast(l1,l2)

In [116]:
## Test here your implementation 

C = ["dog cat elephant monkey",  "dog lion tiger", "fish dog dog cat cow"]

index = build_index(C)
assert query(index, "cat", "dog") == [0, 2], "Urca"