## Basic sorting algorithms
----

### Exercise: Selection Sort
Write the function ```SelectionSort(coll)``` that returns a sorted list with the elements in *coll*. 
You have to implements Selection Sort algorithm.

In [None]:
## Your implementation here!!!

def SelectionSort(coll):
  for i in range(len(coll)):
    ind = i
    for j in range(i+1,len(coll)): 
      if coll[j] < coll[ind]:
        ind = j
    coll[i],coll[ind] = coll[ind],coll[i]
  return coll




SelectionSort([3,2,5,1])

[1, 2, 3, 5]

In [None]:
## Check correctdness your implementation!
def test_sortedness(my_list):
    return my_list == sorted(my_list)

my_list = list(range(10))[::-1]

print(SelectionSort(my_list))

assert test_sortedness( SelectionSort(my_list) ), "Must be increasing!"

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


### Exercise: Insertion Sort
Write the function ```InsertionSort(coll)``` that returns a sorted list with the elements in *coll*. 
You have to implements Insertion Sort algorithm.

In [None]:
## Your implementation here!!!

def InsertionSort(coll):
  for i in range(1,len(coll)):
    key = coll[i]
    j = i-1
    while j >= 0 and coll[j] > key:
      coll[j+1] = coll[j]
      j -= 1
    coll[j+1] = key 
  return coll


print(InsertionSort([9,8,7,6,5,4,3,2,1,0]))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [None]:
## Check correctdness your implementation!

my_list = list(range(10))[::-1]

print(InsertionSort(my_list))

assert test_sortedness( InsertionSort(my_list) ), "Must be increasing!"

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


### Comparators

You have learned that many sorting are based on comparison. 
They obtain a ordered sequence by comparing elements. 

It's often very useful to define our own way to compare elements. Any comparator that implies a total order 
is a good one. 

For example, assume you have a list of tuple. Each tuple stores information about a person. 
If you sort this list, the final ordering is *"lexicographic"* one. First we compare the first component, 
then the second component for tuples with the same first component, and so on.
 
However, you may want impose your own way to order. For example, sort person by name, then increasingly by age, and so on. 

This is possible by implementing your own comparator and let ```.sort()``` and ```sorted()``` to use it.

### How? 
You know that comparison-based algorithms sort a sequence by comparing pairs of elements. 
Thus, a comparator is a function that takes two elements, say a and b, and compare them.

The result of a comparison is a value smaller than $0$, if a must precede b in the ordering. 
The result is larger than $0$, if b must precede a. The result is $0$, if we do not care.

For example, we can use the following comparator to sort numbers in reverse order.

In [None]:
def my_cmp(a, b): 
    return b-a # a is before if larger that b

To use our own comparator with ```.sort()``` and ```sorted()```, we have to use ```functools.cmp_to_key(cmp)```function. This converts our comparator to a function that can be used as a argument for parameter ```key```. 


In [None]:
import functools

print(sorted(list(range(10)), key=functools.cmp_to_key(my_cmp)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]



------
### Exercise: Strange orderings
Given a list, write and test comparators to obtain the following orderings:
- Even number precede odd ones. Even numbers are sorted in non-decreasing  order while odd ones are sorted in non-increasing order.
- Strings are sorted in non-increasing order based on their lengths. Strings having the same length are sorted in non-increasing lexicographic order. 

In [None]:
my_list = list(range(10))
my_list2 = ["a", "b", "aba", "cad", "zzzz", "aaaa"]

In [None]:
def my_cmp1(a,b):
  if a%2 == 0:
    if b % 2 == 0:
      return a-b   # B is before if smaller than A, else A is before
    else:
      return - 1
  else:
    if b % 2 == 0:
      return 1
    else:
      return b-a  # A is before if larger than B, else B is before

print(sorted(my_list,key=functools.cmp_to_key(my_cmp1)))

[0, 2, 4, 6, 8, 9, 7, 5, 3, 1]


In [None]:
def my_cmp2(a,b):
  if len(a)== len(b):
    if a<b:
      return 1
    else:
      return -1
  else:
    return len(b)-len(a)     # A is before if larger than B, else B is before

print(sorted(my_list2,key=functools.cmp_to_key(my_cmp2)))

['zzzz', 'aaaa', 'cad', 'aba', 'b', 'a']


### Exercise: Insertion Sort with a comparator
Write the function ```InsertionSort(coll, cmp)``` that returns a sorted list with the elements in *coll* using 
```cmp```as a comparator.

In [None]:
## Your implementation here!!!

def cmp(a,b):
  return a-b


my_list = [3,8,5,2,1,23,21,13]


def InsertionSort(A, cmp):
    for i in range(len(A)):
        key = A[i]
        j = i-1
        while j >= 0 and cmp(key, A[j])<0:   #  key - A[j] < 0 ---> A[j] > key
            A[j+1] = A[j]
            j -= 1
        A[j+1] = key
    return A  

print(InsertionSort(my_list,cmp))


[1, 2, 3, 5, 8, 13, 21, 23]


In [None]:
## Test here your implementation by using comparators from previous exercise.

def test_sortedness(my_list, cmp):
    return InsertionSort(my_list, cmp) == sorted(my_list, key = functools.cmp_to_key(cmp))          

assert test_sortedness(my_list, my_cmp1), "Must be sorted"
assert test_sortedness(my_list2, my_cmp2), "Must be sorted"

-----

### Exercise: Intersection of two lists
Write a function ```intersection_slow(l1, l2)``` which returns the intersection of the two lists l1 and l2.

Use the trivial algorithms that runs in $\Theta(|l1|\times|l2|)$. 

In [None]:
## Your implementation here!!!
def intersection_slow(l1,l2):
  l = []
  for i in l1:
    for j in l2:
      if i == j:
        l.append(i)
        break
  return l


In [None]:
## Test here your implementation 

l1 = [3, 5, 1, 2]
l2 = [1, 4, 6, 2]

assert set(intersection_slow(l1, l2)) == set([1, 2]), "Urca"

----
### Exercise: Faster intersection of two lists
Write a function ```intersection(l1, l2)``` which returns the intersection of the two lists l1 and l2.

Assume that both l1 and l2 are sorted!

In [None]:
## Your implementation here!!!

def intersection(l1,l2):
  l = []
  i = 0
  j = 0
  while i<len(l1) and j<len(l2):
    if l1[i] < l2[j]:
      i += 1
    elif l1[i] > l2[j]:
      j += 1
    else:
      l.append(l1[i])
      i += 1
      j += 1
  return l

l1 = sorted([3, 5, 1, 2,16])
l2 = sorted([1, 4, 6, 2,12,11,14,16])

print(intersection(l1,l2))


[1, 2, 16]


In [None]:
## Test here your implementation 

l1 = sorted([3, 5, 1, 2])
l2 = sorted([1, 4, 6, 2])

assert set(intersection(l1, l2)) == set([1, 2]), "Urca"

----
### Exercise: You own search engine
You are given a collection of texts and you want to build your own search engine, people at Google are already very scared!

Modern search engines are based on a data structure called *Inverted Index*. 

Each document of the collection is assigned an identifier, starting from 0.
An inverted index stores a list, called *inverted list*, for each term of the collection.
The list for a term *t* contains the identifiers of all the documents containing term *t*. The list is sorted.

For example,

````
C = ["dog cat elephant monkey",  "dog lion tiger", "fish dog dog cat cow"]

````

The list of term *cat* is [0,2], the list of *elephant* is [0].

Given two terms, an AND query reports all the documents containing both terms. For example, 
*query("cat", "dog"), the result is [0, 2].

You goal is to implement a simple search engine. Do the following. 

- Given the collection, build a dictionary that maps each term to its inverted list. Observe that 
each document occurs at most once in each list. 
- Implement a function *query* which answers an AND query. 

In [None]:
## Your implementation here!!!

C = ["dog cat elephant monkey",  "dog lion tiger", "fish dog dog cat cow"]


def build_index(C):
  index = {}
  for i,sent in enumerate(C):
    for word in sent.split():
      if word not in index:      # if the word not yet in the dictionary, add the word as key
        index[word] = []
      if i not in index[word]:  # if the index not yet in the value of the key, then add it in the values of that specific key
        index[word].append(i)
  return index

index = build_index(C)

def query(index, t1, t2):
    return [x for x in index[t1] if x in index[t2]]

print(build_index(C))
print(query(index,"cat","dog"))

{'dog': [0, 1, 2], 'cat': [0, 2], 'elephant': [0], 'monkey': [0], 'lion': [1], 'tiger': [1], 'fish': [2], 'cow': [2]}
[0, 2]


In [None]:
## Test here your implementation 

C = ["dog cat elephant monkey",  "dog lion tiger", "fish dog dog cat cow"]

index = build_index(C)
assert query(index, "cat", "dog") == [0, 2], "Urca"