# **GREEDY ALGORITHMS LAB**


---





First things first, let's make a quick recap of the main properties we've seen in the slides:

### **Greedy-choice property**

The first key ingredient is the greedy-choice property: we can assemble a globally
optimal solution by making locally optimal (greedy) choices. 

In other words, when
we are considering which choice to make, we make the choice that looks best in
the current problem, without considering results from subproblems.

### **We design greedy algorithms according to the following sequence of steps:**

  1. Cast the optimization problem as one in which we make a choice and are left
with one subproblem to solve.
  2. Prove that there is always an optimal solution to the original problem that makes
the greedy choice, so that the greedy choice is always safe.
  3. Demonstrate optimal substructure by showing that, having made the greedy
choice, what remains is a subproblem with the property that if we combine an
optimal solution to the subproblem with the greedy choice we have made, we
arrive at an optimal solution to the original problem.

###  **Do Greedy Algorithms always work?**

The answer is **NO**, of course.

So when does it?

And when it doesn't?

It depends on the problem at hand. Let's see...

---

## **Knapsack problem**




A thief robbing Casas Pedro finds $n$ containers. 
The items in the $i$th container worth $v_i$ reais and weigh $w_i$
kilograms, where $v_i$ and $w_i$ are integers.

The thief wants to take as valuable a load as possible, but he can carry at most $W$ 
kilograms in his knapsack, for some integer $W$.

Which items should he take?

Let's assume the items are (a) nuts, (b) dried grapes, and (c) manioc flour and that:

(a) is worth 60 reais and weighs 10 kg;

(b) is worth 100 reais and weighs 20 kg;

(c) is worth 120 reais and weighs 30 kg.

Let's also assume the knapsack can store no more than 40 kg.

What's the optimal theft?

![title](Fig1.PNG)

To solve the fractional problem, we first compute the value per kilogram $v_i/w_i$ for each item.

Obeying a greedy strategy, the thief begins by taking as much as possible of
the item with the greatest value per kg. If the supply of that item is exhausted
and he can still carry more, he takes as much as possible of the item with the next greatest value per kg, and so forth, until he reaches his weight limit $W$.


**Now it's your turn!**

Implement in the next cell an algorithm that correctly finds the optimal theft.

In [12]:
items = [(10,60),(20,100),(30,120)]
sack = 40

items = [(10,60),(20,100),(30,120)]
sack = 40

def solve_knapsack (items,sack):
    lista = []
    knapsack = []
    totalValue = 0
    for item in items:
        newItem = [item[0], item[1], item[1]/item[0]]
        lista.append(newItem)
    lista.sort(key=lambda x: x[2],reverse=True)
    i=0
    while (sack>0 and i<len(lista)):
        cabe = sack/lista[i][0]
        if cabe>=1:
            knapsack.append(lista[i])
            sack-=lista[i][0]
            totalValue+=lista[i][1]     
        else:
            input = [lista[i][0]*cabe,lista[i][1]*cabe,lista[i][2]]
            knapsack.append(input)
            sack-=input[0]
            totalValue+=input[1]   
        i+=1
    return knapsack, totalValue
    

print ('Theft max value is: ', solve_knapsack (items,sack)[1])


Theft max value is:  200.0


If your algorithm got it right, the answer should be:

10 kg of (a), worthing 60 reais + 
20 kg of (b), worthing 100 reais +
10 kg of (c), worthing 40 reais,

with a theft total worth of 200 reais.

![title](Fig16.0.png)

But what if we had a slightly different problem?

The problem we solved is known as the 'Fractional Knapsack Problem'. We shall now examine the **'0-1 Knapsack Problem'**:



Imagine the containers were all locked and impossible to be opened. (Or likewise, suppose the items to be undivisable.)

If the thief has to choose whether to take or to leave the whole container for each item, what would be the optimal solution?

Think, and write the maximum value in the next cell.

In [8]:
max_value = # Your answer here

SyntaxError: invalid syntax (<ipython-input-8-062260ef3dfa>, line 1)

In [None]:
n = int(10**(10**(10**(10**(10-10))-10+10/10)-10+10/10))
correct = int(n*(n-1)*(n/5))
if max_value == correct:
    print('Está correto.')
else:
    print('A resposta correta é ' + str(correct) + '.')

As you can see, this optimal theft is quite different from the optimal in the original problem.

The question is: *'Can you write a greedy algorithm that solves this second problem?'*

In [None]:
items = [(10,60),(20,100),(30,120)]
sack = 40

def solve_second_knapsack (items,sack):
  #Your code here


Hope you didn't try it for too long.

The answer is *'No, you can't.'*

Not all problems can be solved by greedy algorithms, as we stated before, and here is one example that it sometimes may be difficult to devise whether the problem at hand can or cannot. 

Now, let's see a different aplication:


## **Huffman codes**


---



Let's implement the awesome Huffman coding algorithm! The objective here is to pass a string to bits compressing the data without losing any information.

### Overview:
- Choose a text to compress;
- Calculate the initial amount of bits (we will use this to see the size of the improvement);
- Create a dictionary with each character as a key and the frequency as value (e.g: {'A':3,'B':2})
- Implement the Huffman algorithm (don't worry, we will provide some guidance)


Here is some code to turn your string into bits. After converting your text, just calculate the length.

In [1]:
# Code from:
# https://stackoverflow.com/questions/10237926/convert-string-to-list-of-bits-and-viceversa/41892777

def tobits(s):
    result = []
    for c in s:
        bits = bin(ord(c))[2:]
        bits = '00000000'[len(bits):] + bits
        result.extend([int(b) for b in bits])
    return result

test = 's'
print(tobits(test))
len(tobits(test))

[0, 1, 1, 1, 0, 0, 1, 1]


8

Create a dictionary containing the frequency of each symbol that appears in your string.

In [2]:
test = 'Try any text you want'
# Write your code here. Your output should be something like
# {'a': 2, 'y': 3,...}

def frequency_dict(text):
    all_freq = {} 

    for i in text: 
        if i in all_freq: 
            all_freq[i] += 1
        else: 
            all_freq[i] = 1
    return all_freq

frequency_dict(test)

{'T': 1,
 'r': 1,
 'y': 3,
 ' ': 4,
 'a': 2,
 'n': 2,
 't': 3,
 'e': 1,
 'x': 1,
 'o': 1,
 'u': 1,
 'w': 1}

In [3]:
# https://gist.github.com/mreid/fdf6353ec39d050e972b
# Example Huffman coding implementation
# Distributions are represented as dictionaries of { 'symbol': probability }
# Codes are dictionaries too: { 'symbol': 'codeword' }

def huffman(p):

    # Base case of only two symbols, assign 0 or 1 arbitrarily
    if(len(p) == 2):
        return dict(zip(p.keys(), ['0', '1']))

    # Create a new distribution by merging lowest prob. pair
    p_prime = p.copy()
    a1, a2 = lowest_prob_pair(p)
    p1, p2 = p_prime.pop(a1), p_prime.pop(a2)
    p_prime[a1 + a2] = p1 + p2

    # Recurse and construct code on new distribution
    c = huffman(p_prime)
    ca1a2 = c.pop(a1 + a2)
    c[a1], c[a2] = ca1a2 + '0', ca1a2 + '1'

    return c

def lowest_prob_pair(p):
    '''Return pair of symbols from distribution p with lowest probabilities.'''
    assert(len(p) >= 2) # Ensure there are at least 2 symbols in the dist.

#     sorted_p = sorted(p.items(), key=lambda (i,pi): pi)
    sorted_p = sorted(p.items(), key=lambda x: x[1])
    return sorted_p[0][0], sorted_p[1][0]

# Example execution

ex1 = { 'a': 10, 'e': 15, 'i': 12, 's': 3, 't': 4, 'p':13 ,'n':1 }
huffman(ex1)

{'e': '10',
 'i': '00',
 'p': '01',
 'a': '111',
 't': '1100',
 'n': '11010',
 's': '11011'}

#### Calculating the compression

Now that your Huffman coding algorithm is working, write a simple function to calculate the new
total number of bits.

In [11]:
# Implement function that calculates the number of bits
# after running the Huffman coding algorithm
import numpy as np
test = """
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
"""

def new_size_huffman(all_freq):
    huffman(all_freq)
    bits = 0
    for i in all_freq:
        bits += len(huffman(all_freq)[i])*all_freq[i]
    return bits

print('Compressed total of bits   = ',new_size_huffman(frequency_dict(test)))
print('Uncompressed total of bits = ', len(tobits(test)))
print('Ratio = ', np.round(new_size_huffman(frequency_dict(test))/len(tobits(test)),decimals=2))

Compressed total of bits   =  2554
Uncompressed total of bits =  4608
Ratio =  0.55


 # This is all!
 
 ----

# APPENDIX

Slides with the Huffman coding example

![title](Huffman1.png)

![title](Huffman3.png)