# DSCI 512 Lab 2

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from collections import defaultdict, Counter

### Instructions
rubric={mechanics:3}

Follow the [general lab instructions](https://ubc-mds.github.io/resources_pages/general_lab_instructions/).

## Exercise 1: time complexity of recursive functions

For each of the following recursive functions, determine the time complexity as a function of the input $n$ and briefly justify your answer. Assume $n$ is a positive integer.

#### 1.1
rubric={reasoning:3}

In [None]:
def titled(n):
    if n >= 0:
        print('n: ', n)
        return titled(n-1)
    else:
        return "sandwich"

In [None]:
titled(15)

#### 1.2
rubric={reasoning:3}

In [None]:
def untitled(n):
    if n < 0:
        return "sandwich"
    else:
        print('n: ', n)
        return untitled(n-2)

In [None]:
untitled(8)

#### 1.3
rubric={reasoning:3}

In [None]:
def does_nothing(n):
    print('n:', n)
    if n == 0:
        return
    does_nothing(n-1)
    does_nothing(n-1)

In [None]:
does_nothing(3)

#### 1.4
rubric={reasoning:3}

In [None]:
def does_nothing_more_slowly(n):
    print(n)
    if n == 0:
        return
    does_nothing_more_slowly(n-1)
    does_nothing_more_slowly(n-1)
    does_nothing_more_slowly(n-1)

In [None]:
does_nothing_more_slowly(3)

#### (challenging) 1.5
rubric={reasoning}

In [None]:
def looprec(n):
    print("Hello!")
    print('N: ', n)
    for i in range(n):
        looprec(n-1)

In [None]:
looprec(3)

#### (challenging) 1.6
rubric={reasoning}

In this exercise, determine the **space** complexity of `hello` in terms of $n$.

In [None]:
def hello(n):
    if n == 0:
        return 1
    return hello(n-1) + hello(n-1)

In [None]:
hello(4)

## Exercise 2: recursive sum
rubric={accuracy:3,quality:3}

Write a recursive function `rec_sum` that takes in a list of numbers and sums up the numbers in the list. If the list is empty, it should return `0`. No loops, `sum`, or numpy operations allowed! And, as usual, a docstring is required.

In [None]:
# An empty list
assert rec_sum([]) == 0

# A list with one element
assert rec_sum([32]) == 32

# A list with all positive numbers
assert rec_sum([1, 2, 3, 4, 5]) == 15

# A list with negative numbers
assert rec_sum([1, 2, 3, 4, -5]) == 5

## Exercise 3: recursive graphics
rubric={accuracy:1,reasoning:1}

In this exercise you will use recursion to draw the Sierpinski triangle. An image of one such triangle is shown below.

<img width="500" src="sierpinski_6_smaller.png">

To help you do this, we are providing some code in the cell below. The `draw_triangle` function draws a triangle for you. When you are done calling `draw_triangle` as many times as you wish, call `show_triangles` once to render the image nicely.

You do not need to understand how the code below works. You only need to understand how to use it. In other words, read the docstrings, but you don't need to read the code inside the functions.

In [None]:
def draw_triangle(x, y, side):
    """
    Draw an equilateral triangle at (x,y) with side length `side`.

    Parameters:
    -----------
    x : float
        the x-coordinate of the *midpoint* of the triangle base
    y : float
        the y-coordinate of the *base* of the triangle
    side : float
        the length of each side of the triangle
    """
    height = np.sqrt(3)*side/2
    plt.plot([x-side/2.0, x+side/2.0, x, x-side/2.0], [y, y, y+height, y], 'k')


def show_triangles(save=False):
    """
    Make the Sierpinski triangle image look pretty.

    Parameters:
    -----------
    save : bool, optional
        Whether or not to save the image to a file (default: False).
    """
    plt.gcf().set_size_inches(10, 8.6)
    plt.axis('scaled')
    plt.axis('off')
    plt.tick_params(labelbottom=False, labelleft=False)
    if save:
        plt.tight_layout()
        plt.savefig('sierpinski.png')
    plt.show()

draw_triangle(0, 0, 1)  # example: a single triangle (depth=0)
show_triangles()        # show the triangle

Another example is given below: a Sierpinski triangle with depth 1, drawn without using recursion but just by calling `draw_triangles` 3 times. The point of this is that we provide you with (most of) the geometry, so you can focus on recursion and be less likely to get stuck on the geometry aspects.

In [None]:
draw_triangle(-0.25, 0, 0.5)
draw_triangle(+0.25, 0, 0.5)
draw_triangle(0, (0.5*(np.sqrt(3)/2)), 0.5)
show_triangles()

Your tasks are as follows:

1. Write a recursive function `sierpinski` that takes four arguments: the coordinates `x` and `y`, the side length of the outermost triangle, `size`, and the depth `n`. Then, use your function to reproduce the figure above of the Sierpinski triangle with depth 6. Note: your code should only call `show_triangles` once, outside the recursive function (**not** within the recursive function)

2. What is the big-O running time of your code, as a function of $n$?

## Exercise 4: tricky recursive code
rubric={reasoning:1}

Explain what the following code does, and how it works:

In [None]:
def f(letters, n):
    """
    Does something mysterious.

    Parameters
    ----------
    letters : str 
        ?????
    n : int 
        ?????

    Returns
    -------
    ??? 
        ?????   

    """

    if n == 0:
        return [""]

    return [letter + l for letter in letters for l in f(letters, n-1)]

In [None]:
f("MDS!", 1)

## (optional) Exercise 5: Factorio recipes
rubric={reasoning}

[Factorio](https://www.factorio.com/) is a popular online strategic game in which you build and maintain factories. In this game, objects are built out of other objects. Let's load the data and then explain further:

In [None]:
recipe_df = pd.read_csv("recipes-factorio.csv")
recipe_df.columns = ["output", "input", "quantity", "raw"]
recipe_df["raw"] = recipe_df["raw"].astype(bool)

In [None]:
recipe_df.head()

I'll turn this into a dictionary of dictionaries for easier access:

In [None]:
def recipe_to_dict(recipe_df, item):
    z = recipe_df[recipe_df["output"] == item]
    return dict(zip(z["input"], z["quantity"]))

recipes = {item : recipe_to_dict(recipe_df, item) for item in recipe_df["output"]}

Let's look at building a _Roboport_ as an example.

In [None]:
recipes["roboport"]

What we see here is that, to construct a Roboport we need 45 advanced circuits, 45 iron gear wheels, and 45 steel plates. But how do you make advanced circuits? You need to make the advanced circuits out of other ingredients as well:

In [None]:
recipes["advanced-circuit"]

This continues until we get to what the game considers "raw" ingredients, such as copper plates:

In [None]:
recipes["copper-cable"]

(Optional note: the "0.5" means that one copper plate is used to produce two copper cables; see [here](https://wiki.factorio.com/Copper_cable).)

It is useful to know the total raw ingredient requirement for an item when crafting things by hand, because you can quickly check whether you have those raw ingredients in your inventory.

We can see the full list of raw ingredients required to build a roboport by looking at the [Factorio Wiki's page on Roboport](https://wiki.factorio.com/Roboport):

![](roboport-factorio.png)

What this page shows is both the recipe or _immediate ingredients_ (advanced circuit, iron gear wheel, steel plate) that we saw earlier, as well as the total _raw ingredients_: 225 copper plates, 180 iron plates, 90 plastic bars, and 45 steel plate. Where did these numbers come from? Well... 1 roboport requires:

- 45 advanced circuits which requires
  - 90 electronic circuits which requires 
    - 270 copper cables which requires 
      - **135 copper plates** (raw)
    - and **90 iron plates** (raw)
  - **90 plastic bars** (raw)
  - 180 copper cables which requires 
    - **90 copper plates** (raw)
- 45 iron gear wheels requires
  - **90 iron plates** (raw)
- **45 steel plates** (raw)
  
Add up everything in bold and you get the 225 copper plates, 180 iron plates, 90 plastic bars and 45 steel plates.

It's a bit complicated to explain which of the ingredients are considered "raw" ingredients for the purpose of the game, but I will provide the set of items for you here:

In [None]:
raw_ingredients = (set(recipe_df["input"]) | set(recipe_df["output"])) - set(recipe_df[~recipe_df["raw"]]["output"])

**Your task:**

Write a recursive function that takes in the recipes, the set of raw ingredients, and the name of an item, and returns _the raw ingredients for that item_ as a dictionary. Some tests are provided below.

In [None]:
assert get_raw_ingredients(recipes, raw_ingredients, 'roboport') == {
    'copper-plate' : 225,
    'iron-plate'   : 180,
    'plastic-bar'  : 90,
    'steel-plate'  : 45
}

In [None]:
assert get_raw_ingredients(recipes, raw_ingredients, 'copper-ore') == {
    'copper-ore' : 1
}

In [None]:
assert get_raw_ingredients(recipes, raw_ingredients, 'automation-science-pack') == {
    'copper-plate' : 1,
    'iron-plate'   : 2
}

![](red-science-pack.png)

In [None]:
# concrete is considered raw because it can't be crafted by hand
# but if we remove it from the set of raw ingredients we can break it down further
assert get_raw_ingredients(recipes, raw_ingredients-{'concrete'}, 'concrete') == {
    'iron-ore'    : 0.1,
    'stone-brick' : 0.5,
    'water'       : 10
}

In [None]:
assert get_raw_ingredients(recipes, raw_ingredients, 'satellite') == {
    'battery'         : 500,
    'copper-plate'    : 4787.5,
    'iron-plate'      : 1825,
    'plastic-bar'     : 500,
    'processing-unit' : 100,
    'rocket-fuel'     : 50,
    'steel-plate'     : 700
}

![](satellite-factorio.png)

## Exercise 6: Set implementation with BSTs

In this exercise, you will implement a set data structure based on a binary search tree. You will write the tree as a Python class. We are providing some starter code for you below. 

#### 6(a)
rubric={accuracy:4,quality:4}

Implement a recursive method `insert` that takes a new element and inserts it into the tree. Your function should work by recursively calling `insert` on the left or right subtree depending on whether the new value is less than or greater than the tree's value, respectively. If the element is already in the tree, then the call to `insert` should do nothing. 

Hint: When inserting something into the tree, you should be creating a new `TreeSet` object with `TreeSet()`, then inserting the value into this newly created `TreeSet`, and then making sure this new `TreeSet` is stored in your current `TreeSet` as either `self.left` or `self.right`.

In [None]:
class TreeSet:
    """
    A set implementation based on a binary tree.
    """

    def __init__(self):
        self.value = None
        self.left = None
        self.right = None

    
    
    def contains(self, value):
        """
        Check to see if the binary tree has a certain value 

        Parameters
        ----------
        value : object
            the value to search for within the tree

        Returns
        -------
        bool 
            if contained in the tree returns True, otherwise False  

        Example
        --------
        >>> my_set = TreeSet() 
        >>> my_set.insert("Attempt") 
        >>> my_set.contains("Failure")
        False
        """
        if value == self.value:
            return True

        if value < self.value:
            if self.left is None:
                return False
            else:
                return self.left.contains(value)
        else:
            if self.right is None:
                return False
            else:
                return self.right.contains(value)

    def __str__(self, s=""):
        """
        A crude way to print the tree. A better way would be to print the tree by depth. 

        Note: __str__ is a special method, like __init__, that returns a string representation of an object.

        Parameters
        ----------
        s : str
           the starting string value. Default is empty string

        Returns
        -------
        str 
            aggregated items in the set

        Example
        --------
        >>> my_set = TreeSet() 
        >>> my_set.insert("Try")
        >>> my_set.insert("your")
        >>> my_set.insert("best")
        >>> print(my_set)
        Try, your, best,
        """

        if self.value is None:
            return "(An empty tree)"

        if self.left is not None:
            s += self.left.__str__()

        s += str(self.value) + ", "

        if self.right is not None:
            s += self.right.__str__()

        return s

In [None]:
my_set = TreeSet()
my_set.insert("today")
my_set.insert("hello")
my_set.insert("data science")
my_set.insert("jerry")
my_set.insert("apple")
my_set.insert("17")
my_set.insert("hello")
print(my_set)

In [None]:
assert my_set.contains("data science")
assert my_set.contains("apple")
assert not my_set.contains("18")
assert not my_set.contains("blah")

In [None]:
my_set = TreeSet()
my_set.insert(3)
my_set.insert(5)
my_set.insert(10)
print(my_set)

#### 6(b)
rubric={accuracy:3}

In lecture 2, we empirically timed the searching operation using four approaches:

1. Linear search on an unsorted list
2. Binary search on an sorted list
3. Python's `in` operator on an unsorted list
4. `in` with Python's built-in `set`

Similar code to that from lecture, for just Python's `set`, is reproduced below for your convenience:

In [None]:
list_sizes = [100, 1000, 10_000, 100_000, 1_000_000]

results = defaultdict(list)
results["size"] = list_sizes

key = -1
x_all = np.random.randint(1e8, size=max(list_sizes))

for list_size in list_sizes:
    print('List size: ', list_size)
    x = x_all[:list_size]
    
    x_set = set(x)
    time = %timeit -q -o -r 1 (key in x_set)
    results["Python set in"].append(time.average)

In [None]:
df = pd.DataFrame(results)
df

Empirically measure the speed of `contains` with your `TreeSet` implementation, and then add them to the DataFrame for printing. Print out the DataFrame.

(Note: for reasons of speed, we only go up to $n=10^6$ here. Populating the `TreeSet` objects with $10^7$ items would take a long time.

#### 6(c)
rubric={reasoning:5}

Discuss your results from the previous part. How do Python's `set` and your `TreeSet` compare? Specifically:

- Which method is faster?
- What is the theoretical time complexity of `in` with a `set`, and `contains` with a `TreeSet`?
- Are the empirical results consistent with the theoretical time complexities?
- Are the results what you expected, overall?

#### (optional) 6(d)
rubric={reasoning:1}

Now, also time the `insert` function from `TreeSet` and compare it to the speed of `add` from Python's `set`. This time, $n$ is **not** the number of elements we are inserting. Rather, you are measuring the speed of inserting _one_ value into the set, and $n$ is the current size of the set before insertion. 

Note: you'll have to be a bit careful setting this up. If you repeatedly insert the same value into the set, is the experiment valid?

Discuss the results.