
# <center>Python - Recursive Algorithms - Practice Solutions <a class="tocSkip"></center>
# <center>QTM 350: Data Science Computing <a class="tocSkip"></center>    
# <center>Davi Moreira <a class="tocSkip"></center>

## Introduction <a class="tocSkip">
<hr>


This topic material is based on [Professor Mike Gelbart Algorithms and Data Structures course](https://github.com/UBC-MDS/DSCI_512_alg-data-struct). It was adapted for our purposes.

In [9]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from collections import defaultdict, Counter

## Exercise: tricky recursive code

Explain what the following code does, and how it works:

In [20]:
def f(letters, n):
    """
    Does something mysterious.

    Parameters
    ----------
    letters : str 
        ?????
    n : int 
        ?????

    Returns
    -------
    ??? 
        ?????   

    """

    if n == 0:
        return [""]

    return [letter + l for letter in letters for l in f(letters, n-1)]

In [23]:
f("QTM!", 1)

['Q', 'T', 'M', '!']

**Answer:**

The function `f` effectively uses recursion to build combinations by expanding each smaller combination from the base case upward, appending each character from the `letters` to previously formed combinations until combinations of the desired length `n` are completed. Let's break down how this function operates:

1. **Parameters**:
   - `letters`: a string containing distinct characters from which combinations are formed.
   - `n`: an integer representing the length of each combination to be generated.

2. **Return Value**:
   - The function returns a list of strings, each string being one of the possible combinations of characters from `letters` of length `n`.

**Recursive Functionality:**

1. **Base Case**:
   - When `n` is 0, the function returns a list containing an empty string (`[""]`). This base case is crucial as it provides the terminating condition for the recursion and acts as the starting point for constructing combinations.

2. **Recursive Case**:
   - When `n` is greater than 0, the function constructs combinations by:
     - Iterating over each character in `letters`.
     - For each character, it recursively calls itself with `n-1`, which generates all combinations of length `n-1` from the given `letters`.
     - It then prepends the current character to each combination obtained from the recursive call and collects these new combinations in a list.
   
   - This process uses list comprehension to iterate over each `letter` in `letters` and for each letter, iterate over each combination `l` generated by `f(letters, n-1)`, forming new combinations by concatenating `letter` with `l`.

**How It Works:**

Suppose `letters = "ab"` and `n = 2`. The function operates as follows:

- Call `f("ab", 2)`:
  - For `letter = 'a'`, it needs combinations from `f("ab", 1)`.
  - `f("ab", 1)` returns `['a', 'b']`.
  - It constructs `['a' + 'a', 'a' + 'b']` → `['aa', 'ab']`.
  - For `letter = 'b'`, it repeats with `f("ab", 1)`.
  - It constructs `['b' + 'a', 'b' + 'b']` → `['ba', 'bb']`.
  - Combines these to form `['aa', 'ab', 'ba', 'bb']`.

- Each level of recursion builds upon the results of the previous level, expanding each combination until the full length `n` is reached.


## Exercise: Factorio recipes

In the game [Factorio](https://www.factorio.com/), you build objects out of other objects. Let me load the data and then explain further:

In [24]:
recipe_df = pd.read_csv("data/recipes-factorio_2021-October.csv")
recipe_df.columns = ["output", "input", "quantity", "raw"]
recipe_df["raw"] = recipe_df["raw"].astype(bool)

In [25]:
recipe_df.head()

Unnamed: 0,output,input,quantity,raw
0,speed-module,electronic-circuit,5.0,False
1,speed-module-2,speed-module,4.0,False
2,speed-module-2,advanced-circuit,5.0,False
3,speed-module-2,processing-unit,5.0,False
4,speed-module-3,speed-module-2,5.0,False


Let's turn this into a dictionary of dictionaries for easier access:

In [26]:
def recipe_to_dict(recipe_df, item):
    z = recipe_df[recipe_df["output"] == item]
    return dict(zip(z["input"], z["quantity"]))

recipes = {item : recipe_to_dict(recipe_df, item) for item in recipe_df["output"]}

Let's look at building a _Roboport_ as an example.

In [27]:
recipes["roboport"]

{'steel-plate': 45.0, 'iron-gear-wheel': 45.0, 'advanced-circuit': 45.0}

What we see here is that, to construct a Roboport we need 45 advanced circuits, 45 iron gear wheels, and 45 steel plates. But how do you make advanced circuits? You need to make the advanced circuits out of other ingredients as well:

In [28]:
recipes["advanced-circuit"]

{'electronic-circuit': 2.0, 'plastic-bar': 2.0, 'copper-cable': 4.0}

This continues until we get to what the game considers "raw" ingredients, such as copper plates:

In [29]:
recipes["copper-cable"]

{'copper-plate': 0.5}

(Optional note: the "0.5" means that one copper plate is used to produce two copper cables; see [here](https://wiki.factorio.com/Copper_cable).)

It is useful to know the total raw ingredient requirement for an item when crafting things by hand, because you can quickly check whether you have those raw ingredients in your inventory.

We can see the full list of raw ingredients required to build a roboport by looking at the [Factorio Wiki's page on Roboport](https://wiki.factorio.com/Roboport):

![](img/roboport-factorio.png)

What this page shows is both the recipe or _immediate ingredients_ (advanced circuit, iron gear wheel, steel plate) that we saw earlier, as well as the total _raw ingredients_: 225 copper plates, 180 iron plates, 90 plastic bars, and 45 steel plate. Where did these numbers come from? Well... 1 roboport requires:

- 45 advanced circuits which requires
  - 90 electronic circuits which requires 
    - 270 copper cables which requires 
      - **135 copper plates** (raw)
    - and **90 iron plates** (raw)
  - **90 plastic bars** (raw)
  - 180 copper cables which requires 
    - **90 copper plates** (raw)
- 45 iron gear wheels requires
  - **90 iron plates** (raw)
- **45 steel plates** (raw)
  
Add up everything in bold and you get the 225 copper plates, 180 iron plates, 90 plastic bars and 45 steel plates.

It's a bit complicated to explain which of the ingredients are considered "raw" ingredients for the purpose of the game, but I will provide the set of items for you here:

In [31]:
raw_ingredients = (set(recipe_df["input"]) | set(recipe_df["output"])) - set(recipe_df[~recipe_df["raw"]]["output"])

**Your task is to write a recursive function that takes in the recipes, the set of raw ingredients, and the name of an item, and returns _the raw ingredients for that item_ as a dictionary.** Some tests are provided below.

**Answer:**

In [32]:
import pandas as pd

def recipe_to_dict(recipe_df, item):
    z = recipe_df[recipe_df["output"] == item]
    return dict(zip(z["input"], z["quantity"]))

# Recursive function to accumulate raw ingredients
def get_raw_ingredients(recipes, raw_ingredients, item, quantity=1):
    """
    Recursively finds all the raw ingredients required to craft an item.

    Parameters:
    ----------
    recipes : dict
        A dictionary where the keys are items and the values are dictionaries
        of the ingredients needed to make an item.
    raw_ingredients : set
        A set containing all raw ingredients in the recipes.
    item : str
        The item for which to find the raw ingredients.
    quantity : float, optional
        The quantity of the item to craft.

    Returns:
    -------
    dict
        A dictionary of raw ingredients and their total quantities needed to craft the item.
    """
    # Base case: if the item is a raw ingredient, return it with its required quantity
    if item in raw_ingredients:
        return {item: quantity}
    
    # Recursive case: if the item is made from other items, find the raw materials for each component
    raw_totals = {}
    for component, qty in recipes.get(item, {}).items():
        component_raw = get_raw_ingredients(recipes, raw_ingredients, component, quantity * qty)
        for raw_item, raw_qty in component_raw.items():
            if raw_item in raw_totals:
                raw_totals[raw_item] += raw_qty
            else:
                raw_totals[raw_item] = raw_qty
    return raw_totals

# Example usage:
recipe_df = pd.read_csv("data/recipes-factorio_2021-October.csv")
recipe_df.columns = ["output", "input", "quantity", "raw"]
recipe_df["raw"] = recipe_df["raw"].astype(bool)

recipes = {item: recipe_to_dict(recipe_df, item) for item in recipe_df["output"]}
raw_ingredients = (set(recipe_df["input"]) | set(recipe_df["output"])) - set(recipe_df[~recipe_df["raw"]]["output"])

# Assuming 'item_name' is the item for which you want to find the raw ingredients
item_name = 'speed-module-3'
raw_ingredients_for_item = get_raw_ingredients(recipes, raw_ingredients, item_name)
print(raw_ingredients_for_item)


{'iron-plate': 160.0, 'copper-plate': 300.0, 'plastic-bar': 60.0, 'processing-unit': 30.0}


In [33]:
assert get_raw_ingredients(recipes, raw_ingredients, 'roboport') == {
    'copper-plate' : 225,
    'iron-plate'   : 180,
    'plastic-bar'  : 90,
    'steel-plate'  : 45
}

In [34]:
assert get_raw_ingredients(recipes, raw_ingredients, 'copper-ore') == {
    'copper-ore' : 1
}

In [35]:
assert get_raw_ingredients(recipes, raw_ingredients, 'automation-science-pack') == {
    'copper-plate' : 1,
    'iron-plate'   : 2
}

![](img/red-science-pack.png)

In [36]:
# concrete is considered raw because it can't be crafted by hand
# but if we remove it from the set of raw ingredients we can break it down further
assert get_raw_ingredients(recipes, raw_ingredients-{'concrete'}, 'concrete') == {
    'iron-ore'    : 0.1,
    'stone-brick' : 0.5,
    'water'       : 10
}

In [37]:
assert get_raw_ingredients(recipes, raw_ingredients, 'satellite') == {
    'battery'         : 500,
    'copper-plate'    : 4787.5,
    'iron-plate'      : 1825,
    'plastic-bar'     : 500,
    'processing-unit' : 100,
    'rocket-fuel'     : 50,
    'steel-plate'     : 700
}

![](satellite-factorio.png)

## Exercise: Set implementation with BSTs

In this exercise, you will implement a set data structure based on a binary search tree. You will write the tree as a Python class. We are providing some starter code for you below. 

###

Implement a recursive function `insert` that takes a new element and inserts it into the tree. Your function should work by recursively calling `insert` on the left or right subtree depending on whether the new value is less than or greater than the tree's value, respectively. If the element is already in the tree, then the call to `insert` should do nothing. 

Hint: When inserting something into the tree, you should be creating a new `TreeSet` object with `TreeSet()`, then inserting the value into this newly created `TreeSet`, and then making sure this new `TreeSet` is stored in your current `TreeSet` as either `self.left` or `self.right`.

In [38]:
class TreeSet:
    """
    A set implementation based on a binary tree.
    """

    def __init__(self):
        self.value = None
        self.left = None
        self.right = None

    
    
    def contains(self, value):
        """
        Check to see if the binary tree has a certain value 

        Parameters
        ----------
        value : object
            the value to search for within the tree

        Returns
        -------
        bool 
            if contained in the tree returns True, otherwise False  

        Example
        --------
        >>> my_set = TreeSet() 
        >>> my_set.insert("Attempt") 
        >>> my_set.contains("Failure")
        False
        """
        if value == self.value:
            return True

        if value < self.value:
            if self.left is None:
                return False
            else:
                return self.left.contains(value)
        else:
            if self.right is None:
                return False
            else:
                return self.right.contains(value)

    def __str__(self, s=""):
        """
        A crude way to print the tree. A better way would be to print the tree by depth. 

        Note: __str__ is a special method, like __init__, that returns a string representation of an object.

        Parameters
        ----------
        s : str
           the starting string value. Default is empty string

        Returns
        -------
        str 
            aggregated items in the set

        Example
        --------
        >>> my_set = TreeSet() 
        >>> my_set.insert("Try")
        >>> my_set.insert("your")
        >>> my_set.insert("best")
        >>> print(my_set)
        Try, your, best,
        """

        if self.value is None:
            return "(An empty tree)"

        if self.left is not None:
            s += self.left.__str__()

        s += str(self.value) + ", "

        if self.right is not None:
            s += self.right.__str__()

        return s

**Answer:**

In [48]:
class TreeSet:
    """
    A set implementation based on a binary tree.
    """

    def __init__(self):
        self.value = None
        self.left = None
        self.right = None

    def insert(self, value):
        """
        Inserts a value into the tree.

        Parameters
        ----------
        value : object
            The value to insert into the tree.

        Returns
        -------
        None
        """
        if self.value is None:
            self.value = value
            return

        if self.value == value:
            return  # If the value is already in the tree, do nothing.

        if value < self.value:
            if self.left is None:
                self.left = TreeSet()  # Create a new subtree if necessary.
            self.left.insert(value)  # Recursively insert into the left subtree.
        else:
            if self.right is None:
                self.right = TreeSet()  # Create a new subtree if necessary.
            self.right.insert(value)  # Recursively insert into the right subtree.

    
    def contains(self, value):
        """
        Check to see if the binary tree has a certain value 

        Parameters
        ----------
        value : object
            the value to search for within the tree

        Returns
        -------
        bool 
            if contained in the tree returns True, otherwise False  

        Example
        --------
        >>> my_set = TreeSet() 
        >>> my_set.insert("Attempt") 
        >>> my_set.contains("Failure")
        False
        """
        if value == self.value:
            return True

        if value < self.value:
            if self.left is None:
                return False
            else:
                return self.left.contains(value)
        else:
            if self.right is None:
                return False
            else:
                return self.right.contains(value)

    def __str__(self, s=""):
        """
        A crude way to print the tree. A better way would be to print the tree by depth. 

        Note: __str__ is a special method, like __init__, that returns a string representation of an object.

        Parameters
        ----------
        s : str
           the starting string value. Default is empty string

        Returns
        -------
        str 
            aggregated items in the set

        Example
        --------
        >>> my_set = TreeSet() 
        >>> my_set.insert("Try")
        >>> my_set.insert("your")
        >>> my_set.insert("best")
        >>> print(my_set)
        Try, your, best,
        """

        if self.value is None:
            return "(An empty tree)"

        if self.left is not None:
            s += self.left.__str__()

        s += str(self.value) + ", "

        if self.right is not None:
            s += self.right.__str__()

        return s

In [43]:
my_set = TreeSet()
my_set.insert("today")
my_set.insert("hello")
my_set.insert("data science")
my_set.insert("jerry")
my_set.insert("apple")
my_set.insert("17")
my_set.insert("hello")
print(my_set)

17, apple, data science, hello, jerry, today, 


In [44]:
assert my_set.contains("data science")
assert my_set.contains("apple")
assert not my_set.contains("18")
assert not my_set.contains("blah")

In [45]:
my_set = TreeSet()
my_set.insert(3)
my_set.insert(5)
my_set.insert(10)
print(my_set)

3, 5, 10, 


###

In this topic, we empirically timed the searching operation using four approaches:

1. Linear search on an unsorted list
2. Binary search on an sorted list
3. Python's `in` operator on an unsorted list
4. `in` with Python's built-in `set`

Similar code to that from lecture, for just Python's `set`, is reproduced below for your convenience:

In [40]:
list_sizes = [100, 1000, 10_000, 100_000, 1_000_000]

results = defaultdict(list)
results["size"] = list_sizes

key = -1
x_all = np.random.randint(1e8, size=max(list_sizes))

for list_size in list_sizes:
    print('List size: ', list_size)
    x = x_all[:list_size]
    
    x_set = set(x)
    time = %timeit -q -o -r 1 (key in x_set)
    results["Python set in"].append(time.average)

List size:  100
List size:  1000
List size:  10000
List size:  100000
List size:  1000000


In [None]:
df = pd.DataFrame(results)
df

Empirically measure the speed of `contains` with your `TreeSet` implementation, and then add them to the DataFrame for printing. Print out the DataFrame.

(Note: for reasons of speed, we only go up to $n=10^6$ here. Populating the `TreeSet` objects with $10^7$ items would take a long time.

**Answer:**

In [46]:
from collections import defaultdict
import numpy as np
import pandas as pd
import timeit

# Assuming the TreeSet class implementation with the insert and contains methods is defined above.

list_sizes = [100, 1000, 10_000, 100_000, 1_000_000]

results = defaultdict(list)
results["size"] = list_sizes

key = -1
x_all = np.random.randint(1e8, size=max(list_sizes))

for list_size in list_sizes:
    print('List size: ', list_size)
    x = x_all[:list_size]
    
    # Time Python built-in set
    x_set = set(x)
    set_time = timeit.timeit(lambda: key in x_set, number=1)
    results["Python set in"].append(set_time)
    
    # Populate TreeSet and time the contains method
    x_tree_set = TreeSet()
    
    for num in x:
        x_tree_set.insert(num)
    
    tree_set_time = timeit.timeit(lambda: x_tree_set.contains(key), number=1)
    results["TreeSet contains"].append(tree_set_time)

# Convert results to DataFrame and print
df = pd.DataFrame(results)
print(df)


List size:  100
List size:  1000
List size:  10000
List size:  100000
List size:  1000000
      size  Python set in  TreeSet contains
0      100   1.207998e-06          0.000008
1     1000   4.579924e-07          0.000001
2    10000   2.500019e-07          0.000001
3   100000   3.749883e-07          0.000003
4  1000000   7.499912e-07          0.000003


###

Discuss your results from the previous part. How do Python's `set` and your `TreeSet` compare? Specifically:

- Which method is faster?
- What is the theoretical time complexity of `in` with a `set`, and `contains` with a `TreeSet`?
- Are the empirical results consistent with the theoretical time complexities?
- Are the results what you expected, overall?

**Answer:**

The results suggest the following:

- **Speed Comparison**:
  - For each list size from 100 to 1,000,000, Python's built-in `set` appears to perform faster than the `TreeSet`. The times for the built-in `set` are consistently in the order of 10^-7 seconds, whereas times for `TreeSet` are higher, particularly for larger sizes.

- **Theoretical Time Complexity**:
  - For a Python built-in `set`, which is implemented as a hash table, the average-case time complexity for the `in` operation is O(1).
  - For the `TreeSet` `contains` operation, if the tree is balanced, the average-case time complexity should be O(log n). However, if the tree is not balanced (which could be the case given random data insertion), the worst-case time complexity could degrade to O(n).

- **Consistency with Theoretical Time Complexities**:
  - The empirical results show that Python's `set` operations are faster, which is consistent with the average-case O(1) time complexity.
  - The `TreeSet` `contains` method does not seem to increase much as the size grows, which could suggest a logarithmic growth as expected from a balanced tree. However, the theoretical best performance of O(log n) would still be expected to be slower than the constant time complexity of the built-in `set`. This assumes the tree remains reasonably balanced, which may not always be the case with random data.
  
- **Expectations**:
  - Generally, one would expect Python's built-in `set` to be faster due to its highly optimized hash table implementation compared to a simple binary tree for `TreeSet`. Especially in large datasets, the difference in performance can be substantial.
  - The empirical results are somewhat expected. However, one might have anticipated a more noticeable degradation in performance for `TreeSet` with the largest list size if the tree became unbalanced. It's possible that the random data did not create the worst-case scenario, or that the Python overhead for recursion and object handling is a significant part of the measured time, masking the difference in time complexities.

Overall, these results indicate that while the `TreeSet` performs reasonably well, Python's built-in `set` is faster due to its O(1) lookup time, and the empirical results reflect this advantage. It's worth noting that these tests are measuring raw speed and do not account for other factors such as memory usage, where a `TreeSet` might have advantages in certain situations.

In [None]:
!jupyter nbconvert _04-py-recursive-data-structures-practice-solutions.ipynb --to html --template classic --output 04-py-recursive-data-structures-practice-solutions.html

# <center>Have fun!<a class="tocSkip"></center>