# Algorithms and Data Structures with Python - Assignment 2

University of Amsterdam / Faculty of Economics and Business

Bachelor Business Analytics Y2 2024-2025

Author: [Julien Rossi](mailto:j.rossi@uva.nl)

* This assignment is made for auto-grading with CodeGrade in Canvas
* **DO NOT CHANGE COMMENTS** such as: `# CodeGrade Tag Question1`
* tampering with these comments will result in a diminished grade
* Answers have to be put in the one single cell that has such comments
* During debugging, you are free to use as many cells as you want
* At grading time, CodeGrade will only consider these few cells

# Reference Material
* ITP Week 1 for function definitions, type hinting
* ITP Week 2 for class and attributes, type hinting
* ITP Week 3 for dunder methods
* ADS Tutorial 1 for `@dataclass` and `field` usage
* ADS Lecture 2 for implementation of Stack
* Python general manual for list and string manipulation (insertion, slicing, iteration with `for`)
* Python general manual for iterating and branching (`for`, `continue`, `break`, `if`, `while`)

# Question 1 - Solving the 1x2x2 Rubik's Cube (50 points)

## Storyline
* You have a 1x2x2 Rubik's cube
* This [video](https://www.youtube.com/watch?v=eK96X6tQyD0) shows how to solve
* Read this [Wikihow](https://www.wikihow.com/Solve-a-2x2-Rubik%27s-Cube) page about the 2x2x2 that has also diagrams for faces and moves.

This is a solved cube:

<p align="left"> <img src=https://static.wikia.nocookie.net/speedsolving/images/8/8e/1x2x2.jpg/revision/latest?cb=20210709170236></p>

The 6 **FACES** are identified by letters. We watch a solved cube as indicated in the picture:
* `"F"` is the FRONT face (white when solved)
* `"U"` is the UP face (blue when solved)
* `"R"` is the RIGHT face (red when solved)
* `"L"` is the LEFT face (orange when solved)
* `"D"` is the DOWN face (green when solved)
* `"B"` is the BACK face (yellow when solved)

The **MOVES** you can make with the cube are annotated as follows:
* `"R"` turn the RIGHT face clockwise by a half-turn (this is the movement in the picture, the "Z" cube moves up)
* `"U"` turn the UP face clockwise by a half-turn
* `"L"` turn the LEFT face clockwise by a half-turn
* `"D"` turn the DOWN face clockwise by a half-turn

A **SCRAMBLE** is a suite of moves, such as `"RURLDU"`.

We represent a face by a 4-letter code (for faces with 4 tiles: FRONT, BACK) or a 2-letter code (for faces with 2 tiles).
* Each letter stands for a color: `"R" (red), "G" (green), "B" (blue), "Y" (yellow), "W" (white), "O" (orange)`
* Colors appear in this order: top-left, top-right, bottom-left, bottom-right
* In the solved state, the front face is encoded as `"WWWW"`
* At the end of the `"R"` move, the front face is encoded as `"WYWY"`
* If the `"Z"` tile was colored Red, the encoding after the move `"L"` would be `YWYR`

We represent a cube as a dictionary:
* Keys are 1-letter code for the faces (see above)
* values are 2- or 4-letter code for the colors

This dictionary can be turned into a single string:
* concatenate all 4-letter and 2-letter codes, in the following key order: `"FBUDLR"`
* The solved state results in the string `"WWWWYYYYBBGGOORR"`

The moves are represented as position permutations of this string, you can find it in the code delimited by `### DO NOT CHANGE`
* Given a text representation of the dictionary (such as `x = "WWWWYYYYBBGGOORR"`)
* the return value of `apply(x, move="U")` is the string representation of the cube AFTER doing the move `"U"`

There are 2 additional dictionaries:
* `POSSIBLE_MOVES`: a list of moves
* `MOVE_2_TIME`: a dictionary indicating how long it takes to do a move (not all moves are equal, some are easier to do than others)


___



About `default_factory`:
* Specifies which function to call (without arguments) to get a value for the case where this field is not in the list of arguments when the constructor is called

```python
def create_thing():
    return 42

@dataclass
class Sample:
    thing: int = field(default_factory=create_thing)

a = Sample()
print(a.thing)   # 42

b = Sample(thing=57)
print(b.thing)   # 57
```


## Assignment
* Code a function named `solved_cube` that returns the dictionary corresponding to the solved cube


* Code a class named `CubeState`, which represents one cube in a specific state
  * use `dataclass`
  * there is one non-mandatory attribute `faces`:
    * it is a dictionary representing the cube, see above for description (don't forget the complete type hinting for a dictionary)
    * the default value corresponds to the solved cube, as seen on the diagram
    * use the argument `default_factory` of the function `field` to achieve this without the need for `__post_init__`
  * add the dunder method corresponding to `repr`
    * return the string representation of the dictionary `faces`
  * add the dunder method corresponding to `hash`
    * return the result of calling the `hash` on the string representation of the dictionary
  * add a method `move` with one argument: the move to execute (a string)
    * return a new object `CubeState` with the state of the cube AFTER the move (use the function `apply` as explained above)
    * assume the existence of a function `from_repr` that can generate a CubeState object from its text representation


* Code a function `from_repr` that takes in a text representation of a cube state (see above for description)
  * return a new object `CubeState` with the `faces` dictionary correctly set up


* Code a class `Cube`, which represents a single cube
  * use `dataclass`
  * it has one optional attribute `state` which is a `CubeState`
  * use `default_factory` to have the default `CubeState()` as default value, without using `__post_init__`
  * add a method `move`, which changes the cube's state to the new state AFTER the move, and returns the cube itself
  * add a method `scramble`
    * receives a scramble (see above for description)
    * executes all the moves
    * change the cube's state to the new state AFTER executing all of the moves in the specified order
    * return the cube itself


* Code a function `generate_graph` that returns a networkx directed graph
  * This is the graph with all possible states of a cube
  * Nodes in the graph are objects `CubeState`
  * Each edge represents a single move from one state to another, it is directed from a state to the state AFTER the move
  * Each edge has attributes:
    * `move`: the 1-letter code for the move,
    * `time`: the time it takes to make the move (see description)
    * `turn`: it's always 1
  * In this cube, each move is also its own inverse
  * So `A->"U"->B` means that `B->"U"->A`

Implement the following algorithm for the generation of the graph:
```
Given an empty directed graph G
Given a Stack S that contains the solved state

As long as the stack is not empty:
  get the state on top of the stack: start
  For each possible move:
    get the state after executing the move from start
    if there are already edges (start, after) and (after, start) in the graph:
      go to the next move
    else:
      add the edges (start, after), (after, start) with the proper edge attributes
      add the new state to the stack S

  Return G

```

* Code a class `Solution`, using `dataclass`, with attributes `moves` (string) and `time` (integer), and no method

* Code a class `CubeSolver` that has one non-init argument `_G` of type networkx directed graph
  * the value of `_G` is the result of calling `generate_graph`
  * you can use `default_factory` to avoid having `__post_init__`
  * implement the dunder method for using an object as function `x = CubeSolver(); x(cube_to_solve)`
    * solves the cube by finding the shortest path from the cube's state to the solved state
    * it will have 2 arguments: an object `Cube` that has been scrambled, and a string `minimize`
    * `minimize` indicates which edge attribute should be minimized for the shortest path (should be `"time"` or `"turn"`)
    * returns an object `Solution`:
      * the `moves` is a string indicating which moves to do to solve the cube (e.g. `"URL"`)
      * the `time` indicate the total time for these moves
  * implement a method `max_moves`
    * No argument
    * Returns the maximum shortest path between any pair of nodes in the graph
    * It is called the `diameter` of a graph. It is up to you to locate the networkx function that computes this value
    * For this graph, it should be 4
    
    
    

## What you deliver
* function `solved_cube()`
* function `from_repr(txt)`
* class `CubeState`, using `@dataclass`, with attributes having the correct name and correct type, with the dunder method for `repr`
* class `Cube`, using `@dataclass`, with atttributes of correct name and type
* function `generate_graph`
* class `Solution` (same instructions for all classes)
* class `CubeSolver`, with dunder method for `()`, and `max_moves`

## Grading

Lines refer to the line numbers within the cell "Self-Test".

Question | Description | Self-test Lines | Points
---|---|---:|---:
Q1.1 | Function `solved_cube` | 17-20 | 1
Q1.2 | Class `CubeState` definition | 22-23 | 1
Q1.3 | Constructor `CubeState()` | 24 | 1
Q1.4 | Operator `CubeState.repr` | 25 | 2
Q1.5 | Operator `CubeState.hash` | 26 | 1
Q1.6 | Function `from_repr` | 28-30 | 2
Q1.7 | Method `CubeState.move` | 32-35 | 2
Q1.8 | Class `Cube` definition| 37-38 | 2
Q1.9 | Constructor `Cube()` | 39-40 | 1
Q1.10 | Method `Cube.move` | 41-45 | 2
Q1.11 | Method `Cube.scramble` | 47-53 | 2
Q1.12 | Function `generate_graph` | 55-59 | 20
Q1.13 | Class `Solution` definition | 61-62 | 1
Q1.14 | Class `CubeSolver` definition | 64-65 | 2
Q1.15 | Operator `CubeSolver.call` | 67-70 | 8
Q1.16 | Method `CubeSolver.max_moves` | 72 | 2
**TOTAL** | | | **50**

## How to
* Respect all method / attributes / arguments names given in the assignment


In [6]:
# CodeGrade Tag Question1

# Write your solution in one single cell:
# function solved_cube
# class CubeState
# class Cube
# function generate_graph
# class Solution
# class CubeSolver

### DO NOT CHANGE

R = [0,6,2,4,3,5,1,7,8,10,9,11,12,13,15,14]
U = [4,5,2,3,0,1,6,7,9,8,10,11,14,13,12,15]
L = [7,1,5,3,4,2,6,0,11,9,10,8,13,12,14,15]
D = [0,1,6,7,4,5,2,3,8,9,11,10,12,15,14,13]

MOVE_2_PERMUTATION: dict[str, list[int]] = {
    "R": R,
    "U": U,
    "L": L,
    "D": D,
}

POSSIBLE_MOVES = list(MOVE_2_PERMUTATION.keys())

MOVE_2_TIME: dict[str, int] = {
    "R": 1,
    "U": 2,
    "L": 2,
    "D": 3
}

def apply(x: str, move: str) -> str:
    return "".join(x[i] for i in MOVE_2_PERMUTATION[move])

KEY_ORDER = "FBUDLR"

def solved_cube() -> dict:
    return {
        "F": "WWWW",
        "B": "YYYY",
        "U": "BB",
        "D": "GG",
        "L": "OO",
        "R": "RR"
    }

### END OF GIVEN CODE

from dataclasses import dataclass, field
from typing import Dict, Tuple

import networkx as nx

@dataclass
class CubeState:
    faces: dict[str, str] = field(default_factory=solved_cube)

    def __repr__(self) -> str:

        return "".join([self.faces[face] for face in KEY_ORDER])

    def __hash__(self) -> int:
        return hash(repr(self))

    def move(self, move: str) -> 'CubeState':

        next_state = apply(repr(self),move)
        return from_repr(next_state)

def from_repr(representation: str) -> CubeState:
    i = 0
    sides = {}
    for key in KEY_ORDER:
        size = 4 if key in "FB" else 2
        sides[key], i = representation[i:i+size], i + size
    return CubeState(sides)

@dataclass
class Cube:
    state: CubeState = field(default_factory = CubeState)

    def move(self, move: str) -> 'Cube':
        self.state = self.state.move(move)
        return self

    def scramble(self, scramble: str) -> 'Cube':
        for move in scramble:
            self.move(move)
        return self

def generate_graph() -> nx.DiGraph:
    G = nx.DiGraph()
    stack = [CubeState()]
    visited = set()

    while stack:
        current = stack.pop()
        visited.add(hash(current))

        for move in POSSIBLE_MOVES:
            new_state = current.move(move)
            if (hash(new_state) not in visited) or (not G.has_edge(current, new_state)):
                G.add_edge(current, new_state, move=move, time=MOVE_2_TIME[move], turn=1)
                G.add_edge(new_state, current, move=move, time=MOVE_2_TIME[move], turn=1)
                stack.append(new_state)

    return G

@dataclass
class Solution:
    moves: str
    time: int

@dataclass
class CubeSolver:

    _G: nx.DiGraph = field(default_factory=generate_graph, init=False)

    def __call__(self, cube: Cube, minimize: str = "time") -> Solution:
        return self._get_solution(cube, minimize)

    def _get_solution(self, cube: Cube, minimize: str) -> Solution:

        path = self._get_shortest_path(cube, minimize)

        moves = self._get_moves_from_path(path)

        time = self._get_time_from_path(path, minimize)

        return Solution(moves, time)

    def _get_shortest_path(self, cube: Cube, minimize: str):
        return nx.shortest_path(
            self._G,
            source=cube.state,
            target=CubeState(),
            weight=minimize
        )

    def _get_moves_from_path(self, path) -> str:
        return "".join(
            self._G[path[i]][path[i+1]]['move']
            for i in range(len(path)-1)
        )

    def _get_time_from_path(self, path, minimize: str) -> int:
        total_time = 0
        for i in range(len(path)-1):
            edge_data = self._G.get_edge_data(path[i], path[i+1])
            total_time += edge_data['time']
        return total_time

    def max_moves(self) -> int:
        return nx.diameter(self._G)

## Self-Test

Using hash function to avoid giving away solution while allowing to test.

In [7]:
import hashlib
from dataclasses import is_dataclass
from dataclasses import fields
from dataclasses import _MISSING_TYPE

def hash_dict(d: dict[str, str]) -> str:
    txt = "/".join(f"{k}.{d[k]}" for k in KEY_ORDER)
    return hashlib.md5(txt.encode("utf-8")).hexdigest()

def hash_string(txt: str) -> str:
    return hashlib.md5(txt.encode("utf-8")).hexdigest()

def hash_fields(class_) -> str:
    txt = "/".join(sorted([f"{f.name}.{f.type.__qualname__}.{f.default_factory is not _MISSING_TYPE}.{f.init}" for f in fields(class_)]))
    return hashlib.md5(txt.encode("utf-8")).hexdigest()

edge_dict = solved_cube()
assert isinstance(edge_dict, dict)
assert all(isinstance(k, str) and isinstance(v, str) for k, v in edge_dict.items())
assert hash_dict(edge_dict) == "85cd11a43db135b797e07a9dc589f5d8"
print(hash_fields(CubeState))
assert is_dataclass(CubeState)
assert hash_fields(CubeState) == "7ad126216856463a3b71bc32a2a234d6"
st = CubeState()
assert hash_string(repr(st)) == "6f4eba0ca8981827e2561360f19699dc"
assert hash(st) == hash(repr(st))

st = from_repr("ABCDEFGHIJKLMNOP")
assert isinstance(st, CubeState)
assert hash_dict(st.faces) == "fc0597ef3fbcc8c6f58350a6b2e1dd4a"

st = CubeState().move("U")
assert isinstance(st, CubeState)
assert hash_dict(st.faces) == "c243be7271abebd5e07b10ac787a1a83"
assert hash_string(repr(st)) == "3c88da78ea6089c2a3265a897080e940"

assert is_dataclass(Cube)
assert hash_fields(Cube) == "dfe4b5064f68649223c875573b8b84ec"
c = Cube()
assert hash_string(repr(c)) == "04bf06504d70b64039f0596d05854dc9"
assert c.move("U") is c

c = Cube()
assert hash_string(repr(c.move("U"))) == "629e4d1c7e3404229ca9bf6f066165b4"
assert hash_string(repr(c.move("U"))) == "04bf06504d70b64039f0596d05854dc9"

c = Cube()
assert c.scramble("RULD") is c

c = Cube()
assert hash_string(repr(c.scramble("UU"))) == "04bf06504d70b64039f0596d05854dc9"
assert hash_string(repr(c.scramble("RRUULLDD"))) == "04bf06504d70b64039f0596d05854dc9"
assert hash_dict(Cube().scramble("URUL").state.faces) == "7f717a45980c292e01ec436da63e33cb"

G = generate_graph()
assert hash_string(str(type(G))) == "6729c3d6b00507a7d5966ccaaef0bb40"
assert G.number_of_nodes() == 24
assert CubeState() in G
assert Cube().scramble("RULD").state in G

assert is_dataclass(Solution)
assert hash_fields(Solution) == "aa12bd1950dc028ea646bf47d78ccb65"

assert is_dataclass(CubeSolver)
assert hash_fields(CubeSolver) == "fca69c463563a73c49bbfac554c09f1b"

solver = CubeSolver()
assert solver(cube=Cube().scramble("RU"), minimize="turn") == Solution(moves='UR', time=3)
assert solver(cube=Cube().scramble("URU"), minimize="turn") in [Solution(moves='DLD', time=8), Solution(moves='RUR', time=4), Solution(moves='URU', time=5), Solution(moves='LDL', time=7)]
assert solver(cube=Cube().scramble("URU"), minimize="time") == Solution(moves='RUR', time=4)

assert solver.max_moves() == 4

print("✅ Self-Test OK")

7ad126216856463a3b71bc32a2a234d6
✅ Self-Test OK


In [8]:
def move(self, move: str) -> 'CubeState':
        new_faces = apply(str(self), move)
        return CubeState(faces=from_repr(new_faces).faces)

# Question 2 - Car Traffic in Amsterdam Centre (30 points)

## Storyline

### Data
* We have created a graph of the streets in the center of Amsterdam
* It is a directed graph, based on the allowed travel direction for usual cars (excluding busses, service vehicles)
* The function `get_data()` will build the graph
* The variable `G` is an object of class `networkx.DiGraph`
* Nodes are intersections, while Edges are streets (or portions of streets)
* Edges attributes indicate street data (length, name, etc...)

### Study
* Based on this data we want to identify the most *important* streets in Amsterdam
* There is a graph metric called "edge betweenness centrality": see [Wikipedia](https://en.wikipedia.org/wiki/Betweenness_centrality)
  * It indicates how often a specific edge is on the shortest path between 2 nodes of the graph
  * Edges with a high centrality are crucial to car flow in the city
* In a last part, we will study all the travels starting from the REC-A building (node `SRC = 46349206`)
  * For each other node in the graph we obtain the shortest path, weighted by distance
  * We compute the straight line distance, based on the coordinates of the start and arrival nodes (node attribute `x` is longitude, `y` is latitude)
  * We are looking for the Top-10 destinations with the highest ratio `driving distance / straight line`


## Assignment
* For this part, you have less detailed instructions

* Code a function `add_travel_time` that computes the time it takes to travel along each edge
  * The function has one argument: `graph`, a `networkx` directed graph
  * It modifies this graph and returns it
  * Use the edge attributes `"length"` (length of the street in **meters**) and `"maxspeed"` (maximum speed in **kilometers per hour**)
  * `"maxspeed"` might be a string `"30"` or a list `["50", "30"]`
  * When `"maxspeed"` is a list, use the first item in the list
  * When `"maxspeed"` does not exist as an edge attribute, take `50` as the default value
  * Create a new attribute `"time_s"` that contains the time it takes to travel along the edge in **seconds** (be careful with units)
  * There are 3 *test* edges: `(0, 1), (0, 2), (0, 3)` representing all cases, use them for testing (see test code)


* Code 3 functions:
  * They all accept one directed graph `graph` as argument
  * They all return a dictionary, where keys are edge identifiers and values are float numbers:
  * `get_c_e_hops` returns the edge betweenness centrality of all the edges of the graph `G`, unweighted
  * `get_c_e_distance` returns the edge betweenness centrality of all the edges of the graph `G`, weighted by distance
  * `get_c_e_time` returns the edge betweenness centrality of all the edges of the graph `G`, weighted by time
  * It is a part of the assignment to identify the proper function in `networkx` and the correct arguments


* Code a function `get_top_10_streets`:
  * takes in a graph `graph` and a dictionary `edge_dict` similar to `c_e_hops`, `c_e_distance`, `c_e_time`
  * returns a list where each element is a tuple `(a, b)` where:
    * `a` is the street name of the edge
    * `b` is the value associated to this edge in `edge_dict`
  * returns only the Top-10 edges with regards to their associated value
  * return a sorted list in descending order of the value


* Code a function `get_top_10_ratio`
  * Takes as argument a directed graph `graph` and the identifier of a node `src` (an integer)
  * Compute the Top-10 list of nodes with the highest ratio `driving distance / straight line` (see description above)
  * Return this as a list of node identifiers
  * It is a list of 10 node identifiers, sorted in descending order of the ratio
  * use `geopy` to compute distances between 2 points
  * there is a `networkx` function that will compute all shortest path from a node to all the other nodes in the graph
  * the shortest path should be computed with the Dijkstra algorithm (you don't need to code the algorithm)
  * In all specific case where the ratio can't be computed (no shortest path, or zero-distance), consider the ratio to be `0.0`


* The whole cell must run under 2 minutes in the CodeGrade environment
  * Solution runs in 16 seconds on a standard laptop




## Grading

Lines refer to the line numbers within the cell "Self-Test".

Question | Description | Self-test Lines | Points
---|---|---:|---:
Q2.1 | Function `add_travel_time` | 16-19 | 5
Q2.2 | Function `get_c_e_hops` | 21 | 5
Q2.3 | Function `get_c_e_distance` | 22 | 5
Q2.4 | Function `get_c_e_time` | 23 | 5
Q2.5 | Function `get_top_10_streets` | 25-28 | 5
Q2.6 | Function `get_top_10_ratio` | 30 | 5
**TOTAL** | | | **30**

In [9]:
# CodeGrade Tag Question2
import networkx as nx
from geopy.distance import great_circle
from geopy.distance import geodesic
from typing import List, Tuple, Any

def add_travel_time(graph: nx.DiGraph) -> nx.DiGraph:
    for u, v, data in graph.edges(data=True):
        length = data['length'] if 'length' in data else 0
        maxspeed = data['maxspeed'] if 'maxspeed' in data else 50

        if isinstance(maxspeed, list):
            maxspeed = float(maxspeed[0])
        else:
            maxspeed = float(maxspeed)

        conversion_factor = 1000 / 3600

        time_s = length / (maxspeed * conversion_factor)

        data['time_s'] = time_s

    return graph

def get_c_e_hops(graph):
    return nx.edge_betweenness_centrality(graph)

def get_c_e_distance(graph):
    return nx.edge_betweenness_centrality(graph, weight='length')

def get_c_e_time(graph):
    return nx.edge_betweenness_centrality(graph, weight='time_s')

def get_top_10_streets(graph, edge_dict):
    edge_list = [(graph[u][v]['name'], value) for (u, v), value in edge_dict.items() if 'name' in graph[u][v]]

    top_10 = sorted(edge_list, key=lambda x: x[1], reverse=True)[:10]

    return top_10

def get_top_10_ratio(graph, src):
    shortest_paths = nx.single_source_dijkstra_path_length(graph, src, weight='length')
    ratios = {}

    src_coords = (graph.nodes[src]['y'], graph.nodes[src]['x'])

    for target in shortest_paths:
        if target != src:
            target_coords = (graph.nodes[target]['y'], graph.nodes[target]['x'])
            straight_line_distance = geodesic(src_coords, target_coords).meters

            if straight_line_distance > 0:
                ratio = shortest_paths[target] / straight_line_distance
            else:
                ratio = 0.0

            ratios[target] = ratio

    top_10 = sorted(ratios, key=ratios.get, reverse=True)[:10]

    return top_10


### DO NOT CHANGE

import json
import networkx as nx

def get_data() -> nx.DiGraph:
    with open("amsterdam.json") as src:
        js = json.load(src)

    return nx.node_link_graph(js)

### END OF GIVEN CODE



## Self-Test

In [10]:
import hashlib
import time
from typing import Any

def hash_dict(d: dict[str, float]) -> str:
    txt = "/".join(f"{k}.{round(d[k], 4)}" for k in sorted(list(d.keys())))
    return hashlib.md5(txt.encode("utf-8")).hexdigest()

def hash_string(txt: str) -> str:
    return hashlib.md5(txt.encode("utf-8")).hexdigest()

def hash_list(data: list[Any]) -> str:
    txt = "/".join(str(x) for x in data)
    return hashlib.md5(txt.encode("utf-8")).hexdigest()


start_time = time.time()
G = get_data()

G = add_travel_time(graph=G)
assert all("time_s" in G.edges[x] for x in G.edges())
assert G.edges[(0, 1)]["time_s"] == 36.0
assert G.edges[(0, 2)]["time_s"] == 72.0
assert G.edges[(0, 3)]["time_s"] == 72.0

c_e_hops = get_c_e_hops(graph=G)
c_e_distance = get_c_e_distance(graph=G)
c_e_time = get_c_e_time(graph=G)

assert hash_dict(c_e_hops) == "782c614f6d7cbcde8bf7a1e535395f76"
assert hash_dict(c_e_distance) == "0e16fa58f41b78b028b98c3b0869bcaa"
assert hash_dict(c_e_time) == "44e8da10321f1e9ea64dd7221d0d715b"

assert get_top_10_streets(graph=G, edge_dict={(0, 1): 10, (0, 2): 12, (0, 3): 5}) == [('2nd Street', 12), ('1st Street', 10), ('3rd Street', 5)]
assert hash_list(get_top_10_streets(graph=G, edge_dict=c_e_hops)) == "001372730f2369f4b0b86d7de61d9a94"
assert hash_list(get_top_10_streets(graph=G, edge_dict=c_e_distance)) == "5a415cd3b129be61365e81bfa9d6be56"
assert hash_list(get_top_10_streets(graph=G, edge_dict=c_e_time)) == "bbb3c77552d3817dce2cb20624b7f2a8"

top_10_ratio = get_top_10_ratio(graph=G, src=46349206)
assert hash_list(top_10_ratio) == "6ee4db971f3a11d7c40c2e5c7b21652c"
end_time = time.time()

print("✅ Self-Test OK")
print(f"⏲️ {end_time - start_time:.1f} secs")

✅ Self-Test OK
⏲️ 32.7 secs


# Question 3 - Graph Data Structure (20 points)

## Storyline

* You will code an implementation of the Graph data structure from scratch
* A graph is a set of nodes, and a set of edges
* It has operations to add new nodes, new edges


## Assignment
* You can't use the `networkx` module, it will be deactivated in the CodeGrade environment for this function
* Create a class `Graph` by using `dataclass`
* This class has 2 attributes
  * `nodes` is a set of integers (each element in `nodes` is an integer)
  * `edges` is a set of tuples of 2 integers (each element in `edges` is a tuple `(src, dst)` where `src` and `dst` are integers)
* Dunder method for `in` operator (`__contains__`)
  * One argument: `node_or_edge_or_path` either an integer or a tuple of 2 integers or a list of integers
  * Returns a boolean
  * When `node_or_edge_or_path` is an integer: returns `True` when the integer is in the set of nodes, `False` otherwise
  * When `node_or_edge_or_path` is a tuple: returns `True` when the tuple is in the set of edges, `False` otherwise
  * When `node_or_edge_or_path` is a list of integers: returns `True` when the list is a valid path, ie it has at least 2 nodes and there is an edge between each consecutive node in the list
  * When something is wrong with `node_or_edge_path` it should raise an exception `ValueError` (no points for that, but you can try it)
* Dunder method for `str` operator (`__str__`)
  * returns a string
  * this string has the sorted list of edges
    * in ascending order of source node of edges
    * in ascending order of destination node of edges
    * For example if the edges are `{(0, 1), (3, 2), (1, 1)}` then the string is `"[(0, 1), (1, 1), (3, 2)]"`
    * Which is the result of `str([(0, 1), (1, 1), (3, 2)])`
* Dunder method for `hash` operator (`__hash__`)
  * returns an integer
  * this integer is the integer value of SH3-224 hash of the string representing the object, modulo `2**61 - 1`
* Method `add_nodes`
  * One argument `nodes`: a list of integers
  * Does not return anything
  * Adds all the nodes to the current set of nodes in the graph
* Method `add_edges`
  * One argument `edges`: a list of tuples of 2 integers
  * Does not return anything
  * Adds all of the edges to the current set of edges in the graph
  * If needed, adds new nodes to the graph
* Method `is_valid_path`
  * One argument `path`: a list of integers
  * Returns a boolean
  * Evaluates whether the nodes mentioned in `path` can be visited in this order
  * This means that there is an edge between each consecutive item in `path`
* Method `in_degree`
  * One argument `node`: an integer
  * Returns an integer
  * Returns the in-degree of the node
* Method `out_degree`
  * One argument `node`: an integer
  * Returns an integer
  * Returns the out-degree of the node






## Grading

Lines refer to the line numbers within the cell "Self-Test".

Question | Description | Self-test Lines | Points
---|---|---:|---:
Q3.1 | Build `Graph` | 2-4 | 2
Q3.2 | Operator `in` | 6-24 | 6
Q3.3 | Method `add_edges` | 26-27 | 2
Q3.4 | Method `add_nodes` | 29-31 | 2
Q3.5 | Method `in_degree` | 37 | 2
Q3.6 | Method `out_degree` | 38 | 2
Q3.7 | Operator `str` | 39 | 2
Q3.8 | Operator `hash` | 40 | 2
**TOTAL** | | | **20**

In [11]:
# CodeGrade Tag Question3

import hashlib
from dataclasses import dataclass, field
from typing import List, Tuple, Set

@dataclass
class Graph:
    nodes: Set[int] = field(default_factory=set)
    edges: Set[Tuple[int, int]] = field(default_factory=set)

    def __post_init__(self):
        for edge in self.edges:
            self.nodes.update(edge)

    def __contains__(self, node_or_edge_or_path):
        if isinstance(node_or_edge_or_path, int):
            return node_or_edge_or_path in self.nodes

        elif isinstance(node_or_edge_or_path, tuple) and len(node_or_edge_or_path) == 2:
            return node_or_edge_or_path in self.edges

        elif isinstance(node_or_edge_or_path, list) and len(node_or_edge_or_path) >= 2:
            return self.is_valid_path(node_or_edge_or_path)

        else:
            raise ValueError("Input must be an integer, a tuple of two integers, or a list of integers")

    def __str__(self):
        sorted_edges = sorted(self.edges)
        return str(sorted_edges)

    def __hash__(self):
        graph_str = str(self)
        sh3_224_hash = hashlib.sha3_224(graph_str.encode()).hexdigest()
        return int(sh3_224_hash, 16) % (2**61 - 1)

    def add_nodes(self, nodes: List[int]):
        self.nodes.update(nodes)

    def add_edges(self, edges: List[Tuple[int, int]]):
        for edge in edges:
            src, dst = edge
            self.nodes.update([src, dst])
            self.edges.add(edge)

    def is_valid_path(self, path: List[int]) -> bool:
        return all((path[i], path[i+1]) in self.edges for i in range(len(path) - 1))

    def in_degree(self, node: int) -> int:
        return sum(1 for src, dst in self.edges if dst == node)

    def out_degree(self, node: int) -> int:
        return sum(1 for src, dst in self.edges if src == node)

## Self-Test

In [13]:
## Self-Test
g = Graph(nodes={1, 2, 3, 4}, edges={(4, 5), (1, 2), (3, 4), (1, 3), (2, 3), (2, 5), (3, 1), (5, 2), (3, 4)})
assert isinstance(g.nodes, set) and len(g.nodes) == 5 and all(isinstance(x, int) for x in g.nodes)
assert isinstance(g.edges, set) and len(g.edges) == 8 and all(isinstance(x, tuple) and len(x) == 2 and all(isinstance(y, int) for y in x) for x in g.edges)

assert (3, 4) in g
assert 5 in g
assert 6 not in g

for bad_value in ["amsterdam", (1, 3, 4), (1.4, 2), ["a", "b", "c", "d"], [1]]:
    try:
        _ = bad_value in g
        print(f"🙋 Raise an exception for {bad_value}")
    except ValueError:
        pass

assert [1, 3, 4, 5, 2] in g
assert [1, 2, 4, 5, 2] not in g

g.add_edges([(2, 4)])
assert [1, 2, 4, 5, 2] in g

g.add_edges([(5, 10), (12, 13), (25, 5)])
assert all(x in g for x in [10, 12, 13, 25])
assert len(g.nodes) == 9

g.add_nodes([5, 3, 89])
assert len(g.nodes) == 10
assert 89 in g

assert g.in_degree(5) == 3
assert g.out_degree(4) == 1
assert str(g) == "[(1, 2), (1, 3), (2, 3), (2, 4), (2, 5), (3, 1), (3, 4), (4, 5), (5, 2), (5, 10), (12, 13), (25, 5)]"
assert hash(g) == 2214868959501319296

print("✅ Self-Test OK")

🙋 Raise an exception for (1.4, 2)
🙋 Raise an exception for ['a', 'b', 'c', 'd']
✅ Self-Test OK
