# Assignment 6
## Group Members:
* ### Nils Dunlop, e-mail: gusdunlni@student.gu.se
* ### Francisco Alejandro Erazo Piza, e-mail: guserafr@student.gu.se

## Problem 1

In [1]:
import math
from collections import deque

class Graph:
    # Initialize the graph components
    def __init__(self, gdict={}):
        self.gdict = gdict
        self.distance = {}
        self.colour = {}
        self.predecessor = {}
        self.finish = {}

    # Return all vertices of the graph
    def get_vertices(self):
        return list(self.gdict.keys())

    # Return all edges of the graph
    def get_edges(self):
        edges = []
        for vertex in self.gdict:
            for next_vertex in self.gdict[vertex]:
                if (vertex, next_vertex) not in edges:
                    edges.append((vertex, next_vertex))
        return edges

    # Print the shortest path from source s to vertex v
    def print_path(self, s, v):
        if v in self.gdict.keys():
            if v == s:
                print(s)
            elif self.predecessor[v] == None:
                print("There is no path from", s, "to", v, "exists.")
            else:
                self.print_path(s, self.predecessor[v])
                print(v)
        else:
            print("Node with key", v, "is not in the graph.")

    # Initialize distance and predecessor for each vertex
    def initialise_single_source(self, s):
        for v in self.get_vertices():
            self.distance[v] = math.inf
            self.predecessor[v] = None
        self.distance[s] = 0

    # Get weight of the edge from u to v
    def get_weight(self, u, v):
        return self.gdict[u][v]

    # Update distance and predecessor if a shorter path is found
    def relax(self, u, v):
        if self.distance[v] > self.distance[u] + self.get_weight(u, v):
            self.distance[v] = self.distance[u] + self.get_weight(u, v)
            self.predecessor[v] = u

    # Dijkstra's algorithm for finding the shortest path
    def dijkstra(self, s):
        # Initialize distances and predecessors for all nodes
        self.initialise_single_source(s)
        
        # Priority queue to process nodes
        # Initialize the priority queue with the source node and a distance of 0
        priority_queue = [(0, s)]

        while priority_queue:
            # Sort the queue by distance, with the shortest distance at the front
            priority_queue.sort(reverse=True, key=lambda x: x[0])
            
            # Get the node with the shortest distance
            current_cost, current_node = priority_queue.pop()

            # Relax the edges for each neighbor of the current node
            for neighbor in self.gdict[current_node]:
                old_distance = self.distance[neighbor]
                self.relax(current_node, neighbor)

                # If the distance to the neighbor has been updated, add it to the queue
                if self.distance[neighbor] != old_distance:
                    priority_queue.append((self.distance[neighbor], neighbor))

    # DAG shortest path algorithm for directed acyclic graphs
    def dag_shortest_path(self, s):
        # Initialize distances and predecessors for all nodes
        self.initialise_single_source(s)
        
        # Get the nodes in topological order
        sorted_nodes = self.topological_sort()

        for node in sorted_nodes:
            # Relax all edges from the current node to its neighbors
            for neighbor in self.gdict[node]:
                self.relax(node, neighbor)

    # Topological sort using depth-first search
    def topological_sort(self):
        # List to store the sorted order of node
        sorted_nodes = deque()
        
        # Set to track visited nodes
        visited = set()
        
        # Depth-first search function
        def dfs(node):
            # Mark the current node as visited
            visited.add(node)
            
            # Recursively visit all unvisited neighbors
            for neighbor in self.gdict[node]:
                if neighbor not in visited:
                    dfs(neighbor)
                    
            # Once all neighbors are visited, add the current node to the start of the sorted list
            sorted_nodes.appendleft(node)

        # Start DFS from each unvisited node to ensure all nodes are processed
        for node in self.gdict:
            if node not in visited:
                dfs(node)

        return sorted_nodes

    # Print the shortest path using Dijkstra's algorithm
    def shortest_path_dijkstra(self, s, v):
        self.dijkstra(s)
        self.print_path(s, v)
        print(f"Total time: {self.distance[v]} minutes\n")

    # Print the shortest path using Dag shortest path's algorithm
    def shortest_path_dag(self, s, v):
        self.dag_shortest_path(s)
        self.print_path(s, v)
        print(f"Total time: {self.distance[v]} minutes\n")

# Input graphs
adjacency = {
    "r": {"s": 5, "t": 3},
    "s": {"x": 6, "t": 2},
    "t": {"x": 7, "y": 4, "z": 2},
    "x": {"y": -1, "z": 1},
    "y": {"z": -2},
    "z": {}
}

adjacency2 = {
    "s": {"t": 5, "y": 10},
    "t": {"x": 1, "y": 2},
    "x": {"z": 4},
    "y": {"x": 9,"t": 3, "z": 2},
    "z": {"s": 7,"x": 6}
}

graph = Graph(adjacency2)
graph2 = Graph(adjacency)
graph.shortest_path_dijkstra("s", "z")
graph2.shortest_path_dag("s", "z")

s
t
y
z
Total time: 9 minutes

s
x
y
z
Total time: 3 minutes



## Problem 2

In [2]:
import tarfile

def extract_tar_to_dict(tar_file_path):
    extracted_files = {}

    # Open the tar file and extract the files
    try:
        with tarfile.open(tar_file_path, 'r:gz') as tar:
            for item in tar:
                # Skip the files starting with ._ since they are not tram data
                if item.name.startswith('Data/._'):
                    continue
                # Extract the tram data files
                if item.isfile() and 'tram' in item.name and item.name.endswith('.txt'):
                    file = tar.extractfile(item)
                    if file:
                        tram_dict = {}

                        for line in file:
                            # Decode the line and split it into the tram stop and time
                            tram_stop, time = line.decode('utf-8', errors='ignore').strip().lower().split(', ')

                            tram_dict[tram_stop] = int(time)
                            
                        # Add the tram data to the dictionary
                        tram_number = item.name.replace('Data/', '').replace('.txt', '')
                        extracted_files[tram_number] = tram_dict
    # Handle exceptions
    except FileNotFoundError:
        print(f"{tar_file_path} not found.")
    except tarfile.ReadError:
        print(f"{tar_file_path} is not a tar file.")

    return extracted_files


tar_file_path = 'Data_A6.tar.gz'
extracted_dict = extract_tar_to_dict(tar_file_path)

In [3]:
def get_complete_tram_data(extracted_dict):
    reverse_dict = {}

    for tram, stops in extracted_dict.items():
        # Simply reverse the order of the dictionary items
        reversed_stops = dict(reversed(stops.items()))

        # Add the reverse stops to the dictionary
        reverse_tram = f"{tram}_reverse"
        reverse_dict[reverse_tram] = reversed_stops

    return {**extracted_dict, **reverse_dict}, reverse_dict


def get_tram_hubs(complete_tram_data, extracted_dict):
    # Set to store all unique tram stops and tram hubs
    all_tram_stops, tram_hubs = set(), set()

    # Dictionary to count the number of connections each tram stop has
    connections_count = {}

    # Populate the 'all_tram_stops' set with unique stops from 'complete_tram_data'
    for inner_dict in complete_tram_data.values():
        for key in inner_dict.keys():
            all_tram_stops.add(key.lower())

    # Initialize the dictionary with zeros for each tram stop
    for tram_stop in all_tram_stops:
        connections_count[tram_stop] = 0

    # For each tram line count the number of connections for each stop
    for tram_line in extracted_dict.values():
        stops = list(tram_line.keys())
        for i in range(len(stops) - 1):
            current_stop = stops[i]
            next_stop = stops[i + 1]
            connections_count[current_stop] += 1  # Increment connection count for the current stop
            connections_count[next_stop] += 1     # Increment connection count for the next stop

    # Identify tram hubs as stops with 3 or more connections
    for stop, count in connections_count.items():
        if count >= 3:
            tram_hubs.add(stop)

    # Return the set of tram hubs
    return tram_hubs


complete_tram_data, reverse_dict = get_complete_tram_data(extracted_dict)
tram_hubs = get_tram_hubs(complete_tram_data, extracted_dict)

In [4]:
def get_terminal_stops(extracted_dict):
    terminal_stops = []

    # Iterate over each tram line in the extracted_dict
    for line, stops in extracted_dict.items():
        # Get the first stop
        first_key = list(stops.keys())[0]
        # Get the last stop
        last_key = list(stops.keys())[-1]

        # Append both starting and terminal stops and covert to lower case
        terminal_stops.append(first_key.lower())
        terminal_stops.append(last_key.lower())

    # Convert terminal_stops to a set to remove duplicates and then convert back to list
    return list(set(terminal_stops))

# Fetch terminal stops
terminal_stops = get_terminal_stops(extracted_dict)

# Combine tram hubs and terminal stops into a single list
# Set comprehension was used to ensure they're unique and lowercase
all_special_stops = sorted(list(tram_hubs.union({stop.lower() for stop in terminal_stops})))

In [5]:
def build_tram_network_graph(extracted_dict, all_special_stops):
    graph = {}

    # Initialize the graph with terminal stops and hubs
    for stop in all_special_stops:
        graph[stop] = {}
        
    # Create edges for each tram line
    for tram_lines, stops in extracted_dict.items():
        stop_names = list(stops.keys())
        for i in range(len(stop_names) - 1):
            # Get the current and next stop names and time
            current_stop = stop_names[i]
            next_stop = stop_names[i + 1]
            current_time = stops[current_stop]

            # Only process if both stops are either terminal stops or hubs
            if current_stop in graph and next_stop in graph:
                # Check if there's already a connection and pick the shorter time if so
                # This was done to account for the variability in connection times
                if next_stop in graph[current_stop]:
                    graph[current_stop][next_stop] = min(graph[current_stop][next_stop], current_time)
                else:
                    graph[current_stop][next_stop] =  current_time

                # Also add the reverse connection
                # The time in the reverse connection is also current_time due to the assignment description
                # "(assume that the time between stops A and B will be the same as the time between stops B and A)."
                if current_stop in graph[next_stop]:
                    graph[next_stop][current_stop] = min(graph[next_stop][current_stop], current_time)
                else:
                    graph[next_stop][current_stop] = current_time

    return graph

tram_network_graph = build_tram_network_graph(extracted_dict, all_special_stops)

In [6]:
graph = Graph(tram_network_graph)

# Shortest route from Chalmers to Centralstationen using Dijkstra's algorithm
print("Shortest route from Chalmers to Centralstationen using Dijkstra's algorithm:")
graph.shortest_path_dijkstra("chalmers", "centralstationen")

# Shortest route from Chalmers to Centralstationen using DAG shortest path algorithm
print("Shortest route from Chalmers to Centralstationen using DAG shortest path algorithm:")
graph.shortest_path_dag("chalmers", "centralstationen")

# Shortest route from Saltholmen to Chalmers using Dijkstra's algorithm
print("Shortest route from Saltholmen to Chalmers using Dijkstra's algorithm:")
graph.shortest_path_dijkstra("saltholmen", "chalmers")

# Shortest route from Saltholmen to Chalmers using DAG shortest path algorithm
print("Shortest route from Saltholmen to Chalmers using DAG shortest path algorithm:")
graph.shortest_path_dag("saltholmen", "chalmers")

Shortest route from Chalmers to Centralstationen using Dijkstra's algorithm:
chalmers
kapellplatsen
vasaplatsen
grönsakstorget
domkyrkan
brunnsparken
centralstationen
Total time: 10 minutes

Shortest route from Chalmers to Centralstationen using DAG shortest path algorithm:
chalmers
korsvägen
scandinavium
ullevi södra
centralstationen
Total time: 11 minutes

Shortest route from Saltholmen to Chalmers using Dijkstra's algorithm:
saltholmen
roddföreningen
långedrag
hinsholmen
käringberget
tranered
hagen
nya varvsallén
kungssten
sandarna
sannaplan
mariaplan
marklandsgatan
botaniska trädgården
sahlgrenska huvudentré
medicinaregatan
wavrinskys plats
chalmers
Total time: 28 minutes

Shortest route from Saltholmen to Chalmers using DAG shortest path algorithm:
There is no path from saltholmen to chalmers exists.
Total time: inf minutes



### Observations
1. **Variability in Connection Times:** For certain stops such as between 'Gamlestads torg' and 'Ejdergatan', multiple trams serve the same route. However, the travel time varies depending on the tram. In our graph, we opted to represent this connection using the shortest possible time between the two stops. By representing connections with the shortest possible time, it's worth noting that some trams might take longer than depicted in our graph.
2. **Standardizing Stop Names:** We noticed discrepancies in the casing of tram stop names. To avoid any duplication and ensure consistency in our data representation, all tram stop names were converted to lowercase.
3. **Asymmetric Travel Times:** Some connections, like A->B and B->A, displayed different travel times depending on the direction. However based on the assignment description, we assumed that the travel time between A and B would be the same as the travel time between B and A. This assumption may not fully reflect real-world scenarios where traffic, tram frequency, or other factors could affect travel times however it was necessary to ensure the validity of our results.
4. **Limitation with DAG Shortest Path Algorithm:** The DAG (Directed Acyclic Graph) shortest path algorithm was unsuitable for certain paths, like from Saltholmen to Chalmers. The tram network isn't purely acyclic, especially since some routes are faster in one direction than the other. This limitation should be noted when choosing algorithms for analysis.
5. **Final Output:** The computed travel time between 'Chalmers' and 'Centralstationen' was 10 minutes using the Dijkstra's algorithm compared to the 11 minutes using the DAG Shortest Path algorithm. In contrast, traveling from 'Saltholmen' to 'Chalmers' took 28 minutes using Dijkstra's algorithm. However, the DAG shortest path algorithm was unable to find a path between these two stops given that the tram network isn't purely acyclic.

#### Explanation for the build_tram_network_graph() implementation
The method begins by initializing a dictionary with key tram hubs and terminal stops paired with empty dictionaries. It then traverses each tram line, creating edges between stops. For each tram line, the method iterates over its stops. Using the assignment's guideline of 'the approximate time (in minutes) from that tram stop to the tram stop that is given on the next line of the file', it uses the current_time to determine the time between consecutive stops. This time is added as a value for both the current stop to the next and vice versa, in accordance with the assignment's provision that travel times are symmetric (i.e., 'the time between stops A and B will be the same as between B and A'). If a connection between two stops already exists, the method selects the shorter time, accounting for variable connection times. The graph is then returned.

#### Interpretation of the results
**Shortest route from Chalmers to Centralstationen:**

<u>Dijkstra's path:</u>

1. Chalmers → Kapellplatsen = 2 minutes
2. Kapellplatsen → Vasaplatsen = 2 minutes
3. Vasaplatsen → Grönsakstorget = 1 minute
4. Grönsakstorget → Domkyrkan = 1 minute
5. Domkyrkan → Brunnsparken = 2 minutes
6. Brunnsparken → Centralstationen = 2 minutes
Total time: 10 minutes

<u>DAG shortest path:</u>
1. Chalmers → Korsvägen = 5 minutes
2. Korsvägen → Scandinavium = 1 minute
3. Scandinavium → Ullevi södra = 2 minutes
4. Ullevi södra → Centralstationen = 3 minutes
Total time: 11 minutes

**Shortest route from Saltholmen to Chalmers:**

<u>Dijkstra's path:</u>

The shortest path from Saltholmen to Chalmers, using Dijkstra's algorithm, involves 17 stops and takes a total of 28 minutes. Full path seen below:
1. Saltholmen → Roddföreningen = 1 minute
2. Roddföreningen → Långedrag = 1 minute
3. Långedrag → Hinsholmen = 1 minute
4. Hinsholmen → Käringberget = 2 minute
5. Käringberget → Tranered = 1 minute
6. Tranered → Hagen = 1 minute
7. Hagen → Nya Varvsallén = 2 minutes
8. Nya Varvsallén → Kungssten = 1 minute
9. Kungssten → Sandarna = 2 minutes
10. Sandarna → Sannaplan = 1 minute
11. Sannaplan → Mariaplan = 1 minute
12. Mariaplan → Marklandsgatan = 5 minutes
13. Marklandsgatan → Botaniska Trädgården = 3 minutes
14. Botaniska Trädgården → Sahlgrenska Huvudentré = 3 minutes
15. Sahlgrenska Huvudentré → Medicinaregatan = 1 minute
16. Medicinaregatan → Wavrinskys Plats = 1 minute
17. Wavrinskys Plats → Chalmers = 1 minute

Total time: 1 + 1 + 1 + 2 + 1 + 1 + 2 + 1 + 2 + 1 + 1 + 5 + 3 + 3 + 1 + 1 + 1 = 28 minutes

**Why are Dijkstra's algorithm and the DAG shortest path algorithm producing different results?**

The DAG shortest path algorithm is designed for Directed Acyclic Graphs, but our tram network, containing cycles, isn't a DAG. The dag_shortest_path method assumes no cycles, leading to non-optimal results on cyclic graphs. Dijkstra's algorithm correctly computes the shortest path for cyclic graphs, as long as there are no negative weights. Therefore, since the tram network graph is cyclic, Dijkstra's algorithm produces the correct result.

**Why does the DAG shortest path algorithm not find a path between Saltholmen and Chalmers?**

The DAG shortest path algorithm works best for graphs without loops or cycles. However, in the tram network, trams can travel back and forth between stops, creating potential loops. Since the tram network contains these loops, it's not suitable for the DAG algorithm, leading to the issue of not finding a path between Saltholmen and Chalmers.