# Practical 2: Deterministic Dynamic Programming

Author: FIRSTNAME  LASTNAME

Student Number: n00000000

### Learning Outcomes:
In this practical you will address the following learning outcomes:
- Control Sequences
- Cost Functions
- Deterministic Dynamic Programming

We will require the following library for this practical (Import all necessary libraries before running the code):

In [None]:
import numpy as np

## Part A: Deterministic Shortest Path
The Shortest Path Problem (SPP) involves finding the shortest path between two specific nodes in a weighted graph. In a graph, nodes represent points or locations, and edges represent connections between those points. A weighted graph includes numerical values associated with each edge, indicating the "cost" or "weight" of traveling between the connected nodes. The goal of the shortest path problem is to determine the path from a starting node to a target node that has the minimum total cost among all possible paths. The cost of a path is the sum of the costs of the edges along that path. Consider the following example:

Tom, who resides in City "A", is planning a journey towards City "H". Given his limited funds, he has devised a strategic plan to spend each night during his expedition at the abode of a friend. Tom has friends in cities "B", "C", "D", "E", "F", and "G".

Tom is mindful of optimizing his energy expenditure, and he is aware of the limited distances he can cover each day. On the first day of travel, he can comfortably reach City "B", "C", or "D". On the second day, he can reach City "E", "F", or "G". Ultimately, Tom can reach his destination, City "H", on the third day.

To conserve energy and navigate his journey efficiently, Tom must strategically decide where to spend each night along the route. It's imperative for him to consider the energy requirements between cities, which are outlined in the subsequent table. By skillfully selecting his overnight stops, Tom can ensure his expedition is both cost-effective and successful.

| Cities | A | B | C | D | E | F | G | H |
|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|
| **A** | / | 333 | 282 | 230 | / | / | / | / |
| **B** | / | / | / | / | 553 | 280 | 370 | / |
| **C** | / | / | / | / | 470 | 404 | 522 | / |
| **D** | / | / | / | / | 268 | 606 | 767 | / |
| **E** | / | / | / | / | / | / | / | 807 |
| **F** | / | / | / | / | / | / | / | 450 |
| **G** | / | / | / | / | / | / | / | 603 |

The left-hand side of the table indicates the departure cities, while the top denotes the arrival cities. For instance, the value "333" in the first row quantifies the energy from City "A" to City "B". Consider the following questions:

### Q1:
Based on the above description and the energy table, construct a graphical representation of the shortest path problem involving the travel between cities. (Feel free to utilize any drawing tools at your disposal: e.g., PowerPoint, a hand-drawn illustration, etc.)

< Answer Here >

### Q2
Inspect your graph, by inspection, identify the path with the least energy expenditure towards the destination.

< Answer Here >

### Q3
For all possible paths, calculate their costs by hand using the "cost-to-go" functions. (Hint: use your graph)

< Answer Here >

### Q4
Examine all potential paths with "cost-to-go" functions in Q3, identify the path characterized by the lowest energy consumption. Does this align with your intuition in Q2? Provide an explanation for the outcome.

< Answer Here >

### Q5
Use the dynamic programming (DP) algorithm to identify the best (optimal) path exhibiting the minimum energy. Does this align with the path from Q4?

< Answer Here >

### Q6
Complete the following code to implement deterministic dynamic programming algorithm for the SSP. We have provided some setup code.

In [None]:
# Define the nodes at each step. Here, the nodes are defined by a dictionary. The keys in this dictionary "0~3" represent the 
# stage, and the values "0~7" represent City "A"~"H", respectively.
nodes = {
    0: [0],
    1: [1,2,3],
    2: [4,5,6],
    3: [7],
}

# Define the actions and the corresponding costs between the nodes. The keys in this dictionary "0~7" represent City "A"~"H", 
# and the values corresponding to each key represent the next city and the energy cost between these two cities, respectively.
graph = {
    0: [(1,333), (2,282), (3,230)],
    1: [(4,553), (5,280), (6,370)],
    2: [(4,470), (5,404), (6,522)],
    3: [(4,268), (5,606), (6,767)],
    4: [(7,807)],
    5: [(7,450)],
    6: [(7,603)],
    7: [],
}

In [None]:
num_stage = len(nodes)  # The number of stages
num_nodes = len(graph)  # The number of nodes
value_function = np.zeros(num_nodes)  # Initialize the value function for each node
value_function[num_nodes-1] = 0
optimal_action = np.zeros(num_nodes)  # Initialize the optimal action at each node
optimal_action[num_nodes-1] = num_nodes-1
optimal_path_index = nodes[0]  # Initialize the optimal path with the starting point

cities = ["A", "B", "C", "D", "E", "F", "G", "H"]  # The city nodes

# Implement deterministic dynamical programming algorithm
for k in range(num_stage-2, -1, -1):
    for n in nodes[k]:
        values = []
        num_action = len(graph[n])
        
        # Hint: compute the value for each action, and append to the values list
        ### START CODE HERE ###

        
        
        ###  END CODE HERE ###
        
        value_function[n] = np.min(values)  # Choose the minimum value
        optimal_action[n] = graph[n][np.argmin(values)][0]  # Extract the action with minimum value

# Obtain the optimal path
optimal_path = ["A"]
for k in range(1, num_stage):
    action = optimal_action[int(optimal_path_index[-1])]
    optimal_path_index.append(int(action))
    optimal_path.append(cities[int(action)])
    
# Print the results
print('Optimal Cost:', value_function[0])
print('Optimal Path:', optimal_path)

### Q7
Compare the result of your DP implementation to the by-hand computation in Q4 and Q5.

< Answer Here >

## Part B: Traveling Salesperson
In the Traveling Salesperson Problem (TSP), a salesperson is tasked with visiting a set of cities exactly once and returning to the starting city, while minimizing the total distance or cost traveled. The goal is to find the shortest possible route that visits all cities and returns to the starting point. Different from the above SSP, all cities are accessible in TSP.

Consider the following example: There are four cities "A", "B", "C", "D". The salesperson travels from City "A", and wishes to find a minimum cost (distance) that visits each of the cities once and return to City "A".

### Q8
Consider the following given costs (travel distances), intuit the best control sequence. Explain why you chose this sequence.

| Cities | A | B | C | D | 
|:---------:|:---------:|:---------:|:---------:|:---------:|
| **A** | 0 | 5 | 1 | 15 | 
| **B** | 5 | 0 | 20 | 4 | 
| **C** | 1 | 20 | 0 | 3 |
| **D** | 15 | 4 | 3 | 0 | 

< Answer Here >

### Q9
For all possible control sequences, calculate their costs by hand using the "cost-to-go" functions of all constituent cities.

< Answer Here >

### Q10
Consider all possible sequences from Q9, identify the best (optimal) control sequence. Is this the control sequence that you expect? Explain why.

< Answer Here >

### Q11
Use the dynamic programming algorithm to identify the best (optimal) control sequence.

< Answer Here >

### Q12
Compare the control sequence from the "cost-to-go" functions (Q10) and dynamic programming algorithm (Q11). Are these sequences identical? Does this outcome align with your initial expectations?

< Answer Here >

### Q13
In the above, we started and finished in city A. How would the optimal control sequence change if we were to start and finish in a different city?

< Answer Here >

### Q14
In the above, we considered a problem with 4 states (cities). How would you handle a higher order problem (e.g. with 20 cities)?

< Answer Here >

### Q15
Given the costs in Q8, complete the following code to implement the deterministic dynamic programming algorithm for the TSP starting from City "A".

In [None]:
cityCosts = np.array([
  [float('inf'), 5, 1, 15],
  [5, float('inf'), 20, 4],
  [1, 20, float('inf'), 3],
  [15, 4, 3, float('inf')]  
])

In [None]:
# Create Dynamic Programming function 
# Hint: use 'def func_name(inputs): ... return outputs'
### START CODE HERE ###
    

    
    
### END CODE HERE ###

In [None]:
numCity = 4 # set number of cities
Path = ['A'] # initialize terminal state in the path
Cost = [0] # initialize terminal cost
runningCost = 0 # intiailize terminal running cost
# Create Action Set
ActionSet = []
for ii in range(numCity):
    ActionSet += chr(ord('A')+ii)

# Establish Terminal State City
currentCity = 'A'

# while we have not visited all cities
while len(Path) <= numCity:
    # extract the cost to travel from where we are
    tempCost = np.array(cityCosts[:,ActionSet.index(currentCity)])
    if len(Path) < numCity:
        # if we haven't visited all cities
        for ii in Path:
            # set the cost for any city we have been to to infinite
            tempCost[ActionSet.index(ii)] = float('inf')
    else:
        # after we have visited all cities
        for ii in Path[:-1]:
            # set every city except the last to have infinite cost
            tempCost[ActionSet.index(ii)] = float('inf')
    
    # operate DP function
    newCost, newAction = DeterministicDP(0, tempCost, ActionSet)
    Path.insert(0,newAction) # add action to the START of the path
    Cost.insert(0,newCost) # add the cost
    runningCost += newCost # add to running cost
    currentCity = newAction # update location

print('City Set:', ActionSet)
print('Travel Costs:\n', cityCosts)
print('Final Cost:', runningCost)
print('Cost:', Cost)
print('Path:', Path)

## Part C: Discussion

### Q16
Discuss the issues you have encountered when using the deterministic dynamic programming algorithm.

< Answer Here >

### Q17
Consider the TSP problem and assume that the salesperson needs to do a round-trip (covering all cities starting and ending in City "A" every day). Unfortunately, the travelling cost between cities is random, and the salesperson does not known values in advance. Suggest a procedure that will assist the salesperson to optimize their route choices over time.

< Answer Here >