This is the script used for solving the pick up and delivery probelm (PDP) with reinforcment learning (RL).

-- add more description here, explain the capacity constraint and the hardness of the PDP + a general overview of the RL framework. Maybe also something on RNN.



First, we import useful packages that we will use later on.

In [1]:
import matplotlib.pyplot as plt

import numpy as np 
import pandas as pd
import torch 
import random
import itertools    

%matplotlib inline

Next, we will define some useful parameters

In [2]:
random_seed = 1234 #This is just to keep having the same results, so we can confront better
random.seed(random_seed)
number_of_clients = 10 # number of clients to be visited
max_demand = 0.5  #max (normalized) demand of a client 
min_demand = 0.05 #min (normalized) demand of a client

To start easy, I made the following assumptions.
Some assumptions will be changed later on, some will be justified

* The grid is one by one
* The depot is always in the middle [0.5,0.5]  (This can be changed by feeding the decoder the position of the depot as first input)
* also the capacity is normalized
* distances are euclidian

plus, there are a few things that are not clear to me with respect to the paper we build on. 
* Why don't they compare with best know solutions? 
* How to fit an instance from the literature into something we can use?

PS remind Jacopo to explain Katharina why square root of 2 times the dimension of the grid (which is one) should equal the maximum travel distance of the vehicle. SHort answer, doing so, even if you have to pick up something in [0,0] and deliver it in [1,1], it still is feasible.
plus we could also add normalized time windows but I have to think more about it because it would make sense to translate the travel distances in time distances and then normalize, but then how do you deal with a constraint on the maximum travel distance of a vehicle? Maybe it's more interesting to consider time windows intead that a constraint on the maximum travel distance.

Hereafter we define two functions.
* the first function, generates a new random client
* the second one returns a random instance of the PDP (an istance of PDP is completely defined by its clients given its normalization)

these functions will be useful to generate random testing instances

In [3]:
class ClientClass:
    X_p = None    # cartesian coordinate for the pick up
    Y_p = None
    X_d = None    # cartesian coordinate for the delivery
    Y_d = None
    Demand = None
    VisitedPickUp = None   # Boolean variables to be used later on
    VisitedDelivery = None
    
    def __init__(self, max_demand, min_demand):
        self.X_p = round(random.random(), 2) # the rounding is just to keep 'reasonable' numbers
        self.Y_p = round(random.random(), 2)
        self.X_d = round(random.random(), 2)
        self.Y_d = round(random.random(), 2)
        # just to avoid unreasonably close values
        while round(np.linalg.norm([self.X_p - self.X_d,self.Y_p - self.Y_d],2),2)<0.05:
            self.X_p = round(random.random(), 2) 
            self.Y_p = round(random.random(), 2)
            self.X_d = round(random.random(), 2)
            self.Y_d = round(random.random(), 2)
        self.Demand = round(random.uniform(min_demand, max_demand),2)
        self.VisitedPickUp = False
        self.VisitedDelivery = False
        
def NewInstancePDP(number_of_clients, max_demand, min_demand):
    
    Clients = []
    for i in range(number_of_clients):
        Clients.append(ClientClass(max_demand, min_demand))
    
    return Clients

In the next cell we test a greedy policy which always visits the closest node among:
* the clients who were not visited yet
* the unvisited delivery locations currently in the vehicle

To do so, we create other two classes.
One class (Route) which keeps information about the (partial) route.
The other class (Solution) which collects all the routes belonging to a solution.

In [4]:
class RouteClass:
    
    ClientsVisited = None   # array of ClientClass, states the clients already picked up AND delivered
    Distance = None         # float, states the current distance travlled
    Capacity = None         # float, states the current capacity usage
    PickedUp = None         # array of ClientClass, states the clients already picked up but NOT yet delivered
    Route = None            # array of touple [x_i,y_i] stating the order of the nodes you visit
    LocationsToVisit = None # array of touple [x_i,y_i, d_i] stating the nodes still to be visited and d_i is
                            # the capacity consumed at that node
    
    def __init__(self):
        
        self.ClientsVisited = []
        self.Distance = 0
        self.Capacity = 0
        self.Route = []
        self.PickedUp = []
        self.LocationsToVisit = [[0.5,0.5,0]] # 0.5,0.5 is the depot location; 0 is the (fake) depot demand
        
    def FindGreedyRoute(self, Clients):
        
        self.Route.append([0.5, 0.5]) # this is the position of the depot
        Reachables = self.Mask(Clients)
        
        while len(Reachables)>0:
            
            Reachables = self.GreedyOrdering(Reachables)
            client = Reachables[0]
            if client not in self.PickedUp: # it means it is a pick up
                Next = [client.X_p,client.Y_p]
                self.PickedUp.append(client)
                self.Capacity+= client.Demand
                self.LocationsToVisit.append([client.X_d,client.Y_d, -client.Demand])
                for i in range(len(self.LocationsToVisit)):
                    for j in range(len(self.LocationsToVisit)):
                        if i==j:
                            continue
                        if self.ComputeDistance(self.LocationsToVisit[i],self.LocationsToVisit[j])==0:
                            print(i,j)
                            print(self.LocationsToVisit[i])
                            print(self.LocationsToVisit[j])
                            error
            else:                                 # while this is a delivery
                Next = [client.X_d,client.Y_d]
                self.ClientsVisited.append(client)
                self.PickedUp.remove(client)
                self.LocationsToVisit.remove([client.X_d,client.Y_d, -client.Demand])
                self.Capacity-= client.Demand
            distance = self.ComputeDistance(self.Route[-1],Next)
            self.Route.append(Next)
            self.Distance+= distance
            Reachables = self.Mask(Clients)
        self.Route.append([0.5,0.5]) # final node is again the depot
            
    def GreedyOrdering(self,Reachables):
        
        Aux1 = [[r, self.ComputeDistance(self.Route[-1], [r.X_d,r.Y_d])] for r in Reachables if r in self.PickedUp]
        Aux2 = [[r, self.ComputeDistance(self.Route[-1], [r.X_p,r.Y_p])] for r in Reachables if r not in self.PickedUp]
        Aux = Aux1+Aux2
        Aux.sort(key = lambda x: x[1])
        Aux = [a[0] for a in Aux]
        
        if len(Aux)!=len(Reachables):
            print(len(Aux1))
            print(len(Aux2))
            print('\n')
            print(len(Aux))
            print(len(Reachables))
            error
        
        return Aux
    
    def ComputeDistance(self,A,B):
        
        #round to the second decimal of the second norm of [x_1,y_1]-[x_2,y_2]
        return round(np.linalg.norm(np.array(A)-np.array(B),2),2)
            
    def Mask(self, Clients):
        
        Unvisited = [c for c in Clients if c not in self.ClientsVisited and c not in self.PickedUp]
        ToAdd = []
        for new in Unvisited:
            loc_pu = [new.X_p, new.Y_p, new.Demand]
            loc_de = [new.X_d, new.Y_d, -new.Demand]
            if self.ExistFeasibleRoute(loc_pu, loc_de):
                ToAdd.append(new)
        
        return self.PickedUp + ToAdd
    
    def ExistFeasibleRoute(self, loc_pu, loc_de):
        
        # temporarly remove the depot because it is always the last node
        self.LocationsToVisit.remove([0.5,0.5,0])
        # temporarly adding the pick up and delivery of the next client (not its demand)
        self.LocationsToVisit.append(loc_pu)
        self.LocationsToVisit.append(loc_de)
        # obtaining all the possible permutations
        Permutations = [list(p) for p in list(itertools.permutations(self.LocationsToVisit))]
        
        # adding again the depot to the locations to be visited
        self.LocationsToVisit.append([0.5,0.5,0])
        # removing the new client from the locations to visit
        self.LocationsToVisit.remove(loc_pu)
        self.LocationsToVisit.remove(loc_de)
        # removing the permutations where the pick up of the new client is after its delivery
        Permutations = [p for p in Permutations if self.FindIndex(p,loc_pu)<self.FindIndex(p,loc_de)]
        # adding the depot as last node to be visited and adding the last visited node as first node
        for p in Permutations:
            p.append([0.5,0.5,0])
            p.insert(0, self.Route[-1]+[0])
        
        for permutation in Permutations:
            distance = self.Distance
            capacity = self.Capacity
            for i in range(len(permutation)-1):
                capacity+= permutation[i][2]
                if capacity > 1:
                    Boolean = False
                    break
                A = permutation[i][0:-1]
                B = permutation[i+1][0:-1]
                distance+= self.ComputeDistance([A[0],A[1]],[B[0],B[1]])
                if distance > round(2*np.sqrt(2)):
                    Boolean = False
                    break
                Boolean = True
            if Boolean:
                break
        
        return Boolean
    
    def FindIndex(self, p,loc):
        
        # stupid fucntion because you can't use .index() 
        # since it initialies different objects in itertools.permutations
        
        for index in range(len(p)):
            if self.ComputeDistance(p[index][0:-1],loc[0:-1])==0:
                return index
        print('cannot find ',loc)
        print('within: ',p)
        print('this is an error')
        stop
        
class SolutionClass:
    
    Cost = None
    Routes = None
    Clients = None
    ClientsVisited = None
    mode = None
    NN = None
    
    def __init__(self, Clients, mode, NN):
        self.Clients = Clients
        self.ClientsVisited = []
        self.Routes = []
        self.Cost = 0
        self.mode = mode
        self.NN = NN
        
    def solve(self):
        
        if self.mode == 'greedy':
            return self.solveGreedy()
        else:
            if self.NN==0:
                print('if you\'re not solving it greedily, then you need to assign a neural network to NN')
                print('this is an error')
                error
            yetToCome
    
    def solveGreedy(self):
        
        ClientsToVisit = [c for c in self.Clients if c not in self.ClientsVisited]
        while len(ClientsToVisit)>0:
            newRoute = RouteClass()
            self.Routes.append(newRoute)
            newRoute.FindGreedyRoute(ClientsToVisit)
            self.ClientsVisited = self.ClientsVisited + newRoute.ClientsVisited
            ClientsToVisit = [c for c in self.Clients if c not in self.ClientsVisited]
        self.UpdateCost()
    
    def UpdateCost(self):
        
        self.Cost = sum([r.Distance for r in self.Routes])

In [5]:
def SolvePDP(Clients, mode, NN=0):
    Solution = SolutionClass(Clients, mode, NN=0)
    Solution.solve()
    
    return Solution

In [7]:
Clients = NewInstancePDP(number_of_clients, max_demand, min_demand)
Solution = SolvePDP(Clients, 'greedy')

print('Cost', Solution.Cost)
for r in Solution.Routes:
    print('duration', r.Distance)
        
for r in Solution.Routes:
    for c in r.ClientsVisited:
        print(Clients.index(c))

Cost 6.84
duration 2.46
duration 2.57
duration 1.81
2
4
5
6
9
3
7
1
8
0
