An algorithm hibridized by Q-learning and Ant Colony Optimisation to solve the SHHCSRP

The project is my first attempt to build a tabular Q-learning agent to solve a complicated and large scale combinatorial optimisation problem. The problem, Stochastic Home Health Care Scheduling and Routing Problem, is a variant of the well-known Stochastic Vehicle Routing Problem (SVRP). Basically, the task of such kind of problems is to assign a group of demands required by many customers to a group of vehicles with optimised cost. This project aims to train an agent that makes its own decisions and generate a routing plan.

This repo is academic-oriented.

Brief Introduction

The SVRP has been studied for many decades and now there is a substantial amount of papers with various applications, theories, mathematical models, objectives and constraints. As for this project, I intended to use the Q-learning technique and see if this popular RL algorithm can obtain satisfactory performance on large scale stochastic combinatorial problem.

Since RL algorithm requires a MDPs, the SVRP in this project was hierarchically modeled by the MDPs as the exterior level, and the CCP model as the interior level such that the CCP model represents the criteria of a single vehicle routing process, and the MDPs represents a process during which a vehicle will be chosen as an input to the CCP model at each state. Before it reaches the absorbing state which is either out of vehicle or out of demand, every single episode of the MDPs contains three steps:

observe the current state which is practically some information about the available targets to visit.
take an action, that is, choose a vehicle.
observe the reward according to the solution given by the algorithm that solves the CCP model.

The objective of the model is to maximize the fulfilled demands with least waiting cost. The QL agent was expected to learn to make a plan that satisfies the objective as much as possible.

But till now the algorithm does yet perform very well on the instance, which is probably due to the following problems:

The state definition of the MDPs is not correct enough that it couldn't fully represent the environment.
The reward function of the MDPs is not appropriate enough that it couldn't lead the agent towords sensible policy.
The ACO algorithm that solves the CCP model is essentially based on random evolution, which commonly produces different sub-solution at each state of the MDPs, thus makes it more difficult for the QL agent to learn the right behavious.
Unsupervised algorithms (like CNN, DQN, QL, etc.) need a huge amount of training data to learn sensible results.

However, this project was one part of my last research subject that is now finished, and my first motivation was the curiosity of how well the Reinforcement Learning techniques could preform on large scale combinatorial optimisation problem. The reason I put it on Github is to make it as a reference that might be useful for someone who wants to make further investigation on this direction or take it as a practice of implementing Q-learning on combinatorial optimisation problem. Here are some suggestions that might help this project to make further progress:

Better shape the reward function.
Since the state difiniton here is highly abstracted from the information of resource and demand, which turned out to be insufficient for the QL agent to learn more important knowledge from the environment, one may consider figuring out a better approximation or abstraction of the environment.

In spite of what's mentioned above, there are plenty of directions that this project could move towards. Besides, I hope you can learn something useful from this repo :) For detailed information please check out the Wiki pages: Wiki

Files explanation

The 2 MS Visio files (.vsdx) are flow charts of the BWACO and QL algorithms.
The 4 MS Excel files (.xlsx) are instances data.
The 2 CSV files (.csv) are initial preference settings for relevant instances.
QL_BWACO.py is the Python code for algorithms
main.py is the main py file to run.

Run

Setup expermental instance and nurse resource which are at the beginning in the "main".
Run 'main.py'.
After training, find new files in the root folder: things like solution with extensions like .png and .xls. They are results.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.gitignore		.gitignore
BWACO flow chart.vsdx		BWACO flow chart.vsdx
Elders_A.xlsx		Elders_A.xlsx
Elders_B.xlsx		Elders_B.xlsx
Elders_C.xlsx		Elders_C.xlsx
Elders_D.xlsx		Elders_D.xlsx
Plot.py		Plot.py
QL flow chart.vsdx		QL flow chart.vsdx
QL_BWACO.py		QL_BWACO.py
README.md		README.md
initial_preference_AB.csv		initial_preference_AB.csv
initial_preference_C1.csv		initial_preference_C1.csv
initial_preference_C2.csv		initial_preference_C2.csv
initial_preference_C3.csv		initial_preference_C3.csv
initial_preference_C4.csv		initial_preference_C4.csv
initial_preference_D1.csv		initial_preference_D1.csv
initial_preference_D2.csv		initial_preference_D2.csv
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

An algorithm hibridized by Q-learning and Ant Colony Optimisation to solve the SHHCSRP

Brief Introduction

Files explanation

Run

About

Uh oh!

Releases

Packages

Languages

IanYangChina/SHHCSRP

Folders and files

Latest commit

History

Repository files navigation

An algorithm hibridized by Q-learning and Ant Colony Optimisation to solve the SHHCSRP

Brief Introduction

Files explanation

Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages