# ICAPS24 SkDecide Tutorial: solving PDDL problems with classical planning, and reinforcement learning solvers

Alexandre Arnold, Guillaume Povéda, Florent Teichteil-Königsbuch

Credits to [IMACS](https://imacs.polytechnique.fr/) and especially to Nolwen Huet

This notebook will show how to solve PDDL problems in scikit-decide via the great [Unified Planning](https://unified-planning.readthedocs.io/en/latest/) framework and its third-party engines from the [AIPlan4EU](https://github.com/aiplan4eu) project. We will also demonstrate how to call scikit-decide solvers from Unified Planning, allowing for solving PDDL problems with simulation-based solvers embedded in scikit-decide.

## Prerequisites

Concerning the python kernel to use for this notebook:
- If running locally, be sure to use an environment with
  - `scikit-decide[all]`,
  - `folium` (graph rendering over Earth maps),
  - `up-skdecide` (bridge between unified-planning and scikit-decide, see below) from github [repo](https://github.com/aiplan4eu/up-skdecide.git).
- If running on colab, the next cell does it for you.
- If running on binder, the environment should be ready.

In [None]:
# On Colab: install the library
on_colab = "google.colab" in str(get_ipython())
if on_colab:
    import glob
    import json
    import os
    import sys

    using_nightly_version = True

    if using_nightly_version:
        # look for nightly build download url
        release_curl_res = !curl -L   -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/airbus/scikit-decide/releases/tags/nightly
        release_dict = json.loads(release_curl_res.s)
        release_download_url = sorted(
            release_dict["assets"], key=lambda d: d["updated_at"]
        )[-1]["browser_download_url"]
        print(release_download_url)

        # download and unzip
        !wget --output-document=release.zip {release_download_url}
        !unzip -o release.zip

        # get proper wheel name according to python version used
        wheel_pythonversion_tag = f"cp{sys.version_info.major}{sys.version_info.minor}"
        wheel_path = glob.glob(
            f"dist/scikit_decide*{wheel_pythonversion_tag}*manylinux*.whl"
        )[0]

        skdecide_pip_spec = f"{wheel_path}[all]"
    else:
        skdecide_pip_spec = "scikit-decide[all]"

    # install scikit-decide with all extras + folium + up-skdecide
    !pip install {skdecide_pip_spec} folium git+https://github.com/aiplan4eu/up-skdecide.git

    # download utility modules (that are in the same repo)
    if not os.path.exists("flight_planning_utils.py"):
        !wget https://raw.githubusercontent.com/airbus/scikit-decide/master/notebooks/icaps24/flight_planning_utils.py

We import the packages that will be used in this notebook.

In [None]:
import datetime
import os
import sys

import unified_planning as up
from openap.extra.aero import cas2mach, ft, kts
from ray.rllib.algorithms.dqn import DQN
from unified_planning.environment import get_environment
from unified_planning.io import PDDLReader
from unified_planning.shortcuts import (
    GE,
    BoolType,
    Fluent,
    InstantaneousAction,
    Int,
    IntType,
    Object,
    OneshotPlanner,
    Problem,
    SimulatedEffect,
    UserType,
)

from skdecide.hub.domain.flight_planning import (
    AircraftState,
    FlightPlanningDomain,
    H_Action,
    PerformanceModelEnum,
    PhaseEnum,
    RatingEnum,
    V_Action,
    WeatherDate,
)
from skdecide.hub.domain.flight_planning.flightplanning_utils import (
    plot_network_adapted,
)
from skdecide.hub.domain.up import UPDomain
from skdecide.hub.solver.iw import IW
from skdecide.hub.solver.ray_rllib import RayRLlib
from skdecide.hub.solver.up import UPSolver
from skdecide.utils import rollout

## Solving PDDL problems via the scikit-decide bridge to Unified Planning solvers

For the purpose of demonstration, we show how to solve a simplistic `blocksworld` instance with 4 blocks. Since we are relying on PDDL engines from Unified Planning (e.g. `fast-downward`, `ENHSP`, `tamer`, etc.), you are free to try more challenging benchmarks!

In [None]:
if not os.path.exists("bw-domain.pddl"):
    !wget https://raw.githubusercontent.com/potassco/pddl-instances/master/ipc-2000/domains/blocks-strips-typed/domain.pddl
    !mv domain.pddl bw-domain.pddl

if not os.path.exists("bw-instance.pddl"):
    !wget https://raw.githubusercontent.com/potassco/pddl-instances/master/ipc-2000/domains/blocks-strips-typed/instances/instance-1.pddl
    !mv instance-1.pddl bw-instance.pddl

reader = PDDLReader()
up_problem = reader.parse_problem("bw-domain.pddl", "bw-instance.pddl")
up_problem.add_quality_metric(up.model.metrics.MinimizeSequentialPlanLength())

We now create a `skdecide.hub.domain.UPDomain` which embeds a Unified Planning [problem](https://unified-planning.readthedocs.io/en/latest/problem_representation.html#).

In [None]:
domain_factory = lambda: UPDomain(up_problem)
domain = domain_factory()

Once the `UPDomain` is created, we can call the `skdecide.hub.solver.UPSolver` which forward the solving process to a Unified Planning engine, then re-casting back the plan into the scikit-decide action format as defined in the `skdecide.hub.domain.UPDomain`.

We are specifically calling here the `fast-downward` [engine](https://github.com/aiplan4eu/up-fast-downward), after what we execute the resulting plan by using `skdecide.utils.rollout()`.

In [None]:
assert UPSolver.check_domain(domain)
with UPSolver(
    domain_factory=domain_factory,
    operation_mode=OneshotPlanner,
    name="fast-downward",
    engine_params={"output_stream": sys.stdout},
) as solver:
    solver.solve()
    rollout(
        domain,
        solver,
        num_episodes=1,
        max_steps=100,
        max_framerate=30,
        outcome_formatter=None,
    )

However, thanks to the unified API of scikit-decide, we can also call scikit-decide's native planners - which do not need to be specifically designed for PDDL problems! - which are compatible with the features of `UPDomain`.

Looking more closely to `UPDomain`'s characteristics, we see that it inherits from `DeterministicPlanningDomain`, which is itself a shortcut for the following features: `Domain`, `SingleAgent`, `Sequential`, `DeterministicTransitions`, `Actions`, `Goals`, `DeterministicInitialized`, `Markovian`, `FullyObservable`, and `PositiveCosts`.

Especially, scikit-decide's implementation of the [Iterated Width](https://dl.acm.org/doi/10.5555/3007337.3007433) planner is compatible with such characteristics. In order to be able to computey Iterated Width's novelty measures, we must provide the state features as vectors. In order to do so, we pass the parameter `state_encoding='vector'` to the `UPDomain` instance's constructor. The state feature vector used by Iterated Width will then just be the state vector itself.

In [None]:
domain_factory = lambda: UPDomain(up_problem, state_encoding="vector")
domain = domain_factory()

with IW(
    domain_factory=domain_factory,
    state_features=lambda d, s: s,
    node_ordering=lambda a_gscore, a_novelty, a_depth, b_gscore, b_novelty, b_depth: a_novelty
    > b_novelty,
) as solver:
    solver.solve()
    rollout(
        domain,
        solver,
        num_episodes=1,
        max_steps=100,
        max_framerate=30,
        outcome_formatter=None,
    )

## Using scikit-decide solvers from Unified Planning

The library [`up-skdecide`](https://github.com/aiplan4eu/up-skdecide) from AIPlan4EU's GitHub project provides a Unified Planning engine which converts a Unified Planning domain into a `skdecide.hub.domain.UPDomain`, then forward the solving process to a compatible scikit-decide's solver. 

In the following, we define a robot moving problem with *simulated action effects* which are typically hard to be handled by PDDL solvers. Scikit-decide solvers like Reinforcement Learning ones or Iterated Width are not specific to PDDL logics, and are thus generally (much) less efficient than PDDL-specific solvers, but they can naturally handle simulated action effects.

In the example below, we simulate the battery discharge of the robot when it is moving, which is usually the result of complex underlying physics simulation that cannot be easily modeled in basic PDDL in real problems.

In [None]:
Location = UserType("Location")
robot_at = up.model.Fluent("robot_at", BoolType(), l=Location)
battery_charge = Fluent("battery_charge", IntType(0, 100))
connected = up.model.Fluent("connected", BoolType(), l_from=Location, l_to=Location)

move = up.model.InstantaneousAction("move", l_from=Location, l_to=Location)
l_from = move.parameter("l_from")
l_to = move.parameter("l_to")
move.add_precondition(connected(l_from, l_to))
move.add_precondition(robot_at(l_from))
move.add_precondition(GE(battery_charge(), 10))
move.add_effect(robot_at(l_from), False)
move.add_effect(robot_at(l_to), True)


def fun(problem, state, actual_params):
    value = state.get_value(battery_charge()).constant_value()
    return [Int(value - 10)]


move.set_simulated_effect(SimulatedEffect([battery_charge()], fun))

problem = up.model.Problem("robot")
problem.add_fluent(robot_at, default_initial_value=False)
problem.add_fluent(connected, default_initial_value=False)
problem.add_action(move)

NLOC = 10
locations = [up.model.Object("l%s" % i, Location) for i in range(NLOC)]
problem.add_objects(locations)

problem.set_initial_value(robot_at(locations[0]), True)
for i in range(NLOC - 1):
    problem.set_initial_value(connected(locations[i], locations[i + 1]), True)
problem.set_initial_value(battery_charge(), 100)

problem.add_goal(robot_at(locations[-1]))

problem.add_quality_metric(up.model.metrics.MinimizeActionCosts({move: 1}))

Now we call scikit-decide's implementation of Iterated Width on this problem, using Unified Planning's engine calling process and standards. We pass the parameters to be given to `skdecide.hub.solver.IW`, especially the state encoding required to compute the novelty measure, in the `config` field of the `params` dictionary of the `OneshotPlanner`.

In [None]:
get_environment().factory.add_engine("skdecide", "up_skdecide.engine", "EngineImpl")

with OneshotPlanner(
    problem_kind=problem.kind,
    name="skdecide",
    params={
        "solver": IW,
        "config": {"state_encoding": "vector", "state_features": lambda d, s: s},
    },
) as planner:
    result = planner.solve(problem)
    print("%s returned: %s" % (planner.name, result.plan))

We show below that solving the same Unified Planning problem with RLlib's DQN algorithm comes to just change one line of code.

<div class="alert alert-block alert-info"><b>Note: </b> Scikit-decide's implementation of `skdecide.hub.solver.RayRLlib` automatically manages action filtering in the deep value and policy networks passed to the underlying RLlib's solver. It means that Unified Planning (PDDL) action preconditions are processed in the background by scikit-decide to automatically provide filtered actions to RLlib's deep networks, which is usually much more efficient than filtering those actions by means of high penalty costs on the infeasible actions. This automatic action filtering is currently only feasible with skdecide.hub.solver.ray_rllib.RayRLlib, not yet with skdecide.hub.solver.stable_baselines.StableBaseline. </div>

In [None]:
with OneshotPlanner(
    problem_kind=problem.kind,
    name="skdecide",
    params={
        "solver": RayRLlib,
        "config": {
            "state_encoding": "vector",
            "action_encoding": "int",
            "algo_class": DQN,
            "train_iterations": 1,
        },
    },
) as planner:
    result = planner.solve(problem)
    print("%s returned: %s" % (planner.name, result.plan))

## Solving a flight planning problem modeled in numeric PDDL

Our final experiment with PDDL planning in scikit-decide consists in solving a simplified planning problem over a waypoint graph and wind drift.

We first install the folium package which brings nice graph rendering over Earth maps.

We then import map plotting and cost computation functions from the flight planning utils script.

In [None]:
from flight_planning_utils import cost, plot_map

Computing the transition cost between 2 waypoints, which represents the flown distance in the air mass, requires to do some trigonometric maths in the Earth spherical coordinate system and its projection on the tangential plane of the aircraft as depicted in the following image:

![Flight planning with wind](./images/flight_planning_with_wind.png)

It begins with the computtion of the coordinates of the direction vector, i.e. the vector linking two successive waypoints, by using [trigonometric formulas](https://en.wikipedia.org/wiki/Local_tangent_plane_coordinates) in the Earth sphere.

We note:
- $\mathbf{W}$ the wind speed vector
- $\mathbf{V}$ the true aircraft speed vector in the air
- $\mathbf{D}$ the direction vector (obtained with the trigonometric formulas above)
- $\mathbf{U}$ the projected speed of the aircraft on the direction vector
- $\mathbf{u}=\frac{\mathbf{U}}{\Vert \mathbf{U} \Vert} = \frac{\mathbf{D}}{\Vert \mathbf{D} \Vert}$ the unitary direction vector

We known $\mathbf{D}$, $\mathbf{W}$ and $\mathbf{\Vert \mathbf{V} \Vert}$, but we don't known $\mathbf{V}$.

We have: $\mathbf{V} = \mathbf{U} - \mathbf{W}$

Thus: $\Vert \mathbf{V} \Vert^2 = \Vert \mathbf{U} \Vert \; \mathbf{u} \cdot \mathbf{V} - \mathbf{W} \cdot \mathbf{V}$

But also: $\mathbf{V} \cdot \mathbf{u} = \Vert \mathbf{U} \Vert - \mathbf{W} \cdot {u}$

As well as: $\mathbf{V} \cdot \mathbf{W} = \Vert \mathbf{U} \Vert \; \mathbf{u} \cdot \mathbf{W} - \Vert \mathbf{W} \Vert^2$

Therefore: $\Vert \mathbf{U} \Vert^2 - 2 \; \mathbf{u} \cdot \mathbf{W} \; \Vert \mathbf{U} \Vert + \Vert \mathbf{W} \Vert^2 - \Vert \mathbf{V} \Vert^2 = 0$

Finally: $\Vert \mathbf{U} \Vert = \mathbf{W} \cdot \mathbf{u} + \sqrt{(\mathbf{W} \cdot \mathbf{u})^2 + \Vert \mathbf{V} \Vert^2 - \Vert \mathbf{W} \Vert^2}$

Now, if we note $t$ the flying time between the 2 successive waypoints, we can compute the flown distance in the air, i.e. in the direction of $\mathbf{V}$ as: $\Vert \mathbf{V} \Vert \times t = \Vert \mathbf{V} \Vert \times \frac{\Vert \mathbf{D} \Vert}{\Vert \mathbf{U} \Vert} = \frac{\Vert \mathbf{V} \Vert}{\Vert \mathbf{U} \Vert} \Vert \mathbf{D} \Vert$

With headwind, the flown distance will be greater than the direct distance. With tailwind, it is the contrary.

This is exactly what the imported `cost` function computes.

We are now ready to model the flight planning numeric problem.
This problem (in this simplified version) is a classical planning problem with floating-point action costs.
We could solve it with the ENHSP planner, which would yet require to install java. For simplicity reasons, we will thus make later on in the problem instance all the floating-point costs rounded to their 3rd digit then scale by 1e3 to make them all integers. Doing so, the problem is now solvable by the `fast-downward-opt` Unified Planning engine. Therefore, we can define the type of the `Cost` fluent to be `IntType`.

In [None]:
problem = Problem("flight_planning")

# Objects
waypoint = UserType("waypoint")

# Fluents
Cost = Fluent("COST", IntType(), l_from=waypoint, l_to=waypoint)
Connected = Fluent("CONNECTED", BoolType(), l_from=waypoint, l_to=waypoint)
at = Fluent("at", BoolType(), w=waypoint)

problem.add_fluent(Cost, default_initial_value=1000000)
problem.add_fluent(Connected, default_initial_value=False)
problem.add_fluent(at, default_initial_value=False)

# Actions
GoTo = InstantaneousAction("goto", fromwp=waypoint, towp=waypoint)
fromwp = GoTo.parameter("fromwp")
towp = GoTo.parameter("towp")
GoTo.add_precondition(Connected(fromwp, towp))
GoTo.add_precondition(at(fromwp))
GoTo.add_effect(at(towp), True)
GoTo.add_effect(at(fromwp), False)

problem.add_action(GoTo)

problem.add_quality_metric(
    up.model.metrics.MinimizeActionCosts({GoTo: Cost(fromwp, towp)})
)

To create the actual flight planning problem instance, we will leverage the `skdecide.hub.domain.flight_planning.FlightPlanningDomain`. This domain is much more realistic - but also ways more complex ! - than our simplified PDDL domain: it uses the aircraft performance model to compute the real fuel consumption of the aircraft based on its speed, altitude and mass at each waypoint in the graph. Even if we won't solve this more realistic domain (we are in a PDDL tutorial notebook!), we will still use its capability to extract the waypoint graph and actual weather of the current date (yes, today's weather data!).

In [None]:
origin = "LFPG"
destination = "LFBO"
aircraft = "A320"
today = datetime.date.today()
month = today.month // 4 * 4 + 1  # will result in january, may, or september
year = today.year
day = 1
weather_date = WeatherDate(day=day, month=month, year=year)
heuristic = "lazy_fuel"
cost_function = "fuel"
aircraft_state = AircraftState(
    model_type="A320",  # only for OPENAP and POLL_SCHUMANN
    performance_model_type=PerformanceModelEnum.POLL_SCHUMANN,  # PerformanceModelEnum.OPENAP
    gw_kg=80_000,
    zp_ft=10_000,
    mach=cas2mach(250 * kts, h=10_000 * ft),
    phase=PhaseEnum.CLIMB,
    rating_level=RatingEnum.MCL,
    cg=0.3,
)

realistic_fp_domain = FlightPlanningDomain(
    origin=origin,
    destination=destination,
    aircraft_state=aircraft_state,
    weather_date=weather_date,
    heuristic_name=heuristic,
    objective=cost_function,
    fuel_loop=False,
    graph_width="large",
    nb_lateral_points=6,
    nb_forward_points=10,
    nb_climb_descent_steps=4,
)

Les us have a look at the generated waypoints graph.

In [None]:
plot_network_adapted(
    graph=realistic_fp_domain.network,
    p0=realistic_fp_domain.origin,
    p1=realistic_fp_domain.destination,
)

In [None]:
G = realistic_fp_domain.network

# actual starting point
origin_node = [(x, y, z) for (x, y, z) in G.nodes if x == 0][0]
# can choose one of the 5 initial headings: 5 starting nodes for pddl domain <-> 5 chidren of starting point
start_nodes = [(x, y, z) for (x, y, z) in G.nodes if x == 1]
end_nodes = [
    (x, y, z) for (x, y, z) in G.nodes if x == realistic_fp_domain.nb_forward_points - 1
]
destination_node = [
    (x, y, z) for (x, y, z) in G.nodes if x == realistic_fp_domain.nb_forward_points
][0]

locations = {str(node): Object(str(node), waypoint) for node in G.nodes}
problem.add_objects(locations.values())


problem.set_initial_value(at(locations[str(origin_node)]), True)
problem.add_goal(at(locations[str(destination_node)]))

for (f, t) in G.edges:
    problem.set_initial_value(Connected(locations[str(f)], locations[str(t)]), True)
    c = cost(realistic_fp_domain, f, t)
    problem.set_initial_value(
        Cost(locations[str(f)], locations[str(t)]), int(round(c, ndigits=3) * 1e3)
    )

We can now solve the flight planning problem by defining the `UPDomain` embedding our flight planning Unified Planning problem, and calling the `fast-downward-opt` engine from the `UPSolver`.

In [None]:
domain_factory = lambda: UPDomain(problem)
with UPSolver(
    domain_factory=domain_factory,
    operation_mode=OneshotPlanner,
    name="fast-downward-opt",
    engine_params={"output_stream": sys.stdout},
) as solver:
    print("Solving the problem...")
    solver.solve()
    print("Extracting plan...")
    plan = solver.get_plan()

In [None]:
froms = []
tos = []
actions = []


for ai in plan:
    # from->to
    fr = eval(str(ai.up_parameters[0]))
    to = eval(str(ai.up_parameters[1]))
    # horizontal action
    y_diff = to[1] - fr[1]
    if y_diff == 0:
        a1 = H_Action.straight
    elif y_diff < 0:
        a1 = H_Action.left
    else:
        a1 = H_Action.right
    # vertical action
    z_diff = to[2] - fr[2]
    if z_diff == 0:
        a2 = V_Action.cruise
    elif z_diff < 0:
        a2 = V_Action.descent
    else:
        a2 = V_Action.climb
    # store
    froms.append(fr)
    tos.append(to)
    actions.append((a1, a2))

path = froms + [tos[-1]]
plot_map(path, G, realistic_fp_domain)

Finally, if we want to know the real fuel consumption of the plan found by FastDownward, we just have to execute the resulting plan in the realistic `skdecide.hub.domain.flight_planning.FlightPlanningDomain` provided with scikit-decide.

In [None]:
consumed_fuel = 0
realistic_fp_domain.reset()
for ai in actions:
    print(ai)
    outcome = realistic_fp_domain.step(ai)
    consumed_fuel += outcome.value.cost
    print(outcome.value.cost)
print(f"Consumed fuel: {consumed_fuel}")

Please note that more realistic flight planning plans are rather found by running `skdecide.hub.solver.astar.Astar` solver on the `skdecide.hub.domain.flight_planning.FlightPlanningDomain`, also using advanced domain decoupling strategies and custom heuristic estimates.