classification

- static
- determenistic
- fully observable
- discrete
- single agent
- general

# Automated planning

Is a way of finding plans (sequences of actions) that lead from an initial state to a goal state.
Here we look at classic planning
- General approach to finding solutions for state space search problems
- Classic = static, deterministic, fully observable
- Variants: Probalistic planning, planning under partial observability, online planning, etc.

Given
- A state space description in terms of suitable problem description language (planning formalism)

Required
- A plan, i.e. a solution for the described state space (sequence of actions from initial state to goal state)
- Or proof that no plan exists

Distinguish between
- Optimal planning: Guarantee that the returned plans are optimal, i.e., have minimal overall cost
- Suboptimal planning (satisficing): Suboptimal plans are allowed

# Four planning

A description language for state spaces (planning tasks) is called a planning formalism.

# STRIPS

- It is the most simple common planning formalism.
- State variables are binary (true or false)
- States $s$ (based on a given set of state variables $V$) can be represented in two equivalent ways
    - As assignments $s: V \rightarrow \{\mathbf{T}, \mathbf{F}\}$
    - As sets $s \subseteq V$ where $s$ encodes the set of state variables that are true in $s$.
- Goals and preconditions of actions are given as sets of variables that must be true (values of other variables do not matter)
- Effects of actions are given as sets of variables that are set to true and set to false, respectively

## Definition

A STRIPS planning task is a 4 tuple $\prod = \langle V, I, G, A \rangle$ with
- $V$: Finite set of state variables
- $I \subseteq V$: The initial state
- $G \subseteq V$: The set of goals
- $A$: Finite set of actions where for all actions $a \in A$, the following is defined:
    - $pre(a) \subseteq V$: The preconditions of $a$
    - $add(a) \subseteq V$: The add effects of $a$
    - $del(a) \subseteq V$: The delete effects of $a$
    - $cost(a) \in \mathbb{N}_0$: The cost of $a$

Given thr STRIPS planning task. Then $\prod$ induces the state space $\mathcal{S}(\prod) = \langle S, A, cost, T, s_I, S_G \rangle$:
- Set of states $S = 2^V$
- Actions: actions $A$ as defined in $\prod$
- Action costs: $cost$ as defined in $\prod$
- Transitions: $s \xrightarrow[]{a} s'$ for states $s,s' \in S$ and action $a \in A$ iff
    - $pre(a) \subseteq s$ (precondition satisfied)
    - $s' = (s \textbackslash del(a)) \cup add(a)$ (effects are applied)
- Initial state: $s_I = I$
- Goal state: $s \in S_G$ for state $s$ iff $G \subseteq s$ (goals reached)

# ADL

Like STRIPS, ADL uses propositional variables (true / false) as state variables. Preconditions of actions and goal are arbitrary logic formulas (action applicable / goal reached in states that satisfy the formula). In Addition to STRIPS effects, there are conditional effects: variable $v$ is only set to true / false if a given logical formula is true in the current state.

# SAS+

Very similair to STRIPS: state variables not necessarily binary, but with given finite domain. States are assignments to these variables. Preconditions and goals given as partial assignments. Effects are assignments to subset of variables.

# PDDL

Is standard language used in practice to describe planning tasks. Descriptions in predicate logic instead of propositional logic. Other features lke numeric variables and derived variables (axioms) for defining complex logical conditions (formulas that are automatically evaluated in every state and can, e.g. be used in preconditions). There exist defined PDDL fragments for STRIPS and ADL: many planners only support the STRIPS fragment.

# Plannning heuristics

A STRIPS heuristic can be for examples the number of goals not yet satisfied

$$
h(s) = |G \textbackslash s|
$$

The drawbacks of STRIPS heuristics is that they are rather uninformed. For state $s$, if there is no applicable action $a$ in $s$ such that applying $a$ in $s$ satisfies strictly more (or fewer) goals, then all successor states have the same heuristic value as $s$. Ignores almost the whole task structure. The heuristic values do not depends on the actions.

## Delete Relaxation

In STRIPS planning tasks, good and bad effects are easy to distinguish. Add effects are useful and delete effects are always harmful.

The relaxation $a^+$ of STRIPS action $a$ is the action with
- $pre(a^+) = pre(a)$
- $add(a^+) = add(a)$
- $cost(a^+) = cost(a)$
- $del(a^+) = \emptyset$

The relaxation $\prod^+$ of STRIPS plannig task $\prod = \langle V, I, G, A \rangle$ is the task $\prod^+ = \langle V, I, G, \{a^+  | a \in A\} \rangle$

STRIPS planning tasks without delete effects are called relaxed planning tasks or delete-free planning tasks. Plans for relaxed planning tasks are called relaxed plans. If $\prod$ is a STRIPS planning task and $\pi^+$ is a plan for $\prod^+$ then $\pi^+$ is called relaxed plan for $\prod$.

## Optimal relaxation heuristic $h^+$

Let $\prod$ be a STRIPS planning task with the relaxation $\prod^+ = \langle V, I, G, A^+ \rangle$. The optimal relaxation heuristic $h^+$ for $\prod$ maps each state $s$ to the cost of an optimal plan for the planning task $\langle V, s, G, A^+ \rangle$.

For general STRIPS planning tasks, $h^+$ is an admissible and consistent heuristic. It is easy to solve delete-free planning tasks suboptimally. Optimal solution is NP-hard.

# Relaxed planning graphs

Relaxed planning graphs represent which variables in $\prod^+$ can be reached and how. Graphs with variable layers $V^i$ and action layers $A^i$
- Variable layer $V^0$ contains the variable vertex $v^0$ for all $v \in I$.
- Action layer $A^{i+1}$ contains the action vertex $a^{i+1}$ for action $a$ if $V^i$ contains the vertex $v^i$ for all $v \in pre(a)$.
- Variable layer $V^{i+1}$ contains the variable vertex $v^{i+1}$ if previous variable layer contains $v^i$ or previous action layer contains $a^{i+1}$ with $v \in add(a)$
- A goal vertex $g$ if $v^n \in V^n$ for all $v \in G$, where $n$ is the last layer
- Graph can be constructed for arbitrary many layers but stabilizes after a bounded number of layers $\rightarrow V^{i+1} = V^i$ and $A^{i+1} = A^i$
- Directed edges
    - from $v^i$ to $a^{i+1}$ if $v \in pre(a)$
    - from $a^i$ to $v^{i}$ if $v \in add(a)$
    - from $v^i$ to $v^{i+1}$
    - from $v^n$ to $g$ if $v \in G$

## Heuristic Values from relaxed planning graph

- function generic-rpg-heuristic($\langle V, I, G, A\rangle, s$):
    - $\prod^+ = \langle V, s, G, A^+ \rangle$
    - for $k \in \{0,1,2,...\}$:
        - rpg = $RPG_k(\prod^+)$
        - if rpg contains a goal node
            - Annotate nodes of rpg
            - if termination criterion is true:
                - return heuristic value from annotations
        - else if graph is stabilized
            - return $\infty$

# Maximum and Additive heuristics

$h^{max}$ and $h^{add}$ are the simplest RPG heuristics. Vertex annotatoins are numerical values. The vertex values estimate the cost
- to make a given variable true
- to reach and apply a given action
- to reach the goal

**cost of variable vertices**
- 0 in layer 0
- otherwise minimum of the costs of predecessor vertices

**costs of action and goal vertices**
- maximum ($h^{max}$) or sum ($h^{add}$) of predecessor vertex costs, for action vertices $a^i$, also add cost(a).

**termination criterion**
- Stability: terminate if $V^i = V^{i-1}$ and costs of all vertices in $V^i$ equal corresponding vertex costs in $V^{i-1}$

**heuristic value**
- value of goal vertex

**variable vertices**
- choose cheapest way of reaching the variable

**action/goal vertices**
- $h^{max}$ is optimistic: assumption: when reaching the most expensive precondition variable, we can reach the other precondition variables in parallel.
- $h^{add}$ is pessimistic: assumption: all precondition variables must be reached completely independently of each other (hence summation costs).

comparison
- both are safe and goal-aware
- $h^{max}$ is admissible and consistent; $h^{add}$ is neither $\rightarrow h^{add}$ is not suited for optimal planning
- However, $h^{add}$ is usually more informative than $h^{max}$. Greedy best-first-search with $h^{add}$ is a decent algorithm
- Apart from not being admissible, $h^{add}$ often vastly overetimates the actual costs because positive synergies between subgoals are not recognized.

# FF Heuristic

Identical to $h^{add}$, but additional steps at the end:
- Mark goal vertex
- Apply the following marking rules until nothing more to do:
    - Marked action or goal vertex?
        - mark all predecessors
    -  Marked variable vertex $v^i$ in layer $i \geq 1$?
        - mark one predecessor with minimal $h^{add}$ value (tie breaking: prefer variable vertices; otherwise arbitrary)

**Heuristic value**
- The actions corresponding to the marked action vertices build a relaxed plan
- The cost of this plan is the heuristic value.

- Like $h^{add}$, $h^{FF}$ is safe and goal-aware.
- Approximation of $h^+$ which is always at least as good as $h^{add}$
- Usually significantly better
- Can be computed in almost linear time $(O(n \log n))$ in the size of the description of the planning task
- Computation of heuristic value depends on tie-breaking of marking rules ($h^{FF} not well defined$)
- one of the most successful planning heuristics

Let $s$ be a state in the STRIPS planning task $\langle V, I, G, A \rangle$.

Then
- $h^{max}(s) \leq h^+(s) \leq h^*(s)$
- $h^{max}(s) \leq h^+(s) \leq h^{FF}(s) \leq h^{add}(s)$
- $h^*$ and $h^{FF}$ are incomparable
- $h^*$ and $h^{add}$ are incomparable

For non-admissible heuristics, it is generally neither good nor bad to compute higher values than other heuristic.
For relaxation heuristics, the objective is to approximate $h^+$ as closely as possible.

# SAS$^+$

The difference between STRIPS and SAS$^+$ is that the state variables $v$ are not binary, but with finite domain $dom(v)$. Accordingly, preconditions, effects and goals are specified as partial assignments. Everything else is equal to STRIPS.

A SAS$^+$ planning task is a 5-tuple $\prod = \langle V, dom, I, G, A \rangle$ with the following components
- $V$: finite set of state variables
- $dom$: domain; $dom(v)$ finite and non-empty for all $v \in V$. states: Total assignments for $V$ according to $dom$.
- $I$: The initial state (state = total assignment)
- $G$: Goals (partial assignment)
- $A$: finite set of actions $a$ with
    - $pre(a)$: Its preconditions (partial assignment)
    - $eff(a)$: Its effects (partial assignment)
    - $cost(a) \in \mathbb{N}_0$: Its cost


Let $\prod = \langle V, dom, I, G, A \rangle$ be a SAS$^+$ planning task. Then $\prod$ induces the state space $\mathcal{S}(\prod) = \langle S, A, cost, T, s_I, S_G \rangle$:
- Set of states: total assignments of $V$ according to $dom$
- Actions: actions $A$ defined in $\prod$
- Action costs: cost as defined in $\prod$
- Transitions: $s \xrightarrow[]{a} s'$ for states $s,s'$ and action $a$ iff
    - $pre(a)$ agrees with $s$ (precondition satisfied)
    - $s'$ agrees with $eff(a)$ for all variables mentioned in $eff$, agrees with $s$ for all other variables (effects are applied)
- initial state: $s_I = I$
- goal state: $s \in S_G$ for state $s$ iff $G$ agrees with $s$

# Abstraction

State space abstraction drop distinctions between certain states, but preserve the state space behaviour as well as possible.
- An abstraction of a state space $\mathcal{S}$ is defined by an abstraction function $\alpha$ that determines which states can be distinguished in the abstraction.
- Based on $\mathcal{S}$ and $\alpha$, we compute the abstract state space $\mathcal{S}^{alpha}$ which is "similair" to $\mathcal{S}$ but smaller.
- Use optimal solution cost in $\mathcal{S}^{\alpha}$ as heuristic.

Let $\mathcal{S} = \langle S, A, cost, T, s_I, S_G \rangle$ be a state space, and let $\alpha:S \rightarrow S'$ be a surjective function. The abstraction of $\mathcal{S}$ induced by $\alpha$, denoted as $\mathcal{S}^{\alpha}$, is the state space $\mathcal{S}^{\alpha} = \langle S', A, cost, T', s_I', S_G' \rangle$ with
- $T' = \{\langle \alpha(s), a, \alpha(t) \rangle | \langle  s, a, t \rangle \in T \}$
- $s_I' = \alpha(s_I)$
- $S_G' = \{\alpha(s) | s \in S_G\}$

# Abstraction Heuristic

Given an abstraction function $\alpha$ for a state space $\mathcal{S}$, use abstract solution cost (solution cost of $\alpha(s)$ in $\mathcal{S}^{\alpha}$) as heuristic for concrete solution cost (solution cost of $s$ in $\mathcal{S}$).

The abstract heuristic for abstraction $\alpha$ maps each state $s$ to its abstract solution cost

$$
h^{\alpha}(s) = h^*_{\mathcal{S}^{\alpha}}(\alpha(s))
$$

where $h^*_{\mathcal{S}^{\alpha}}$ is the perfect heuristic in $\mathcal{S}^{\alpha}$.

- Every abstraction heuristic is admissible and consistent.
- The choice of the abstraction function $\alpha$ is very important.
    - Every $\alpha$ yields an admissible and consistent heuristic.
    - But most $\alpha$ lead to poor heuristics
- An effective $\alpha$ must yield an informative heuristic as well as being efficiently computable

# Pattern Databases

The most common abstraction heuristics are pattern database heuristics.

A PDB heuristic for a planning task is an abstraction heuristic where
- some aspects (= state variables) of the task are preserved with perfect precision while
- all other aspects are not preserved at all.

formalized as projections to a pattern $P \subseteq V$:

$$
\pi_P(s) = \{v \mapsto s(v) | v \in P\}
$$

Let $P$ be a subset of the variables of a planning task. The abstraction heuristic induced by the projection $\pi_P$ on $P$ is called pattern database heuristic (PDB heuristic) with pattern $P$.