Skip to content

asai-research-presentation/2017-11-16-ibm

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
css
 
 
img
 
 
js
 
 
 
 
 
 
sty
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.org

Note to Basel people: this presentation is aimed at the general audience who may not be familiar or even haven’t heard about AI planning.

Introduction

Deep Learning, RL, data mining, etc…

  • Do you identify yourself as a connectionist?

Introduction

SAT, CP, Logic, TCS, …

  • Have you tried any DL libraries?

Introduction

sgif:icapslogo

International Conference on Automated Planning and Scheduling

(AAAI sister conference, 33% avg. accept ratio, since 1990, ECP+AIPS→ICAPS)

We are working on /Automated Planners/

 

High-level plan for agents

plan = (action_1, action_2, … action_n)

Initial state → 🚶🚗🔨 → Goal state

 

1st-Order Logic

STRIPS/PDDL

modelling language

Close-world assumption

Combinatorial Explosion

High-dimensional

implicit state space

PSPACE-Hard

Expensive, Critical Applications

Satellite Operation

Deep Space 1

Mars Exploration Rover

 

  • Soundness, Completeness, Optimality

    of the algorithms are important

    (compared to “planning” in other AI fields)

Introduction

Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA_2 Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

Overview

/Background/

Latplan Architecture

State AutoEncoder (SAE)

break

AMA_2 Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

Sliding tile puzzle (a.k.a 8-puzzle)

States

States as 1st-order logic formula, described in PDDL language.

png:8puzzle-standard Initial State

png:8puzzle-standard-goal Goal State

sgif:8puzzle

Transitions / Actions

Symbolic planners cannot solve an /Image-based/ 8-puzzle

  • PDDL for 8-puzzle can be solved optimally << 1sec

Knowledge-Acquisition Bottleneck (Cullen, 1988):

The cost of human involved for converting real-world problems into inputs for domain-independent symbolic systems

 

When Empty(x, yold) ∧ … ;

Then ¬Empty(x,yold) ∧

  • A Long-Standing Problem
  • Now we propose a system which automate these processes…

Latent-Space Planner (/Latplan/)

png:latplanlogo

(embedding the lambda calculus inside a neural architecture)

Survey of Exisiting Action Model Acquisition Techniques

 

Limitations of Existing Systems

So far, ALL existing AMA systems require /symbolic inputs/

ARMS (Yang AIJ07) LOCM (ICAPS09) Argall (AIJ09) Mourao (UAI12)

All taking the symbolic inputs to find the action models

spng:locm

Framer (ICAPS17)

Near-Symbols : Parses natural language sentences with a clear grammatical structure.

Konidaris, Kaelbring (AAAI14, IJCAI15)

“Constructing Symbolic Representations for High-Level Planning” (AAAI14)

What it does
Converting a Semi-MDP Model to a PDDL Model by set-theoretic representation

i.e. Model-to-Model conversion, not generating a model from the scratch

Semi-MDP contains Action Symbols
move and interact (Playroom)
Sensor inputs are structured (Labels for “State Variable” are known)

x/y-distance, light level, whether a monkey cries

→ Each sensor has a distinct meaning (no overwrap)

Learning from Video for Board Game (Bardu ICRA10; Kaiser AAAI12; Kirk 16)

Handles Images, but with strong assumptions (almost symbol) e.g.

Tic-Tac-Toe with Ellipse Detectors (Bardu 10)

→ Almost immediately provides propositions

→ Also, Domain-dependent (“3x3 grid” “Ellipse” are hard-coded)

Problem Setting

Problem Setting

of

Latent-Space Planner

Task: Solving an Imaged-Based 8-puzzle w/o Prior Explicit Knowledge

No Prior Knowledge : labels/symbols such as “9 tiles”, “moving”

sjpg:puzzle

Task: Solving an Imaged-Based 8-puzzle /w/o Prior Explicit Knowledge/

/No Prior Knowledge/ : labels/symbols such as “9 tiles”, “moving”

sjpg:puzzle

Task: Solving /ANY/ Imaged-Based tasks w/o Prior Explicit Knowledge

/No Prior Knowledge/ : /Domain-independent Image-based planner/

Inputs

Input1: Training Input – Image Pairs

png:overview/1

Input1: Training Input – Image Pairs

They are randomly sampled image transitions

  • No state descriptions (unlike AlphaGo (Silver et al. ‘16))
  • No expert traces (unlike AlphaGo)
  • No rewards (unlike DRL systems)
  • No access to the simulator (unlike DQN (Mnih et al. ‘15) for Atari)

    → cannot ask for more examples

Input2: Planning Input – Initial Image & Goal Image

png:overview/input2

Task: Solving /ANY/ Imaged-Based tasks w/o Prior Explicit Knowledge

Task: Solving /ANY/ Imaged-Based tasks w/o Prior Explicit Knowledge

Overview

Background

/Latplan Architecture/

State AutoEncoder (SAE)

break

AMA_2 Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

Latent-Space Planner (/LatPlan/) architechture

png:overview/planning1

Step 1: Propositional Symbol Grounding

png:overview/planning2

Step 1: State Autoencoder

png:train-state-ae

Trained SAE provides two functions:

  • $b = Encode(r)$ maps a raw datum $r\;$ to a bit vector $b\;$
  • $˜{r} = Decode(b)$ maps a bit vector $b\;$ to a raw datum $˜{r}$

Step 2: Action Model Acquisition (AMA)

png:overview/planning3

Step 3: Solve the Symbolic Planning Problem

png:overview/planning4

Step 4: Executing the Symbolic Plan

png:overview/planning5

Step 5: Obtaining the Visualization

png:overview/planning6

Summary

Overview

Background

Latplan Architecture

/State AutoEncoder (SAE)/

break

AMA_2 Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

State AutoEncoder (SAE)

SAE is a neural network which provides two functions:

  • $b = Encode(r)$ : maps a raw datum $r\;$ to a bit vector $b\;$
  • $˜{r} = Decode(b)$ : maps a bit vector $b\;$ to a raw datum $˜{r}$

Neural Network 101

png:deeplearning/1

Neural Network 101

png:deeplearning/2

Neural Network 101

png:deeplearning/3

Stochastic Gradient Descent + GPU

spng:gradient-descent

Plus misc techniques e.g. Batchnorm, Dropout

NN for Standard Classification Tasks (Supervised)

TaskInput xOutput y
Image classificationImageLabel (1=car, 2=cat, 3=monkey …)

Unsupervised Learning for NN: AutoEncoder (AE)

Variational AutoEncoder (VAE, Kingma 2014)

An AutoEncoder that enforces a certain distribution on $Z ⊂ \mathbb{R}^n$ over the dataset $X$

You have $X=$ { 10k images of apples }. If you train a Gaussian VAE on $X$, then $Z = Encode(X) ≈ N(μ,σ)$ for some $μ,σ ∈ \mathbb{R}^n$.

VAE needs a reparametrization trick because random distributions are non-differentiable.

Reparametrization for $N(μ,σ)$: $μ + σ N(0,1)$

$μ$ and $σ$ are differentiable (trainable) vectors, $N(0,1)$ is not.

Gumbel-Softmax VAE (Jang, Gu, ICLR2017)

Additional optimization penalty which enforces $Z ∼ \textbf{Categorical}$:

   → $z\;$ converges to a 1-hot vector e.g. $\langle 0,0,1,0 \rangle$ .

Example: Represent an MNIST image with 30 variables of 8 categories.

256
ReLU
[Not supported by viewer]
512
ReLU
[Not supported by viewer]
512
ReLU
[Not supported by viewer]
256
ReLU
[Not supported by viewer]
  • Key idea: These categorical variables are /directly/ usable

    as the source of propositional models

    In particular, 2 categories → propositional variables (true/false)

State Autoencoder (/before training/)

png:sae/state-ae-before

State Autoencoder (/after training/)

png:sae/state-ae

Gumbel-Softmax: Differential Approximation of Gumbel-Max

Verifying the feasibility of Latplan system and SAE

Q: Does the SAE (neural network) produce

/sound propositions/ for /reasoning?/

AMA_1 : a /trivial, oracular/ AMA method w/o generalization.

  • Encode the entire raw state transitions in the environment
  • For each transition, perform the following conversion:
     00110101  ;; encoded bit vectors of a transition
    
       ↓       ;; one action per transition
    (:action       action-0011-0101
                   ;; before-state is embedded directly
     :precondition (and (b0-false) (b1-false) (b2-true) (b3-true))
                   ;; effect = state diff
     :effect       (and (not (b1-false)) (b1-true)
                        (not (b2-true))  (b2-false)))
        

Step 3: Solve the Symbolic Planning Problem

png:overview/planning4

Step 4: Executing the Symbolic Plan

png:overview/planning5

Step 5: Obtaining the Visualization

png:overview/planning6

AMA_1 Experiments

SAE: trained with 20k images (note: > 360k entire states in 8-puzzle)

  • /SAE is generalizing/

AMA_1: requires the entire transistions (note: > 1M transitions in 8-puzzle)

  • /AMA_1 is NOT generalizing, being an oracle/

Planner: a State-of-the-Art, Fast Downward (Helmert, 08)

  • $A^*$ (optimal search) : It must find an optimal solution
  • Runtime: ~3sec (instances are too small for symbolic systems)

8-puzzle Results with MNIST tiles (MNIST 8-puzzle)

8-puzzle using digits from MNIST database

png:results/mnist-plan

Results with photographic, unseparated tiles (Mandrill 8-puzzle)

MNIST 8-puzzle has cleanly separated objects -> This domain does not.

png:results/mandrill-intro

Results with photographic, unseparated tiles (Mandrill 8-puzzle)

png:results/mandrill-plan

Results with photographic, unseparated tiles (Spider 8-puzzle)

png:results/spider-plan-new

Notice that latplan has no idea that this is an 8-puzzle; Thus MNIST, Mandrill, Spider are entirely different domains (state encoding is also different)

Tower of Hanoi (3 disks, 4 disks)

Completely different puzzle problems can be solved by the same system

png:results/hanoi3

png:results/hanoi4

Lights Out

Does not assume “objects” are static (objects may disappear)

png:results/lightsout_new4x4

Twisted Lights Out

Does not assume grid-like structures

png:results/lightsout_twisted_new4x4

Handling the Noisy Input

SAE implementation uses a denoising layer (Denoising AE, Vincent 08)

png:results/noise-new

Why bother the off-the-shelf planner? Shouldn’t the blind search do?

Domain-independent lowerbounds work, /SURPRISINGLY!/

Why Gumbel-Softmax is necessary?

  • Alternative 1: Use a normal NN and round the encoder output?
    • ✘ The decoder is not trained with 0/1 value, cannot obtain the meaningful output
  • Alternative 2: Include the round operation in a NN?
    • ✘ Rounding is non-differentiable / Backpropagation impossible

State Autoencoder Conclusion

SAE can learn from small examples

20k training images → learn to map 360k unseen images

SAE-based propositions are sound

Given the oracular AMA_1, planners can reason over the propositions

Latplan maintains the theoretical guarantee in the search algorithm

Given the complete state space graph,

  • Optimising algorithm (A*) returns an optimal solution
  • Completeness is guaranteed

*** Break ***

Background

Latplan Architecture

State AutoEncoder (SAE)

/ Break /

AMA_2 Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

/AMA_2 Overview/

Action AutoEncoder (AAE)

Action Discriminator (AD)

SAE Feasible! Now what?

The Task of AMA_2 : The /Real/ AMA method

Input: Propositional transitions $\{ (s,t) \ldots \}$

  • Action Symbol Grounding
    (:action slide-up-tile7-at-x0-y1 ...
        
  • Action Preconditon Learning
    :precondition (and (empty x0 y0) ...)
        
  • Action Effect Learning
    :effects      (and (empty x0 y1) (not ...)))
        

Action Symbol Grounding

Identifing a type of transition (clustering)

Action Precondition Learning

Learns when that transition is allowed

Action Effect Learning

Learns what happens after the transition

Why /action symbols/ are necessary?

Planners typically perform a forward search (Dijkstra, $A^*$ )

Without Action Symbols, successor generation becomes challenging

  • SAE with $|z|=36 \text{bit}$

    → Generate $236$ potential successors, then filter the invalid

  • /With action symbols/, (e.g. up, down, left, right)
  • → We can enumerate the candidates in /constant time/.

Challenge: Training Input lacks action symbols

Should identify the number of schemas from what is un/affected

png:ama/action-symbol

Challenge: Precondition/Effect learning is not trivial

  • We do not need an action label; it is linear anyways
  • Then, AMA ≡ prediction task ($≈$ scene prediction in video)
    • We can train a neural network $a(s) = t\;$ by minimizing $|t-a(s)|$
    • Most DL literature follows this path

png:ama/linear

Challenge: Precondition/Effect learning is not trivial

AMA_2 Overview

png:ama/overview0

AMA_2 Overview

png:ama/overview1

AMA_2 Overview

png:ama/overview1-1

AMA_2 Overview

png:ama/overview2

AMA_2 Overview

png:ama/overview3

Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA_2 Overview

/Action AutoEncoder (AAE)/

Action Discriminator (AD)

Action AutoEncoder

What is the right function to learn?

  • $a(s) = t$ ?

    We cannot train this

  • $a$ is a variable!

    $apply(a,s) = t$

  • Transition is a mapping

    action $a → $ successor $t$

    /conditioned by/

    the current state $s$

  • The right function to learn!

Implementing $(a → t) | s$

png:aae/aae-0

Implementing $(a → t) | s$

png:aae/aae-1

Implementing $(a → t) | s$

png:aae/aae-2

Implementing $(a → t) | s$

png:aae/aae-3

Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA_2 Overview

Action AutoEncoder (AAE)

/Action Discriminator (AD)/

Action Discriminator learns preconditions

png:ama/overview2

Action Discriminator learns Preconditions = Binary Classifer

Binary classifier telling which transitions are valid

Negative dataset

  • /Invalid transitions/: All transitions that are not valid.
    • We lack the definition; Cannot be generated.
  • Can we collect the invalid data?
    • Teleportation violates the laws of physics. (at least in a macro scale)

Solution: PU-Learning (Elkan & Noto, KDD 08’)

Training a positive & negative classifier from the positive & /unlabelled/ datasets.

  • Positive: Training Input. Observed data are valid, guaranteed
  • Unlabelled: Successor candidates generated by the AAE
    • Some are valid, some are invalid

Planning using AMA_2

AAE enumerates the canditates; AD/SD filters the invalid.

\begin{align*} Succ(s) &= \{t = apply(a,s) \; | \; a ∈ \{0\ldots 127\},
& \qquad ∧ AD(s,t) \geq 0.5 \ & \qquad ∧ SD(t) \geq 0.5 \} \end{align*}

We have action symbols and propositional states!

Now we can run $A^*$ with goal-count heuristics

AMA_2 Experiments 1

Is it feasibile to do planning with AAE and AD?

  • 100 instances for each domain
    • self-avoiding random walks from the goal state
    • (benchmark A) 7-step
    • (benchmark B) 14-step
  • 180 sec. time limit
  • Domain-specific plan validators.

Results

Noise are applied to the planning inputs (init/goal images)

G: Gaussian noise, s/p: salt/pepper noise

  • Easy instances: Majority of instances are solved
  • Harder instances: Still many instances are solved
/<><>
step777141414
noisestdGs/pstdGs/p
MNIST726464643
Mandrill10010010091414
Spider949998293638
LightsOut10099100596051
Twisted LO966598756872
/Hanoi374439151817

AMA_2 Experiments 2

How accurate are Action Discriminators and State Discriminators?

AMA_2 Experiments 2

Conclusion

  • Latplan: The /first system/ which completely automatically generates a classical, symbolic planning model from scratch
  • State AutoEncoder(SAE) : /real-world states/ /propositional states/

    Action AutoEncoder(AAE) : /transitions/ /action labels/, /effects/

    Action Discriminator(AD) : /transition/ → /bool/ with PU-Learning

    Two of the major grounding problems were solved!

    Types of symbols
    Propositional symbolsSolved!
    Action symbolsSolved!
    Object symbolsComputer Vision techniques?
    Predicate symbols???

    Future work: improved accuracy, runtime efficiency.

    This opens many avenue for future research.

Future Work (Input format)

LatPlan is an architecture : Any system with SAE is an implementation

Different SAE → reason about different types of raw data

  • Autoencoders for text, audio [Li 2015, Deng 2010]
  • Example:
    • Transition rule ”This is an apple, this is a pen → oh, ApplePen!
  • When this happens, /Latplan/ brings AI to a new level e.g.

    1000 steps of natural language reasoning with /logic/, not by reflex

    • Fundamentally impossible for short-sighted / greedy agents

Future Work (Extended planning formalism)

  • Latplan assumes nothing about the environment machinery (grids, movable tiles…)
  • Latplan assumes fully-observable, deterministic domains
  • Next step: Extending Latplan to MDP, POMDP
    • Gumbel-Softmax layer → just a Softmax layer? (probability)
    • $AO^*$, $LAO^*$ algorithms for optimal planning under uncertainty

Future Work (Propositional → First-order)

SAE can generate propositional symbols (state $s = \{q,r\ldots\}$)

  • 1st-order logic (predicate $p(a,b)$ )
  • We need object recognition from images (parameters $a,b$)
  • SAE with networks for object recognition (e.g. R-CNN) should achieve this

Future Work (Knowledge extraction)

AMA_2 (AAE/AD) is a neural network, thus precondition/effects are in a blackbox

Hinders deriving heuristic functions

Knowledge extraction by Konidaris et al.(‘14,’15), δ-ILP (Deepmind) ?

Appendix

Using the Remaining Time (Discussion)

Konidaris et. al (2014, 2015):

Structured input (e.g. light switch, x/y-distance) unstructured image

Action Symbols (move, interact)

Just converting Semi-MDP model to a PDDL model.

Could be used for extracting PDDL from AAE/AD

NNs for solving combinatorial tasks:

TSP (Hopfield and Tank 1985)

Neurosolver for ToH (Bieszczad and Kuchar 2015)

The input/output are symbolic.

Other Work Combining Symbolic Search and NNs

Embedded NNs inside a search to provide the search control knowledge

(i.e. node evaluation function)

Sliding-tile puzzle and Rubik’s Cube (Arfaee et al. 2011)

Classical planning (Satzger and Kramer 2013)

The game of Go (AlphaGo, Silver et al. 2016)

Deep Reinforcement Learning (DRL) (Mnih et al. 2015, DQN)

DQN assumes predetermined action symbols (↑↓←→+ buttons).

DQN relies on simulators. Latplan reverse-engineers a simulator.

DQN does not work when it does not know what action is even possible!

Other Interesting Systems

SCAN system (deepmind)

  • Maps continuous latent vector and human-provided symbolic vector

δ-ILP (Inductive Logic Programming)

  • ILP robust to noise
    • Extracting rules from AAE/AD to form a PDDL?

Why not individual pixels? Why DL?

Systems based on individual pixels lack generalization

  • Noise / variations can make the data entirely different

    png:results/noise

  • must acquire the generalized features
  • = a nonlinear function that recognize the entanglements between multiple pixels

Learning vs Planning

Main differences: Purposes and the abstraction layer

  • AlphaGo = Subsymbolic (DLNN eval. function) + Symbolic (MCTS)

Human-Competitive Systems

AlphaGo = Subsymbolic (NN eval. func) + Symbolic (MCTS)

  • However, domain-specific – specialized in Go, “Grids” / “Stones” are known
  • Huge expert trace DB — Not applicable when data are scarse (e.g. space exploration)
  • /Is supervised learning necessary for human?/

    True intelligence should search / collect data by itself

DQN = Subsymbolic (DLNN) + Reinforcement Learning (DLNN)

Domain-independent Atari Game solver (Invader, Packman…), however:

  • RL Acting: Greedily follow the learned policy → no deliberation!
  • You can survive most Atari games by reflex

Latplan Advantages

— Robust systems augmented by the latest DL tech

Better Theoretical Guarantee than Reinforcement Learning

Completeness (Finds solution whenever possible), /Solution Optimality/

Decision Making Independent from Learning

/Unsupervised/ (No data required), Explainable (Search by logic)

When Latplan returns a wrong solution?

Machine learning may contain errors (convergence only on $t→ ∞$, not on real time)

  • Images → Fraud symbols/model/graph
  • Optimal path on a fraud graph or graph disconnected

    A* completeness, soundness, optimality (admissible heuristics)

  • Fraud visualized plan (noisy) / no plan found

LatPlan may make /wrong observations/ but no /wrong decisions/

BTW, “correctness” is defined by error prone observations by humans anyways …

Reinforcement Learning

Both perception and decision making depend on training

  • /When the learned policy is wrong/,
    • the solution could be suboptimal
    • /It could even fail to find solutions (incomplete agent)/

… AlphaGo was unprepared for Lee Sedol’s Move 78 because it didn’t think that a human would ever play it.

Cade Metz. “In Two Moves, that Redifined the Future.” Wired, 2016

RL may make wrong decisions.

Why symbols?

Symbols are strong abstraction mechanisms becasue

Meanings do not matter
No understanding necessary: Does a symbol $X$ mean an apple or a car?

Logical reasoning by mechanical application of rules

  • Domain-independent planning : mystery vs nomystery

    Logistic domains where symbol names are mangled (truck → shark)

Composable
A latent vector is a conjunction (and)

Heuristic functions use modus ponens to derive guidance

This is the start of neural-symbolic AI!

png:latplanlogo-in-sicp

(yeah I really like a catchy phrase)