# Mountain Car Example

Let's see how to make this API work with Mountain Car! This reinforcement learning API requires 3 things to be defined before we start running algorithms:

+ BlackBoxModel: defines the problem--see below for an example!
+ Policy: this is where your domain knowledge comes in--define action space and feature functions
+ Solver: This is where the API takes over and you just specify what you want to use

In [None]:
#TODO: include stuff
import PyPlot: plot, xlabel, ylabel, subplot, suptitle #for solver.grandiloquent
import StatsBase: sample, WeightVec #for policy.SoftmaxPolicy
using HypothesisTests #for utils.test...

typealias RealVector Union{Array{Float64,1},Array{Int,1},SparseMatrixCSC{Float64,Int},SparseMatrixCSC{Int,Int}}
typealias RealMatrix Union{Array{Float64,2},Array{Int,2},SparseMatrixCSC{Float64,Int},SparseMatrixCSC{Int,Int}}
dot(x::Array,y::SparseMatrixCSC) = (x'*y)[1]
dot(x::SparseMatrixCSC,y::Array) = dot(y,x)

import Base.assert
function assert(expr,val,fn::Function= ==,varname::AbstractString="")
	if !fn(expr,val)
    error("Assertion failed: $varname : expected $val, got $expr")
	end
end

abstract AnnealerParam
abstract ExperienceReplayer
abstract Minibatcher
abstract UpdaterParam
abstract ActionSpace
abstract Policy
abstract Model
include(joinpath("..","src","BlackBoxModel.jl"))
include(joinpath("..","src","policy.jl"))
include(joinpath("..","src","simulator.jl"))
include(joinpath("..","src","learners.jl"))
include(joinpath("..","src","solvers","__solvers.jl"))

include(joinpath("..","src","solve.jl"))
include(joinpath("..","src","utils.jl"))

## Define Black Box Model Functions

The BlackBoxModel type requires the following things to be defined:
+ `model`: a generic type that holds all your model parameters for a specific instance of your problem
+ `init(model,rng)`: generate an initial state
+ `observe(model,rng,state,action=None)`: return an observation based on your state (and action--this isn't quite ironed out yet)
+ `next_state(model,rng,state,action)`: generate a next state given your state, action and problem parameterization
+ `reward(model,rng,state,action)`: generate a reward based on your state and action and problem parameterization
+ `isterminal(model,state,action)`: return a boolean of whether a state (and action) is terminal or not

In [None]:
type MtnCarModel <: Model
    cost::Float64
end
MtnCarModel() = MtnCarModel(-1.)

In [None]:
init(m::MtnCarModel,rng::AbstractRNG) = [-0.5;0.]

In [None]:
function next_state(model::MtnCarModel,rng::AbstractRNG,s::Array{Float64,1},a::Float64)
    x,v = s
    v_ = v + a*0.001+cos(3*x)*-0.0025
    v_ = max(min(0.07,v_),-0.07)
    x_ = x+v_
    #inelastic boundary
    if x_ < -1.2
        x_ = -1.2
        v_ = 0.
    end
    return [x_;v_]
end

In [None]:
reward(m::MtnCarModel,rng::AbstractRNG,s::Array{Float64,1},a::Float64) = m.cost

In [None]:
isterminal(m::MtnCarModel,rng::AbstractRNG,s::Array{Float64,1},a::Float64) = s[1] >= 0.5

We now define the BlackBoxModel type. Note that we do not include an observation function in the constructor--in this case, it uses a default identity observation model

In [None]:
bbm = BlackBoxModel(MtnCarModel(),init,next_state,reward,isterminal) 

## Setting Up the Policy

In general for a policy, we have to define an ActionSpace (which we require to be exactly or a subset of the true action space), and feature function, which maps the state into a vector.

In [None]:
A = DiscreteActionSpace([-1.;0.;1.]) #

Tile coding is provided (the API for tilecoding needs work, however) for a quick and dirty function approximator in the continuous domain. For concreteness/generality, we include a function `cast_mc_state`, which in the most general case, will convert whatever state representation you have into an array of numbers

In [None]:
#for concreteness, this function converts statespace to an array
cast_mc_state(x)=x
__feature_function_ = generate_tilecoder(10,10,A,[-1.2;-0.07],[0.6;0.07])
feature_function(s,a)=__feature_function_(cast_mc_state(s),a)

In [None]:
policy = EpsilonGreedyPolicy(feature_function,A,rng=MersenneTwister(3234),eps=0.1)

## Choose and Set up your Solver

Currently, the following solvers are supported:
+ Forgetful LSTD(\lambda) / LS-SARSA (untested)
+ SARSA(\lamda) (untested)
+ Q(\lambda) (unimplemented)
+ GQ(\lambda) (unimplemented)
+ Double Q learning (untested)
+ Deterministic Policy Gradient (unimplemented)
+ (Natural) Actor-Critic (unimplemented

We just ask that you know a-priori how big your feature vectors are to make initialization easy

In [None]:
#there might be a smart way to stick this into a constructor, but for now...
nb_features = length(policy.feature_function(bbm.state,domain(A)[1]))
updater = SARSAParam(nb_features,lambda=0.9,init_method="unif_rand",trace_type="replacing")

## Actually set up the real solver

Some random cool things supported include:
+ minibatching
+ experience replay
+ adaptive learning rates, e.g.:
    * momentum
    * nesterov momentum
    * rmsprop
    * adagrad
    * adadelta
    * adam
+ simulated annealing (probably shouldn't support this)


In [None]:
solver = Solver(updater,
                lr=0.1,
                nb_episodes=50,
                nb_timesteps=6000,
                discount=0.99,
                annealer=NullAnnealer(),
                mb=NullMinibatcher(),
                er=NullExperienceReplayer(),
                display_interval=3)

In [None]:
trained_policy = solve(solver,bbm,policy)

## Evaluate Policy
Basically just run a couple of simulations -- the simulator api is a subset of the stuff you see in solver

In [None]:
sim = Simulator(discount=1.,nb_sim=100,nb_timesteps=2500) #stuff...

In [None]:
#returns average reward for now...
R_avg = simulate(sim,bbm,trained_policy)

In [None]:
sum(updater.e)

In [None]:
[weights(updater)'*feature_function(bbm.state,a) for a in domain(A)]