# Reinforcement Learning, An Introduction - *Sutton & Barto '18*

[![GitHubBadge]][GitHubLink] [![ColabBadge]][ColabLink]

## Chapter 2 - Multi-armed Bandits

### Exercise 2.5 - Non-Stationary Problem


 
Design and conduct an experiment to demonstrate the difficulties that sample-average methods have for non-stationary problems. Use a modified version of the 10-armed testbed in which all the $q_*(a)$ start out equal and then take independent random walks (say by adding a normally distributed increment with mean zero and standard deviation 0.01 to all the $q_*(a)$ on each step). Prepare plots like Figure 2.2 for an action-value method using sample averages, incrementally computed, and another action-value method using a constant step-size parameter, $\alpha$=0.1. Use $\epsilon$=0.1 and longer runs, say of 10,000 steps.



[GitHubBadge]: https://img.shields.io/badge/|-Edit_on_GitHub-green.svg?logo=github "Edit notebook's source code on GitHub"
[GitHubLink]: https://github.com/vojtamolda/reinforcement-learning-an-introduction/blob/swift/Chapter%202/Exercise%202.5.ipynb

[ColabBadge]: https://colab.research.google.com/assets/colab-badge.svg "Run notebook in Google Colab"
[ColabLink]: https://colab.research.google.com/github/vojtamolda/reinforcement-learning-an-introduction/blob/swift/Chapter%202/Exercise%202.5.ipynb


In [1]:
// Install Packages
//%install-swiftpm-flags -c release
%install-location /swift/packages

%install '.package(url: "https://github.com/vojtamolda/reinforcement-learning-an-introduction", .branch("swift"))' ReinforcementLearning
//%install '.package(url: "https://github.com/vojtamolda/Plotly.swift", from: "0.3.1")' Plotly

// Clear Output
//print("\u{001B}[2J")

Installing packages:
	.package(url: "https://github.com/vojtamolda/reinforcement-learning-an-introduction", .branch("swift"))
		Reinforcement Learning - An Introduction
With SwiftPM flags: []
Working in: /tmp/tmpoyx0_ssb/swift-install
'jupyterInstalledPackages' /swift/packages/package: error: product 'Reinforcement Learning - An Introduction' not found. It is required by target 'jupyterInstalledPackages'.


: ignored

In [None]:
import ReinforcementLearning

In [None]:
let z = MultiArmedBandit()
print(z)


let a = EpsilonGreedyGambler(z)
print(a)

## Differentiable PDE Solver

The next cell imports three different implementations differentiable shallow water PDE solver from a cloned git repository. The full source code and a readme file can be found in [this GitHub gist](https://gist.github.com/bd85033cf62877e6f8ada68b8bbb32a0.git).


In [None]:
// MARK: - Visualization

extension MultiArmedBandit.State {
    
    var armRewardPlot: Figure {
        let numSamples = 1_000
        var armIndices = [String]()
        var armRewardSamples = [Double]()

        for arm in 0..<game.armCount {
            let indices = Array(repeating: String(arm), count: numSamples)
            let rewardSamples = (0..<numSamples).map {
                _ in applying(arm).utility(for: .player(0))
            }

            armIndices.append(contentsOf: indices)
            armRewardSamples.append(contentsOf: rewardSamples)
        }
        
        let armRewardDistributions = Violin(
            y: armRewardSamples,
            x: armIndices,
            points: .off,
            meanLine: .init(visible: true)
        )
        
        return Figure(data: [armRewardDistributions])
    }
}

extension ClosedRange: Plotable where Bound: Encodable {
    public func encode(toPlotly encoder: Encoder) throws {
        try self.encode(to: encoder)
    }
}

## Benchmarks

The following code runs a simulation of a water surface behavior in a rectangular bathtub. There's an initial "splash" at the begining. The splash generates surface gravity waves that propagate away from the center and reflect off the domain walls. There's three different versions, one for each implementation of the solver.

Implementations that use the `Tensor` type for numerical values also acept the `device` argument. This allows them to run with XLA acceleration.

In [None]:
let n = 256
let duration = 512

#### A - `ArrayLoopSolution`

In [None]:
func splashArrayLoop() {
    var initialWaterLevel = [[Float]](repeating: [Float](repeating: 0.0, count: n), count: n)
    initialWaterLevel[n / 2][n / 2] = 100

    let initialSolution = ArrayLoopSolution(waterLevel: initialWaterLevel)
    _ = [ArrayLoopSolution](evolve: initialSolution, for: duration)
}

#### B - `TensorLoopSolution`

In [None]:
func splashTensorLoop(on device: Device) {
    var initialWaterLevel = Tensor<Float>(zeros: [n, n], on: device)
    initialWaterLevel[n / 2][n / 2] = Tensor<Float>(100, on: device)

    let initialSolution = TensorLoopSolution(waterLevel: initialWaterLevel)
    _ = [TensorLoopSolution](evolve: initialSolution, for: duration)
}

#### C - `TensorSliceSolution`

In [None]:
func splashTensorSlice(on device: Device) {
    var initialWaterLevel = Tensor<Float>(zeros: [n, n], on: device)
    initialWaterLevel[n / 2][n / 2] = Tensor<Float>(100, on: device)

    let initialSolution = TensorSliceSolution(waterLevel: initialWaterLevel)
    _ = [TensorSliceSolution](evolve: initialSolution, for: duration)
}

#### D - `TensorConvSolution`

In [None]:
func splashTensorConv(on device: Device) {
    var initialWaterLevel = Tensor<Float>(zeros: [n, n], on: device)
    initialWaterLevel[n / 2][n / 2] = Tensor<Float>(100, on: device)

    let initialSolution = TensorConvSolution(waterLevel: initialWaterLevel)
    _ = [TensorConvSolution](evolve: initialSolution, for: duration)
}

## Results

Not yet conclusive...

In [None]:
let splashBenchmarks = BenchmarkSuite(name: "Shallow Water PDE Solver",
                                      settings: Iterations(10), WarmupIterations(2)) { suite in
    suite.benchmark("Array Loop") { splashArrayLoop() }

    // This is at least 1000x slower. One can easily grow old while running the benchmark :(
    //suite.benchmark("Tensor Loop") { splashTensorLoop(on: Device.default) }
    //suite.benchmark("Tensor Loop (XLA)") { splashTensorLoop(on: Device.defaultXLA) }

    suite.benchmark("Tensor Slice") { splashTensorSlice(on: Device.default) }
    suite.benchmark("Tensor Slice (XLA)") { splashTensorSlice(on: Device.defaultXLA) }

    suite.benchmark("Tensor Conv") { splashTensorConv(on: Device.default) }
    suite.benchmark("Tensor Conv (XLA)") { splashTensorConv(on: Device.defaultXLA) }
}

In [None]:
Benchmark.main([splashBenchmarks])