# Continuous Deep Reinforcement Learning on Slot Car Racing

In this notebook we provide a quick introduction to the "Deep Deterministic Gradient Policy" algorithm presented in the 2016 paper "Continuous Control with Deep Reinforcement Learning" by Lillicrap et al. We demonstrate the algorithm in a Slot Car Racing (also known by the brand-name Carrera) environment.

The following consists of multiple parts:

  - The custom `Carrera` environment -- a Python class which allows agents to create, reset and perform "steps" on a slot car racing track. It also provides abstractions for visualizing the algorithms performance.
  - An `Agent` class which interacts with a provided environment.
  - The DDPG algorithm implemented in TensorFlow.

## Environment

In [None]:
import math
from typing import Tuple

import numpy as np
import matplotlib.pyplot as plt

class Carrera:
    """A simple carrera track, modeled by a maximum velocity function."""

    def __init__(self):
        """Create new carrera track environment."""
        self._track_len = 2 * math.pi
        self._position = 0
        self._velocity = 0
        self._terminal = False
        self._episode_reward = 0
    
    def _max_velocity(self, position: float):
        """Returns the maximum velocity for any position on the track."""
        return (math.sin(position) + 1) / 2
    
    def reset(self):
        """Reset environment."""
        self._position = 0
        self._velocity = 0
        self._terminal = False
        self._episode_reward = 0
        return (self._position, self._velocity)
        
    def step(acceleration: float) -> Tuple[Tuple[float, float], float]:
        """Perform a step in the environment.
        
        Returns new observation tuple (position, velocity) and reward.
        """
        self._velocity = max(0.8 * self._velocity, min(1, acceleration))
        self._position = (self._position + (self._velocity/self._track_len)) % self._track_len
        max_velocity = self._max_velocity(self._position)
        if self._velocity > max_velocity:  # Cart flew out of the track
            self._terminal = True
            return (self._position, self._velocity), -1, self._terminal
        reward = (self._velocity - max_velocity) + 1
        self._episode_reward += reward
        return (self._position, self._velocity), reward, self._terminal
    
    def render(self, fig=None) -> plt.Figure:
        if fig is None:
            fig = plt.figure()
        x = np.linspace(0, self._track_len, 1000)
        y = np.vectorize(self._max_velocity)(x)
        plt.plot(x, y)
        fig.canvas.draw()
        return fig
        
        
    @property
    def episode_reward(self) -> float:
        """Get cummulated reward for the whole episode."""
        return self._episode_reward

In [None]:
env = Carrera()
env.render()