# Deep dive into the reward function

__Definition of the Reward function:__<br>
A reward function describes immediate feedback (as a score for reward or penalty) when the vehicle takes an action to move from a given position on the track to a new position. Its purpose is to encourage the vehicle to make moves along the track to reach its destination quickly. The model training process will attempt to find a policy which maximizes the average total reward the vehicle experiences.

__At Colaberry we have a dedicated course on [Python for Data Scientists](https://refactored.ai/course/python-for-data-scientists/) which we would recommend you to go through in case you would like to learn about coding in python as the reward function requires you to write code in Python programming language__

## 1. Reward function Parameters/Inputs

The AWS DeepRacer reward function takes a dictionary object as the input.<br>
Example of how the reward function looks like.

In [None]:
def reward_function(params) :
    
    reward = ...

    return float(reward)

Now let us take a look at the parameters(inputs) the reward fuction can take.<br>
The params dictionary object contains the following key-value pairs:

In [None]:
{
    "all_wheels_on_track": Boolean,        # flag to indicate if the agent is on the track
    "x": float,                            # agent's x-coordinate in meters
    "y": float,                            # agent's y-coordinate in meters
    "closest_objects": [int, int],         # zero-based indices of the two closest objects to the agent's current position of (x, y).
    "closest_waypoints": [int, int],       # indices of the two nearest waypoints.
    "distance_from_center": float,         # distance in meters from the track center 
    "is_crashed": Boolean,                 # Boolean flag to indicate whether the agent has crashed.
    "is_left_of_center": Boolean,          # Flag to indicate if the agent is on the left side to the track center or not. 
    "is_offtrack": Boolean,                # Boolean flag to indicate whether the agent has gone off track.
    "is_reversed": Boolean,                # flag to indicate if the agent is driving clockwise (True) or counter clockwise (False).
    "heading": float,                      # agent's yaw in degrees
    "objects_distance": [float, ],         # list of the objects' distances in meters between 0 and track_length in relation to the starting line.
    "objects_heading": [float, ],          # list of the objects' headings in degrees between -180 and 180.
    "objects_left_of_center": [Boolean, ], # list of Boolean flags indicating whether elements' objects are left of the center (True) or not (False).
    "objects_location": [(float, float),], # list of object locations [(x,y), ...].
    "objects_speed": [float, ],            # list of the objects' speeds in meters per second.
    "progress": float,                     # percentage of track completed
    "speed": float,                        # agent's speed in meters per second (m/s)
    "steering_angle": float,               # agent's steering angle in degrees
    "steps": int,                          # number steps completed
    "track_length": float,                 # track length in meters.
    "track_width": float,                  # width of the track
    "waypoints": [(float, float), ]        # list of (x,y) as milestones along the track center

}

__A more detailed technical reference of the input parameters is as follows__

## 1.1 Parameters for the Reward function

Input Parameter:

__all_wheels_on_track__

__Type__: Boolean

__Range__: (True:False)

A Boolean flag to indicate whether the agent is on-track or off-track. It's off-track (False) if any of its wheels are outside of the track borders. It's on-track (True) if all of the wheels are inside the two track borders. The following illustration shows that the agent is on-track.

<img src="images/deepracer-reward-function-input-all_wheels_on_track-true.png">

The following illustration shows that the agent is off-track.


<img src="images/deepracer-reward-function-input-all_wheels_on_track-false.png">

Example: A reward function using the all_wheels_on_track parameter

In [None]:
define reward_function(params):
    #############################################################################
    '''
    Example of using all_wheels_on_track and speed
    '''

    # Read input variables
    all_wheels_on_track = params['all_wheels_on_track']
    speed = params['speed']

    # Set the speed threshold based your action space 
    SPEED_THRESHOLD = 1.0 

    if not all_wheels_on_track:
        # Penalize if the car goes off track
        reward = 1e-3
    elif speed < SPEED_THRESHOLD:
        # Penalize if the car goes too slow
        reward = 0.5
    else:
        # High reward if the car stays on track and goes fast
        reward = 1.0

    return reward`

<hr style="border:1px solid gray"> </hr>

Input Paramaters:
    
__waypoints__

__Type__: list of [float, float]

__Range__: [[$x_{w,0}$,$y_{w,0}$] … [$x_{w,Max-1}$, $y_{w,Max-1}$]]

An ordered list of track-dependent `Max` milestones along the track center. Each milestone is described by a coordinate of ($x_{w,i}$, $y_{w,i}$). For a looped track, the first and last waypoints are the same. For a straight or other non-looped track, the first and last waypoints are different.

<img src="images/deepracer-reward-function-input-waypoints.png">

Example: A reward function using the waypoints parameter was used in the __closest_waypoints__ example below

<hr style="border:1px solid gray"> </hr>

Input Parameter:

__closest_waypoints__

__Type__: [int, int]

__Range__: [(0:Max-1),(1:Max-1)]

The zero-based indices of the two neighboring waypoints closest to the agent's current position of (x, y). The distance is measured by the Euclidean distance from the center of the agent. The first element refers to the closest waypoint behind the agent and the second element refers the closest waypoint in front of the agent. Max is the length of the waypoints list. In the illustration shown in waypoints, the closest_waypoints would be [16, 17].



<img src="images/deepracer-reward-function-input-waypoints.png">

__Example__: A reward function using the closest_waypoints parameter.

The following example reward function demonstrates how to use waypoints and closest_waypoints as well as heading to calculate immediate rewards.

In [2]:
def reward_function(params):
    ###############################################################################
    '''
    Example of using waypoints and heading to make the car in the right direction
    '''

    import math

    # Read input variables
    waypoints = params['waypoints']
    closest_waypoints = params['closest_waypoints']
    heading = params['heading']

    # Initialize the reward with typical value 
    reward = 1.0

    # Calculate the direction of the center line based on the closest waypoints
    next_point = waypoints[closest_waypoints[1]]
    prev_point = waypoints[closest_waypoints[0]]

    # Calculate the direction in radius, arctan2(dy, dx), the result is (-pi, pi) in radians
    track_direction = math.atan2(next_point[1] - prev_point[1], next_point[0] - prev_point[0]) 
    # Convert to degree
    track_direction = math.degrees(track_direction)

    # Calculate the difference between the track direction and the heading direction of the car
    direction_diff = abs(track_direction - heading)
    if direction_diff > 180:
        direction_diff = 360 - direction_diff

    # Penalize the reward if the difference is too large
    DIRECTION_THRESHOLD = 10.0
    if direction_diff > DIRECTION_THRESHOLD:
        reward *= 0.5

    return reward


<hr style="border:1px solid gray"> </hr>

Input Parameters:
    
__heading__

__Type__: float

__Range__: -180:+180

Heading direction, in degrees, of the agent with respect to the x-axis of the coordinate system.

<img src="images/deepracer-reward-function-input-heading.png">

Example: A reward function using the heading parameter can be seen in the __closest_waypoints__ example above

<hr style="border:1px solid gray"> </hr>

Input Parameter:

__closest_objects__

__Type__: [int, int]

__Range__: [(0:len(object_locations)-1), (0:len(object_locations)-1]

The zero-based indices of the two closest objects to the agent's current position of (x, y). The first index refers to the closest object behind the agent, and the second index refers to the closest object in front of the agent. If there is only one object, both indices are 0.

__Note__: This is primarily used for the object detection race in the AWS DeepRacer. For time trial race this can be ignored.

<hr style="border:1px solid gray"> </hr>

Input Parameter:

__distance_from_center__

__Type__: float

__Range__: 0:~track_width/2

Displacement, in meters, between the agent center and the track center. The observable maximum displacement occurs when any of the agent's wheels are outside a track border and, depending on the width of the track border, can be slightly smaller or larger than half the track_width.

<img src="images/deepracer-reward-function-input-distance_from_center.png">

Example: A reward function using the distance_from_center parameter

In [None]:
def reward_function(params):
    #################################################################################
    '''
    Example of using distance from the center 
    ''' 

    # Read input variable
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Penalize if the car is too far away from the center
    marker_1 = 0.1 * track_width
    marker_2 = 0.5 * track_width

    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    else:
        reward = 1e-3  # likely crashed/ close to off track

    return reward

<hr style="border:1px solid gray"> </hr>

Input Paramater:

__is_crashed__

__Type__: Boolean

__Range__: (True:False)

A Boolean flag to indicate whether the agent has crashed into another object (True) or not (False) as a termination status.

<hr style="border:1px solid gray"> </hr>

Input Paramater:

__is_left_of_center__

__Type__: Boolean

__Range__: (True : False)

A Boolean flag to indicate if the agent is on the left side to the track center (True) or on the right side (False).

<hr style="border:1px solid gray"> </hr>

Input Paramater:

__is_offtrack__

__Type__: Boolean

__Range__: (True:False)

A Boolean flag to indicate whether the agent has off track (True) or not (False) as a termination status.

<hr style="border:1px solid gray"> </hr>

Input Paramater:

__is_reversed__

__Type__: Boolean

__Range__: (True:False)

A Boolean flag to indicate if the agent is driving on clock-wise (True) or counter clock-wise (False).

It's used when you enable direction change for each episode. An episode is a period in which the vehicle starts from a given starting point and ends up completing the track or going off the track.

<hr style="border:1px solid gray"> </hr>

Input Paramater:

__objects_distance__

__Type__: [float, … ]

__Range__: [(0:track_length), … ]

A list of the distances between objects in the environment in relation to the starting line. The $i_{th}$ element measures the distance in meters between the $i_{th}$ object and the starting line along the track center line.

To index the distance between a single object and the agent, use:

`abs(params["objects_distance"][index] - (params["progress"]/100.0)*params["track_length"])`

<img src="images/objects-distance-diagram.png">

* __Note__

abs | (var1) - (var2)| = how close the car is to an object, WHEN var1 = ["objects_distance"][index] and var2 = params["progress"]*params["track_length"]

To get an index of the closest object in front of the vehicle and the closest object behind the vehicle, use the "closest_objects" parameter.


* This is primarily used for the object detection race in the AWS DeepRacer. For time trial race this can be ignored.

<hr style="border:1px solid gray"> </hr>

Input Parameter:

__objects_heading__

__Type__: [float, … ]

__Range__: [(-180:180), … ]

List of the headings of objects in degrees. The $i_{th}$ element measures the heading of the $i_{th}$ object. For stationary objects, their headings are 0. For a bot vehicle, the corresponding element's value is the vehicle's heading angle.

<hr style="border:1px solid gray"> </hr>

Input Parameter:

__objects_left_of_center__

__Type__: [Boolean, … ]

__Range__: [True|False, … ]

List of Boolean flags. The $i_{th}$ element value indicates whether the $i_{th}$ object is to the left (True) or right (False) side of the track center.

<hr style="border:1px solid gray"> </hr>

Input Parameter:

__objects_location__

__Type__: [(x,y), ...]

__Range__: [(0:N,0:N), ...]

List of all object locations, each location is a tuple of (x, y).

The size of the list equals the number of objects on the track. Note the object could be the stationary obstacles, moving bot vehicles.

<hr style="border:1px solid gray"> </hr>

Input Parameter:

__objects_speed__

__Type__: [float, … ]

__Range__: [(0:12.0), … ]

List of speeds (meters per second) for the objects on the track. For stationary objects, their speeds are 0. For a bot vehicle, the value is the speed you set in training.

<hr style="border:1px solid gray"> </hr>

Input Parameter:

__progress__

__Type__: float

__Range__: 0:100

Percentage of track completed.

Example: A reward function using the progress parameter is shared below in the _steps_ example

<hr style="border:1px solid gray"> </hr>

Input Parameters:
    
__speed__

__Type__: float

__Range__: 0.0:5.0

The observed speed of the agent, in meters per second (m/s)

<img src="images/deepracer-reward-function-input-speed.png">

Example: A reward function using the speed parameter was the initial __all_wheels_on_track__ example

<hr style="border:1px solid gray"> </hr>

Input Parameters:

__steering_angle__

__Type__: float

__Range__: -30:30

Steering angle, in degrees, of the front wheels from the center line of the agent. The negative sign (-) means steering to the right and the positive (+) sign means steering to the left. The agent center line is not necessarily parallel with the track center line as is shown in the following illustration.

<img src="images/deepracer-reward-function-steering.png">

Example: A reward function using the steering_angle parameter

In [None]:
def reward_function(params):
    '''
    Example of using steering angle
    '''

    # Read input variable
    steering = abs(params['steering_angle']) # We don't care whether it is left or right steering

    # Initialize the reward with typical value 
    reward = 1.0

    # Penalize if car steer too much to prevent zigzag
    STEERING_THRESHOLD = 20.0
    if steering > ABS_STEERING_THRESHOLD:
        reward *= 0.8

    return reward

<hr style="border:1px solid gray"> </hr>

Input Parameter:

__steps__

__Type__: int

__Range__: 0:Nstep

Number of steps completed. A step corresponds to an action taken by the agent following the current policy.

Example: A reward function using the steps parameter

In [None]:
def reward_function(params):
    #############################################################################
    '''
    Example of using steps and progress
    '''

    # Read input variable
    steps = params['steps']
    progress = params['progress']
    
    # Total num of steps we want the car to finish the lap, it will vary depends on the track length
    TOTAL_NUM_STEPS = 300

    # Initialize the reward with typical value 
    reward = 1.0

    # Give additional reward if the car pass every 100 steps faster than expected 
    if (steps % 100) == 0 and progress > (steps / TOTAL_NUM_STEPS) * 100 :
        reward += 10.0

    return reward


<hr style="border:1px solid gray"> </hr>

Input Paramater:
    
__track_length__

__Type__: float

__Range__: [0:Lmax]

The track length in meters. Lmax is track-dependent.

<hr style="border:1px solid gray"> </hr>

Input Parameter:

__track_width__

__Type__: float

__Range__: 0:Dtrack

Track width in meters.

<img src="images/deepracer-reward-function-input-track_width.png">

Example: A reward function using the track_width parameter

In [None]:
def reward_function(params):
    #############################################################################
    '''
    Example of using track width
    '''

    # Read input variable
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Calculate the distance from each border
    distance_from_border = 0.5 * track_width - distance_from_center

    # Reward higher if the car stays inside the track borders
    if distance_from_border >= 0.05:
        reward *= 1.0
    else:
        reward = 1e-3 # Low reward if too close to the border or goes off the track

    return reward

<hr style="border:1px solid gray"> </hr>

Input Parameters:

__x, y__

__Type__: float

__Range__: 0:N

Location, in meters, of the agent center along the x and y axes, of the simulated environment containing the track. The origin is at the lower-left corner of the simulated environment.

<img src="images/deepracer-reward-function-input-x-y.png">

## A quick recap of the AWS Deepracer and its functioning(Basics, paramaters and terminologies) can be found in nice interactive visualizations by on this [link](https://d2k9g1efyej86q.cloudfront.net/)