## Combing two reward functions.

In this session, we discuss the way to combine two reward functions.

While training DeepRacer, it is a common scenario in which we find two interesting reward functions, and we want to make them work together as one function.

Say that we want to combine the following two reward functions:

1. follow the center line

2. reward for faster progress

The original code of these two functions are listed below.

And the function names are changed to `reward_function_1` and `reward_function_2`

In [4]:
# reward function 1:

def reward_function_1(params):
    '''
    Example of rewarding the agent to follow center line
    '''
    
    # Read input parameters
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Calculate 3 markers that are increasingly further away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track

    return reward

In [5]:
# reward function 2:
    
def reward_function_2(params):
    #############################################################################
    '''
    Example of using steps and progress
    '''

    # Read input variable
    steps = params['steps']
    progress = params['progress']

    # Total num of steps we want the car to finish the lap, it will vary depends on the track length
    TOTAL_NUM_STEPS = 300

    # Initialize the reward with typical value
    reward = 1.0

    # Give additional reward if the car pass every 100 steps faster than expected
    if (steps % 100) == 0 and progress > (steps / TOTAL_NUM_STEPS) * 100 :
        reward += 10.0

    return float(reward)
               

Now we can test the two reward functions seperatly:

In [9]:
testing_params = dict()
testing_params['track_width'] = 1
testing_params['distance_from_center'] = 0.1
testing_params['steps'] = 100
testing_params['progress'] = 40

In [10]:
reward_function_1(testing_params)

1

In [11]:
reward_function_2(testing_params)

11.0

To combine the two reward values, we need to define a real `reward_function` to call above two reward functions:

In [12]:
def reward_function(params):
    reward_1 = reward_function_1(params)
    reward_2 = reward_function_2(params)
    
    reward = reward_1 + reward_2
    
    return (reward)

In [13]:
reward_function(testing_params)

12.0

The following are all the codes we put together:

In [14]:
# reward function 1:

def reward_function_1(params):
    '''
    Example of rewarding the agent to follow center line
    '''
    
    # Read input parameters
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Calculate 3 markers that are increasingly further away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track

    return reward


# reward function 2:
    
def reward_function_2(params):
    #############################################################################
    '''
    Example of using steps and progress
    '''

    # Read input variable
    steps = params['steps']
    progress = params['progress']

    # Total num of steps we want the car to finish the lap, it will vary depends on the track length
    TOTAL_NUM_STEPS = 300

    # Initialize the reward with typical value
    reward = 1.0

    # Give additional reward if the car pass every 100 steps faster than expected
    if (steps % 100) == 0 and progress > (steps / TOTAL_NUM_STEPS) * 100 :
        reward += 10.0

    return float(reward)
          
def reward_function(params):
    reward_1 = reward_function_1(params)
    reward_2 = reward_function_2(params)
    
    reward = reward_1 + reward_2
    
    return (reward)