## Combing two reward functions.

In this session, we discuss the way to combine two reward functions.

While training DeepRacer, it is a common scenario in which we find two interesting reward functions, and we want to make them work together as one function.

Say that we want to combine the following two reward functions:

1. follow the center line

2. reward for faster progress

The original code of these two functions are listed below.

And the function names are changed to `reward_function_1` and `reward_function_2`

Then we can create a new `reward_function` to call above two functions, then combine the return values together.

In [1]:
# reward function 1:

def reward_function_1(params):
    '''
    Example of rewarding the agent to follow center line
    '''
    
    # Read input parameters
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Calculate 3 markers that are increasingly further away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track

    return reward


# reward function 2:
    
def reward_function_2(params):
    #############################################################################
    '''
    Example of using steps and progress
    '''

    # Read input variable
    steps = params['steps']
    progress = params['progress']

    # Total num of steps we want the car to finish the lap, it will vary depends on the track length
    TOTAL_NUM_STEPS = 300

    # Initialize the reward with typical value
    reward = 1.0

    # Give additional reward if the car pass every 100 steps faster than expected
    if (steps % 100) == 0 and progress > (steps / TOTAL_NUM_STEPS) * 100 :
        reward += 10.0

    return float(reward)
          
def reward_function(params):
    reward_1 = reward_function_1(params)
    reward_2 = reward_function_2(params)
    
    reward = reward_1 + reward_2
    
    return (reward)

Now we can test the two reward functions seperatly:

In [2]:
testing_params = dict()
testing_params['track_width'] = 1
testing_params['distance_from_center'] = 0.1
testing_params['steps'] = 100
testing_params['progress'] = 40

In [3]:
reward_function_1(testing_params)

1

In [4]:
reward_function_2(testing_params)

11.0

And calling the `reward_function` give us the sum value of above two reward functions.

In [13]:
reward_function(testing_params)

12.0

Let's review the detail of `reward_function`, which is listed below:

In [14]:
def reward_function(params):
    reward_1 = reward_function_1(params)
    reward_2 = reward_function_2(params)
    
    reward = reward_1 + reward_2
    
    return (reward)

In the `reward_function`, we call the `reward_function_1` and `reward_function_2`, then we add the two reward values together. While adding two rewards, we should try to keep them in same scaling order, such as keeping both of them between 0 to 1. So that both of the rewards have similar impact to over all result. 

In the example we used, although the `reward_function_2` returns 10, while `reward_function_1` returns 1, the combination still makes sense as `reward_function_2` reward faster racing every 100 steps.  

Adding two rewards is one approach to combine them. While, in some cases, multipling them is another option.

Generally speaking, if the two reward functions can work indenpently, we should add them together. On the other side, if one reward depends on other reward, we can consider multipling them.

Say that we have a reward function return higher reward for higher speed. This reward funtion doesn't work independently, means that running faster is not always a good thing. We want the car to run on track at first. 
Under the premise of running on track, we hope that it can run faster.

The following is a new reward_function mutipling two reward values:

In [6]:
def reward_speed(params):
    reward = params['speed']
    return reward

In [7]:
def reward_function(params):
    reward_1 = reward_function_1(params)
    speed_reward = reward_speed(params)
    
    reward = reward_1 * speed_reward
    
    return reward

In [11]:
testing_params['speed'] = 1.8
reward_function(testing_params)

1.8

In above code, the total reward depends on `speed reward` reward and `near to center line` reward at the same time. Some one argues that `near to center line` reward should not depend on other rewards, as running near the center line is always right, no matter it is running faster or slower.

To address this, we can use adding approach and mutipling approch at the same time. 


In [12]:
def reward_function(params):
    reward_1 = reward_function_1(params)
    speed_reward = reward_speed(params)
    
    reward = reward_1 + reward_1 * speed_reward
    
    return reward

While, the suggestions about adding or multipling reward are not iron rules, you can make your own decision based on your understanding