## Self motivation, increasing progress per step

This reward function caculates reward by comparing current progress and expected progress.

It gives more reward if current progress is larger than expected progress.

The expected progress is caculated by setting up expected finished time.

On the re:Invent 2018 track, the world class model can finish one lap in 7 seconds. As there are 15 steps in 1 second, so we expect the car to finish the lap in 7*15 = 105 steps if it work as well as the world class model. So the expected progress per step is 100/105. With the expected progress per step and the steps parameter from input `params`, we can caculate expected progress (expected_progress_per_step * steps) and then compare it with current progress.


In [7]:
def reward_function(params):

    # Read all input parameters
    all_wheels_on_track = params['all_wheels_on_track']
    x = params['x']
    y = params['y']
    distance_from_center = params['distance_from_center']
    is_left_of_center = params['is_left_of_center']
    heading = params['heading']
    progress = params['progress']
    steps = params['steps']
    speed = params['speed']
    steering_angle = params['steering_angle']
    track_width = params['track_width']
    waypoints = params['waypoints']
    closest_waypoints = params['closest_waypoints']
    is_offtrack = params['is_offtrack']
    
    
    reward = 0.0
    
    expected_steps = 105 # for re:invent 2018, finish one lap in 7 seconds, 15*7 = 105 steps
    
    expected_progress_per_step = 100/expected_steps
    
    expected_progress = expected_progress_per_step * steps
    
    progress_reward = progress/expected_progress
    
    reward += progress_reward
    
    return float(reward)

Testing it with sample params

In [8]:
from race_utils import SampleGenerator
generator = SampleGenerator()
params = generator.random_sample()

In [10]:
reward_function(params)

0.6154348133910134

## Configs and result


The following chart is the training metrics of the `progress per step` model with higer speed. The speed is set to 1.4 to 4.



<img src="./images/08_result.png" width = "300"  alt="result"  />



And the evaluation results are listed below.

All of the three evalutions reach 100% progress, with the time used being 11 seconds.


<img src="./images/08_evaluation.png" width = "300"  alt="evaluation"  />


Action space and other hyper parameters are listed as below:

<img src="./images/08_action_space.png" width = "300"  alt="action space"  />

<img src="./images/08_hyper_parameters.png" width = "300"  alt="hyper parameters"  />

