## Default sample: Prevent Zig-Zag in Time Trials

It is the third sample used in DeepRacer.

The car will be penalized if it is steering too much

At the same time, it will try to follow the center line:


In [7]:
def reward_function(params):
    '''
    Example of penalize steering, which helps mitigate zig-zag behaviors
    '''
    
    # Read input parameters
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    abs_steering = abs(params['steering_angle']) # Only need the absolute steering angle

    # Calculate 3 marks that are farther and father away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track

    # Steering penality threshold, change the number based on your action space setting
    ABS_STEERING_THRESHOLD = 15 

    # Penalize reward if the car is steering too much
    if abs_steering > ABS_STEERING_THRESHOLD:
        reward *= 0.8

    return float(reward)

To test the reward function, you need to setup a testing parameter.

Beside "track_width" and "distance_from_center", parameter "steering_angle" is used, let's setup a testing parameter.

In [8]:
testing_params = dict()
testing_params['track_width'] = 1
testing_params['distance_from_center'] = 0.1
testing_params['steering_angle'] = -18


Then, let's try the reward function:

In [9]:
reward_function(testing_params)

0.8

And you can modify the testing_params to test different cases:

In [10]:
testing_params['steering_angle'] = -12
reward_function(testing_params)

1.0

Now, you can try to modify the reward_function with customized setting

The following is the copy of default first cell, you can try to edit the code without worrying about making things wrong. If you screw up, just delete all the codes you write and then copy the code from the first cell again:

In [11]:
def reward_function(params):
    '''
    Example of penalize steering, which helps mitigate zig-zag behaviors
    '''
    
    # Read input parameters
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    abs_steering = abs(params['steering_angle']) # Only need the absolute steering angle

    # Calculate 3 marks that are farther and father away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track

    # Steering penality threshold, change the number based on your action space setting
    ABS_STEERING_THRESHOLD = 15 

    # Penalize reward if the car is steering too much
    if abs_steering > ABS_STEERING_THRESHOLD:
        reward *= 0.8

    return float(reward)

After you modify the reward function, you can test it with testing-params again:

In [12]:
testing_params['steering_angle'] = -12
reward_function(testing_params)

1.0

In the following cells, there are several common grammar error in Python.

They are listed here for your reference.

## Python Tips: *= operator

There are special operators like `+=`, `-=`, `*=`, `/=` in program language.

They are applied to one variable, and the result will be set back to that variable.

The following code will get the value of `a`, increased by 1, then set back to `a`:

    a += 1

In [14]:
a = 3
a += 1
a

4

`b -= 3` will get the value of b, decrease 3, then set back to `b`

In [15]:
b = 10
b -= 3
b

7

Similar behavior for `*=` and `/=`

In [1]:
c = 7
c *= 3
c

21

In [2]:
d = 35
d /= 7
d

5.0

## Python Tips: abs() function

Function abs() will return the absolute value of one variable:

In [3]:
abs(-8)

8

In [4]:
abs(8)

8

In [20]:
abs(-1.333)

1.333

## Configs and result

To train a good model, we can tune the hyper parameters including action space and other training parameters like learning rate, batch size, etc.

If you are training your first model, it is recommended to use the default setting.

The following chart is the training metrics of the `follow the center line` model with default settings:

The Gree line is the averate reward, the Blue line is the finish rate of training, and the Red line is the finish rate of evalution:

<img src="./images/03_result.png" width = "300"  alt="result"  />



And the evaluation results are listed below.

<img src="./images/03_evaluation.png" width = "300"  alt="evaluation"  />


Action space and other hyper parameters are listed as below:

<img src="./images/03_action_space.png" width = "300"  alt="action space"  />

<img src="./images/03_hyper_parameters.png" width = "300"  alt="hyper parameters"  />

