Collect environment and action metrics during reinforcement learning #182

ricklindstrom · 2019-11-10T18:48:03Z

Creates a metrics csv file that allows train or enjoy progress to be easily explored, graphed and debugged.

For example here is a scatter plot of x and y that demonstrates the default training settings fail to train the bot and instead induce either wiggling in place or spinning in place:

The above generated with

data = pd.read_csv("metrics-training.csv") 
sns.scatterplot(x="x", y="y", hue='step', data=data)

A sample excerpt from the metrics csv file for training:

datetime,step,x,y,angle,speed,steering,center_dist,center_angle,reward,total_reward
2019-11-10 13:24:51.595017,1,2.643872463995367,2.9012396542240886,4.8690985603846855,0.3757399320602417,0.46001503,-0.13055107592213924,0.035870341709401815,3.19559684058306,3.19559684058306
2019-11-10 13:24:51.624950,2,2.6461244240452295,2.9163726845417557,4.8511330472675125,0.405402946472168,0.3595909,-0.12986506835069456,0.0538358548265737,3.1596744597449034,6.3552713003279635
2019-11-10 13:24:51.643253,3,2.646838734203294,2.9244335382732203,4.750412590849644,0.33081514835357667,0.073977984,-0.12902339261159254,0.15455631124444458,2.9496454310411275,9.30491673136909

Wrappers modifying observations or rewards should be below the MetricsWrapper. Wrappers modifying actions should be above the MetricsWrapper. Therefore the ActionWrapper was moved above the MetricsWrapper to allow the MetricsWrapper to see the modified Actions.

I'm new to python and duckietown and new to pull requests to open source projects. So your guidance and feedback are much appreciated.

liampaull · 2019-11-11T13:07:39Z

@bhairavmehta95 this looks cool can you take a look?

ricklindstrom · 2019-11-13T02:46:33Z

Some other examples. Here is a plot of reward as a function of 'center_angle' that shows the reward is well tuned for 'center_angle':
sns.scatterplot(x="center_angle", y="reward", hue='step', data=data)

Here is a plot of reward as a function of 'center_dist' that shows my reward was NOT well tuned for 'center dist':
sns.scatterplot(x="center_dist", y="reward", hue='step', data=data)

Here is an example of more successful driving

ricklindstrom · 2019-11-13T02:50:27Z

Oh. I notice that because my pull request was from master and not a branch, other things I am committing to my master are polluting the PR. Sorry. Let me know if or how you need me to fix this.

bhairavmehta95 · 2019-11-13T19:00:46Z

Sorry, just saw after LP tagged me.

@ricklindstrom thank you for this contribution, this looks fantastic. It honestly seems basically ready to go, except for:

Can you move the notebooks from the main directory to a new directory inside of learning/ called notebooks/?

Collect environment and action metrics during reinforcement learning

2349857

update reward function

d41106d

liampaull requested a review from bhairavmehta95 November 13, 2019 13:03

adding (oddly behaving) trained model

3ff5958

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collect environment and action metrics during reinforcement learning #182

Collect environment and action metrics during reinforcement learning #182

ricklindstrom commented Nov 10, 2019

liampaull commented Nov 11, 2019

ricklindstrom commented Nov 13, 2019

ricklindstrom commented Nov 13, 2019

bhairavmehta95 commented Nov 13, 2019

Collect environment and action metrics during reinforcement learning #182

Are you sure you want to change the base?

Collect environment and action metrics during reinforcement learning #182

Conversation

ricklindstrom commented Nov 10, 2019

liampaull commented Nov 11, 2019

ricklindstrom commented Nov 13, 2019

ricklindstrom commented Nov 13, 2019

bhairavmehta95 commented Nov 13, 2019