Cooperative Inverse Reinforcement Learning

Really cool paper on the value alignment problem, basically how do robots learn from a human, with problematic teaching, as in humans may not be optimal in their demos, and humans are bad at explaining their value functions. Really what we need is interactive robot-human teaching.

Their formalism results in active teaching and active learning, where the teacher has to find optimal training examples to teach the learner.

The key assumption IRL makes is that the observed behavior is optimal in the sense that the observed trajectory maximizes the sum of rewards. We call this the demonstration-by-expert (DBE) assumption. One of our contributions is to prove that this may be suboptimal behavior in a CIRL game, as H may choose to accept less reward on a particular action in order to convey more information to R.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cooperative_Inverse_Reinforcement_Learning.md

Cooperative_Inverse_Reinforcement_Learning.md

Cooperative Inverse Reinforcement Learning

Files

Cooperative_Inverse_Reinforcement_Learning.md

Latest commit

History

Cooperative_Inverse_Reinforcement_Learning.md

File metadata and controls

Cooperative Inverse Reinforcement Learning