You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The main difference is that ExperienceSource produces all traces of given length, but ExperienceSourceFirstLast returns only first and last states with calculated discounted reward between. It could be illustrated on example.
Suppose we have single episode with states 0 -> 1 -> 2 -> 3 -> 4. On the last state episode is terminated.
Suppose we have ExperienceSource(steps_count=3), then it will produce the following data on iteration:
But ExperienceSourceFirstLast(steps_count=3) will return the following:
ExperienceFirstLast(state=0, last_state=2)
ExperienceFirstLast(state=1, last_state=3)
ExperienceFirstLast(state=2, last_state=None)
ExperienceFirstLast(state=3, last_state=None)
Reward returned by ExperienceSourceFirstLast is aggregated using gamma passed on constructor.
Most of the time, ExperienceSourceFirstLast is more convenient, as we're not normally need intermediate states. But sometimes, we need more control, so, ExperienceSource could be handy.
In terms of implementation, ExperienceSourceFirstLast is a wrapper around ExperienceSource.
Can someone explain the main difference between ExperienceSourceFirstLast and ExperienceSource? Are we still storing every incoming state?
The text was updated successfully, but these errors were encountered: