Sagemaker rlestimator - deployment endpoint giving different prediction than evaluation

I ran into the problem of using reinforcement learning dueling double deep q network on sagemaker. Basically the endpoint predictions almost always predicts 1 values, while the same dataset in evaluation (same model checkpoint) gives even predictions among the classes.  I used this operation for the output node `main_level/agent/main/online/network_0/dueling_q_values_head_0/output`, which is the last operation in a dueling q network. 

Is this a bug within the checkpoint deployment? or my output operation is wrong? Any help would be appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sagemaker rlestimator - deployment endpoint giving different prediction than evaluation #2638

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sagemaker rlestimator - deployment endpoint giving different prediction than evaluation #2638

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions