Question about the performance

Thanks for you great work.
I reproduce the algorithm according to the readme. But I found then success rate is very low. The model is almost unable to utilize the position and color variations of the blocks, with the generated actions being largely uncorrelated with visual observations. It only achieves a certain success rate when the red block is positioned in the center of the box.