### Feature or Model Request agentic grpo learner uses continuous batching ### Additional Context _No response_