Skip to content

Commit 5a60fa0

Browse files
authored
Update training details (#389)
1 parent d517a3a commit 5a60fa0

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/tutorials/open-deep-research.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ The following steps execute when a training run on a new cluster begins:
8383
- **Download the model checkpoint.**
8484
- Usually takes a few minutes depending on the model size.
8585
- **Train the model for a specified number of steps.**
86-
- Each RL step involves running the research agent on benchmark questions, evaluating the results, and updating the model based on the rewards. Training time depends on the number of steps and the complexity of each research task.
86+
- Each RL step involves running the research agent on a subset of benchmark questions, and updating the model based on the rewards. We hold out another subset of test set questions to evalutate model progress every 10 steps that we do not train on. Training time depends on the number of steps and the complexity of each research task.
8787
- **Upload the final model checkpoint.**
8888
- This usually takes a few minutes.
8989

0 commit comments

Comments
 (0)