You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tutorials/open-deep-research.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,7 +83,7 @@ The following steps execute when a training run on a new cluster begins:
83
83
-**Download the model checkpoint.**
84
84
- Usually takes a few minutes depending on the model size.
85
85
-**Train the model for a specified number of steps.**
86
-
- Each RL step involves running the research agent on benchmark questions, evaluating the results, and updating the model based on the rewards. Training time depends on the number of steps and the complexity of each research task.
86
+
- Each RL step involves running the research agent on a subset of benchmark questions, and updating the model based on the rewards. We hold out another subset of test set questions to evalutate model progress every 10 steps that we do not train on. Training time depends on the number of steps and the complexity of each research task.
0 commit comments