Skip to content

Conversation

@chenyushuo
Copy link
Collaborator

@chenyushuo chenyushuo commented Feb 2, 2026

Description

  1. Add new sync style TRAINER_DRIVEN.
  2. [Breaking Change] Rename DYNAMIC_BY_EXPLORER to EXPLORER_DRIVEN which maintains forward compatibility.
  3. [Breaking Change] Refactor explorer and fix the issue of inconsistent behavior between NCCL and checkpoint modes in the case of EXPLORER_DRIVEN.
  4. Bug fix and refactor in checkpoint load.
  5. Fix when trainer use one gpu and memory sync method.
  6. Fix chat_template in vllm openai API server.

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

2. Refactor explorer and fix the issue of inconsistent behavior between NCCL and checkpoint modes in the case of `DYNAMIC_BY_EXPLORER`.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @chenyushuo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the system's distributed training capabilities by introducing a new synchronization style, DYNAMIC_BY_TRAINER, which empowers the trainer to initiate model synchronization. Concurrently, it addresses and rectifies existing inconsistencies in the explorer's synchronization behavior, particularly between NCCL and checkpoint-based methods under the DYNAMIC_BY_EXPLORER style. These changes aim to provide more robust and flexible control over model weight synchronization across distributed components.

Highlights

  • New Synchronization Style: Introduced a new synchronization style, DYNAMIC_BY_TRAINER, allowing the trainer to drive the synchronization process.
  • Explorer Refactoring and Consistency Fixes: Refactored the explorer's synchronization logic to resolve inconsistencies between NCCL and checkpoint modes when using the DYNAMIC_BY_EXPLORER style.
  • Synchronizer Status Management: Consolidated the RunningStatus.WAITING_SYNC state into RunningStatus.REQUIRE_SYNC within the synchronizer, simplifying status tracking.
  • Test Suite Enhancement: Performed a significant refactoring of the synchronization test suite, introducing a base class and helper methods for improved modularity and readability.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new synchronization style DYNAMIC_BY_TRAINER and performs a significant refactoring of the synchronization logic between the explorer and trainer. The changes simplify the state machine by removing the WAITING_SYNC status and centralizing state transitions within the Synchronizer. A crucial fix is included for DYNAMIC_BY_EXPLORER mode to prevent the explorer from hanging when the trainer has no new model to sync. The tests have also been substantially refactored for better readability and maintainability, which is a great improvement.

My main feedback points are:

  1. A potential semantic change in the DYNAMIC_BY_EXPLORER sync style that makes it behave like a fixed-step schedule.
  2. A suggestion to simplify the logic in a test monkey patch to improve clarity and remove a redundant call.

Overall, this is a solid refactoring with valuable improvements.

@chenyushuo
Copy link
Collaborator Author

/unittest-all

2. Bug fix and refactor in checkpoint load.
3. Fix when trainer use one gpu and memory sync method.
@chenyushuo
Copy link
Collaborator Author

/unittest-all

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
253 243 3 7 0 0 1h 21m

Failed Tests

Failed Tests ❌ Fail Message
❌ tests/common/vllm_test.py::TestLogprobs::test_logprobs_api The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer The test failed in the call phase due to an assertion error

Skipped

Tests Status
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class skipped ⏭️
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner skipped ⏭️
tests/utils/swanlab_test.py::TestSwanlabMonitor::test_swanlab_monitor_smoke skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_std_grpo 5.5s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_step_wise_grpo_advantage 3.6s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo 5.2s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage 3.5s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias 2.1s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std 1.7s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage 2.0s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_with_std_threshold 2.4s
tests/algorithm/kl_fn_test.py::KLFnTest::test_abs_kl_fn 1.9s
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_fallback 961ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_loss 1.0s
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_same_policy 882ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_with_old_logprob 840ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_dummy_kl_fn 831ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k1_kl_fn 838ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k2_kl_fn 876ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k3_kl_fn 820ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_kl_loss_aggregation_modes 893ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_low_var_kl_fn 863ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss 2.2s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss 2.0s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss 3.5s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss 1.5s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss 1.2s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss_with_sequence_masking 1.3s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sapo_policy_loss 1.9s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss 1.0s
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline 3h 5m
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_pass_rate_calculation 1h 47m
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_experience_buffer 42m 36s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_0_sft 1h 13m
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_1_dpo 1h 19m
tests/buffer/file_test.py::TestFileBuffer::test_file_reader 6m 19s
tests/buffer/file_test.py::TestFileBuffer::test_file_writer 28m 34s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter 8m 36s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter 7m 53s
tests/buffer/formatter_test.py::TestFormatter::test_multi_modal_sft_formatter 13m 58s
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter 17m 8s
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter 12m 10s
tests/buffer/formatter_test.py::TestFormatter::test_task_formatter 3m 45s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse 1h 46m
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity 35m 13s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_reuse_count_control 1h 7m
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue 51m 19s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue 54m 32s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity 1h 2m
tests/buffer/reader_test.py::TestBufferReader::test_buffer_reader_registration 13m 23s
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage 6.7s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_default_sample_strategy 34m 18s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_staleness_control_sample_strategy 32m 40s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_default_sample_strategy 32m 32s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_staleness_control_sample_strategy 29m 30s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_sql_staleness_control_sample_strategy 1h 13m
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_default_sample_strategy 37m 43s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_staleness_control_sample_strategy 33m 13s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_default_sample_strategy 33m 37s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_staleness_control_sample_strategy 33m 24s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_sql_staleness_control_sample_strategy 1h 3m
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_0 1h 36m
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_1 38m 24s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_task_buffer_read_write 46m 19s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_0 5m 8s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_1 4m 56s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_2 5m 38s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_3 5m 25s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_4 5m 26s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_5 5m 26s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_6 5m 43s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_simple 5m 4s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_0_file 6m 9s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_1_sql 49m 56s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_2_file 40.4s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_3_sql 44m 37s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_4_file 41.2s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_5_sql 56m 19s
tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode 11h 3m
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command 1h 42m
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc 27m 6s
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command 5m 21s
tests/cli/launcher_test.py::TestLauncherMain::test_multi_stage_run 4h 1m
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 8h 49m
tests/common/config_test.py::TestConfig::test_chat_template_path 4m 55s
tests/common/config_test.py::TestConfig::test_config_flatten 31.6s
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 6m 16s
tests/common/config_test.py::TestConfig::test_default_workflow 4m 53s
tests/common/config_test.py::TestConfig::test_load_default_config 1h 19m
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 4m 56s
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 14m 8s
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 6m 12s
tests/common/experience_test.py::TestEID::test_eid_properties 497ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 540ms
tests/common/experience_test.py::TestExperience::test_assertions 319ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 368ms
tests/common/experience_test.py::TestExperience::test_gather 809ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward 572ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 14.5s
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 355ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1.0s
tests/common/experience_test.py::TestExperience::test_single_turn_experience 353ms
tests/common/experience_test.py::TestExperience::test_to_dict 336ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 680ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 552ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 796ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 491ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 587ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution 791ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes 838ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled 752ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board 253ms
tests/common/sudoku_test.py::test_judge_detects_row_violation 237ms
tests/common/sudoku_test.py::test_judge_detects_column_violation 219ms
tests/common/sudoku_test.py::test_judge_detects_block_violation 242ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution 265ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled 277ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation 215ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation 229ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 15h 39m
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 10h 53m
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 10h 52m
tests/common/vllm_test.py::TestModelLen_0::test_model_len 8h 50m
tests/common/vllm_test.py::TestModelLen_1::test_model_len 7h 40m
tests/common/vllm_test.py::TestModelLen_2::test_model_len 7h 35m
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 7h 36m
tests/common/vllm_test.py::TestMessageProcess::test_no_prompt_truncation 7h 41m
tests/common/vllm_test.py::TestMessageProcess::test_truncation_status 7h 44m
tests/common/vllm_test.py::TestAPIServer::test_api 7h 42m
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 7h 28m
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 7h 39m
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 592ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 3m 58s
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 3m 58s
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 8h 43m
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 7h 43m
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 19h 24m
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 11h 30m
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 14h 39m
tests/explorer/explorer_test.py::TestExplorerEvalDetailedStats::test_explorer 13h 42m
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer 15h 34m
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer 50h 16m
tests/explorer/explorer_test.py::ServeTest::test_serve 15h 15m
tests/explorer/proxy_test.py::RecorderTest::test_recorder 1m 22s
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow 1h 23m
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations 1h 20m
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout 3h 34m
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results 5h 39m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0 1h 27m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1 1h 18m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0 1h 22m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1 1h 20m
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution 1h 27m
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow 1h 24m
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait 2h 30m
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods 4h 13m
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop 2h 36m
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks 2h 20m
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid 7h 2m
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all 2h 14m
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch 3h 50m
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection 2h 55m
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0 1.8s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1 10m 2s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0 1.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1 16m 42s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error 889ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps 16m 43s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 12.8s
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 20.3s
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 2m 42s
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow 4.8s
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 12.0s
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 7.9s
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0 911ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1 1m 41s
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0 1.2s
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1 3m 21s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow 6h 42m
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow 6h 36m
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording 1h 6m
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v0 12m 59s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1 14.0s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner 2m 18s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state 2h 14m
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai 7h 16m
tests/explorer/workflow_test.py::TestConcurrentWorkflowRunner::test_concurrent_workflow_runner 9h 29m
tests/manager/synchronizer_test.py::TestSynchronizerExit_0::test_synchronizer 30h 33m
tests/manager/synchronizer_test.py::TestSynchronizerExit_1::test_synchronizer 29h 50m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer 21h 54m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer 21h 2m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer 21h 17m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer 30h 10m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_4::test_synchronizer 26h 2m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_5::test_synchronizer 29h 43m
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer 19h 44m
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer 19h 15m
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_2::test_synchronizer 19h 41m
tests/service/data_juicer_test.py::TestDataJuicer::test_config 18m 15s
tests/service/data_juicer_test.py::TestDataJuicer::test_server_start 5h 57m
tests/service/data_juicer_test.py::TestDataJuicerExperiencePipeline::test_data_juicer_operators 5h 44m
tests/service/data_juicer_test.py::TestDataJuicerTaskPipeline::test_data_juicer_task_pipeline 4h 12m
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer 42h 18m
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer 48h 12m
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 21h 18m
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer 15h 36m
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer 14h 32m
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer 15h 54m
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer 17h 44m
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer 33h 26m
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 9h 49m
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer 8h 34m
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools 9h 3m
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode 26h 14m
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode 25h 40m
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode 38h 16m
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer 37h 23m
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer 167h 30m
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer 27h 29m
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer 30h 24m
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer ⏭️ 19m 3s
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer ⏭️ 20m 51s
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer 38h 36m
tests/trainer/trainer_test.py::TestOverRollout::test_trainer 13h 18m
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer 13h 9m
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer ⏭️ 681ms
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class ⏭️ 348ms
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner ⏭️ 387ms
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_equivalent 9.8s
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_not_equivalent 1.2s
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_ground_truth 1.8s
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_solution_string 337ms
tests/utils/eval_utils_test.py::TestComputeScore::test_multiple_boxed_answers_in_solution 1.9s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_equivalent 1.2s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_not_equivalent 1.2s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_not_boxed 340ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_raw_and_ground_truth_boxed_equivalent 1.1s
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer 5.0s
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer 1m 2s
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv 5.6s
tests/utils/log_test.py::LogTest::test_actor_log 35m 47s
tests/utils/log_test.py::LogTest::test_group_by_node 38m 48s
tests/utils/log_test.py::LogTest::test_no_actor_log 18m 26s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_0__workspace_tests_utils_plugins 5m 2s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_1_tests_utils_plugins 5m 3s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_0__workspace_tests_utils_plugins 2h 27m
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_1_tests_utils_plugins 2h 26m
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_0__workspace_tests_utils_plugins 1h 30m
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_1_tests_utils_plugins 1h 26m
tests/utils/registry_test.py::TestRegistryWithRay::test_dynamic_import 39m 53s
tests/utils/registry_test.py::TestRegistry::test_algorithm_registry_mapping 8.8s
tests/utils/registry_test.py::TestRegistry::test_buffer_module_registry_mapping 3.1s
tests/utils/registry_test.py::TestRegistry::test_common_module_registry_mapping 55.8s
tests/utils/registry_test.py::TestRegistry::test_register_module 492ms
tests/utils/registry_test.py::TestRegistry::test_utils_module_registry_mapping 671ms
tests/utils/swanlab_test.py::TestSwanlabMonitor::test_swanlab_monitor_smoke ⏭️ 408ms

Github Test Reporter by CTRF 💚

@chenyushuo
Copy link
Collaborator Author

/unittest-all

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
253 246 0 7 0 0 1h 17m

Skipped

Tests Status
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class skipped ⏭️
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner skipped ⏭️
tests/utils/swanlab_test.py::TestSwanlabMonitor::test_swanlab_monitor_smoke skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_std_grpo 5.5s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_step_wise_grpo_advantage 3.5s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo 5.3s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage 3.6s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias 2.1s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std 1.7s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage 2.0s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_with_std_threshold 2.4s
tests/algorithm/kl_fn_test.py::KLFnTest::test_abs_kl_fn 1.8s
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_fallback 938ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_loss 1.0s
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_same_policy 880ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_with_old_logprob 842ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_dummy_kl_fn 846ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k1_kl_fn 805ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k2_kl_fn 872ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k3_kl_fn 813ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_kl_loss_aggregation_modes 875ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_low_var_kl_fn 859ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss 2.2s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss 2.0s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss 3.5s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss 1.5s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss 1.2s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss_with_sequence_masking 1.3s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sapo_policy_loss 2.0s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss 991ms
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline 2h 59m
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_pass_rate_calculation 1h 42m
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_experience_buffer 45m 22s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_0_sft 1h 15m
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_1_dpo 1h 18m
tests/buffer/file_test.py::TestFileBuffer::test_file_reader 6m 35s
tests/buffer/file_test.py::TestFileBuffer::test_file_writer 28m 11s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter 8m 57s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter 8m 15s
tests/buffer/formatter_test.py::TestFormatter::test_multi_modal_sft_formatter 14m 24s
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter 17m 44s
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter 12m 33s
tests/buffer/formatter_test.py::TestFormatter::test_task_formatter 3m 54s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse 1h 47m
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity 34m 54s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_reuse_count_control 1h 10m
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue 51m 23s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue 54m 54s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity 1h 2m
tests/buffer/reader_test.py::TestBufferReader::test_buffer_reader_registration 13m 46s
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage 6.4s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_default_sample_strategy 34m 7s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_staleness_control_sample_strategy 33m 16s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_default_sample_strategy 32m 48s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_staleness_control_sample_strategy 30m 3s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_sql_staleness_control_sample_strategy 1h 14m
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_default_sample_strategy 36m 2s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_staleness_control_sample_strategy 33m 7s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_default_sample_strategy 33m 20s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_staleness_control_sample_strategy 32m 57s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_sql_staleness_control_sample_strategy 1h 4m
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_0 1h 36m
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_1 42m 8s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_task_buffer_read_write 49m 4s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_0 5m 22s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_1 4m 56s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_2 5m 20s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_3 5m 21s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_4 5m 19s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_5 5m 26s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_6 5m 55s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_simple 4m 45s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_0_file 6m 28s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_1_sql 50m 25s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_2_file 41.0s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_3_sql 47m 40s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_4_file 41.7s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_5_sql 52m 55s
tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode 11h 8m
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command 1h 40m
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc 26m 58s
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command 5m 2s
tests/cli/launcher_test.py::TestLauncherMain::test_multi_stage_run 4h
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 8h 50m
tests/common/config_test.py::TestConfig::test_chat_template_path 4m 54s
tests/common/config_test.py::TestConfig::test_config_flatten 32.0s
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 6m 19s
tests/common/config_test.py::TestConfig::test_default_workflow 4m 53s
tests/common/config_test.py::TestConfig::test_load_default_config 1h 41m
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 5m 2s
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 14m 15s
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 11m 41s
tests/common/experience_test.py::TestEID::test_eid_properties 491ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 548ms
tests/common/experience_test.py::TestExperience::test_assertions 312ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 401ms
tests/common/experience_test.py::TestExperience::test_gather 812ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward 578ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 15.6s
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 442ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1.1s
tests/common/experience_test.py::TestExperience::test_single_turn_experience 355ms
tests/common/experience_test.py::TestExperience::test_to_dict 351ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 696ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 550ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 807ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 504ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 597ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution 713ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes 757ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled 705ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board 263ms
tests/common/sudoku_test.py::test_judge_detects_row_violation 250ms
tests/common/sudoku_test.py::test_judge_detects_column_violation 232ms
tests/common/sudoku_test.py::test_judge_detects_block_violation 246ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution 274ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled 269ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation 222ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation 240ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 15h 48m
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 11h 4m
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 13h
tests/common/vllm_test.py::TestModelLen_0::test_model_len 8h 43m
tests/common/vllm_test.py::TestModelLen_1::test_model_len 7h 40m
tests/common/vllm_test.py::TestModelLen_2::test_model_len 7h 37m
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 7h 57m
tests/common/vllm_test.py::TestMessageProcess::test_no_prompt_truncation 7h 1m
tests/common/vllm_test.py::TestMessageProcess::test_truncation_status 7h 29m
tests/common/vllm_test.py::TestAPIServer::test_api 8h 12m
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 7h 21m
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 8h 12m
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 755ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 3m 56s
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 3m 48s
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 8h 54m
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 7h 47m
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 19h 23m
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 11h 30m
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 13h 46m
tests/explorer/explorer_test.py::TestExplorerEvalDetailedStats::test_explorer 12h 28m
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer 16h 31m
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer 50h 14m
tests/explorer/explorer_test.py::ServeTest::test_serve 17h 42m
tests/explorer/proxy_test.py::RecorderTest::test_recorder 1m 24s
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow 1h 22m
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations 1h 20m
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout 3h 34m
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results 5h 37m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0 1h 19m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1 1h 19m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0 1h 18m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1 1h 18m
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution 1h 29m
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow 1h 21m
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait 2h 27m
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods 4h 8m
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop 2h 35m
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks 2h 15m
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid 7h 1m
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all 2h 13m
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch 3h 46m
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection 2h 50m
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0 1.7s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1 10m 2s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0 1.9s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1 16m 43s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error 1.3s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps 16m 46s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 13.7s
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 18.3s
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 11m 25s
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow 4.4s
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 11.6s
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 7.9s
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0 767ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1 1m 41s
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0 768ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1 3m 21s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow 6h 28m
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow 6h 36m
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording 1h 6m
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v0 12m 53s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1 13.8s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner 2m 18s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state 2h 14m
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai 7h 19m
tests/explorer/workflow_test.py::TestConcurrentWorkflowRunner::test_concurrent_workflow_runner 9h 20m
tests/manager/synchronizer_test.py::TestSynchronizerExit_0::test_synchronizer 30h 5m
tests/manager/synchronizer_test.py::TestSynchronizerExit_1::test_synchronizer 28h 26m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer 22h 45m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer 20h 43m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer 21h 44m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer 29h 49m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_4::test_synchronizer 26h 15m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_5::test_synchronizer 29h 46m
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer 19h 56m
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer 18h 59m
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_2::test_synchronizer 19h 53m
tests/service/data_juicer_test.py::TestDataJuicer::test_config 18m 47s
tests/service/data_juicer_test.py::TestDataJuicer::test_server_start 5h 57m
tests/service/data_juicer_test.py::TestDataJuicerExperiencePipeline::test_data_juicer_operators 5h 38m
tests/service/data_juicer_test.py::TestDataJuicerTaskPipeline::test_data_juicer_task_pipeline 4h 12m
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer 39h 34m
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer 64h 10m
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 19h 48m
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer 15h 59m
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer 15h 9m
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer 15h 28m
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer 18h 53m
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer 34h 33m
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 9h 16m
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer 8h 21m
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools 8h 26m
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode 25h 16m
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode 25h 13m
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode 38h 37m
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer 34h 28m
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer 89h 51m
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer 27h 49m
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer 29h 22m
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer ⏭️ 16m 18s
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer ⏭️ 20m 40s
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer 37h 38m
tests/trainer/trainer_test.py::TestOverRollout::test_trainer 13h 54m
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer 13h 6m
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer ⏭️ 734ms
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class ⏭️ 294ms
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner ⏭️ 308ms
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_equivalent 10.1s
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_not_equivalent 1.2s
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_ground_truth 1.7s
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_solution_string 284ms
tests/utils/eval_utils_test.py::TestComputeScore::test_multiple_boxed_answers_in_solution 1.8s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_equivalent 1.1s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_not_equivalent 1.2s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_not_boxed 267ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_raw_and_ground_truth_boxed_equivalent 1.1s
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer 4.4s
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer 1m 3s
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv 5.2s
tests/utils/log_test.py::LogTest::test_actor_log 36m 10s
tests/utils/log_test.py::LogTest::test_group_by_node 35m 50s
tests/utils/log_test.py::LogTest::test_no_actor_log 14m 30s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_0__workspace_tests_utils_plugins 5m 3s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_1_tests_utils_plugins 4m 58s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_0__workspace_tests_utils_plugins 2h 25m
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_1_tests_utils_plugins 2h 26m
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_0__workspace_tests_utils_plugins 1h 24m
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_1_tests_utils_plugins 1h 22m
tests/utils/registry_test.py::TestRegistryWithRay::test_dynamic_import 38m 21s
tests/utils/registry_test.py::TestRegistry::test_algorithm_registry_mapping 8.9s
tests/utils/registry_test.py::TestRegistry::test_buffer_module_registry_mapping 3.3s
tests/utils/registry_test.py::TestRegistry::test_common_module_registry_mapping 57.4s
tests/utils/registry_test.py::TestRegistry::test_register_module 576ms
tests/utils/registry_test.py::TestRegistry::test_utils_module_registry_mapping 707ms
tests/utils/swanlab_test.py::TestSwanlabMonitor::test_swanlab_monitor_smoke ⏭️ 412ms

Github Test Reporter by CTRF 💚

@chenyushuo chenyushuo changed the title Add new sync style DYNAMIC_BY_TRAINER. Add new sync style TRAINER_DRIVEN. Feb 3, 2026
@chenyushuo
Copy link
Collaborator Author

/unittest-all

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
253 245 1 7 0 0 1h 17m

Failed Tests

Failed Tests ❌ Fail Message
❌ tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer The test failed in the call phase due to an assertion error

Skipped

Tests Status
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class skipped ⏭️
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner skipped ⏭️
tests/utils/swanlab_test.py::TestSwanlabMonitor::test_swanlab_monitor_smoke skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_std_grpo 5.5s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_step_wise_grpo_advantage 3.6s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo 5.3s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage 3.5s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias 2.1s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std 1.6s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage 2.0s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_with_std_threshold 2.4s
tests/algorithm/kl_fn_test.py::KLFnTest::test_abs_kl_fn 1.8s
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_fallback 963ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_loss 1.0s
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_same_policy 881ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_with_old_logprob 843ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_dummy_kl_fn 858ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k1_kl_fn 798ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k2_kl_fn 873ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k3_kl_fn 819ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_kl_loss_aggregation_modes 907ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_low_var_kl_fn 888ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss 2.2s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss 2.0s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss 3.5s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss 1.5s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss 1.2s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss_with_sequence_masking 1.3s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sapo_policy_loss 2.0s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss 997ms
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline 3h 6m
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_pass_rate_calculation 1h 41m
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_experience_buffer 46m 11s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_0_sft 1h 11m
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_1_dpo 1h 21m
tests/buffer/file_test.py::TestFileBuffer::test_file_reader 6m 20s
tests/buffer/file_test.py::TestFileBuffer::test_file_writer 28m 10s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter 8m 31s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter 8m 12s
tests/buffer/formatter_test.py::TestFormatter::test_multi_modal_sft_formatter 14m 28s
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter 16m 51s
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter 12m 16s
tests/buffer/formatter_test.py::TestFormatter::test_task_formatter 3m 54s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse 1h 46m
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity 34m 45s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_reuse_count_control 1h 7m
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue 51m 28s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue 50m 54s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity 1h 2m
tests/buffer/reader_test.py::TestBufferReader::test_buffer_reader_registration 13m 21s
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage 6.4s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_default_sample_strategy 37m 15s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_staleness_control_sample_strategy 29m 54s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_default_sample_strategy 32m 48s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_staleness_control_sample_strategy 33m 4s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_sql_staleness_control_sample_strategy 1h 17m
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_default_sample_strategy 35m 27s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_staleness_control_sample_strategy 29m 24s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_default_sample_strategy 30m 9s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_staleness_control_sample_strategy 32m 49s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_sql_staleness_control_sample_strategy 1h 2m
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_0 1h 36m
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_1 37m 48s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_task_buffer_read_write 46m 1s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_0 5m 1s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_1 4m 45s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_2 5m 20s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_3 5m 21s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_4 5m 39s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_5 5m 32s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_6 5m 35s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_simple 4m 35s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_0_file 6m 49s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_1_sql 44m 40s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_2_file 40.3s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_3_sql 50m 8s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_4_file 41.3s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_5_sql 53m 22s
tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode 10h 58m
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command 1h 40m
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc 27m 46s
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command 5m 10s
tests/cli/launcher_test.py::TestLauncherMain::test_multi_stage_run 4h 2m
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 8h 53m
tests/common/config_test.py::TestConfig::test_chat_template_path 4m 58s
tests/common/config_test.py::TestConfig::test_config_flatten 31.9s
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 6m 20s
tests/common/config_test.py::TestConfig::test_default_workflow 4m 56s
tests/common/config_test.py::TestConfig::test_load_default_config 1h 41m
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 5m 28s
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 5m 6s
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 6m 29s
tests/common/experience_test.py::TestEID::test_eid_properties 530ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 529ms
tests/common/experience_test.py::TestExperience::test_assertions 334ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 424ms
tests/common/experience_test.py::TestExperience::test_gather 841ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward 610ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 14.8s
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 371ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1.1s
tests/common/experience_test.py::TestExperience::test_single_turn_experience 374ms
tests/common/experience_test.py::TestExperience::test_to_dict 345ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 728ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 585ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 806ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 513ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 617ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution 1.1s
tests/common/sudoku_test.py::test_9x9_generator_creates_holes 899ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled 953ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board 267ms
tests/common/sudoku_test.py::test_judge_detects_row_violation 236ms
tests/common/sudoku_test.py::test_judge_detects_column_violation 247ms
tests/common/sudoku_test.py::test_judge_detects_block_violation 233ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution 298ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled 265ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation 247ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation 230ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 15h 38m
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 11h 4m
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 10h 38m
tests/common/vllm_test.py::TestModelLen_0::test_model_len 7h 41m
tests/common/vllm_test.py::TestModelLen_1::test_model_len 7h 30m
tests/common/vllm_test.py::TestModelLen_2::test_model_len 7h 44m
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 7h 35m
tests/common/vllm_test.py::TestMessageProcess::test_no_prompt_truncation 7h 27m
tests/common/vllm_test.py::TestMessageProcess::test_truncation_status 8h 33m
tests/common/vllm_test.py::TestAPIServer::test_api 8h 12m
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 7h 28m
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 8h 13m
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 748ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 4m 15s
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 3m 51s
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 8h 48m
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 7h 43m
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 19h 56m
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 11h 29m
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 14h 23m
tests/explorer/explorer_test.py::TestExplorerEvalDetailedStats::test_explorer 12h 21m
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer 16h 14m
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer 50h 14m
tests/explorer/explorer_test.py::ServeTest::test_serve 15h 26m
tests/explorer/proxy_test.py::RecorderTest::test_recorder 1m 25s
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow 1h 24m
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations 1h 25m
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout 3h 33m
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results 5h 37m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0 1h 22m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1 1h 19m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0 1h 19m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1 1h 18m
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution 1h 29m
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow 1h 20m
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait 2h 25m
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods 4h 3m
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop 2h 37m
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks 2h 15m
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid 6h 58m
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all 2h 12m
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch 3h 48m
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection 2h 49m
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0 2.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1 10m 2s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0 1.1s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1 16m 42s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error 967ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps 16m 43s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 13.6s
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 19.1s
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 12m 1s
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow 8.5s
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 12.3s
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 7.9s
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0 987ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1 1m 41s
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0 1.1s
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1 3m 21s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow 6h 26m
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow 6h 35m
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording 1h 6m
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v0 13m 22s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1 15.2s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner 2m 18s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state 2h 14m
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai 7h 7m
tests/explorer/workflow_test.py::TestConcurrentWorkflowRunner::test_concurrent_workflow_runner 9h 22m
tests/manager/synchronizer_test.py::TestSynchronizerExit_0::test_synchronizer 28h
tests/manager/synchronizer_test.py::TestSynchronizerExit_1::test_synchronizer 28h 40m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer 23h 3m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer 20h 20m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer 21h 25m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer 29h 10m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_4::test_synchronizer 27h 31m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_5::test_synchronizer 29h 19m
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer 19h 52m
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer 19h 5m
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_2::test_synchronizer 19h 42m
tests/service/data_juicer_test.py::TestDataJuicer::test_config 20m 7s
tests/service/data_juicer_test.py::TestDataJuicer::test_server_start 5h 57m
tests/service/data_juicer_test.py::TestDataJuicerExperiencePipeline::test_data_juicer_operators 5h 37m
tests/service/data_juicer_test.py::TestDataJuicerTaskPipeline::test_data_juicer_task_pipeline 5h 6m
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer 38h 45m
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer 66h 13m
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 22h 14m
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer 15h 7m
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer 14h 35m
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer 15h 57m
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer 17h 38m
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer 33h 52m
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 9h 33m
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer 8h 38m
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools 8h 34m
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode 25h 10m
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode 25h 11m
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode 38h 26m
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer 40h 19m
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer 89h 48m
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer 28h 15m
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer 28h 45m
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer ⏭️ 37m 30s
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer ⏭️ 20m 31s
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer 38h 46m
tests/trainer/trainer_test.py::TestOverRollout::test_trainer 14h 8m
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer 13h 25m
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer ⏭️ 631ms
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class ⏭️ 295ms
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner ⏭️ 297ms
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_equivalent 9.8s
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_not_equivalent 1.1s
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_ground_truth 1.7s
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_solution_string 292ms
tests/utils/eval_utils_test.py::TestComputeScore::test_multiple_boxed_answers_in_solution 1.8s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_equivalent 1.1s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_not_equivalent 1.1s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_not_boxed 278ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_raw_and_ground_truth_boxed_equivalent 1.1s
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer 4.4s
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer 59.8s
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv 5.2s
tests/utils/log_test.py::LogTest::test_actor_log 35m 50s
tests/utils/log_test.py::LogTest::test_group_by_node 34m 37s
tests/utils/log_test.py::LogTest::test_no_actor_log 14m 24s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_0__workspace_tests_utils_plugins 5m 5s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_1_tests_utils_plugins 5m 4s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_0__workspace_tests_utils_plugins 2h 27m
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_1_tests_utils_plugins 2h 25m
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_0__workspace_tests_utils_plugins 1h 28m
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_1_tests_utils_plugins 1h 22m
tests/utils/registry_test.py::TestRegistryWithRay::test_dynamic_import 40m
tests/utils/registry_test.py::TestRegistry::test_algorithm_registry_mapping 8.5s
tests/utils/registry_test.py::TestRegistry::test_buffer_module_registry_mapping 3.2s
tests/utils/registry_test.py::TestRegistry::test_common_module_registry_mapping 56.0s
tests/utils/registry_test.py::TestRegistry::test_register_module 474ms
tests/utils/registry_test.py::TestRegistry::test_utils_module_registry_mapping 666ms
tests/utils/swanlab_test.py::TestSwanlabMonitor::test_swanlab_monitor_smoke ⏭️ 385ms

Github Test Reporter by CTRF 💚

@chenyushuo
Copy link
Collaborator Author

/unittest-all

@chenyushuo
Copy link
Collaborator Author

/unittest-all

@github-actions
Copy link

github-actions bot commented Feb 4, 2026

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
254 246 1 7 0 0 1h 34m

Failed Tests

Failed Tests ❌ Fail Message
❌ tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode The test failed in the call phase due to an exception

Skipped

Tests Status
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class skipped ⏭️
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner skipped ⏭️
tests/utils/swanlab_test.py::TestSwanlabMonitor::test_swanlab_monitor_smoke skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_std_grpo 5.3s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_step_wise_grpo_advantage 3.5s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo 5.2s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage 3.5s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias 2.0s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std 1.7s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage 2.0s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_with_std_threshold 2.4s
tests/algorithm/kl_fn_test.py::KLFnTest::test_abs_kl_fn 1.7s
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_fallback 922ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_loss 948ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_same_policy 859ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_with_old_logprob 848ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_dummy_kl_fn 814ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k1_kl_fn 799ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k2_kl_fn 888ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k3_kl_fn 795ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_kl_loss_aggregation_modes 865ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_low_var_kl_fn 857ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss 2.1s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss 2.0s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss 3.4s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss 1.5s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss 1.2s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss_with_sequence_masking 1.3s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sapo_policy_loss 1.9s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss 1.0s
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline 3h 4m
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_pass_rate_calculation 1h 43m
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_experience_buffer 43m 27s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_0_sft 1h 14m
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_1_dpo 1h 19m
tests/buffer/file_test.py::TestFileBuffer::test_file_reader 6m 41s
tests/buffer/file_test.py::TestFileBuffer::test_file_writer 28m 8s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter 8m 33s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter 7m 55s
tests/buffer/formatter_test.py::TestFormatter::test_multi_modal_sft_formatter 13m 46s
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter 17m 9s
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter 12m 4s
tests/buffer/formatter_test.py::TestFormatter::test_task_formatter 3m 44s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse 1h 50m
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity 37m 28s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_reuse_count_control 1h 10m
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue 51m 24s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue 54m 15s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity 1h 1m
tests/buffer/reader_test.py::TestBufferReader::test_buffer_reader_registration 13m 43s
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage 6.3s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_default_sample_strategy 34m 35s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_staleness_control_sample_strategy 29m 53s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_default_sample_strategy 29m 39s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_staleness_control_sample_strategy 29m 51s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_sql_staleness_control_sample_strategy 1h 18m
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_default_sample_strategy 37m 54s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_staleness_control_sample_strategy 30m 34s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_default_sample_strategy 29m 54s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_staleness_control_sample_strategy 32m 41s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_sql_staleness_control_sample_strategy 1h 2m
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_0 1h 37m
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_1 38m 1s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_task_buffer_read_write 45m 27s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_0 5m 17s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_1 4m 53s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_2 5m 17s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_3 5m 20s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_4 5m 18s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_5 5m 21s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_6 5m 33s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_simple 4m 36s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_0_file 5m 58s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_1_sql 50m 27s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_2_file 40.7s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_3_sql 44m 48s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_4_file 41.5s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_5_sql 55m 49s
tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode 15h 22m
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command 1h 40m
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc 26m 57s
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command 5m 1s
tests/cli/launcher_test.py::TestLauncherMain::test_multi_stage_run 3h 59m
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 8h 46m
tests/common/config_test.py::TestConfig::test_chat_template_path 5m 22s
tests/common/config_test.py::TestConfig::test_config_flatten 32.3s
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 6m 14s
tests/common/config_test.py::TestConfig::test_default_workflow 4m 52s
tests/common/config_test.py::TestConfig::test_load_default_config 1h 21m
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 13m 47s
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 5m 27s
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 6m 11s
tests/common/experience_test.py::TestEID::test_eid_properties 565ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 513ms
tests/common/experience_test.py::TestExperience::test_assertions 342ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 394ms
tests/common/experience_test.py::TestExperience::test_gather 802ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward 575ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 14.4s
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 350ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1.0s
tests/common/experience_test.py::TestExperience::test_single_turn_experience 346ms
tests/common/experience_test.py::TestExperience::test_to_dict 316ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 668ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 547ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 806ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 516ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 601ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution 783ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes 708ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled 593ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board 263ms
tests/common/sudoku_test.py::test_judge_detects_row_violation 228ms
tests/common/sudoku_test.py::test_judge_detects_column_violation 242ms
tests/common/sudoku_test.py::test_judge_detects_block_violation 227ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution 286ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled 265ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation 244ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation 236ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 15h 27m
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 11h 19m
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 10h 38m
tests/common/vllm_test.py::TestModelLen_0::test_model_len 8h 49m
tests/common/vllm_test.py::TestModelLen_1::test_model_len 7h 35m
tests/common/vllm_test.py::TestModelLen_2::test_model_len 7h 22m
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 7h 36m
tests/common/vllm_test.py::TestMessageProcess::test_no_prompt_truncation 7h 35m
tests/common/vllm_test.py::TestMessageProcess::test_truncation_status 7h 26m
tests/common/vllm_test.py::TestAPIServer::test_api 8h 1m
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 7h 23m
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 8h 6m
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 644ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 3m 51s
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 3m 44s
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 9h
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 8h 45m
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 25h 56m
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 11h 33m
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 27h 57m
tests/explorer/explorer_test.py::TestExplorerEvalDetailedStats::test_explorer 20h 26m
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer 14h 10m
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer 50h 44m
tests/explorer/explorer_test.py::ServeTest::test_serve 15h 28m
tests/explorer/proxy_test.py::RecorderTest::test_recorder 59.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow 1h 22m
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations 1h 26m
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout 3h 32m
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results 5h 39m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0 1h 21m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1 1h 19m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0 1h 18m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1 1h 19m
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution 1h 30m
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow 1h 22m
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait 2h 27m
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods 4h 5m
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop 2h 29m
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks 2h 12m
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid 6h 58m
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all 2h 14m
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch 3h 47m
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection 2h 54m
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0 1.9s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1 10m 2s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0 1.1s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1 16m 42s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error 1.3s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps 16m 43s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 12m 9s
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 16.6s
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 2m 9s
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow 4.4s
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 11.5s
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 7.7s
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0 795ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1 1m 41s
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0 788ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1 3m 21s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow 6h 29m
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow 6h 27m
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording 1h 6m
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v0 12m 36s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1 13.4s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner 2m 17s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state 2h 14m
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai 7h 25m
tests/explorer/workflow_test.py::TestConcurrentWorkflowRunner::test_concurrent_workflow_runner 11h 6m
tests/manager/synchronizer_test.py::TestSynchronizerExit_0::test_synchronizer 41h 18m
tests/manager/synchronizer_test.py::TestSynchronizerExit_1::test_synchronizer 41h 48m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer 35h 25m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer 27h 27m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer 33h 40m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer 44h 56m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_4::test_synchronizer 37h 46m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_5::test_synchronizer 44h 29m
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer 19h 34m
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer 18h 11m
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_2::test_synchronizer 18h 3m
tests/service/data_juicer_test.py::TestDataJuicer::test_config 18m 12s
tests/service/data_juicer_test.py::TestDataJuicer::test_server_start 5h 57m
tests/service/data_juicer_test.py::TestDataJuicerExperiencePipeline::test_data_juicer_operators 5h 39m
tests/service/data_juicer_test.py::TestDataJuicerTaskPipeline::test_data_juicer_task_pipeline 4h 31m
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer 61h 34m
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer 86h 11m
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 27h 51m
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer 18h 33m
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer 18h 27m
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer 19h 7m
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer 21h 57m
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer 39h 39m
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 9h 51m
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer 8h 23m
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools 8h 58m
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode 28h 2m
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode 28h 2m
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode 39h 49m
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer 48h 10m
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer 95h 20m
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer 34h 52m
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer 29h 54m
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer ⏭️ 20m 29s
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer ⏭️ 16m
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer 50h 25m
tests/trainer/trainer_test.py::TestOverRollout::test_trainer 16h 22m
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer 13h 9m
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer ⏭️ 558ms
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class ⏭️ 285ms
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner ⏭️ 294ms
tests/trainer/trainer_test.py::ColocateModeTest::test_trainer 33h 50m
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_equivalent 9.8s
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_not_equivalent 1.2s
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_ground_truth 1.8s
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_solution_string 285ms
tests/utils/eval_utils_test.py::TestComputeScore::test_multiple_boxed_answers_in_solution 1.8s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_equivalent 1.1s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_not_equivalent 1.2s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_not_boxed 277ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_raw_and_ground_truth_boxed_equivalent 1.1s
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer 4.2s
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer 1m 3s
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv 5.2s
tests/utils/log_test.py::LogTest::test_actor_log 37m 15s
tests/utils/log_test.py::LogTest::test_group_by_node 35m 32s
tests/utils/log_test.py::LogTest::test_no_actor_log 14m 17s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_0__workspace_tests_utils_plugins 5m 6s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_1_tests_utils_plugins 5m 3s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_0__workspace_tests_utils_plugins 2h 27m
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_1_tests_utils_plugins 2h 27m
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_0__workspace_tests_utils_plugins 1h 25m
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_1_tests_utils_plugins 1h 24m
tests/utils/registry_test.py::TestRegistryWithRay::test_dynamic_import 40m 19s
tests/utils/registry_test.py::TestRegistry::test_algorithm_registry_mapping 8.1s
tests/utils/registry_test.py::TestRegistry::test_buffer_module_registry_mapping 2.9s
tests/utils/registry_test.py::TestRegistry::test_common_module_registry_mapping 52.1s
tests/utils/registry_test.py::TestRegistry::test_register_module 458ms
tests/utils/registry_test.py::TestRegistry::test_utils_module_registry_mapping 660ms
tests/utils/swanlab_test.py::TestSwanlabMonitor::test_swanlab_monitor_smoke ⏭️ 365ms

Github Test Reporter by CTRF 💚

@chenyushuo
Copy link
Collaborator Author

/unittest-module-cli

@github-actions
Copy link

github-actions bot commented Feb 4, 2026

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
5 4 1 0 0 0 1m 22s

Failed Tests

Failed Tests ❌ Fail Message
❌ tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode The test failed in the call phase due to an exception

Tests

Test Name Status Flaky Duration
tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode 16h 8m
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command 1h 46m
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc 27m 6s
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command 4m 57s
tests/cli/launcher_test.py::TestLauncherMain::test_multi_stage_run 3h 55m

Github Test Reporter by CTRF 💚

@chenyushuo
Copy link
Collaborator Author

/unittest-module-cli

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
5 5 0 0 0 0 1m 10s

Tests

Test Name Status Flaky Duration
tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode 12h 30m
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command 1h 49m
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc 27m 7s
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command 5m 3s
tests/cli/launcher_test.py::TestLauncherMain::test_multi_stage_run 3h 58m

Github Test Reporter by CTRF 💚

@pan-x-c pan-x-c merged commit ad80580 into agentscope-ai:main Feb 5, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants