Skip to content

Conversation

@KE7
Copy link
Owner

@KE7 KE7 commented Jul 9, 2025

No description provided.

@KE7 KE7 force-pushed the improve-qa-qual branch from 4bbdcdd to b6ae2ac Compare July 9, 2025 01:03
KE7 and others added 7 commits July 24, 2025 16:31
…checkpointing

🚀 Major improvements to dataset generation pipeline:

✅ QA Parallel Processing (qa_workers):
- Implement ThreadPoolExecutor.map() for order-preserving parallelization
- Thread-safe image indexing with unique IDs
- Automatic fallback to sequential when qa_workers=1
- 2-4x speedup for ground truth scenarios, no overhead for GPU inference
- Comprehensive testing: verified identical output between parallel/sequential

✅ Simplified Logging System:
- Single GRAID_DEBUG_VERBOSE env var controls console debug output
- Debug messages always go to log files (for troubleshooting)
- Timestamped log files: graid_YYYYMMDD_HHMM.log
- Cleaned up complex logging logic

✅ Robust Checkpointing:
- Save/resume functionality via save_steps parameter
- Automatic checkpoint cleanup on successful completion
- Force restart capability (force parameter)
- Crash recovery for large dataset generation

✅ Enhanced Configuration:
- Added force, save_steps, use_original_filenames, filename_prefix parameters
- CLI arguments now properly override config file values
- Maintains backward compatibility

🧪 Verified Features:
- Parallel QA generates identical results as sequential (100% match)
- Order preservation maintained across all scenarios
- Question-image correspondence preserved
- Profiling and timing aggregation works across threads
- Debug logging working correctly (both console and file)

All changes maintain full backward compatibility and existing functionality.
@KE7 KE7 merged commit 873d560 into main Aug 19, 2025
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants