Required prerequisites
Motivation
Motivation:
To improve training data diversity, we need a robust synthetic data generation pipeline. The goal is to leverage the Self-Instruct methodology to automatically create new, high-quality datapoints using a combination of human-provided seed examples and machine-generated content.
By introducing a SelfInstructGenerator that supports:
- Few-shot prompting for novel question generation,
- Code-based rationale generation,
Solution
No response
Alternatives
No response
Additional context
No response