Autodata

An open-source, hybrid implementation of Meta AI's Autodata. Generates high-quality synthetic training data using Agentic Self-Instruct and Weak-to-Strong gaps.

Mini-Autodata: Agentic Self-Instruct Framework for High-Quality Synthetic DataMini-Autodata is a localized, hybrid implementation inspired by Meta AI's Autodata framework. It acts as an autonomous "data scientist," iteratively generating, evaluating, and refining training and benchmark data using an Agentic Self-Instruct pipeline. Traditional synthetic data often fails to push the boundaries of model capabilities. Mini-Autodata solves this by employing a multi-agent architecture (Challenger, Weak Solver, Strong Solver, and Judge) . The system strictly curates data based on a "Quality Gap": a question is only accepted if a Strong Model (e.g., DeepSeek API) succeeds while a Weak Model (e.g., local llama3.2:1b) fails. This ensures the generated datasets require deep, context-grounded reasoning rather than just general knowledge.

🧠 Weak-vs-Strong Gap Mechanism: Guarantees dataset difficulty by filtering out questions that small models can answer correctly via general knowledge.

⚡ Hybrid Backend Architecture: Prevents OOM crashes and model-swapping latency by routing Strong agent roles to the DS2API, while dedicating local RAM entirely to the Weak Solver (llama3.2:1b).

🛡️ Anti-Cheating Prompt Engineering: Features tightly controlled layer-separated prompts to strictly prevent "Context Leakage" (giving away the answer in the prompt) and "Semantic Cheating".

🔁 Automated Feedback Loop: The Evaluator provides structured, historical feedback to the Challenger agent to autonomously improve question quality across multiple rounds.

💾 Fault Tolerance (Checkpointing): Built-in state-saving prevents data loss during API timeouts or system interruptions.

Link post from metadata: https://facebookresearch.github.io/RAM/blogs/autodata/

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
autodata		autodata
scripts		scripts
.gitignore		.gitignore
README.md		README.md
main.py		main.py
sample.md		sample.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autodata

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autodata

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages