[New Skill]: Synthetic Data Generator (High-Entropy)

### Skill Name

data_engineering/synthetic_generator

### What should this skill do?

**The Problem**: We are rapidly running out of human-written internet text to train frontier models. Data scarcity is the immediate bottleneck, and simple LLM-generated text often suffers from "model collapse" due to low entropy.
**The Solution**: A specialized agent skill that generates high-entropy, highly structured synthetic data intentionally designed to fine-tune other models. This essentially allows an agent to act as an automated synthetic data pipeline for ML engineers.

**Documentation Requirement**: 
When submitting a Pull Request for this skill, the contributor must provide:
1. A reference card at `docs/skills/synthetic_generator.md` detailing the entropy logic.
2. Updates to [docs/skills/README.md](cci:7://file:///e:/ARPA/OpenSource/Skillware/docs/skills/README.md:0:0-0:0) introducing the `data_engineering` category.
3. Example usage in the `examples/` directory showing an agent looping this skill to build a `.jsonl` fine-tuning dataset.


### Ideal Inputs & Outputs

Input: 
{
  "domain": "medical_coding_disputes",
  "num_samples": 5,
  "entropy_temperature": 0.9,
  "diversity_prompt": "Ensure edge-case scenarios involving dual-insurance coverage."
}

Output: 
{
  "samples": [
     {"instruction": "...", "input": "...", "output": "..."},
     {"instruction": "...", "input": "...", "output": "..."}
  ],
  "entropy_score": 0.88,
  "status": "success"
}


### Targeted Models (if applicable)

Model Agnostic (All)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Skill]: Synthetic Data Generator (High-Entropy) #22

Skill Name

What should this skill do?

Ideal Inputs & Outputs

Targeted Models (if applicable)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[New Skill]: Synthetic Data Generator (High-Entropy) #22

Description

Skill Name

What should this skill do?

Ideal Inputs & Outputs

Targeted Models (if applicable)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions