[New Skill]: Novelty Extractor (Data Distillation)

### Skill Name

optimization/novelty_extractor

### What should this skill do?

**The Problem**: "Information slop" is making model training exponentially expensive and slow. Throwing massive, unfiltered datasets at a model wastes tokens on redundant, low-value information.
**The Solution**: An AI curation skill that scans massive datasets, compares the semantic vectors against a baseline, and extracts *only* the 1% that contains "novel" or "high-learning value" data. This acts as a data-distillation middleware, ensuring models are trained on pure signal rather than noise.

**Documentation Requirement**: 
When submitting a Pull Request for this skill, the contributor must provide:
1. A reference card at `docs/skills/novelty_extractor.md` detailing the threshold heuristics.
2. Updates to [docs/skills/README.md](cci:7://file:///e:/ARPA/OpenSource/Skillware/docs/skills/README.md:0:0-0:0) listing this skill under the `optimization` category.
3. Example usage in `examples/` showing how to pipe a large text corpus through this skill.


### Ideal Inputs & Outputs

Input: 
{
  "dataset_chunk": "[10,000 words of raw forum data]",
  "novelty_threshold": 0.85
}

Output: 
{
  "distilled_content": "[150 words of unique, high-value assertions]",
  "compression_ratio": "98.5%",
  "redundant_chunks_dropped": 42
}


### Targeted Models (if applicable)

Model Agnostic (All)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Skill]: Novelty Extractor (Data Distillation) #24

Skill Name

What should this skill do?

Ideal Inputs & Outputs

Targeted Models (if applicable)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[New Skill]: Novelty Extractor (Data Distillation) #24

Description

Skill Name

What should this skill do?

Ideal Inputs & Outputs

Targeted Models (if applicable)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions