# Steerling-8B: Inference Quickstart

This notebook demonstrates how to load and use Steerling-8B for:

- Text generation
- Concept attribution
- Embedding extraction
- Requirements: GPU with ≥18GB VRAM (A100, A6000, RTX 4090)

## Setup

Install steerling if you haven't already:

```bash
pip install steerling
```


In [None]:
import torch
from steerling import SteerlingGenerator, GenerationConfig

## Load Model

First run downloads ~17 GB from huggingface. Subsequent runs load from cache

In [None]:
generator = SteerlingGenerator.from_pretrained("guidelabs/steerling-8b", device="cuda")
print(generator)

## Text Generation
Steerling is a causal diffusion model, so it iteratively unmasks output tokens in any order based on model confidence.

In [None]:
# Basic generation
text = generator.generate(
    "The key to understanding",
    GenerationConfig(max_new_tokens=50, seed=42),
)
print(text)

In [None]:
# Generation with custom parameters
config = GenerationConfig(
    max_new_tokens=100,
    seed=123,
    top_p=0.9,
    repetition_penalty=1.2,
    use_entropy_sampling=True,  # adaptive temperature based on model uncertainty
)
text = generator.generate("Artificial intelligence will", config)
print(text)

## Full Generation Output
Use `generate_full` to get the full output including token counts.

In [None]:
output = generator.generate_full(
    "The future of renewable energy",
    GenerationConfig(max_new_tokens=50, seed=42),
)
print(f"Text: {output.text}")
print(f"Prompt tokens: {output.prompt_tokens}")
print(f"Generated tokens: {output.generated_tokens}")
print(f"Total tokens: {len(output.tokens)}")

## Generation Parameters Reference

 | Parameter            | Default | Description                                                        |
 |----------------------|---------|--------------------------------------------------------------------|
 | `max_new_tokens`     | 100     | Maximum tokens to generate                                         |
 | `seed`               | None    | Random seed for reproducibility                                    |
 | `temperature`        | 1.0     | Sampling temperature (overridden by entropy sampling)              |
 | `top_p`              | 0.9     | Nucleus sampling threshold                                         |
 | `top_k`              | None    | Top-k filtering                                                    |
 | `tokens_per_step`    | 1       | Tokens to unmask per step                                          |
 | `use_entropy_sampling`| True   | Adaptive temperature (0.3–0.7) based on model uncertainty          |
 | `repetition_penalty` | 1.2     | Penalty for repeated tokens                                        |
 | `steer_known`        | None    | Dict of `{concept_id: weight}` for known concept steering          |
 | `steer_unknown`      | None    | Dict of `{concept_id: weight}` for unknown concept steering        |