## [Open in Colab][1]

[1]:https://colab.research.google.com/drive/1pjFGMsk_25DQBDQ1budvLsNQx_Dbk73B?usp=sharing

# **What are Model Settings**

* They are configuration options that control how the LLM (large language model) behaves when the Agent uses it. ([OpenAI GitHub][1])
* Not every model supports all settings — capabilities vary. ([OpenAI GitHub][1])
* They may affect things like randomness, response length, verbosity, use of tools, etc. ([OpenAI GitHub][1])

<br>

---

<br>


## **Key Settings and What They Do**

Here are the important fields in **ModelSettings** and what each one means:

| Setting                                                       | Type / Possible Values                | What It Controls                                                                                                                                                         | Typical Use Cases / Effects |
| ------------------------------------------------------------- | ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------- |
| **temperature**                                               | float                                 | How "random" or "creative" responses are. Lower → more deterministic, higher → more variation. ([OpenAI GitHub][1])                                                      |                             |
| **top\_p**                                                    | float                                 | Nucleus sampling: restricts output to top-probability tokens whose cumulative probability is ≤ top\_p. Another way to control randomness. ([OpenAI GitHub][1])           |                             |
| **frequency\_penalty**                                        | float                                 | Penalizes repeating the same content (words / phrases) in the output. Helps reduce repetition. ([OpenAI GitHub][1])                                                      |                             |
| **presence\_penalty**                                         | float                                 | Encourages introducing new topics (less about repeating topics already used). ([OpenAI GitHub][1])                                                                       |                             |
| **tool\_choice**                                              | enum / specialized type               | Which strategy or policy the agent uses for choosing tools. (e.g. whether to use a tool for certain tasks). ([OpenAI GitHub][1])                                         |                             |
| **parallel\_tool\_calls**                                     | bool                                  | Whether the agent can invoke multiple tools in parallel during a single turn, or is restricted to one at a time. ([OpenAI GitHub][1])                                    |                             |
| **truncation**                                                | `"auto"` or `"disabled"`              | Strategy for truncating inputs/responses if they get too long, to avoid overflow / too many tokens. ([OpenAI GitHub][1])                                                 |                             |
| **max\_tokens**                                               | int                                   | Maximum number of tokens (words/units) in the output. Limits how long agent responses can go. ([OpenAI GitHub][1])                                                       |                             |
| **reasoning**                                                 | object / enum                         | Settings specific for reasoning models (for example how much reasoning effort to apply). ([OpenAI GitHub][2])                                                            |                             |
| **verbosity**                                                 | literal `"low"`, `"medium"`, `"high"` | How detailed / wordy the responses should be. ([OpenAI GitHub][1])                                                                                                       |                             |
| **metadata**                                                  | dict\[str, str]                       | Extra info you attach to the request/response (for logging, tracking, etc.). Doesn’t change content but helpful operationally. ([OpenAI GitHub][1])                      |                             |
| **store**                                                     | bool                                  | Whether to store the generated output for later retrieval (e.g. for analytics, retracing). ([OpenAI GitHub][1])                                                          |                             |
| **include\_usage**                                            | bool                                  | Whether to return usage information (like token usage) in the response. Useful for cost monitoring. ([OpenAI GitHub][1])                                                 |                             |
| **response\_include**                                         | list                                  | Other fields to include in the response (beyond output\_text), such as log probabilities. ([OpenAI GitHub][1])                                                           |                             |
| **top\_logprobs**                                             | int                                   | Number of top probable tokens to return with their log probabilities. Good for analyzing uncertainties. ([OpenAI GitHub][1])                                             |                             |
| **extra\_query / extra\_body / extra\_headers / extra\_args** | Various                               | These are more advanced: custom fields or headers sent to the LLM API. They allow tweaking behaviour not covered by standard fields. Use with care. ([OpenAI GitHub][1]) |                             |

<br>

---

<br>


## **How They Interact / How to Pick Them**

Here are some guidelines for choosing settings:

* **Determinism vs Creativity**
  If you want predictable, safe responses → low temperature, maybe lower top\_p. If you want imaginative or varied content → higher temperature or looser constraints.

* **Length Control**
  Use `max_tokens` to limit response size. If inputs (context or prompts) are long, you may also need `truncation` to ensure things don’t break.

* **Tool Use Behavior**
  If your agent needs to call tools (e.g. search, API calls), `parallel_tool_calls` matters: enabling parallel calls can speed up, but maybe risk overlapping or confusing tool outputs. `tool_choice` affects how the agent decides *which* tool to use.

* **Reasoning Models**
  For models optimized for reasoning (e.g. in GPT-5 families), there are extra settings like `reasoning.effort` or preset verbosity defaults. These models may also have special behaviour, so defaults are tuned accordingly. ([OpenAI GitHub][2])

* **Verbosity**
  If responses are too long / too much detail, reduce verbosity. If you want thorough explanations, set verbosity higher.

* **Monitoring and Debugging**
  Use `metadata`, `include_usage`, `top_logprobs`, etc., to get more visibility into what the model is doing (cost, confidence, where it's uncertain).

<br>

---

<br>

## **Example Configurations**

Here are two sample setups to illustrate:

1. **Customer Support Bot (Safe & Consistent)**

   * temperature = 0.2
   * top\_p = 0.9
   * max\_tokens = 200
   * presence\_penalty = 0.0
   * frequency\_penalty = 0.5
   * verbosity = “low”

   *Result*: The agent gives polite, consistent replies, avoids repeating itself, keeps answers concise.

2. **Creative Storytelling Agent**

   * temperature = 0.9
   * top\_p = 0.95
   * presence\_penalty = 0.2
   * frequency\_penalty = 0.0
   * max\_tokens = 800
   * verbosity = “high”

   *Result*: More imaginative output, more variety, longer responses, less concern for precision / terseness.


[1]: https://openai.github.io/openai-agents-python/ref/model_settings/?utm_source=chatgpt.com "Model settings - OpenAI Agents SDK"
[2]: https://openai.github.io/openai-agents-js/guides/models/?utm_source=chatgpt.com "Models | OpenAI Agents SDK - GitHub Pages"


# **What is temperature?**

* **Definition**: A setting that controls how **random** or **creative** a model’s responses are.
* **Range**: `0.0` to `2.0` (most useful range is `0.0`–`1.5`).
* **Default**: Usually around `1.0`.

<br>

---

<br>

## **How it Works**

* At each step, the model predicts the next token (word or piece of text).
* The temperature adjusts how “flat” or “sharp” the probability distribution is:

  * **Low temperature (close to 0)** → the model picks the *most likely* words. Very deterministic, focused.
  * **High temperature (closer to 1–2)** → the model allows *less likely* words to be chosen. More variety, creativity, and sometimes randomness.

<br>

---

<br>

## **Behavior Examples**

| Temperature | Behavior                          | Example Prompt: “Write a sentence about the moon.”     |
| ----------- | --------------------------------- | ------------------------------------------------------ |
| **0.0**     | Deterministic, factual, stable    | “The moon is Earth’s natural satellite.”               |
| **0.3**     | Slightly varied but still safe    | “The moon orbits the Earth and reflects sunlight.”     |
| **0.7**     | Creative, more diverse outputs    | “Under the moon’s silver glow, the night feels alive.” |
| **1.2**     | Very imaginative, sometimes messy | “The moon whispers secret dreams to the ocean tides.”  |
| **2.0**     | Chaotic, often incoherent         | “Moon sparkles jellyfish grammar running sideways.”    |

<br>

---

<br>

## **When to Use What**

* **Low temperature (0–0.3)**

  * Use for: Customer support, coding, factual Q\&A, legal/medical contexts.
  * Goal: Consistency, accuracy, reliability.

* **Medium temperature (0.5–0.8)**

  * Use for: Balanced outputs, like explanations, brainstorming, or tutoring.
  * Goal: Some creativity but still grounded.

* **High temperature (0.9–1.5+)**

  * Use for: Poetry, storytelling, creative writing, marketing copy.
  * Goal: Variety, originality, risk-taking.

---

**In short:**

* **Low = predictable & safe**
* **High = diverse & creative**


## **What is top-k?**

* **Definition**: The model only considers the **top K most likely next words** (tokens), and ignores all others.
* Then it picks among those K words, weighted by their probabilities.
* **Example**:

  * Suppose the model predicts next word probabilities:

    * "cat" → 40%
    * "dog" → 30%
    * "bird" → 20%
    * "car" → 5%
    * "house" → 5%
  * If **k = 2**, the model only keeps *"cat"* and *"dog"*, ignores the rest.

👉 **Smaller k = safer, more repetitive**.
👉 **Larger k = more diverse, more creative**.

<br>

---

<br>


## **What is top-p (a.k.a. nucleus sampling)?**

* **Definition**: Instead of a fixed number of words, the model chooses from the **smallest set of words whose total probability ≥ p**.
* **p** is between `0.0` and `1.0`.
* **Example**:

  * Same probabilities: cat (40%), dog (30%), bird (20%), car (5%), house (5%).
  * If **p = 0.8**, the model includes words until cumulative probability ≥ 80%.

    * That gives: *cat + dog* (40% + 30% = 70%) → still less than 0.8
    * Add *bird* (20%) → total = 90% ≥ 0.8 ✅
    * So possible choices: *cat, dog, bird*.
  * If **p = 0.95**, it might also include *car* (to reach 95%).

👉 **Lower p (e.g. 0.5)** = only top words allowed, very focused.
👉 **Higher p (e.g. 0.95)** = more variety.

<br>

---

<br>

## **Side-by-Side Comparison**

| Aspect      | **Top-k**                         | **Top-p**                                        |
| ----------- | --------------------------------- | ------------------------------------------------ |
| Limit style | Fixed number of choices           | Dynamic set based on probability mass            |
| Parameter   | k (integer, e.g. 10, 50)          | p (float 0–1, e.g. 0.9, 0.95)                    |
| Behavior    | Always keeps *exactly k* options  | Keeps a *probability-based cutoff*               |
| Use case    | Easy to understand, simple cutoff | Smarter, adapts to context probabilities         |
| Example     | k=3 → always 3 words considered   | p=0.9 → may be 2 words in one case, 5 in another |

---

## **How They’re Used in Practice**

* **Top-k** = often used in early NLP models, like GPT-2. Easy but rigid.
* **Top-p** = more common in modern LLMs (like GPT-3/4/5). Gives more natural variation.
* They can be **combined**:

  * e.g., `top_k = 50`, `top_p = 0.9` → filter to the top 50, then further restrict to probability mass 90%.

<br>

---

<br>

**Quick analogy:**

* **Top-k** = “I’ll only choose from the top 5 restaurants, no matter what.”
* **Top-p** = “I’ll choose from enough restaurants to cover 90% of the best options — sometimes that’s 2, sometimes 10.”


## **Frequency Penalty**

* **What it does**: Penalizes words that the model has already used **a lot** in the response.
* **Effect**: Reduces **repetition** of the same word/phrase.
* **Range**: `-2.0` to `+2.0` (default = `0`).

  * Higher values = stronger penalty against repeated words.

**Example** (Prompt: *“Write a short poem about rain.”*):

* **Penalty = 0.0** →
  “The rain falls softly, rain falls slow, rain falls gently on the meadow.”
  *(lots of repetition)*
* **Penalty = 1.0** →
  “Softly it drifts, a silver veil across the meadow’s quiet trail.”
  *(less repetition, more variety)*

👉 Use when you want cleaner text without the model looping or repeating phrases.

<br>

---

<br>

## **Presence Penalty**

* **What it does**: Penalizes words that have already been mentioned **at least once**.
* **Effect**: Encourages the model to **introduce new topics** instead of staying on the same subject.
* **Range**: `-2.0` to `+2.0` (default = `0`).

  * Higher values = stronger push to bring in new words.

**Example** (Prompt: *“Tell me something about animals.”*):

* **Penalty = 0.0** →
  “Dogs are loyal animals. Cats are also popular animals. Many animals live in the wild.”
  *(keeps repeating “animals”)*
* **Penalty = 1.0** →
  “Dogs are loyal companions. Cats are independent. Elephants roam the savannah, while dolphins play in the sea.”
  *(introduces more variety, avoids reusing “animals”)*

👉 Use when you want broader exploration or brainstorming.


## Side-by-Side

| Feature        | **Frequency Penalty**               | **Presence Penalty**                              |
| -------------- | ----------------------------------- | ------------------------------------------------- |
| Focus          | *How many times* a word is repeated | *Whether* a word has already appeared             |
| Goal           | Reduce overuse of the same words    | Push model to explore new topics                  |
| Example effect | Avoids loops like “rain rain rain”  | Avoids repeating concepts like “animals, animals” |
| Good for       | Cleaner text, less redundancy       | Brainstorming, topic diversity                    |

<br>

---

<br>

## When to Use

* **Use Frequency Penalty**:

  * If model keeps repeating words/phrases (e.g., in long outputs).
* **Use Presence Penalty**:

  * If model is stuck talking about the *same subject* instead of moving to new ideas.

---

👉 In short:

* **Frequency penalty = don’t say the same word too many times.**
* **Presence penalty = don’t keep talking about the same thing.**


# **Max Tokens (max\_tokens)**

* **What it is**: A limit on **how many tokens** the model can generate in its response.
* **Tokens** = chunks of text (≈ 1 token = \~4 characters in English, \~¾ of a word).

<br>

---

<br>

## **Why it matters**

* Prevents **too long** outputs.
* Helps **control cost** (since usage is billed per token).
* Ensures responses **fit inside context window** (prompt + output).

<br>

---

<br>

## **Example**

Prompt: *“Write a story about a dragon.”*

* **max\_tokens = 20** →
  “Once there was a dragon who lived in a cave near the mountains.” *(short)*

* **max\_tokens = 200** →
  Longer, detailed story with setting, characters, and events.

<br>

---

<br>

👉 In short:
**max\_tokens = maximum length of the model’s answer.**

Do you want me to also explain the **difference between `max_tokens` and context window size** (common confusion)?


# **Reasoning in ModelSettings**

The **`Reasoning`** class is a special set of options that apply **only to reasoning models** (like the `o1` family, sometimes called **o-series**).


These models are designed to handle **multi-step reasoning**, problem-solving, and structured thought more explicitly than regular models.

Here’s the breakdown of its fields:

<br>

---

<br>

## 1. `effort`

```python
effort: Optional[ReasoningEffort] = None
```

* **What it is**: Controls how much “reasoning effort” the model applies before giving an answer.
* **Values**:

  * `"low"` → shorter reasoning, faster responses, fewer tokens used in hidden reasoning steps.
  * `"medium"` → balanced (default-like behavior).
  * `"high"` → deeper reasoning, longer “thinking”, potentially more accurate on complex tasks, but also slower & more expensive.

**Analogy**:

* Low effort = giving a “quick guess”
* Medium effort = giving a “careful explanation”
* High effort = “writing out a detailed solution like a math teacher”

**When to use**:

* **low** → quick checks, simple lookups, latency-sensitive tasks.
* **medium** → general-purpose reasoning (most tasks).
* **high** → math, logic puzzles, code correctness, deep analysis.

<br>

---

<br>

## 2. `generate_summary` (deprecated)

```python
generate_summary: Optional[Literal["auto", "concise", "detailed"]] = None
```

* **Deprecated**: use `summary` instead.
* This was an older way to tell the model whether to output a summary of its reasoning process.

---

## 3. `summary`

```python
summary: Optional[Literal["auto", "concise", "detailed"]] = None
```

* **What it is**: Lets you request a **summary of the reasoning process** that the model used internally.
* **Values**:

  * `"auto"` → model decides how much reasoning summary to show.
  * `"concise"` → short summary of reasoning.
  * `"detailed"` → longer, more step-by-step summary.

**Why useful**:

* Debugging: you can see *how* the model arrived at an answer.
* Transparency: helps users trust the response.
* Teaching: useful in tutoring apps, code explanation, or logic-heavy problems.

<br>

---

<br>

### **Side-by-Side**

| Field              | Purpose                              | Values                              | When to Use                         |
| ------------------ | ------------------------------------ | ----------------------------------- | ----------------------------------- |
| `effort`           | Controls depth of reasoning          | `"low"`, `"medium"`, `"high"`       | Trade-off between speed vs accuracy |
| `generate_summary` | (Old) request reasoning summary      | `"auto"`, `"concise"`, `"detailed"` | Use `summary` instead               |
| `summary`          | New way to request reasoning summary | `"auto"`, `"concise"`, `"detailed"` | Debugging, teaching, transparency   |

---

# **Example Usage**

```python
from openai import Agent
from openai.types.agent import ModelSettings, Reasoning

settings = ModelSettings(
    reasoning=Reasoning(
        effort="high",        # ask the model to reason more deeply
        summary="concise"     # get a short explanation of reasoning
    )
)

agent = Agent(
    model="o1-preview",  # a reasoning-capable model
    model_settings=settings
)
```

**Effect**:

* The agent will try to spend more effort reasoning through tasks.
* It will also provide a short summary of *how* it reached its answer.

---

👉 In short:

* **`effort` = how hard the model thinks**
* **`summary` = how much of that thinking you want to see**





# **Metadata**

#### **What it is**

* `metadata` is a **dictionary (key–value pairs)** that you can attach to a model request.
* It **does not affect the model’s answer** — it’s purely for your **own tracking, logging, or analytics**.

<br>

---

<br>

## **Why it’s Useful**

* Tag requests with **extra info** (e.g., user IDs, experiment names, session IDs).
* Helps with **debugging** (know which request produced which output).
* Useful for **analytics** (e.g., cost tracking, A/B testing, auditing).

<br>

---

<br>

## **Example**

```python
from openai.types.agent import ModelSettings

settings = ModelSettings(
    metadata={
        "user_id": "12345",
        "session": "abc-xyz",
        "experiment": "prompt_v2"
    }
)
```

* Here, every response will carry this metadata, so you know which **user/session/experiment** the response belongs to.
* The **model ignores this info** — it’s just returned back to you.

<br>

---

<br>

## **Key Points**

* Type: `dict[str, str]` (string → string).
* Optional (you can leave it empty).
* Best practice: keep values **short and descriptive**.

<br>

---

<br>

👉 In short:
**`metadata` = your custom labels for tracking requests.**
It’s like attaching sticky notes to a request so you can recognize it later.

<br>

---

<br>

# **Store**

### **What it is**

* Tells the system whether the **response should be stored** in OpenAI’s systems for later retrieval (e.g., to view history, debugging, or analytics).

### **Why use it**

* Helps if you want a **persistent record** of responses.
* Useful for replaying or auditing outputs later.

### **Note**

* If `store = False`, responses are **not stored** (only returned immediately).
* Some orgs disable storage for privacy/security reasons.

<br>

---

<br>

# **include_usage**

### **What it is**

* If `True`, the API response will include **token usage details**.

### **What it gives**

Example usage info returned:

```json
"usage": {
  "prompt_tokens": 120,
  "completion_tokens": 200,
  "total_tokens": 320
}
```

### **Why use it**

* Track **costs** (billing is per token).
* Monitor **efficiency** (see if prompts are too long).
* Debug performance issues.

<br>

---

<br>

# **response_include**

```python
response_include: list[str]
```

### **What it is**

* A list of **extra fields** you want included in the response.
* Beyond the default `output_text`, you can ask for additional info.

### **Common values**

* `"logprobs"` → returns log probabilities of tokens (confidence levels).
* `"metadata"` → include metadata you attached.
* `"reasoning_summary"` → if available, include reasoning summary.

### **Why use it**

* Debugging (see why the model chose certain tokens).
* Research (inspect confidence & uncertainty).
* Custom analytics (track metadata or reasoning outputs).

---

# **Side-by-Side**

| Setting            | Type       | Purpose                                    | Example Use Case                 |
| ------------------ | ---------- | ------------------------------------------ | -------------------------------- |
| `store`            | bool       | Save response in system                    | Keep logs of all agent replies   |
| `include_usage`    | bool       | Add token usage info in response           | Track costs per request          |
| `response_include` | list\[str] | Add extra fields (like logprobs, metadata) | Debug or analyze model decisions |

---

# **Example**

```python
from openai.types.agent import ModelSettings

settings = ModelSettings(
    store=True,                     # Save response for history
    include_usage=True,              # Return token usage
    response_include=["logprobs"]    # Also include token log probabilities
)
```

**Effect**:

* Response is stored in history.
* You’ll see usage numbers (`prompt_tokens`, `completion_tokens`).
* You’ll also get detailed `logprobs` for each token.

---

👉 In short:

* **`store` = save or not save responses**
* **`include_usage` = return token counts (for billing/monitoring)**
* **`response_include` = what extra fields you want in the response**






# **What is `response_include`?**

```python
response_include: list[ResponseIncludable] | None = None
```

* It’s a **list of extra fields** you can ask the API to include in its response.
* By default, you only get the **output text**, but with `response_include`, you can also request **additional structured data** (e.g., logprobs, image URLs, reasoning traces).
* Each value comes from `ResponseIncludable`.

<br>

---

<br>

## **The Possible Values of `ResponseIncludable`**

Here’s what each option means:

<br>

---

<br>

## 1. `"code_interpreter_call.outputs"`

* Returns the **outputs from a code interpreter tool call** (like Python sandbox).
* Example: If the agent executed Python code, you’d see the actual outputs (numbers, printed text, plots).

👉 Useful for:

* Debugging code execution in tool-using agents.
* Displaying raw results of Python code.

<br>

---

<br>

## 2. `"computer_call_output.output.image_url"`

* Returns **image URLs** when the model/tool generated an image (e.g., via DALL·E).
* Lets you access the direct image link instead of just the text response.

👉 Useful for:

* Agents that create images.
* Displaying generated images in apps.

<br>

---

<br>

## 3. `"file_search_call.results"`

* Returns the **results of a file search tool call**.
* Example: If the agent searched through uploaded documents, you’d get the matched passages/snippets.

👉 Useful for:

* Retrieval-augmented generation (RAG).
* Document QA or enterprise knowledge bots.

<br>

---

<br>

## 4. `"message.input_image.image_url"`

* Returns the **image URLs of any images passed into the request**.
* Normally, the API just consumes them silently, but this includes them back in the response.

👉 Useful for:

* Debugging multimodal inputs.
* Showing which input images were actually processed.

<br>

---

<br>

## 5. `"message.output_text.logprobs"`

* Returns the **log probabilities** of generated tokens.
* Logprobs = model’s confidence in each word choice.

👉 Useful for:

* Uncertainty estimation (detecting “low confidence” outputs).
* Research/analysis of model behavior.
* Advanced prompt evaluation.

<br>

---

<br>

## 6. `"reasoning.encrypted_content"`

* Returns an **encrypted version of the model’s internal reasoning trace**.
* This is a safety/privacy measure: you can’t directly see the chain of thought, but the encrypted blob can be used by OpenAI for diagnostics.

👉 Useful for:

* Debugging reasoning models.
* Compliance (auditability without exposing raw reasoning).

<br>

---

<br>

## **Example Usage**

```python
from openai.types.agent import ModelSettings

settings = ModelSettings(
    response_include=[
        "message.output_text.logprobs",       # get token logprobs
        "code_interpreter_call.outputs",      # get code execution outputs
        "computer_call_output.output.image_url"  # get image URLs
    ]
)
```

**Effect**:

* Along with the normal text output, you’ll also see:

  * Token confidence scores,
  * Results from any code execution,
  * Links to generated images.

<br>

---

<br>

## **Summary**

* **`response_include`** = “What extra stuff should I see in the response?”
* Options (`ResponseIncludable`):

  * `code_interpreter_call.outputs` → code execution results
  * `computer_call_output.output.image_url` → image generation URLs
  * `file_search_call.results` → file search results
  * `message.input_image.image_url` → input image URLs
  * `message.output_text.logprobs` → log probabilities of output tokens
  * `reasoning.encrypted_content` → encrypted reasoning trace

<br>

---

<br>

👉 In short:
**Default response = only text.**
**`response_include` lets you peek under the hood (tools, images, logprobs, reasoning).**


Perfect 👌 — you pasted the **actual implementation** of `resolve` and `to_json_dict` inside `ModelSettings`. Let’s walk through it line by line in plain English.

---

# ⚙️ `resolve`

```python
def resolve(self, override: ModelSettings | None) -> ModelSettings:
    """Produce a new ModelSettings by overlaying any non-None values from the
    override on top of this instance."""
```

### 🎯 What it does

* Creates a **new `ModelSettings` object** by combining:

  1. The current settings (`self`)
  2. Any **non-None values** from another `ModelSettings` (`override`)

So basically:
👉 **Take my current settings, replace only the fields that override has set.**

---

### 🔍 Step by Step

```python
if override is None:
    return self
```

* If no override is given → just return the current object.

```python
changes = {
    field.name: getattr(override, field.name)
    for field in fields(self)
    if getattr(override, field.name) is not None
}
```

* Build a dictionary of **all fields in override that are not None**.
* Example: if override only sets `max_tokens=100`, then `changes = {"max_tokens": 100}`.

```python
# Handle extra_args merging specially - merge dictionaries instead of replacing
if self.extra_args is not None or override.extra_args is not None:
    merged_args = {}
    if self.extra_args:
        merged_args.update(self.extra_args)
    if override.extra_args:
        merged_args.update(override.extra_args)
    changes["extra_args"] = merged_args if merged_args else None
```

* Special handling for `extra_args` (since it’s a dictionary).
* Instead of replacing it entirely, it **merges both dictionaries** (self’s + override’s).
* If keys overlap → `override` wins.

```python
return replace(self, **changes)
```

* Uses `dataclasses.replace` to return a **new `ModelSettings` object** with the updated fields.
* This ensures immutability (doesn’t modify the old one, creates a new one).

---

## ✅ Example

```python
base = ModelSettings(max_tokens=50, temperature=0.7, extra_args={"a": 1})
override = ModelSettings(max_tokens=100, extra_args={"b": 2})

resolved = base.resolve(override)
print(resolved)
```

**Result**:

* `max_tokens` becomes **100** (from override)
* `temperature` stays **0.7** (from base, since override didn’t set it)
* `extra_args` becomes `{"a": 1, "b": 2}` (merged dictionaries)

---

# ⚙️ `to_json_dict`

```python
def to_json_dict(self) -> dict[str, Any]:
    dataclass_dict = dataclasses.asdict(self)

    json_dict: dict[str, Any] = {}

    for field_name, value in dataclass_dict.items():
        if isinstance(value, BaseModel):
            json_dict[field_name] = value.model_dump(mode="json")
        else:
            json_dict[field_name] = value

    return json_dict
```

### 🎯 What it does

* Converts the `ModelSettings` dataclass into a **JSON-safe dictionary**.

---

### 🔍 Step by Step

1. `dataclasses.asdict(self)` → turn the dataclass into a dict.
   Example:

   ```python
   {"max_tokens": 100, "reasoning": Reasoning(...), "temperature": 0.7}
   ```

2. Loop through each field:

   * If the value is a **`BaseModel`** (like `Reasoning`), call `value.model_dump(mode="json")` to get a JSON-compatible dict.
   * Otherwise, just copy the value.

3. Return the final dictionary, which is safe to send to the API.

---

## ✅ Example

```python
settings = ModelSettings(
    max_tokens=100,
    reasoning=Reasoning(effort="high", summary="concise")
)

print(settings.to_json_dict())
```

**Output**:

```python
{
  "max_tokens": 100,
  "reasoning": {"effort": "high", "summary": "concise"},
  "temperature": None,
  ...
}
```

---

# 📌 Summary

* **`resolve(override)`**
  → Returns a new `ModelSettings` where values from `override` overwrite `self`, with special merging for `extra_args`.

* **`to_json_dict()`**
  → Converts the dataclass into a JSON-safe dictionary, handling nested `BaseModel` objects properly.

---

👉 In short:

* **`resolve` = overlay + merge with defaults**
* **`to_json_dict` = prepare for API call**

---

Do you want me to also show you a **before/after visual of resolve()** with a real dictionary output (so you see exactly what changes)?
