# AG2 + Gemini Thinking Config Variants

This notebook shows how to adjust Gemini thinking features in AG2:
- `thinking_budget` (token budget for thinking)
- `thinking_level` ("High" vs "Low")
- `include_thoughts` (whether to return thought summaries)

Reference: [Gemini Thinking Guide](https://ai.google.dev/gemini-api/docs/thinking)


In [None]:
import os

from dotenv import load_dotenv

from autogen import ConversableAgent, LLMConfig

load_dotenv()

api_key = os.getenv("GEMINI_API_KEY")
if not api_key:
    raise RuntimeError("GEMINI_API_KEY is not set. Please set it in your environment or .env file.")

prompt = "Solve 12 * 17 + 45, then explain your steps briefly."

## Ag2 now support `ThinkingConfig` on Gemini 
ThinkConfig has three configuration:
- thinking budget: Indicates the thinking budget in tokens. 0 is DISABLED. -1 is AUTOMATIC. The default values and allowed ranges are model dependent.
- thinking level: The level of thoughts tokens that the model should generate.
- iclude_thoughts: Indicates whether to include thoughts in the response. If true, thoughts are returned only if the model supports thought and thoughts are available.

In [None]:
llm_config = LLMConfig(
    config_list={
        "model": "gemini-3-pro-preview",
        "api_type": "gemini",
        "api_key": api_key,
        "thinking_budget": 2048,
        "thinking_level": "High",
        "include_thoughts": True,
    }
)

agent = ConversableAgent(name="agent", description="you are a helpful assistant", llm_config=llm_config)
response = agent.run(message=prompt, max_turns=2).process()

## Vary thinking_budget
Budget can vary from 0 meaning no budget, -1 meaning automatic , to custom like for example , 4096 based on the model


In [None]:
budget = 4096
llm_config = LLMConfig(
    config_list={
        "model": "gemini-2.5-flash",
        "api_type": "gemini",
        "api_key": api_key,
        "thinking_budget": budget,
    }
)

agent = ConversableAgent(name="agent", description="you are a helpful assistant", llm_config=llm_config)
response = agent.run(message=prompt, max_turns=2).process()

## Vary thinking_level
You can set that `thinking_level` to "low" or "high" (which is the default for `gemini-3-pro-preview`). This will indicate to the model if it allowed to do a lot of thinking. Since the thinking process stays dynamic, `high` doesn't mean it will always use a lot of token in its thinking phase, just that it's allowed to.

In [None]:
level = "High"
llm_config = LLMConfig(
    config_list={
        "model": "gemini-3-pro-preview",
        "api_type": "gemini",
        "api_key": api_key,
        "thinking_level": level,
    }
)

agent = ConversableAgent(name="agent", description="you are a helpful assistant", llm_config=llm_config)
response = agent.run(message=prompt, max_turns=2).process()

when thinking level is low

In [None]:
level = "Low"
llm_config = LLMConfig(
    config_list={
        "model": "gemini-2.5-flash",
        "api_type": "gemini",
        "api_key": api_key,
    }
)

agent = ConversableAgent(name="agent-thoughts", description="you are a helpful assistant", llm_config=llm_config)
response = agent.run(message=prompt, max_turns=2).process()

## include_thoughts
- True/False to see thought summaries/no thoughts.

In [None]:
llm_config = LLMConfig(
    config_list={
        "model": "gemini-2.5-flash",
        "api_type": "gemini",
        "api_key": api_key,
        "thinking_level": "High",
        "include_thoughts": True,
    }
)

agent = ConversableAgent(name="agent-thoughts", description="you are a helpful assistant", llm_config=llm_config)
response = agent.run(message=prompt, max_turns=2).process()

## Tips
- For long/complex tasks, use a higher `thinking_budget`.
- `thinking_level` can be lowered for lighter reasoning.
- Set `include_thoughts=True` when you want thought summaries; turn off to reduce output.

Reference: [Gemini Thinking Guide](https://ai.google.dev/gemini-api/docs/thinking)
