# 🤖 LLM Providers O Mangaba AI suporta 5 providers de LLM com interface unificada, function calling, streaming, caching e retry. --- ## Providers Suportados | Provider | Alias(es) | Model Default | Function Calling | Streaming | |---|---|---|---|---| | **Google Gemini** | `google`, `gemini` | `gemini-2.5-flash` | ✅ Nativo | ✅ | | **OpenAI** | `openai` | `gpt-4o-mini` | ✅ Nativo | ✅ | | **Anthropic** | `anthropic`, `claude` | `claude-sonnet-4-20250514` | ✅ Tool Use | ✅ | | **HuggingFace** | `huggingface`, `hf` | `meta-llama/Llama-3.1-8B-Instruct` | ⚠️ Prompt-based | ✅ | | **OpenRouter** | `openrouter` | Varia | ✅ Via proxy | ✅ | --- ## Factory: `create_llm_client` ```python from mangaba.core.types import LLMConfig from mangaba.core.llm import create_llm_client # Configuração via LLMConfig llm_config = LLMConfig( provider="google", model="gemini-2.5-flash", api_key="YOUR_API_KEY", temperature=0.7, max_tokens=4096, ) llm = create_llm_client( provider=llm_config.provider, api_key=llm_config.api_key, model=llm_config.model, temperature=llm_config.temperature, max_output_tokens=llm_config.max_tokens, ) ``` ### LLMConfig | Parâmetro | Tipo | Default | Descrição | |---|---|---|---| | `provider` | `str` | `"google"` | Nome do provider (normaliza aliases automaticamente) | | `model` | `str | List[str]` | `"gemini-2.5-flash"` | Nome do modelo ou lista (OpenRouter) | | `api_key` | `str` | `None` | API key | | `temperature` | `float` | `0.7` | Criatividade (0-2) | | `max_tokens` | `int` | `1024` | Máximo tokens de output | | `top_p` | `float` | `1.0` | Top-p sampling (0-1) | | `stop_sequences` | `List[str]` | `None` | Sequências de parada | | `timeout` | `int` | `60` | Timeout em segundos | | `base_url` | `str` | `None` | URL customizada | ### OpenRouterConfig ```python from mangaba.core.types import OpenRouterConfig config = OpenRouterConfig( api_key="OPENROUTER_API_KEY", model=["google/gemini-2.5-flash", "anthropic/claude-3.5-sonnet"], # Fallback site_name="My App", site_url="https://myapp.com", route="fallback", ) ``` ### Parâmetros (herdados de LLMConfig) ### Listar providers ```python from mangaba.core.llm import get_supported_providers print(get_supported_providers()) # ('anthropic', 'claude', 'gemini', 'google', 'hf', 'huggingface', 'openai', 'openrouter') ``` --- ## LLMClient O `LLMClient` é a interface unificada para todos os providers: ```python from mangaba.core.llm import LLMClient client = LLMClient( provider="google", api_key="KEY", model="gemini-2.5-flash", ) # Geração simples response = client.generate("What is AI?") print(response.text) # Com ferramentas response = client.generate_with_tools( messages=[{"role": "user", "content": "Calculate 15% of 250"}], tools=[calculator_tool], ) # Streaming for chunk in client.stream("Write a poem"): print(chunk, end="") ``` ### LLMResponse ```python response.text # Texto da resposta response.tool_calls # List[ToolCall] — chamadas de ferramentas response.has_tool_calls # bool — há tool calls? response.finish_reason # FinishReason enum response.usage # TokenUsage — contagem de tokens response.model # Nome do modelo usado ``` --- ## Google Gemini ```python from mangaba.core.llm import create_llm_client llm = create_llm_client( provider="google", api_key="GOOGLE_API_KEY", model="gemini-2.5-flash", temperature=0.7, max_output_tokens=8192, ) ``` ### Models Disponíveis | Model | Context | Uso | |---|---|---| | `gemini-2.5-flash` | 1M tokens | Geral, rápido | | `gemini-2.5-pro` | 1M tokens | Raciocínio complexo | | `gemini-1.5-flash` | 1M tokens | Custo-eficiente | --- ## OpenAI ```python llm = create_llm_client( provider="openai", api_key="OPENAI_API_KEY", model="gpt-4o-mini", temperature=0.7, max_output_tokens=4096, ) ``` ### Models Disponíveis | Model | Context | Uso | |---|---|---| | `gpt-4o-mini` | 128K | Geral, rápido | | `gpt-4o` | 128K | Raciocínio avançado | | `gpt-4-turbo` | 128K | Alta qualidade | --- ## Anthropic Claude ```python llm = create_llm_client( provider="anthropic", api_key="ANTHROPIC_API_KEY", model="claude-sonnet-4-20250514", temperature=0.7, max_output_tokens=8192, ) ``` ### Models Disponíveis | Model | Context | Uso | |---|---|---| | `claude-sonnet-4-20250514` | 200K | Equilíbrio custo/performance | | `claude-opus-4-20250514` | 200K | Máxima qualidade | | `claude-haiku-3-5` | 200K | Rápido, eficiente | --- ## HuggingFace ```python llm = create_llm_client( provider="huggingface", api_key="HF_TOKEN", model="meta-llama/Llama-3.1-8B-Instruct", ) ``` > ⚠️ **Nota:** Function calling em HuggingFace é emulado via prompt engineering, não nativo. ### Listar modelos HuggingFace ```python from mangaba.core.llm import list_huggingface_models, hf_model_supports_tools models = list_huggingface_models() print(f"Modelo suporta tools: {hf_model_supports_tools('meta-llama/Llama-3.1-8B-Instruct')}") ``` --- ## OpenRouter OpenRouter permite acessar múltiplos modelos via uma única API: ```python from mangaba.core.types import OpenRouterConfig from mangaba.core.llm import create_llm_client llm = create_llm_client( provider="openrouter", api_key="OPENROUTER_API_KEY", model="openai/gpt-4o-mini", site_name="My App", site_url="https://myapp.com", route="fallback", # Fallback entre providers ) ``` --- ## Caching Cache de respostas LLM para evitar chamadas redundantes: ```python from mangaba.core.llm import LLMCache, InMemoryCache, DiskCache # Cache em memória cache = InMemoryCache(ttl=3600) # 1 hora # Cache em disco cache = DiskCache(cache_dir="./llm_cache", ttl=86400) # 24 horas # Usar com client client = LLMClient( provider="google", api_key="KEY", model="gemini-2.5-flash", cache=cache, ) ``` --- ## Retry Retry automático com backoff exponencial: ```python from mangaba.core.llm import with_retry # Aplicar retry a uma função @with_retry(max_retries=3, backoff_factor=2) def my_llm_call(): return client.generate("Query") ``` --- ## Token Tracking ```python from mangaba.core.llm import TokenCounter, UsageTracker tracker = UsageTracker() client = LLMClient( provider="google", api_key="KEY", model="gemini-2.5-flash", usage_tracker=tracker, ) # Após chamadas print(f"Total tokens: {tracker.total_tokens}") print(f"Total custo: ${tracker.estimated_cost():.4f}") print(f"Chamadas: {len(tracker.history)}") ``` --- ## Prompt Templates ```python from mangaba.core.llm import PromptTemplate, ChatPromptTemplate, SystemPromptBuilder # Template simples template = PromptTemplate( template="Answer this question about {topic}: {question}", input_variables=["topic", "question"], ) prompt = template.format(topic="AI", question="What is machine learning?") # Chat template chat = ChatPromptTemplate() chat.add_system("You are a helpful assistant") chat.add_user("Explain {concept}") messages = chat.format_messages(concept="RAG") # System prompt builder builder = SystemPromptBuilder() builder.set_role("Expert Data Scientist") builder.set_goal("Analyze data and provide insights") builder.add_instruction("Always cite sources") prompt = builder.build() ```