A Claude Code skill that calls free LLMs with your keys.
No server. No Docker. Just text in, text out.
npx skills add bradagi/fiThat's it. The skill installs into your agent's skills directory — SKILL.md, the CLI, and the provider catalog all arrive together.
Talk to your agent. The skill wires up the common workflows:
you> add my gemini key AIza…
claude> [stores as GEMINI_API_KEY_1]
you> what free models do I have?
claude> [runs ./fi models — shows your aliases across providers]
you> ask gemini to summarize this pdf
claude> [pipes the pdf text into ./fi call gemini-2.5-flash]
you> use a reasoning model to solve this problem
claude> [runs ./fi call deepseek/deepseek-r1:free --prompt "…"]
you> add my openrouter key sk-or-v1-… and refresh
claude> [./fi keys add + ./fi sync — discovers 30+ new models]
Under the hood, the skill drives a tiny Python CLI (fi) that makes direct HTTPS calls to each provider's OpenAI-compatible /v1/chat/completions endpoint. No localhost proxy, no Docker, no daemon.
If you want to call it yourself instead of through an agent:
git clone https://github.com/bradAGI/fi
cd fi
./fi keys add gemini AIza...
./fi call gemini-2.5-flash --prompt "Hello"
./fi stream gemini-2.5-pro --prompt "Tell me a story"
echo "summarize this" | ./fi call llama-3.3-70b-groq./fi call <alias> --prompt "..." Make a call, print the response
./fi call <alias> --prompt "..." --system "..." --max-tokens 256 --temperature 0.2
./fi stream <alias> --prompt "..." Stream tokens to stdout
./fi keys add <provider> <key> Store a key (numbered slots auto-rotate)
./fi keys list Masked list of configured keys
./fi keys remove <provider> [--index N]
./fi providers Active vs inactive providers
./fi models [-g GROUP] Callable model aliases (filter by group)
./fi sync Refresh auto-discovered catalogs
Both call and stream accept the prompt via --prompt or stdin:
cat prompt.txt | ./fi call gemini-2.5-pro
./fi stream llama-3.3-70b-groq --prompt "Write a haiku"- Python 3.11+ (stdlib only — no pip install)
- Internet access (calls go directly to upstream providers)
That's it. No Docker. No server. No background process.
Auto-discovered (catalogs refresh live from upstream, cached 24h):
| Provider | Source | Size |
|---|---|---|
| OpenRouter | openrouter.ai/api/v1/models → pricing.prompt == 0 |
~30 free models |
| NVIDIA NIM | integrate.api.nvidia.com/v1/models |
~135 models |
| Hugging Face | router.huggingface.co/v1/models (live providers, $0.10/mo credits) |
~120 models |
| Pollinations | gen.pollinations.ai/v1/models (keyless) |
~10 chat models |
Static catalog (add models manually to providers.toml):
Gemini, Groq, Cerebras, Mistral, GitHub Models, Cohere, Together, Scaleway, Tencent Hunyuan, Chutes, LLM7, Ollama Cloud.
Filter models by capability with ./fi models -g <group>:
| Group | Matches |
|---|---|
fast |
Low-latency small models |
smart |
Flagship large models |
code |
Coder-tuned models |
reasoning |
Thinking models (R1, QwQ, o-series) |
vision |
Multimodal text+image |
embed |
Embedding models |
rerank |
Rerankers |
free |
Models with at least one zero-priced provider |
Append one TOML block, then ./fi sync (if auto-discovered) or you're done (if static):
[[provider]]
name = "your-provider"
key_env = "YOURPROVIDER_API_KEY"
base_url = "https://api.yours.example/v1" # OpenAI-compatible endpoint
optional_key = false # true for keyless providers
[[provider.model]]
alias = "yours-flagship"
upstream = "their/model-id"
groups = ["smart", "vision"]Then ./fi keys add your-provider <key> (unless optional_key = true) and ./fi call yours-flagship --prompt "…".
Early versions of this repo were a local LiteLLM-backed gateway at localhost:4000 that spoke both OpenAI and Anthropic shapes to any client — useful if you're building an app that needs a persistent endpoint.
For agent use (Claude calling a model with your keys on your behalf), the gateway is overkill — the agent can just shell out directly. This repo is the simplified direct-call version.
If you do want the full gateway, it lives at bradAGI/fi-gateway.
- Providers that aren't OpenAI-compatible for chat (Voyage, Jina, Mixedbread, Nomic — mostly embedding-focused) aren't in this catalog yet. Add them to
fi-gatewayif you need translation. - Hugging Face models are "credit-gated free" — callable with HF's $0.10/mo free credit, not zero-cost per call. Use
-g freeto filter to HF models with at least one zero-priced provider. ./fi streamuses SSE and requires the upstream to supportstream: trueon chat completions (almost all do).
MIT
