Skip to content

HarvardMadSys/free_inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FreeInference

Free LLM inference for coding agents and AI-powered IDEs.

Overview

FreeInference provides free access to state-of-the-art language models specifically designed for coding agents like Cursor, Codex, Roo Code, and other AI-powered development tools.

Documentation

Visit our documentation at: https://harvardsys.github.io/free_inference/

Supported IDEs & Coding Agents

  • Cursor - AI-powered code editor
  • Codex - Terminal-based coding assistant
  • Roo Code - VS Code & JetBrains extension
  • Kilo Code - AI coding assistant
  • And any tool that supports OpenAI-compatible APIs

Quick Start

Cursor Setup

  1. Open Settings (Cmd + , or Ctrl + ,)
  2. Go to API Keys section
  3. Enter your FreeInference API key
  4. Click Override OpenAI Base URL
  5. Enter: https://freeinference.org/v1
  6. Enable the toggle and start coding!

Codex Setup

  1. Create ~/.codex/config.toml:
model = "glm-4.7"
model_provider = "free_inference"

[model_providers.free_inference]
name = "FreeInference"
base_url = "https://freeinference.org/v1"
wire_api = "chat"
env_http_headers = { "X-Session-ID" = "CODEX_SESSION_ID", "Authorization" = "FREEINFERENCE_API_KEY" }
  1. Add to ~/.zshrc or ~/.bashrc:
export CODEX_SESSION_ID="$(date +%Y%m%d-%H%M%S)-$(uuidgen)"
export FREEINFERENCE_API_KEY="Bearer your-api-key-here"
  1. Reload: source ~/.zshrc

Roo Code / Kilo Code Setup

  1. Install the extension in your IDE
  2. Open settings
  3. Select OpenAI Compatible as provider
  4. Configure:
    • Base URL: https://freeinference.org/v1
    • API Key: your-api-key-here
  5. Select your preferred model

Available Models

  • GLM-4.7 - 200K context, best for long context and bilingual support
  • GLM-4.7-Flash - 200K context, fast and cost-effective
  • MiniMax M2 - 196K context, best for very large codebases
  • Qwen3 Coder 30B - 32K context, specialized for code generation
  • Llama 3.3 70B - 131K context, general coding (limited capacity)
  • Llama 4 Scout - 128K context, optimized for speed (limited capacity)
  • Llama 4 Maverick - 128K context, multimodal support (limited capacity)

See the Models documentation for the complete list.

Get API Key

  1. Visit https://freeinference.org
  2. Register for a free account
  3. Log in and create your API key
  4. Start using FreeInference with your favorite IDE!

Documentation Links

Support

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages