local_llm is a lightweight Ruby gem that lets you talk to locally installed LLMs via Ollama — with zero cloud dependency, full developer control, and configurable defaults, including real-time streaming support.
It supports:
- Completely OFFLINE!
- Any Ollama model (LLaMA, Mistral, CodeLLaMA, Qwen, Phi, Gemma, etc.)
- Developer-configurable default models
- Developer-configurable Ollama API endpoint
- Developer-configurable streaming or non-streaming
- One-shot Q&A and multi-turn chat
- Works in plain Ruby & Rails
- Also safe for HIPAA, SOC2, and regulated workflows where data privacy is a big concern
- 100% local inference
- No cloud calls
- No API keys
- No data leaves your machine
- Use any locally installed Ollama model
- Change default models at runtime
- Enable or disable real-time streaming
- Works with:
llama2mistralcodellamaqwenphi- Anything supported by Ollama
- No API keys needed
- No cloud calls
- Full privacy
- Works completely offline
Download from:
Then start it:
ollama serveollama pull llama2:13b
ollama pull mistral:7b-instruct
ollama pull codellama:13b-instruct
ollama pull qwen2:7b
ollama list
LocalLlm.configure do |c|
c.base_url = "http://localhost:11434"
c.default_general_model = "llama2:13b"
c.default_fast_model = "mistral:7b-instruct"
c.default_code_model = "codellama:13b-instruct"
c.default_stream = false # true = stream by default, false = return full text
end
LocalLlm.ask("llama2:13b", "What is HIPAA?")
LocalLlm.ask("qwen2:7b", "Explain transformers in simple terms.")
LocalLlm.general("What is a Denial of Service attack?")
LocalLlm.fast("Summarize this paragraph in 3 bullet points.")
LocalLlm.code("Write a Ruby method that returns factorial of n.")
For convenience and readability, LocalLLM is provided as a direct alias of LocalLlm.
This means both constants work identically:
LocalLlm.fast("Tell me About Bangladesh")
LocalLLM.fast("Explain HIPAA in simple terms.") # alias of LocalLlm
LocalLlm.configure do |c|
c.default_stream = true
end
LocalLlm.fast("Explain HIPAA in very simple words.") do |chunk|
print chunk
end
LocalLlm.fast("Explain LLMs in one paragraph.", stream: true) do |chunk|
print chunk
end
full_text = LocalLlm.fast("Explain DoS attacks briefly.", stream: false)
puts full_text
LocalLlm.chat("llama2:13b", [
{ "role" => "system", "content" => "You are a helpful assistant." },
{ "role" => "user", "content" => "Explain LSTM." }
])
LocalLlm.models
ollama pull qwen2:7b
LocalLlm.ask("qwen2:7b", "Explain HIPAA in simple terms.")
LocalLlm.configure do |c|
c.default_general_model = "qwen2:7b"
end
LocalLlm.general("Explain transformers.")
LocalLlm.configure do |c|
c.base_url = "http://192.168.1.100:11434"
end
ollama serve