Skip to content

Conversation

cpetersen
Copy link
Owner

@cpetersen cpetersen commented Sep 8, 2025

What this does

This PR adds support for the Red Candle provider, enabling local LLM execution using quantized GGUF models directly in Ruby without requiring external API calls.

Key Implementation Details

Red Candle is fundamentally different from other providers: While all other RubyLLM providers communicate via HTTP APIs, Red Candle runs models locally using the Candle Rust crate. This brings true local inference to Ruby, with no network
latency or API costs.

Dependency Management

Since Red Candle requires a Rust toolchain at build time, we've made it optional at two levels:

  • For end users: red-candle is NOT a gemspec dependency. Users must explicitly add gem 'red-candle' to their Gemfile to use this provider.
  • For contributors: We've added an optional Bundler group so developers can work on RubyLLM without installing Rust. Enable with bundle config set --local with red_candle.

Testing Strategy

We implemented a comprehensive mocking system to keep tests fast:

  • Stubbed mode (default): Uses MockCandleModel to simulate responses without actual inference
  • Real inference mode: Set RED_CANDLE_REAL_INFERENCE=true to run actual model inference (downloads models on first run, ~4.5 GBs)
  • Not installed mode: Tests skip gracefully when Red Candle isn't available

Changes Made

  • Added RubyLLM::Providers::RedCandle with full chat support including streaming
  • Implemented model management with automatic GGUF file downloads from HuggingFace
  • Created comprehensive test mocks in red_candle_test_helper.rb
  • Added conditional loading in ruby_llm.rb and spec_helper.rb to handle optional dependency
  • Updated models_to_test.rb to conditionally include Red Candle models
  • Added documentation in CONTRIBUTING.md for managing the optional dependency
  • Implemented proper Content object handling for structured responses

How to Test

# Test without Red Candle (default for new contributors)
bundle install
bundle exec rspec  # Red Candle tests will be skipped

# Test with Red Candle stubbed (fast)
bundle config set --local with red_candle
bundle install
bundle exec rspec  # Uses mocked responses

# Test with real inference (slow, downloads models)
bundle config set --local with red_candle
bundle install
huggingface-cli login # Make sure to accept mistral terms
RED_CANDLE_REAL_INFERENCE=true bundle exec rspec

Once red-candle is enabled turn it back off with:

bundle config unset with

And turn it BACK on with:

bundle config set --local with red_candle

Try it out

bundle exec irb
require 'ruby_llm'

chat = RubyLLM.chat(
  provider: :red_candle,
  model: 'Qwen/Qwen2.5-1.5B-Instruct-GGUF' # 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF' is another option
)
response = chat.ask("What are the benefits of functional programming?")
puts response.content

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Performance improvement

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Quality check

  • I ran overcommit --install and all hooks pass
  • I tested my changes thoroughly
    • For provider changes: Re-recorded VCR cassettes with bundle exec rake vcr:record[provider_name]
    • All tests pass: bundle exec rspec
  • I updated documentation if needed
  • I didn't modify auto-generated files manually (models.json, aliases.json)

API changes

  • Breaking change
  • New public methods/classes
  • Changed method signatures
  • No API changes

Related issues

Fixes crmne#394

crmne and others added 11 commits September 9, 2025 20:41
Major improvements to Rails integration:
- New acts_as API using association names instead of class names
- Rails-like generator syntax: `rails g ruby_llm:install chat:ChatName message:MessageName`
- Model registry always included (removed skip_model_registry option)
- Clear upgrade path with use_new_acts_as configuration option

Breaking changes managed through configuration:
- Legacy mode (default) maintains backward compatibility
- New mode enabled via `config.use_new_acts_as = true`
- Legacy mode will be deprecated in v2.0

Key improvements:
- More intuitive Rails-like DSL
- Better association naming conventions
- Simplified generator interface
- Cleaner configuration approach
Replace options[:*_model_name] with instance method calls in all
generator templates. This fixes the undefined method 'pluralize'
error when running rails g ruby_llm:install.

Also removes unused legacy migration templates and updates tests
to match the new template format.
Creates a complete chat interface with:
- Chat and message controllers following Rails conventions
- Simple HTML views for chat list, creation, and messaging
- Model selector in new chat form
- Models index page showing available AI models
- Background job for streaming AI responses
- Turbo Stream integration for real-time message updates
- Automatic broadcasting from Message model
- Clean, simple controller methods

The generator creates a working chat UI that can be customized
while maintaining Rails best practices and simplicity.
Copy link
Collaborator

@orangewolf orangewolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great!

crmne and others added 30 commits September 14, 2025 11:14
## What this does

Adds gpt-5, gpt-5-mini, and gpt-5-nano capabilities.  

I tried to run `overcommit`, but it updated more files than I expected
so not sure if this is still used on every commit. I did run
rubocop/tests.

## Type of change

- [x] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation
- [ ] Performance improvement

## Scope check

- [x] I read the [Contributing
Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md)
- [x] This aligns with RubyLLM's focus on **LLM communication**
- [x] This isn't application-specific logic that belongs in user code
- [x] This benefits most users, not just my specific use case

## Quality check

- [ ] I ran `overcommit --install` and all hooks pass
- [x] I tested my changes thoroughly
- [x] I updated documentation if needed
- [x] I didn't modify auto-generated files manually (`models.json`,
`aliases.json`)

## API changes

- [ ] Breaking change
- [ ] New public methods/classes
- [ ] Changed method signatures
- [x] No API changes

Co-authored-by: Carmine Paolino <carmine@paolino.me>
Implemented efficient streaming for the chat UI generator that appends chunks without re-transmitting entire messages. The solution uses broadcast_append_chunk to append individual chunks to message content, reducing bandwidth usage. Only one Turbo Stream subscription is maintained at the chat level, avoiding multiple subscriptions per message.
## Summary
- Added visualization of tool calls in the chat UI message partial
- Tool calls are displayed with function name and arguments in JSON
format
- Styled with monospace font and gray background for better readability

## Test plan
- [ ] Generate a chat UI with the updated template
- [ ] Verify tool calls are displayed correctly in messages
- [ ] Check that messages without tool calls render normally

<img width="730" height="554" alt="CleanShot 2025-09-21 at 22 21 23@2x"
src="https://github.com/user-attachments/assets/058c0923-4081-4399-96c0-4a4e025f7244"
/>

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
…#429)

## What this does
It updates the Faraday middleware to specifically use the `:net_http`
adapter instead of whatever the environment default is/was.

## Type of change

- [x] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation
- [ ] Performance improvement

## Scope check

- [x] I read the [Contributing
Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md)
- [x] This aligns with RubyLLM's focus on **LLM communication**
- [x] This isn't application-specific logic that belongs in user code
- [x] This benefits most users, not just my specific use case

## Quality check

- [x] I ran `overcommit --install` and all hooks pass
- [x] I tested my changes thoroughly
- [ ] For provider changes: Re-recorded VCR cassettes with `bundle exec
rake vcr:record[provider_name]`
  - [x] All tests pass: `bundle exec rspec`
- [ ] I updated documentation if needed
- [x] I didn't modify auto-generated files manually (`models.json`,
`aliases.json`)

## API changes

- [ ] Breaking change
- [ ] New public methods/classes
- [ ] Changed method signatures
- [x] No API changes

## Related issues
Fixes crmne#428
Easier than trying to force it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Add support for red-candle