Feature/llm providers#60
Conversation
…ure/llm-providers
Reviewer's GuideThis PR introduces a standardized LLMProvider abstraction with built-in token counting and chunking utilities, implements a concrete OllamaLLMProvider adapter for synchronous and streaming calls (including message construction and metadata parsing), and adds an example notebook alongside unit tests to validate both the base and Ollama provider behaviors. Sequence diagram for OllamaLLMProvider.generate() interaction with Ollama APIsequenceDiagram
participant User
participant OllamaLLMProvider
participant OllamaAPI
User->>OllamaLLMProvider: generate(prompt)
OllamaLLMProvider->>OllamaLLMProvider: _build_messages(prompt)
OllamaLLMProvider->>OllamaAPI: chat(model, messages, keep_alive, ...)
OllamaAPI-->>OllamaLLMProvider: response
OllamaLLMProvider->>User: LLMResponse(text, metadata, raw)
Sequence diagram for OllamaLLMProvider.stream() interaction with Ollama APIsequenceDiagram
participant User
participant OllamaLLMProvider
participant OllamaAPI
User->>OllamaLLMProvider: stream(prompt)
OllamaLLMProvider->>OllamaLLMProvider: _build_messages(prompt)
OllamaLLMProvider->>OllamaAPI: chat(model, messages, keep_alive, stream=True)
loop For each chunk
OllamaAPI-->>OllamaLLMProvider: chunk
OllamaLLMProvider->>User: LLMResponse(text, metadata, raw)
end
Class diagram for the new LLMProvider abstraction and OllamaLLMProvider implementationclassDiagram
class LLMProvider {
<<abstract>>
+model_name: str
+_tokenizer: TokenizerType | None
+max_context_tokens: int | None
+chunk_overlap: int
+__init__(model, tokenizer, max_context_tokens, chunk_overlap)
+generate(prompt, **kwargs) LLMResponse
+stream(prompt, **kwargs)
+count_tokens(text) int
+chunk_text(text, max_tokens, overlap) list[DocumentChunk]
+_tokenize(text) Sequence[Any]
+_overlap_tail(words, overlap_tokens) list[str]
}
class OllamaLLMProvider {
+system_prompt: str | None
+keep_alive: int | str | None
+__init__(model, system_prompt, keep_alive, **kwargs)
+generate(prompt, messages, **kwargs) LLMResponse
+stream(prompt, messages, **kwargs) Iterator[LLMResponse]
+_build_messages(prompt, messages) list[dict[str, str]]
}
class LLMResponse {
+text: str
+metadata: dict[str, Any]
+raw: Any | None
}
class DocumentChunk {
+text: str
+token_count: int
+index: int
}
LLMProvider <|-- OllamaLLMProvider
LLMProvider o-- LLMResponse
LLMProvider o-- DocumentChunk
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- Consider moving the Jupyter notebook out of the core package (e.g., into an examples or docs folder) so it isn’t shipped with the production library.
- It may be better to warn or require an explicit tokenizer instead of silently falling back to whitespace splitting, as that can lead to inaccurate token counts for many models.
- Add a unit test to verify that
_build_messagesraises a ValueError when bothpromptandmessagesare None to cover the error path.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider moving the Jupyter notebook out of the core package (e.g., into an examples or docs folder) so it isn’t shipped with the production library.
- It may be better to warn or require an explicit tokenizer instead of silently falling back to whitespace splitting, as that can lead to inaccurate token counts for many models.
- Add a unit test to verify that `_build_messages` raises a ValueError when both `prompt` and `messages` are None to cover the error path.
## Individual Comments
### Comment 1
<location> `test/llm_providers/test_providers.py:39-46` </location>
<code_context>
+ return self._tokenizer
+
+
+class BaseProviderTests(unittest.TestCase):
+ def test_chunk_text_respects_token_limit(self):
+ provider = DummyProvider(max_context_tokens=4, chunk_overlap=1)
+ text = "uno dos tres cuatro cinco seis"
</code_context>
<issue_to_address>
**suggestion (testing):** Missing tests for edge cases in chunk_text (empty string, no words, overlap > token limit).
Please add tests for chunk_text covering empty input, no words, overlap exceeding token limit, and max_context_tokens set to None to ensure all edge cases are handled.
```suggestion
class BaseProviderTests(unittest.TestCase):
def test_chunk_text_respects_token_limit(self):
provider = DummyProvider(max_context_tokens=4, chunk_overlap=1)
text = "uno dos tres cuatro cinco seis"
chunks = provider.chunk_text(text)
self.assertGreater(len(chunks), 1)
for chunk in chunks:
self.assertLessEqual(chunk.token_count, 4)
def test_chunk_text_empty_string(self):
provider = DummyProvider(max_context_tokens=4, chunk_overlap=1)
text = ""
chunks = provider.chunk_text(text)
self.assertEqual(chunks, [])
def test_chunk_text_whitespace_only(self):
provider = DummyProvider(max_context_tokens=4, chunk_overlap=1)
text = " "
chunks = provider.chunk_text(text)
self.assertEqual(chunks, [])
def test_chunk_text_overlap_exceeds_token_limit(self):
provider = DummyProvider(max_context_tokens=2, chunk_overlap=3)
text = "one two three"
chunks = provider.chunk_text(text)
# Should not fail, and should chunk correctly
for chunk in chunks:
self.assertLessEqual(chunk.token_count, 2)
def test_chunk_text_max_context_tokens_none(self):
provider = DummyProvider(max_context_tokens=None, chunk_overlap=1)
text = "one two three four"
chunks = provider.chunk_text(text)
# Should return a single chunk with all tokens
self.assertEqual(len(chunks), 1)
self.assertEqual(chunks[0].token_count, 4)
```
</issue_to_address>
### Comment 2
<location> `test/llm_providers/test_providers.py:49-50` </location>
<code_context>
+ self.assertLessEqual(chunk.token_count, 4)
+
+
+class OllamaProviderTests(unittest.TestCase):
+ def test_generate_builds_messages(self):
+ provider = OllamaLLMProvider(model="llama3", system_prompt="Sistema")
+ fake_response = {"message": {"content": "respuesta"}, "eval_count": 10}
</code_context>
<issue_to_address>
**suggestion (testing):** No test for error handling in OllamaLLMProvider._build_messages.
Add a test to confirm that OllamaLLMProvider._build_messages raises ValueError when both prompt and messages are missing.
</issue_to_address>
### Comment 3
<location> `test/llm_providers/test_providers.py:62-76` </location>
<code_context>
+ self.assertEqual(kwargs["model"], "llama3")
+ self.assertEqual(kwargs["messages"][0]["role"], "system")
+
+ def test_stream_yields_chunks(self):
+ provider = OllamaLLMProvider(model="llama3")
+ fake_chunks = iter(
+ [
+ {"message": {"content": "Hola"}, "done": False},
+ {"message": {"content": " Mundo"}, "done": True},
+ ]
+ )
+ with patch("aymurai.llm_providers.ollama_provider.ollama.chat") as mock_chat:
+ mock_chat.return_value = fake_chunks
+ pieces = list(provider.stream("Hola"))
+
+ self.assertEqual([piece.text for piece in pieces], ["Hola", " Mundo"])
+ _, kwargs = mock_chat.call_args
+ self.assertTrue(kwargs["stream"])
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** No test for OllamaLLMProvider with system_prompt in stream method.
Please add a test to verify that when system_prompt is set, the stream method includes the system message in the payload, similar to the generate method.
```suggestion
def test_stream_yields_chunks(self):
provider = OllamaLLMProvider(model="llama3")
fake_chunks = iter(
[
{"message": {"content": "Hola"}, "done": False},
{"message": {"content": " Mundo"}, "done": True},
]
)
with patch("aymurai.llm_providers.ollama_provider.ollama.chat") as mock_chat:
mock_chat.return_value = fake_chunks
pieces = list(provider.stream("Hola"))
self.assertEqual([piece.text for piece in pieces], ["Hola", " Mundo"])
_, kwargs = mock_chat.call_args
self.assertTrue(kwargs["stream"])
def test_stream_includes_system_prompt(self):
system_prompt = "You are a helpful assistant."
provider = OllamaLLMProvider(model="llama3", system_prompt=system_prompt)
fake_chunks = iter(
[
{"message": {"content": "Hola"}, "done": False},
{"message": {"content": " Mundo"}, "done": True},
]
)
with patch("aymurai.llm_providers.ollama_provider.ollama.chat") as mock_chat:
mock_chat.return_value = fake_chunks
pieces = list(provider.stream("Hola"))
# Check that the system message is included in the payload
_, kwargs = mock_chat.call_args
self.assertEqual(kwargs["messages"][0]["role"], "system")
self.assertEqual(kwargs["messages"][0]["content"], system_prompt)
```
</issue_to_address>
### Comment 4
<location> `test/llm_providers/test_providers.py:45-46` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid loops in tests. ([`no-loop-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-loop-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like loops, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 5
<location> `aymurai/llm_providers/provider.py:198-201` </location>
<code_context>
def _tokenize(self, text: str) -> Sequence[Any]:
"""
Tokenize text using the configured tokenizer or fallback to whitespace.
Args:
text (str): The input text to tokenize.
Returns:
Sequence[Any]: The sequence of tokens.
"""
if self._tokenizer is None:
return text.split()
tokenizer = self._tokenizer
if hasattr(tokenizer, "encode"):
return tokenizer.encode(text, add_special_tokens=False)
if callable(tokenizer):
tokens = tokenizer(text)
if isinstance(tokens, dict):
return tokens.get("input_ids", [])
return tokens
return text.split()
</code_context>
<issue_to_address>
**suggestion (code-quality):** We've found these issues:
- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))
```suggestion
return tokens.get("input_ids", []) if isinstance(tokens, dict) else tokens
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| class BaseProviderTests(unittest.TestCase): | ||
| def test_chunk_text_respects_token_limit(self): | ||
| provider = DummyProvider(max_context_tokens=4, chunk_overlap=1) | ||
| text = "uno dos tres cuatro cinco seis" | ||
| chunks = provider.chunk_text(text) | ||
| self.assertGreater(len(chunks), 1) | ||
| for chunk in chunks: | ||
| self.assertLessEqual(chunk.token_count, 4) |
There was a problem hiding this comment.
suggestion (testing): Missing tests for edge cases in chunk_text (empty string, no words, overlap > token limit).
Please add tests for chunk_text covering empty input, no words, overlap exceeding token limit, and max_context_tokens set to None to ensure all edge cases are handled.
| class BaseProviderTests(unittest.TestCase): | |
| def test_chunk_text_respects_token_limit(self): | |
| provider = DummyProvider(max_context_tokens=4, chunk_overlap=1) | |
| text = "uno dos tres cuatro cinco seis" | |
| chunks = provider.chunk_text(text) | |
| self.assertGreater(len(chunks), 1) | |
| for chunk in chunks: | |
| self.assertLessEqual(chunk.token_count, 4) | |
| class BaseProviderTests(unittest.TestCase): | |
| def test_chunk_text_respects_token_limit(self): | |
| provider = DummyProvider(max_context_tokens=4, chunk_overlap=1) | |
| text = "uno dos tres cuatro cinco seis" | |
| chunks = provider.chunk_text(text) | |
| self.assertGreater(len(chunks), 1) | |
| for chunk in chunks: | |
| self.assertLessEqual(chunk.token_count, 4) | |
| def test_chunk_text_empty_string(self): | |
| provider = DummyProvider(max_context_tokens=4, chunk_overlap=1) | |
| text = "" | |
| chunks = provider.chunk_text(text) | |
| self.assertEqual(chunks, []) | |
| def test_chunk_text_whitespace_only(self): | |
| provider = DummyProvider(max_context_tokens=4, chunk_overlap=1) | |
| text = " " | |
| chunks = provider.chunk_text(text) | |
| self.assertEqual(chunks, []) | |
| def test_chunk_text_overlap_exceeds_token_limit(self): | |
| provider = DummyProvider(max_context_tokens=2, chunk_overlap=3) | |
| text = "one two three" | |
| chunks = provider.chunk_text(text) | |
| # Should not fail, and should chunk correctly | |
| for chunk in chunks: | |
| self.assertLessEqual(chunk.token_count, 2) | |
| def test_chunk_text_max_context_tokens_none(self): | |
| provider = DummyProvider(max_context_tokens=None, chunk_overlap=1) | |
| text = "one two three four" | |
| chunks = provider.chunk_text(text) | |
| # Should return a single chunk with all tokens | |
| self.assertEqual(len(chunks), 1) | |
| self.assertEqual(chunks[0].token_count, 4) |
| class OllamaProviderTests(unittest.TestCase): | ||
| def test_generate_builds_messages(self): |
There was a problem hiding this comment.
suggestion (testing): No test for error handling in OllamaLLMProvider._build_messages.
Add a test to confirm that OllamaLLMProvider._build_messages raises ValueError when both prompt and messages are missing.
| def test_stream_yields_chunks(self): | ||
| provider = OllamaLLMProvider(model="llama3") | ||
| fake_chunks = iter( | ||
| [ | ||
| {"message": {"content": "Hola"}, "done": False}, | ||
| {"message": {"content": " Mundo"}, "done": True}, | ||
| ] | ||
| ) | ||
| with patch("aymurai.llm_providers.ollama_provider.ollama.chat") as mock_chat: | ||
| mock_chat.return_value = fake_chunks | ||
| pieces = list(provider.stream("Hola")) | ||
|
|
||
| self.assertEqual([piece.text for piece in pieces], ["Hola", " Mundo"]) | ||
| _, kwargs = mock_chat.call_args | ||
| self.assertTrue(kwargs["stream"]) |
There was a problem hiding this comment.
suggestion (testing): No test for OllamaLLMProvider with system_prompt in stream method.
Please add a test to verify that when system_prompt is set, the stream method includes the system message in the payload, similar to the generate method.
| def test_stream_yields_chunks(self): | |
| provider = OllamaLLMProvider(model="llama3") | |
| fake_chunks = iter( | |
| [ | |
| {"message": {"content": "Hola"}, "done": False}, | |
| {"message": {"content": " Mundo"}, "done": True}, | |
| ] | |
| ) | |
| with patch("aymurai.llm_providers.ollama_provider.ollama.chat") as mock_chat: | |
| mock_chat.return_value = fake_chunks | |
| pieces = list(provider.stream("Hola")) | |
| self.assertEqual([piece.text for piece in pieces], ["Hola", " Mundo"]) | |
| _, kwargs = mock_chat.call_args | |
| self.assertTrue(kwargs["stream"]) | |
| def test_stream_yields_chunks(self): | |
| provider = OllamaLLMProvider(model="llama3") | |
| fake_chunks = iter( | |
| [ | |
| {"message": {"content": "Hola"}, "done": False}, | |
| {"message": {"content": " Mundo"}, "done": True}, | |
| ] | |
| ) | |
| with patch("aymurai.llm_providers.ollama_provider.ollama.chat") as mock_chat: | |
| mock_chat.return_value = fake_chunks | |
| pieces = list(provider.stream("Hola")) | |
| self.assertEqual([piece.text for piece in pieces], ["Hola", " Mundo"]) | |
| _, kwargs = mock_chat.call_args | |
| self.assertTrue(kwargs["stream"]) | |
| def test_stream_includes_system_prompt(self): | |
| system_prompt = "You are a helpful assistant." | |
| provider = OllamaLLMProvider(model="llama3", system_prompt=system_prompt) | |
| fake_chunks = iter( | |
| [ | |
| {"message": {"content": "Hola"}, "done": False}, | |
| {"message": {"content": " Mundo"}, "done": True}, | |
| ] | |
| ) | |
| with patch("aymurai.llm_providers.ollama_provider.ollama.chat") as mock_chat: | |
| mock_chat.return_value = fake_chunks | |
| pieces = list(provider.stream("Hola")) | |
| # Check that the system message is included in the payload | |
| _, kwargs = mock_chat.call_args | |
| self.assertEqual(kwargs["messages"][0]["role"], "system") | |
| self.assertEqual(kwargs["messages"][0]["content"], system_prompt) |
| for chunk in chunks: | ||
| self.assertLessEqual(chunk.token_count, 4) |
There was a problem hiding this comment.
issue (code-quality): Avoid loops in tests. (no-loop-in-tests)
Explanation
Avoid complex code, like loops, in test functions.Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
- loops
- conditionals
Some ways to fix this:
- Use parametrized tests to get rid of the loop.
- Move the complex logic into helpers.
- Move the complex part into pytest fixtures.
Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / Don't Put Logic in Tests
| if isinstance(tokens, dict): | ||
| return tokens.get("input_ids", []) | ||
| return tokens | ||
|
|
There was a problem hiding this comment.
suggestion (code-quality): We've found these issues:
- Lift code into else after jump in control flow (
reintroduce-else) - Replace if statement with if expression (
assign-if-exp)
| if isinstance(tokens, dict): | |
| return tokens.get("input_ids", []) | |
| return tokens | |
| return tokens.get("input_ids", []) if isinstance(tokens, dict) else tokens |
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a flexible abstraction layer for integrating multiple Large Language Model (LLM) providers into the aymurai codebase. The implementation provides a standardized interface through a base LLMProvider class with concrete support for Ollama, along with utilities for text chunking, token counting, and streaming responses.
Key changes:
- New
LLMProviderabstract base class defining standard interface for LLM interactions OllamaLLMProviderimplementation supporting both standard and streaming generation- Comprehensive unit tests with mocking for both base and Ollama functionality
Reviewed Changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
aymurai/llm_providers/provider.py |
Base provider abstraction with token counting, chunking logic, and response models |
aymurai/llm_providers/ollama_provider.py |
Ollama-specific implementation with message building and streaming support |
aymurai/llm_providers/__init__.py |
Module exports for public API |
test/llm_providers/test_providers.py |
Unit tests for base provider and Ollama implementation |
test/llm_providers/__init__.py |
Empty init file for test module |
notebooks/experiments/llm-providers/01-ollama-provider.ipynb |
Usage examples demonstrating chunking, generation, and streaming |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """ | ||
| pass | ||
|
|
||
| @abc.abstractmethod |
There was a problem hiding this comment.
The stream method is decorated with @abc.abstractmethod but provides a default implementation that raises NotImplementedError. This is inconsistent - abstract methods should not have implementations. Either remove the decorator to make it a concrete method with optional override, or remove the default implementation entirely and require subclasses to implement it.
| @abc.abstractmethod |
| def tokenizer(self): | ||
| return self._tokenizer |
There was a problem hiding this comment.
Missing return statement in the tokenizer property. The method should return self._tokenizer but currently returns None implicitly. This would cause any tests using FakePipeline.tokenizer to fail.
… for consistency and improved readability
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…amline client instantiation
This pull request introduces a new abstraction layer for Large Language Model (LLM) providers, adds an implementation for the Ollama provider, and provides both documentation and tests for the new functionality. The changes establish a standardized interface for integrating different LLM backends, making it easier to extend and maintain the codebase.
LLM Provider Abstraction
LLMProviderinprovider.pythat defines a standard interface for LLM providers, including methods for text generation, streaming, token counting, and chunking text. Also defines theLLMResponseandDocumentChunkdata models.__init__.pyto expose the new provider classes and types in the public API.Ollama Provider Implementation
OllamaLLMProviderinollama_provider.py, which implements theLLMProviderinterface, allowing interaction with Ollama's chat API for both standard and streaming responses. Handles system prompts, message formatting, and response parsing.Documentation and Usage Example
01-ollama-provider.ipynb) demonstrating how to use the newOllamaLLMProvider, including chunking, generation, and streaming features.Testing
test_providers.pyfor both the base provider functionality and the Ollama provider, using dummy/fake classes and mocks to ensure correct behavior.Summary by Sourcery
Introduce a standardized LLMProvider interface, implement an Ollama-based provider, and include accompanying documentation and tests
New Features:
Enhancements:
Documentation:
Tests: