-
-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for custom LLM clients #1839
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to remove function_call
and tools_call
and replace them by asking the model to answer json formatted answers?
I guess this is necessary to allow any model to work but how does it impact the reliability of current scan?
Also, I know it might increase complexity but couldn't we have something like this instead so we keep the feature (not necessary if the switch won't impact scan reliability enough):
class Tool:
tool_spec: Dict
examples: Optional[List[Dict]]
@property
def to_format_instruction() -> str:
pass
def use_tool(chat_response) -> bool
pass
def parse_response(chat_response) -> List[Dict] # List because tool might be called more than once
pass
class LLMClient(ABC):
@abstractmethod
def complete(..., tools: List[Tool] = [], tool_choice: Optional[Tool] = None):
...
class MistralClient(LLMClient):
@abstractmethod
def complete(..., tools: List[Tool] = [], tool_choice: Optional[Tool] = None):
# This logic should be moved in an adapter so we can reuse for other LLMClient easily (and have cleaner code)
if (len(tools) > 0):
tools_instruction = '\n'.join([tool.to_format_instruction for tool in tools])
if (tool_choice is not None):
tools_instruction += f'\n You must call {tool_choice.tool_call['name']}'
# TODO: replace '{tools_instruction}' in `messages`
#
...
# OpenAIClient use native tool_calls (if used model support it)
I agree that we should keep the LLMClient as simple as possible. But I do believe that the client should be the one responsible to serialising the message to the API format and not the feature that need to be adapted to work for all LLMs (Ideally we should have something easy to understand in the scan/evaluator/... and the Client should be the translation based on the LLM's feature/needs). Feel free to disregard this comment if deemed not worth it to implement.
@kevinmessiaen please check the test details
Yes, basically to be as flexible as possible in the support. By now, most of the models support some kind JSON format output (through
Yeah, it would be nice to abstract tools but then we would end up reimplementing langchain. I think it's wise to keep our code to a minimum for what is not strictly required. I realized that my first implementation with function calls was over engineered. I feel it's worth waiting for for now, maybe at some point there will be some light open-source library replacing langchain ;) |
if seed is not None: | ||
extra_params["random_seed"] = seed | ||
|
||
if format not in (None, "json", "json_object") and "large" not in self.model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format could be of Literal
type
also pydantic's validate_call
could do the validation job
giskard/llm/embeddings/__init__.py
Outdated
return _default_embedding | ||
|
||
# Try with OpenAI/AzureOpenAI if available | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: the two try blocks could be extracted into dedicated methods
giskard/llm/embeddings/base.py
Outdated
... | ||
|
||
|
||
def batched(iterable, batch_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be moved to utils.py
giskard/llm/evaluators/base.py
Outdated
success_examples: Sequence[dict] | ||
errors: Sequence[dict] | ||
details: Optional[TestResultDetails] = None | ||
results: Sequence[EvaluationResultExample] = field(default_factory=list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we're calling .append
below this should rather be a List, not a Sequence
giskard/llm/evaluators/base.py
Outdated
|
||
return result | ||
|
||
def _evaluate_sample(self, model: BaseModel, sample: Dict) -> Tuple[bool, str, Dict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return type should be Tuple[bool, str]
giskard/llm/evaluators/coherency.py
Outdated
|
||
if len(out.tool_calls) != 1 or "passed_test" not in out.tool_calls[0].function.arguments: | ||
raise LLMGenerationError("Invalid function call arguments received") | ||
def _format_messages(self, model: BaseModel, sample: Dict) -> Sequence[ChatMessage]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing meta
argument
giskard/llm/evaluators/coherency.py
Outdated
@@ -62,50 +68,56 @@ def evaluate(self, model: BaseModel, dataset_1: Dataset, dataset_2: Optional[Dat | |||
inputs_1 = dataset_1.df.to_dict("records") | |||
inputs_2 = dataset_2.df.loc[dataset_1.df.index].to_dict("records") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataset_2 can be None
@@ -84,7 +80,7 @@ def test_llm_ground_truth_similarity( | |||
def test_llm_as_a_judge_ground_truth_similarity( | |||
model: BaseModel, dataset: Dataset, prefix: str = "The requirement should be similar to: ", rng_seed: int = 1729 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefix isn't used
# Conflicts: # pdm.lock # tests/utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me except for minor comments, failing tests and a bunch of typing issues (we should add mypy or something similar)
I tested the new test_llm_ground_truth with the Hub, it worked fine
Quality Gate passedIssues Measures |
Should be good now, I just not sure about the first feedback regarding |
Support for custom LLMs (Mistral, OpenAI, Bedrock, etc.) in Giskard: