Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for custom LLM clients #1839

Merged
merged 70 commits into from
Apr 10, 2024
Merged

Support for custom LLM clients #1839

merged 70 commits into from
Apr 10, 2024

Conversation

mattbit
Copy link
Member

@mattbit mattbit commented Mar 11, 2024

Support for custom LLMs (Mistral, OpenAI, Bedrock, etc.) in Giskard:

  • simplify LLMClient, remove tools and function calls
  • simplified evaluator prompts using chat mode one-shot
  • added MistralClient
  • refactored OpenAI client can also support local models through Ollama, because it now doesn't need function tools

@mattbit mattbit changed the title Big refactoring: break everything, and rebuild simpler Support for custom LLMs Mar 11, 2024
@mattbit mattbit changed the title Support for custom LLMs Support for custom LLM clients Mar 11, 2024
Copy link
Member

@kevinmessiaen kevinmessiaen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to remove function_call and tools_call and replace them by asking the model to answer json formatted answers?

I guess this is necessary to allow any model to work but how does it impact the reliability of current scan?

Also, I know it might increase complexity but couldn't we have something like this instead so we keep the feature (not necessary if the switch won't impact scan reliability enough):

class Tool:
  tool_spec: Dict
  examples: Optional[List[Dict]]

  @property
  def to_format_instruction() -> str:
    pass

  def use_tool(chat_response) -> bool
    pass

  def parse_response(chat_response) -> List[Dict] # List because tool might be called more than once
    pass

class LLMClient(ABC):
  @abstractmethod
  def complete(..., tools: List[Tool] = [], tool_choice: Optional[Tool] = None):
     ...

class MistralClient(LLMClient):
  @abstractmethod
  def complete(..., tools: List[Tool] = [], tool_choice: Optional[Tool] = None):
    # This logic should be moved in an adapter so we can reuse for other LLMClient easily (and have cleaner code)
    if (len(tools) > 0):
      tools_instruction = '\n'.join([tool.to_format_instruction for tool in tools])
      if (tool_choice is not None):
        tools_instruction += f'\n You must call {tool_choice.tool_call['name']}'
      # TODO: replace '{tools_instruction}' in `messages`
      # 
    ...

# OpenAIClient use native tool_calls (if used model support it) 

I agree that we should keep the LLMClient as simple as possible. But I do believe that the client should be the one responsible to serialising the message to the API format and not the feature that need to be adapted to work for all LLMs (Ideally we should have something easy to understand in the scan/evaluator/... and the Client should be the translation based on the LLM's feature/needs). Feel free to disregard this comment if deemed not worth it to implement.

@mattbit
Copy link
Member Author

mattbit commented Mar 18, 2024

The idea is to remove function_call and tools_call and replace them by asking the model to answer json formatted answers?
I guess this is necessary to allow any model to work but how does it impact the reliability of current scan?

Yes, basically to be as flexible as possible in the support. By now, most of the models support some kind JSON format output (through response_format="json_object"), and in our tests the few-shot prompting in chat mode was just as effective.

Also, I know it might increase complexity but couldn't we have something like this instead so we keep the feature (not necessary if the switch won't impact scan reliability enough):
[...]
I agree that we should keep the LLMClient as simple as possible. But I do believe that the client should be the one responsible to serialising the message to the API format and not the feature that need to be adapted to work for all LLMs (Ideally we should have something easy to understand in the scan/evaluator/... and the Client should be the translation based on the LLM's feature/needs). Feel free to disregard this comment if deemed not worth it to implement.

Yeah, it would be nice to abstract tools but then we would end up reimplementing langchain. I think it's wise to keep our code to a minimum for what is not strictly required. I realized that my first implementation with function calls was over engineered. I feel it's worth waiting for for now, maybe at some point there will be some light open-source library replacing langchain ;)

@mattbit mattbit marked this pull request as ready for review April 5, 2024 14:02
@mattbit mattbit removed their assignment Apr 5, 2024
@luca-martial luca-martial removed the request for review from Hartorn April 5, 2024 14:25
if seed is not None:
extra_params["random_seed"] = seed

if format not in (None, "json", "json_object") and "large" not in self.model:
Copy link
Member

@andreybavt andreybavt Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format could be of Literal type

also pydantic's validate_call could do the validation job

return _default_embedding

# Try with OpenAI/AzureOpenAI if available
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: the two try blocks could be extracted into dedicated methods

...


def batched(iterable, batch_size):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be moved to utils.py

success_examples: Sequence[dict]
errors: Sequence[dict]
details: Optional[TestResultDetails] = None
results: Sequence[EvaluationResultExample] = field(default_factory=list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're calling .append below this should rather be a List, not a Sequence


return result

def _evaluate_sample(self, model: BaseModel, sample: Dict) -> Tuple[bool, str, Dict]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return type should be Tuple[bool, str]


if len(out.tool_calls) != 1 or "passed_test" not in out.tool_calls[0].function.arguments:
raise LLMGenerationError("Invalid function call arguments received")
def _format_messages(self, model: BaseModel, sample: Dict) -> Sequence[ChatMessage]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing meta argument

@@ -62,50 +68,56 @@ def evaluate(self, model: BaseModel, dataset_1: Dataset, dataset_2: Optional[Dat
inputs_1 = dataset_1.df.to_dict("records")
inputs_2 = dataset_2.df.loc[dataset_1.df.index].to_dict("records")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataset_2 can be None

@@ -84,7 +80,7 @@ def test_llm_ground_truth_similarity(
def test_llm_as_a_judge_ground_truth_similarity(
model: BaseModel, dataset: Dataset, prefix: str = "The requirement should be similar to: ", rng_seed: int = 1729
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefix isn't used

@andreybavt andreybavt added the Lockfile Temporary label to update pdm.lock label Apr 9, 2024
Copy link
Member

@andreybavt andreybavt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me except for minor comments, failing tests and a bunch of typing issues (we should add mypy or something similar)

I tested the new test_llm_ground_truth with the Hub, it worked fine

@github-actions github-actions bot removed the Lockfile Temporary label to update pdm.lock label Apr 9, 2024
Copy link

sonarcloud bot commented Apr 10, 2024

@kevinmessiaen
Copy link
Member

Generally looks good to me except for minor comments, failing tests and a bunch of typing issues (we should add mypy or something similar)

I tested the new test_llm_ground_truth with the Hub, it worked fine

Should be good now, I just not sure about the first feedback regarding Literal

@andreybavt andreybavt merged commit ad84e47 into main Apr 10, 2024
16 checks passed
@andreybavt andreybavt deleted the feature/llm-support branch April 10, 2024 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants