Merge pull request #1903 from Giskard-AI/doc/upgrade-openai

Upgrade openai version in documentation notebooks
Giskard-AI · May 13, 2024 · 88e3cd4 · 88e3cd4
2 parents 3cceb5a + 0e2393e
commit 88e3cd4
Show file tree

Hide file tree

Showing 11 changed files with 4,983 additions and 831 deletions.
diff --git a/docs/open_source/scan/scan_llm/index.md → docs/open_source/scan/scan_llm/first.md b/docs/open_source/scan/scan_llm/index.md → docs/open_source/scan/scan_llm/first.md
@@ -1,4 +1,4 @@
-# 📚  LLM scan
+# 📚 LLM scan
 
 The Giskard python library provides an automatic scan functionality designed to automatically
 detect [potential vulnerabilities](https://docs.giskard.ai/en/stable/knowledge/llm_vulnerabilities/index.html) affecting
@@ -15,11 +15,13 @@ Differently from other techniques that focus on benchmarking a foundation LLM, G
 **in-depth assessments on domain-specific models.** This includes chatbots, **question answering systems, and
 retrieval-augmented generation (RAG) models.**
 
-You can find detailed information about the inner workings of the LLM scan {doc}`here </knowledge/llm_vulnerabilities/index>`.
+You can find detailed information about the inner workings of the LLM scan
+{doc}`here </knowledge/llm_vulnerabilities/index>`.
 
 ### What data are being sent to Language Model Providers
 
-In order to perform tasks with LLM-assisted detectors, we send the following information to the selected language model provider (e.g., OpenAI, Azure OpenAI, Ollama, Mistral):
+In order to perform tasks with LLM-assisted detectors, we send the following information to the selected language model
+provider (e.g., OpenAI, Azure OpenAI, Ollama, Mistral):
 
 - Data provided in your Dataset
 - Text generated by your model
@@ -29,12 +31,16 @@ Note that this does not apply if you select a self-hosted model.
 
 ### Will the scan work in any language?
 
-Most of the detectors ran by the scan should work with any language, however the effectiveness of **LLM-assisted detectors** largely depends on the language capabilities of the specific language model in use. While many LLMs have broad multilingual capacities, the performance and accuracy may vary based on the model and the specific language being processed.
+Most of the detectors ran by the scan should work with any language, however the effectiveness of **LLM-assisted
+detectors** largely depends on the language capabilities of the specific language model in use. While many LLMs have
+broad multilingual capacities, the performance and accuracy may vary based on the model and the specific language being
+processed.
 
 ## Before starting
 
-In the following example, we illustrate the procedure using **OpenAI** and **Azure OpenAI**; however, please note that our platform supports a variety of language models. For details on configuring different models, visit our [🤖 Setting up the LLM Client page](../../open_source/setting_up/index.md)
-
+In the following example, we illustrate the procedure using **OpenAI** and **Azure OpenAI**; however, please note that
+our platform supports a variety of language models. For details on configuring different models, visit
+our [🤖 Setting up the LLM Client page](../../open_source/setting_up/index.md)
 
 Before starting, make sure you have installed the LLM flavor of Giskard:
 
@@ -59,6 +65,7 @@ giskard.llm.set_llm_api("openai")
 oc = OpenAIClient(model="gpt-4-turbo-preview")
 giskard.llm.set_default_client(oc)
 ```
+
 ::::::
 ::::::{tab-item} Azure OpenAI
 
@@ -72,14 +79,14 @@ os.environ['AZURE_OPENAI_API_KEY'] = '...'
 os.environ['AZURE_OPENAI_ENDPOINT'] = 'https://xxx.openai.azure.com'
 os.environ['OPENAI_API_VERSION'] = '2023-07-01-preview'
 
-
 # You'll need to provide the name of the model that you've deployed
 # Beware, the model provided must be capable of using function calls
 set_llm_model('my-gpt-4-model')
 ```
 
 ::::::
 ::::::{tab-item} Mistral
+
 ```python
 import os
 from giskard.llm.client.mistral import MistralClient
@@ -92,6 +99,7 @@ giskard.llm.set_default_client(mc)
 
 ::::::
 ::::::{tab-item} Ollama
+
 ```python
 import giskard
 from openai import OpenAI
@@ -102,6 +110,7 @@ _client = OpenAI(base_url="http://localhost:11434/v1/", api_key="ollama")
 oc = OpenAIClient(model="gemma:2b", client=_client)
 giskard.llm.set_default_client(oc)
 ```
+
 ::::::
 ::::::{tab-item} Claude 3
 
@@ -119,14 +128,14 @@ giskard.llm.set_default_client(claude_client)
 
 ::::::
 ::::::{tab-item} Custom Client
+
 ```python
 import giskard
 from typing import Sequence, Optional
 from giskard.llm.client import set_default_client
 from giskard.llm.client.base import LLMClient, ChatMessage
 
 
-
 class MyLLMClient(LLMClient):
     def __init__(self, my_client):
         self._client = my_client
@@ -171,6 +180,7 @@ class MyLLMClient(LLMClient):
 
         return ChatMessage(role="assistant", message=data["completion"])
 
+
 set_default_client(MyLLMClient())
 
 ```
@@ -181,6 +191,7 @@ set_default_client(MyLLMClient())
 We are now ready to start.
 
 (model-wrapping)=
+
 ## Step 1: Wrap your model
 
 Start by **wrapping your model**. This step is necessary to ensure a common format for your model and its metadata.
@@ -206,6 +217,7 @@ def model_predict(df: pd.DataFrame):
     """
     return [llm_api(question) for question in df["question"].values]
 
+
 # Create a giskard.Model object. Don’t forget to fill the `name` and `description`
 # parameters: they will be used by our scan to generate domain-specific tests.
 giskard_model = giskard.Model(
@@ -228,7 +240,8 @@ from langchain import OpenAI, LLMChain, PromptTemplate
 
 # Example chain
 llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0)
-prompt = PromptTemplate(template="You are a generic helpful assistant. Please answer this question: {question}", input_variables=["question"])
+prompt = PromptTemplate(template="You are a generic helpful assistant. Please answer this question: {question}",
+                        input_variables=["question"])
 chain = LLMChain(llm=llm, prompt=prompt)
 
 # Create a giskard.Model object. Don’t forget to fill the `name` and `description`
@@ -249,12 +262,13 @@ langchain chain and an OpenAI model. Extending the `giskard.Model` class allows
 Giskard Hub of complex models which cannot be automatically serialized with `pickle`.
 
 You will have to implement just three methods:
+
 - `model_predict`: This method takes a `pandas.DataFrame` with columns corresponding to the input variables of your
-                   model and returns a sequence of outputs (one for each record in the dataframe).
+  model and returns a sequence of outputs (one for each record in the dataframe).
 - `save_model`: This method is handles the serialization of your model. You can use it to save your model's state,
-                including the information retriever or any other element your model needs to work.
+  including the information retriever or any other element your model needs to work.
 - `load_model`: This class method handles the deserialization of your model. You can use it to load your model's state,
-                including the information retriever or any other element your model needs to work.
+  including the information retriever or any other element your model needs to work.
 
 ```python
 from langchain import OpenAI, PromptTemplate, RetrievalQA
@@ -264,6 +278,7 @@ llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0)
 prompt = PromptTemplate(template=YOUR_PROMPT_TEMPLATE, input_variables=["question", "context"])
 climate_qa_chain = RetrievalQA.from_llm(llm=llm, retriever=get_context_storage().as_retriever(), prompt=prompt)
 
+
 # Define a custom Giskard model wrapper for the serialization.
 class FAISSRAGModel(giskard.Model):
     def model_predict(self, df: pd.DataFrame):
@@ -314,11 +329,16 @@ For further examples, check out the {doc}`LLM tutorials section </tutorials/llm_
 
 * <mark style="color:red;">**`Mandatory parameters`**</mark>
     * `model`: A prediction function that takes a `pandas.DataFrame` as input and returns a string.
-    * `model_type`: The type of model, either `regression`, `classification` or `text_generation`. For LLMs, this is always `text_generation`.
-    * `name`: A descriptive name to the wrapped model to identify it in metadata. E.g. "Climate Change Question Answering".
-    * `description`: A detailed description of what the model does, this is used to generate prompts to test during the scan.
-    * `feature_names`: A list of the column names of your feature. By default, `feature_names` are all the columns in your
+    * `model_type`: The type of model, either `regression`, `classification` or `text_generation`. For LLMs, this is
+      always `text_generation`.
+    * `name`: A descriptive name to the wrapped model to identify it in metadata. E.g. "Climate Change Question
+      Answering".
+    * `description`: A detailed description of what the model does, this is used to generate prompts to test during the
+      scan.
+    * `feature_names`: A list of the column names of your feature. By default, `feature_names` are all the columns in
+      your
       dataset. Make sure these features are all present and in the same order as they are in your training dataset.
+
 </details>
 
 ## Step 2: Scan your model

diff --git a/docs/open_source/scan/scan_llm/index.rst b/docs/open_source/scan/scan_llm/index.rst
@@ -0,0 +1,8 @@
+.. include:: first.md
+   :parser: myst_parser.sphinx_
+
+.. raw:: html
+   :file: scan_result_iframe.html
+
+.. include:: second.md
+   :parser: myst_parser.sphinx_