Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/gsk 2334 talk to my model mvp #1831

Merged
merged 191 commits into from
Apr 11, 2024
Merged
Show file tree
Hide file tree
Changes from 181 commits
Commits
Show all changes
191 commits
Select commit Hold shift + click to select a range
bfc87e5
Initial commit for the MVP of "Talk to my model" functionality.
AbSsEnT Dec 12, 2023
e8a3b6e
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Dec 12, 2023
3316bd4
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Dec 14, 2023
4452fd2
Defined the basic pipeline of the 'talk' function.
AbSsEnT Dec 14, 2023
847e8b8
Defined the Tool interface and the boilerplate for the first tool, wh…
AbSsEnT Dec 18, 2023
7ca2416
small addition
AbSsEnT Dec 18, 2023
b44386b
Added method to initialise tools objects, each time the method 'talk'…
AbSsEnT Dec 18, 2023
b727b77
Initial implementation of the "__call__" method.
AbSsEnT Dec 18, 2023
84318d1
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Dec 18, 2023
71a08b5
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of github.com:Gi…
AbSsEnT Dec 18, 2023
549c916
Bug fixes. Adapted flow to currently use legacy 'functions' instead o…
AbSsEnT Dec 19, 2023
66673a3
Debugged "predict from dataset" tool workflow. Debugged the tool work…
AbSsEnT Dec 20, 2023
4deb3fc
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Dec 20, 2023
0e98572
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Dec 20, 2023
e2781aa
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of github.com:Gi…
AbSsEnT Dec 20, 2023
66f7429
Merge pull request #1687 from Giskard-AI/feature/gsk-2335-query-predi…
AbSsEnT Dec 20, 2023
b46fd57
Initial implementation of the 'SHAPExplanationTool'.
AbSsEnT Dec 21, 2023
a836a1e
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Dec 21, 2023
4dcd490
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of github.com:Gi…
AbSsEnT Dec 21, 2023
777011a
Added handling an errors, while calling tools.
AbSsEnT Dec 21, 2023
19488bd
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT Dec 22, 2023
cd2bdf3
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT Dec 22, 2023
0a9315a
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of github.com:Gi…
AbSsEnT Dec 22, 2023
4db4482
Merge pull request #1696 from Giskard-AI/feature/gsk-2336-query-shap-…
AbSsEnT Dec 22, 2023
6b86d48
Moved more attributes and properties to the BaseTool, since they are …
AbSsEnT Dec 22, 2023
bbcf347
Changed PredictFromDataset tool's specification.
AbSsEnT Dec 22, 2023
e248aab
Adapting model.py to the use of tools API.
AbSsEnT Dec 22, 2023
394b2bb
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT Dec 22, 2023
dc155ee
Fully changed 'talk' method workflow to use tools API.
AbSsEnT Dec 22, 2023
9a70228
Added multiple toll calling for the SHAP explanation tool
AbSsEnT Dec 22, 2023
30f9184
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT Dec 22, 2023
10929e4
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Dec 22, 2023
3c6363a
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT Dec 22, 2023
6a1dbf8
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of github.com:Gi…
AbSsEnT Dec 22, 2023
110c804
Merge pull request #1702 from Giskard-AI/feature/gsk-2419-adapt-workf…
AbSsEnT Dec 22, 2023
cafedd6
Code refactoring.
AbSsEnT Dec 26, 2023
b4983d0
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT Jan 4, 2024
7e3fe0d
Fixed conflicts and merged latest changes from the main branch.
AbSsEnT Jan 4, 2024
126d19d
Initial implementation of the IssuesScannerTool, which gives user an …
AbSsEnT Jan 5, 2024
f2a8a0f
Merged and fixed conflicts of branch gsk-2367-migrate-openai-api-call…
AbSsEnT Jan 5, 2024
5c6ce77
Refactoring.
AbSsEnT Jan 5, 2024
6c378f7
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Jan 5, 2024
69dadaa
Merged and fixed conflicts from the feature/gsk-2367-migrate-openai-a…
AbSsEnT Jan 5, 2024
b25a5d2
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of github.com:Gi…
AbSsEnT Jan 5, 2024
cf5c7d1
Merge pull request #1717 from Giskard-AI/feature/gsk-2338-query-a-mod…
AbSsEnT Jan 5, 2024
4c1956a
Removed __futures__ import.
AbSsEnT Jan 5, 2024
a954aa4
Started implementing prediction from user input tool.
AbSsEnT Jan 5, 2024
4767a12
Implemented the final PredictUserInputTool.
AbSsEnT Jan 8, 2024
47eb3a0
Merge pull request #1722 from Giskard-AI/feature/gsk-2337-query-predi…
AbSsEnT Jan 8, 2024
41ccaa7
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT Jan 8, 2024
175fde1
Put the shap explanations calculation logic into separate module.
AbSsEnT Jan 8, 2024
976d347
Explicitly set target to the 'None', when creating Dataset, to omit w…
AbSsEnT Jan 8, 2024
ebbcd3c
Distributed the tools across separate dedicated modules for easier ma…
AbSsEnT Jan 8, 2024
e44ff13
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Jan 9, 2024
3d8fc91
Implemented history (context) persistence to enable dialogue regime b…
AbSsEnT Jan 9, 2024
583f323
Small refactoring.
AbSsEnT Jan 10, 2024
0de2adc
Merge pull request #1729 from Giskard-AI/feature/gsk-2421-history-per…
AbSsEnT Jan 10, 2024
b299208
Executed pre-commit hooks on all files.
AbSsEnT Jan 25, 2024
a892e69
Merged with gsk-2367
AbSsEnT Jan 25, 2024
a9ef45e
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Jan 25, 2024
c0ac0b8
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Jan 26, 2024
120a272
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT Jan 31, 2024
69c6645
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Jan 31, 2024
76418a7
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT Jan 31, 2024
1e16e07
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT Jan 31, 2024
71ef243
Update regarding new LLMClient API.
AbSsEnT Jan 31, 2024
27925e8
Updated `pdm.lock`
AbSsEnT Jan 31, 2024
4d0d41b
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT Jan 31, 2024
d9d18a6
Finalised adaptation to the new LLMClient API for the 'talk' function…
AbSsEnT Jan 31, 2024
52ddba5
Removed "_form_tool_calls" method.
AbSsEnT Jan 31, 2024
e80e1a9
Small fixes.
AbSsEnT Feb 4, 2024
8ef5a1a
Pulled changes from main.
AbSsEnT Feb 4, 2024
bee5037
Merge branch 'feature/gsk-2367-migrate-openai-api-call-from-functions…
AbSsEnT Feb 4, 2024
ea92292
Updated pdm.
AbSsEnT Feb 4, 2024
9fd4ab7
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Feb 6, 2024
59873e7
Updated pdm.
AbSsEnT Feb 6, 2024
ea0b7d6
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Feb 12, 2024
061b982
United the PredictDatasetInput and PredictUserInput tools into single…
AbSsEnT Feb 12, 2024
e08bda7
If we already see, that filtered dataset is of length 0, stop further…
AbSsEnT Feb 12, 2024
8bb46c3
Merge pull request #1799 from Giskard-AI/feature/gsk-2811-shap-calcul…
AbSsEnT Feb 12, 2024
a0888a2
Merge pull request #1800 from Giskard-AI/feature/gsk-2813-fuzzy-strin…
AbSsEnT Feb 12, 2024
bff9c98
Merged with the main branch.
AbSsEnT Feb 14, 2024
367d40e
Merge branch 'GSK-2754' of github.com:Giskard-AI/giskard into feature…
AbSsEnT Feb 14, 2024
40dd4fa
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Feb 18, 2024
27b3f75
Created the new tool to calculate model's performance metrics.
AbSsEnT Feb 18, 2024
e9d1c8a
Talk architecture polishing.
AbSsEnT Feb 19, 2024
dd4de98
Merge pull request #1807 from Giskard-AI/feature/gsk-2808-metrics-cal…
AbSsEnT Feb 19, 2024
e964762
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Feb 19, 2024
296658e
Improved the system prompt to:
AbSsEnT Feb 19, 2024
13ac1f6
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Feb 21, 2024
7f5efeb
System prompt improvement.
AbSsEnT Feb 21, 2024
77276eb
Merge pull request #1811 from Giskard-AI/feature/gsk-2423-prompts-imp…
AbSsEnT Feb 21, 2024
a366563
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Feb 22, 2024
f4d3587
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Feb 29, 2024
64191e2
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Feb 29, 2024
7302741
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of github.com:Gi…
AbSsEnT Feb 29, 2024
cf7862b
1) Updated to the latest gpt-4-turbo version;
AbSsEnT Feb 29, 2024
5cac021
Merge pull request #1825 from Giskard-AI/feature/gsk-2915-bug-fix-met…
AbSsEnT Feb 29, 2024
ad8f0ac
Bug fix.
AbSsEnT Feb 29, 2024
d0b7a30
Added better spacing to the instruct prompt.
AbSsEnT Feb 29, 2024
a32b7ec
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Mar 4, 2024
abc8ea8
Improved instruction to not provide generic answers.
AbSsEnT Mar 4, 2024
ffe9788
Added docstrings.
AbSsEnT Mar 4, 2024
ee84da6
Added docstrings.
AbSsEnT Mar 4, 2024
69631c5
Added docstrings.
AbSsEnT Mar 4, 2024
04ecfd9
Added docstrings.
AbSsEnT Mar 4, 2024
08c5888
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Mar 4, 2024
d186fe7
Added docstrings.
AbSsEnT Mar 4, 2024
497ce21
Added docstrings.
AbSsEnT Mar 4, 2024
f8d44a0
Merge branch 'main' into feature/gsk-2334-talk-to-my-model-mvp
Hartorn Mar 5, 2024
dbf13fd
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Mar 10, 2024
c488815
Updated typing with the respect to not using the __futures__.
AbSsEnT Mar 10, 2024
961b016
Replaced thefuzz.ratio() by the native difflib.SequenceMatcher().ratio()
AbSsEnT Mar 10, 2024
12d9ada
Removed optional list casting.
AbSsEnT Mar 10, 2024
f1dff00
Refactored the dataset filtering logic. Added comments.
AbSsEnT Mar 10, 2024
d8e987e
Removed useless casting to list.
AbSsEnT Mar 10, 2024
35011d7
Simplified assignment expression.
AbSsEnT Mar 10, 2024
44f29d1
Small fix.
AbSsEnT Mar 10, 2024
5e36b7a
Replaced by the object's method call.
AbSsEnT Mar 10, 2024
53b014c
Replaced the __str__ by the __repr__
AbSsEnT Mar 10, 2024
5141733
Moved fuzzy similarity threshold to the config.
AbSsEnT Mar 10, 2024
b7a639b
Small fix.
AbSsEnT Mar 10, 2024
ac8f157
Removed import BaseOpenAIClient from model.py
AbSsEnT Mar 10, 2024
bbe997d
1) The 'dataset' argument of the 'talk' is mandatory now.
AbSsEnT Mar 10, 2024
eaa6b74
Added clarifying comments, on why to use non-top-level imports, as we…
AbSsEnT Mar 11, 2024
0d29bd1
Added the possibility of configuring Talk LLM model through the env v…
AbSsEnT Mar 11, 2024
6f10a3d
Returned the from __future__ import annotations, since we accept such…
AbSsEnT Mar 11, 2024
2172428
Documented the reason, why to import functions not from the top-level.
AbSsEnT Mar 11, 2024
111bc35
Improved typing and docstrings.
AbSsEnT Mar 11, 2024
8cf7c92
[RESTORING] dataset is not mandatory parameter.
AbSsEnT Mar 11, 2024
d1a792e
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Mar 15, 2024
89e1ea4
Created the new group 'talk' for the 'talk-to-my-ml' feature dependen…
AbSsEnT Mar 15, 2024
ce1a6b1
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Mar 15, 2024
5d57757
Regenerating pdm.lock
Mar 15, 2024
1e19325
- Fixed ambiguity in calling for 'model performance'. Now, the metric…
AbSsEnT Mar 17, 2024
747a6a8
Merge remote-tracking branch 'origin/feature/gsk-2334-talk-to-my-mode…
AbSsEnT Mar 17, 2024
dc543cc
Regenerating pdm.lock
Mar 17, 2024
4685007
Created unit-tests for the 'talk' feature.
AbSsEnT Mar 17, 2024
ee5768f
Small fix.
AbSsEnT Mar 17, 2024
5883293
Regenerating pdm.lock
Mar 17, 2024
c1db839
Committing missing pytest file with unit-tests for the 'talk' feature.
AbSsEnT Mar 19, 2024
53b08d4
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Mar 22, 2024
bccbf5e
Update giskard/llm/talk/config.py
AbSsEnT Mar 22, 2024
e61f49b
Update giskard/llm/talk/config.py
AbSsEnT Mar 22, 2024
de23f72
Update giskard/llm/talk/config.py
AbSsEnT Mar 22, 2024
20cb2a6
Update giskard/llm/talk/config.py
AbSsEnT Mar 22, 2024
4473468
Update giskard/llm/talk/config.py
AbSsEnT Mar 22, 2024
2637058
Update giskard/llm/talk/config.py
AbSsEnT Mar 22, 2024
9ce2ec4
Fixed typos with GPT.
AbSsEnT Mar 22, 2024
48a2d96
Better exception raising logic.
AbSsEnT Mar 22, 2024
3d54d90
1) Specified, that model and dataset are mandatory parameters of tools.
AbSsEnT Mar 22, 2024
1599c3e
Update giskard/llm/talk/tools/metric.py
AbSsEnT Mar 22, 2024
b452853
Removed comments.
AbSsEnT Mar 22, 2024
701abea
Made features_json_type as a property.
AbSsEnT Mar 22, 2024
f40269f
Added `features_dict` validation logic.
AbSsEnT Mar 22, 2024
0fcb048
Replaced metrics calculation functions from sklearn to giskard
AbSsEnT Mar 22, 2024
16b7dd1
Fixed unit-tests by escaping regex-sensitive characters.
AbSsEnT Mar 24, 2024
9c264dd
Re-made unit-tests. Mocked LLM responses to avoid dependence on OpenA…
AbSsEnT Mar 24, 2024
757ee44
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Mar 26, 2024
e18f5fc
Fixed CI/CD errors:
AbSsEnT Mar 26, 2024
75e66c1
Regenerating pdm.lock
Mar 26, 2024
46cbe7f
Fixed CI/CD errors:
AbSsEnT Mar 26, 2024
ad57b31
Delete pdm.lock
rabah-khalek Apr 2, 2024
e4318dc
Regenerating pdm.lock
Apr 2, 2024
6ab7155
Created the docs page for the AI Quality Copilot.
AbSsEnT Apr 7, 2024
b4bcebb
Merged with main.
AbSsEnT Apr 7, 2024
307c352
Merge remote-tracking branch 'origin/feature/gsk-2334-talk-to-my-mode…
AbSsEnT Apr 7, 2024
5bc5e01
Regenerating pdm.lock
Apr 7, 2024
c54d0a7
Regenerating pdm.lock
Apr 7, 2024
663680f
Small docs fix.
AbSsEnT Apr 9, 2024
0171321
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Apr 9, 2024
ab8cfad
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' of github.com:Gi…
AbSsEnT Apr 9, 2024
436395c
Removed instruction because of redundancy.
AbSsEnT Apr 9, 2024
8666f32
Rewrote the initialization of all tools. Now only mandatory tool para…
AbSsEnT Apr 9, 2024
e1d816b
Introduced PredictionMixin class to abstract away common prediction n…
AbSsEnT Apr 9, 2024
cdf8229
Small docstring fix.
AbSsEnT Apr 9, 2024
f7e58ec
Added doc page for the AI Quality Copilot.
AbSsEnT Apr 9, 2024
b2c64f1
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Apr 9, 2024
f0d2818
Returned old page.
AbSsEnT Apr 9, 2024
66764e4
Returned old page.
AbSsEnT Apr 9, 2024
ad2be7f
Once again, I added the doc page for the AI Quality Copilot.
AbSsEnT Apr 9, 2024
5cf2beb
Delete pdm.lock
rabah-khalek Apr 10, 2024
f003572
Regenerating pdm.lock
Apr 10, 2024
8d2af8b
Merge branch 'main' into feature/gsk-2334-talk-to-my-model-mvp
rabah-khalek Apr 10, 2024
c746770
Delete pdm.lock
rabah-khalek Apr 10, 2024
05d2c82
Regenerating pdm.lock
Apr 10, 2024
b0d4e7a
Update talk_result.py
rabah-khalek Apr 10, 2024
479d91d
Returned the logic of tool calling to the LLMClient.
AbSsEnT Apr 10, 2024
d195484
Merge branch 'feature/gsk-2334-talk-to-my-model-mvp' into openai-clie…
AbSsEnT Apr 10, 2024
2982c02
Modified the LLMClient to support tool calling functionality as it wa…
AbSsEnT Apr 11, 2024
fe2f650
Merge branch 'main' of github.com:Giskard-AI/giskard into feature/gsk…
AbSsEnT Apr 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
223 changes: 223 additions & 0 deletions docs/open_source/ai_quality_copilot/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
# 🗣🤖💡AI Quality Copilot
> ⚠️ **AI Quality Copilot is currently in early version and is subject to change**. Feel free to reach out on our [Discord server](https://discord.gg/fkv7CAr3FE) if you have any trouble or to provide feedback.

Obtaining information about the ML model requires some coding effort. It may be time-consuming and create friction when
one seeks prediction results, explanations, or the model's performance. The reality is that the code simply translates
the user's intent, which can be described in natural language. For instance, the phrase 'What are the predictions for
women across the dataset?' could be converted to the `model.predict(df[df.sex == 'female'])` snippet. However, in many
cases, such transformation is not as straightforward and may require some effort compared to forming the query in
natural language.

To address this issue, we introduce the **AI Quality Copilot** - an LLM agent that facilitates accessing essential
information about the model's predictions, explanations, performance metrics, and issues through a natural language
interface. Users can prompt the Copilot, and it will provide the necessary information about the specified ML model. In
essence, instead of writing code, one can simply "talk" to the model.

## How it works?
The AI Quality Copilot is an LLM agent that, based on a user's input query, determines which function to call along
with its arguments. The output of these functions will then be used to provide an answer. To utilize it, all you need to
do is call the talk method of the Giskard model and provide a question about the model.

We implemented this feature using the OpenAI [function calling API](https://platform.openai.com/docs/guides/function-calling).
This approach expands the standard capabilities of LLM agents by enabling them to utilize a predefined set of python
functions or tools designed for various tasks. The concept is to delegate to the agent the decision on which tool to use
in order to provide a more effective response to the user's question.

AI Quality Copilot is capable of providing the following information about the ML model:
1. `Prediction` on records from the given dataset or the user's input.
2. `Prediction explanation` using the SHAP framework.
3. `Performance metrics` for classification and regression tasks.
4. `Performance issues` detected using the Giskard Scan.

## Before starting
First, ensure that you have installed the `talk` flavor of Giskard:
```bash
pip install "giskard[talk]"
```

To utilize the AI Quality Copilot, you'll need an OpenAI API key. You can set it in your notebook like this:
```python
import os

os.environ["OPENAI_API_KEY"] = "sk-…"
```

## Prepare the necessary artifacts
First, set up the Giskard dataset:
```python
giskard_dataset = giskard.Dataset(df, target=TARGET_COLUMN, name="Titanic dataset", cat_columns=CATEGORICAL_COLUMNS)
```

Next, it is mandatory to set up the Giskard model, which we will interact with. It's important to provide a detailed
description as well as the name of the model to enhance the responses of the AI Quality Copilot.
```python
giskard_model = giskard.Model(
model=prediction_function,
model_type="classification", # Currently, the Quality Copilot supports either classification or regression.
classification_labels=CLASSIFICATION_LABELS,
feature_names=FEATURE_NAMES,
# Important for the Quality Copilot.
name="Titanic binary classification model",
description="The binary classification model, which predicts, whether the passenger survived or not in the Titanic incident. \n"
"The model outputs yes, if the person survived, and no - if he died."
)
```

Lastly, generate the Giskard scan report. This dependency is optional, and if you don't need information about the
model's performance issues, you can omit this step.
```python
scan_result = giskard.scan(giskard_model, giskard_dataset)
```

## AI Quality Copilot
Let's finally try the AI Quality Copilot. The primary and only method to interact with it is through the `talk` method
of the Giskard model. Below is the API for the method:
```python
def talk(self, question: str, dataset: Dataset, scan_report: ScanReport = None, context: str = "") -> TalkResult:
"""Perform the 'talk' to the model.

Given `question`, allows to ask the model about prediction result, explanation, model performance, issues, etc.

Parameters
----------
question : str
User input query.
dataset : Dataset
Giskard Dataset to be analysed by the 'talk'.
context : str
Context of the previous 'talk' results. Necessary to keep context between sequential 'talk' calls.
scan_report : ScanReport
Giskard Scan Report to be analysed by the 'talk'.
"""
```

We'll start with a simple example. We'll ask the AI Quality Copilot what it can do:
```python
giskard_model.talk(question="What can you do?", dataset=giskard_dataset)
```

The agent's response is as follows:
```markdown
I can assist you with various tasks related to a Titanic binary classification model, which predicts whether a passenger survived or not in the Titanic incident. Here's what I can do:

1. **Predict Survival**: I can predict whether a passenger survived or not based on their details such as class, sex, age, number of siblings/spouses aboard, number of parents/children aboard, fare, and port of embarkation.

2. **Model Performance Metrics**: I can estimate model performance metrics such as accuracy, F1 score, precision, recall, R2 score, explained variance, mean squared error (MSE), and mean absolute error (MAE) for the Titanic survival prediction model.

3. **SHAP Explanations**: I can provide SHAP explanations for predictions, which help understand the impact of each feature on the model's prediction.

4. **Model Vulnerabilities Scan**: I can give you a summary of the model's vulnerabilities, such as unrobustness, underconfidence, unethical behavior, data leakage, performance bias, and more.

Please let me know how I can assist you!
```
In this example, the agent informs about the tasks it can perform without actually executing them, as there is no need
to do so.

### Prediction
Now, let's ask the agent to provide information about the model's prediction on a specific record from the dataset:
```python
giskard_model.talk(question="Have 'Minahan miss Daisy E' survived in the Titanic incident?", dataset=giskard_dataset)
```

Response:
```markdown
Yes, Minahan Miss Daisy E survived in the Titanic incident.
```
This example demonstrates how the AI Quality Copilot allows us to avoid writing code to obtain a prediction on a dataset
record. Instead, we can simply use natural language.

### Prediction explanation
The next example demonstrates how we can obtain the prediction explanation in a similar manner:
```python
giskard_model.talk(question="What was important for the survival result of Minahan, Miss. Daisy E?", dataset=giskard_dataset)
```
```markdown
For Minahan, Miss. Daisy E, the most important factors for the survival prediction (yes) were:
- Sex ('female') with a SHAP value of 0.493
- Pclass (1) with a SHAP value of 0.316
- Embarked ('Q') with a SHAP value of 0.068

These values indicate that being female, traveling in the first class, and embarking from 'Q' were significant contributors to the model's prediction of survival.
```
Once again, no manual coding is needed from us to obtain this result! Everything is managed and executed by the AI
Quality Copilot.

### Performance metrics
Now let's calculate the model's accuracy:
```python
giskard_model.talk(question="Calculate accuracy of the model.", dataset=giskard_dataset)
```
```markdown
The accuracy of the model is 0.79.
```

### Performance issues
Finally, we can ask an LLM agent about the model's performance issues. It's mandatory to provide the generated scan
report in this case.
```python
giskard_model.talk(question="Does the model has any performance issues or biases?", dataset=giskard_dataset, scan_report=result)
```

```markdown
The model has been identified to have several performance issues and biases, including:

1. **Robustness issues**: The model's predictions change when the "Name" feature is transformed to title case in 6.67% of the cases, which indicates a medium level of robustness issue.
2. **Overconfidence issues**: The model shows a significantly higher number of overconfident wrong predictions in several data slices, including when the "Name" contains "mr", when the text length of "Name" is less than 28.5, when "Fare" is less than 14.85, when "Sex" is "male", when "Parch" is 0, and when "SibSp" is 0. These issues are mostly major, indicating a high level of overconfidence in wrong predictions.
3. **Spurious Correlation issues**: There are minor issues related to spurious correlations, particularly with the "Sex" feature being highly associated with the survival prediction. For example, "female" is highly associated with "Survived" = "yes", and "male" is highly associated with "Survived" = "no".
4. **Performance issues**: The model has major performance issues in several data slices, including lower recall for records where "Name" contains "mr" and "Sex" is "male", lower precision for records where "Pclass" is 3, and various accuracy and precision issues in other specific conditions.

These findings suggest that the model may not perform equally well across different groups of passengers, indicating potential biases and vulnerabilities in its predictions.
```
As you can see, the LLM agent was able to represent all the performance issues the model has.

### Multiple questions in one call
Thanks to the ability of the function calling API to call multiple tools within a single OpenAI API request, we can
benefit from it by prompting multiple questions to the model within a single `talk` call. For example:
```python
giskard_model.talk(question="Calculate accuracy, f1, precision ans recall scores of the model. Summarise the result in a table", dataset=giskard_dataset)
```
```markdown
Here are the model performance metrics summarized in a table:

| Metric | Score |
|-----------|-------|
| Accuracy | 0.79 |
| F1 | 0.7 |
| Precision | 0.75 |
| Recall | 0.66 |
```
In this example, to calculate each metric, an LLM agent used a dedicated tool four times with different parameters
(metric type). And in doing so, we called the `talk` method only once instead of making four distinct calls. This further
reduces the need for writing repetitive code.

### Dialogue mode
By default, the `talk` calls are standalone, meaning they do not preserve the history. However, we can enable a
so-called 'dialogue' mode by passing the summary of the current `talk` call to the subsequent call as context. For
example, let's make two subsequent `talk` calls, where the latter question cannot be answered without having a summary
of the first call:
```python
talk_result = giskard_model.talk(question="Have 'Webber, miss. Susan' survived in the Titanic incident?", dataset=giskard_dataset)
giskard_model.talk(question="Can you explain me, why did she survive?", context=talk_result.summary, dataset=giskard_dataset)
```

```markdown
The model predicted that 'Webber, Miss. Susan' survived the Titanic incident primarily due to her sex being female, which had the highest SHAP value, indicating it was the most influential factor in the prediction. Other contributing factors include her traveling in 2nd class (Pclass) and her name, which might have been considered due to encoding specific information relevant to survival. Age and fare paid for the ticket also played minor roles in the prediction. However, the number of siblings/spouses aboard (SibSp), the number of parents/children aboard (Parch), and the port of embarkation (Embarked) did not significantly influence the prediction.
```

Without passing the `talk_result.summary` to the context of the second call, the response would be useless:
```markdown
To provide an explanation for why a specific individual survived, I would need more details about the person in question, such as their name, ticket class, age, or any other information that could help identify them in the dataset. Could you please provide more details?
```

## Frequently Asked Questions

> #### ℹ️ What data are being sent to OpenAI
>
> In order to perform the question generation, we will be sending the following information to OpenAI:
>
> - Data provided in your dataset
> - Model name and description

## Troubleshooting
If you encounter any issues, join our [Discord community](https://discord.gg/fkv7CAr3FE) and ask questions in
our #support channel.
5 changes: 5 additions & 0 deletions docs/open_source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,11 @@ integrate_tests/index
:link: testset_generation/index.html
::::

::::{grid-item-card} <br/><h3>🤖 AI Quality Copilot</h3>
:text-align: center
:link: ai_quality_copilot/index.html
::::

::::{grid-item-card} <br/><h3>🧪 Customize your tests</h3>
:text-align: center
:link: customize_tests/index.html
Expand Down
Loading
Loading