![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Use AutoAI RAG and Chroma to create a pattern and get information from `ibm-watsonx-ai` SDK documentation

#### Disclaimers

- Use only Projects and Spaces that are available in watsonx context.


## Notebook content

This notebook contains the steps and code to demonstrate the usage of IBM AutoAI RAG. The AutoAI RAG experiment conducted in this notebook uses data scraped from the `ibm-watsonx-ai` SDK documentation.

Some familiarity with Python is helpful. This notebook uses Python 3.11.


## Learning goal

The learning goals of this notebook are:

- Create an AutoAI RAG job that will find the best RAG pattern based on provided data


## Contents

This notebook contains the following parts:

- [Setup](#setup)
- [RAG Optimizer definition](#definition)
- [RAG Experiment run](#run)
- [RAG Patterns comparison and testing](#comparison)
- [Historical runs](#runs)
- [Clean up](#cleanup)
- [Summary and next steps](#summary)

<a id="setup"></a>
## Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a <a href="https://cloud.ibm.com/catalog/services/watsonxai-runtime" target="_blank" rel="noopener no referrer">watsonx.ai Runtime Service</a> instance (a free plan is offered and information about how to create the instance can be found <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/wml-plans.html?context=wx&audience=wdp" target="_blank" rel="noopener no referrer">here</a>).

### Install and import the required modules and dependencies

In [None]:
!pip install -U 'ibm-watsonx-ai[rag]>=1.2.4' | tail -n 1
!pip install -U "langchain_community>=0.3,<0.4" | tail -n 1

### Defining the watsonx.ai credentials
This cell defines the credentials required to work with the watsonx.ai Runtime service.

**Action:** Provide the IBM Cloud user API key. For details, see <a href="https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui" target="_blank" rel="noopener no referrer">documentation</a>.

In [None]:
import getpass
from ibm_watsonx_ai import Credentials

credentials = Credentials(
    url="https://us-south.ml.cloud.ibm.com",
    api_key=getpass.getpass("Please enter your watsonx.ai api key (hit enter): "),
)

### Defining the project id
The foundation model requires a project id that provides the context for the call. We will try to obtain the id directly from the project in which this notebook runs. If this fails, you'll have to provide the project id.

In [2]:
import os

try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

Create an instance of APIClient with authentication details.

In [2]:
from ibm_watsonx_ai import APIClient

client = APIClient(credentials=credentials, project_id=project_id)

<a id="definition"></a>

## RAG Optimizer definition

### Defining a connection to training data

Upload training data to a COS bucket and then define a connection to this file. This example uses the `Base` description from the [`ibm_watsonx_ai`](https://ibm.github.io/watsonx-ai-python-sdk/fm_model_inference.html) documentation.

The code in the next cell uploads training data to the bucket.

In [None]:
import os
import requests

url = "https://ibm.github.io/watsonx-ai-python-sdk/base.html"

document_filename = "base.html"

response = requests.get(url)

response.raise_for_status()

if not os.path.isfile(document_filename):
    with open(document_filename, "w", encoding="utf-8") as file:
        file.write(response.text)

document_asset_details = client.data_assets.create(name=document_filename, file_path=document_filename)

document_asset_id = client.data_assets.get_id(document_asset_details)
document_asset_id

Creating data asset...
SUCCESS


'4f76e9c4-724e-45a2-8099-2d93f2746db3'

Define a connection to training data.

In [4]:
from ibm_watsonx_ai.helpers import DataConnection

input_data_references = [DataConnection(data_asset_id=document_asset_id)]

### Defining a connection to test data

Upload a `json` file that will be used for benchmarking to COS and then define a connection to this file. This example uses content from the [`ibm_watsonx_ai`](https://ibm.github.io/watsonx-ai-python-sdk/index.html) SDK documentation.

In [5]:
benchmarking_data_IBM_page_content = [
    {
        "question": "How can you set or refresh user request headers using the APIClient class?",
        "correct_answer": "client.set_headers({'Authorization': 'Bearer <token>'})",
        "correct_answer_document_ids": [
            "base.html"
        ]
    },
    {
        "question": "How to initialise Credentials object with api_key",
        "correct_answer": "credentials = Credentials(url = 'https://us-south.ml.cloud.ibm.com', api_key = '***********')",
        "correct_answer_document_ids": [
            "base.html"
        ]
    }
]

The code in the next cell uploads testing data to the bucket as a `json` file.

In [6]:
import json

test_filename = "benchmarking_data_Base.json"

if not os.path.isfile(test_filename):
    with open(test_filename, "w") as json_file:
        json.dump(benchmarking_data_IBM_page_content, json_file, indent=4)

test_asset_details = client.data_assets.create(name=test_filename, file_path=test_filename)

test_asset_id = client.data_assets.get_id(test_asset_details)
test_asset_id

Creating data asset...
SUCCESS


'84b59630-65a4-466d-b174-400928fb9634'

Define connection information to testing data.

In [7]:
test_data_references = [DataConnection(data_asset_id=test_asset_id)]

### RAG Optimizer configuration

Provide the input information for AutoAI RAG optimizer:
- `name` - experiment name
- `description` - experiment description
- `max_number_of_rag_patterns` - maximum number of RAG patterns to create
- `optimization_metrics` - target optimization metrics

In [None]:
from ibm_watsonx_ai.experiment import AutoAI

experiment = AutoAI(credentials, project_id=project_id)

rag_optimizer = experiment.rag_optimizer(
    name='AutoAI RAG run - Base documentation',
    description="AutoAI RAG Optimizer on ibm_watsonx_ai Base documentation",
    foundation_models=["ibm/granite-13b-chat-v2"],
    embedding_models=["ibm/slate-125m-english-rtrvr"],
    retrieval_methods=["simple"],
    chunking=[
        {
            "chunk_size": 512,
            "chunk_overlap": 64,
            "method": "recursive"
        }
    ],
    max_number_of_rag_patterns=4,
    optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS]
)

Configuration parameters can be retrieved via `get_params()`.

In [9]:
rag_optimizer.get_params()

{'name': 'AutoAI RAG run - ModelInference documentation',
 'description': 'AutoAI RAG Optimizer on ibm_watsonx_ai ModelInference documentation',
 'chunking': [{'chunk_size': 512, 'chunk_overlap': 64, 'method': 'recursive'}],
 'embedding_models': ['ibm/slate-125m-english-rtrvr'],
 'retrieval_methods': ['simple'],
 'foundation_models': ['ibm/granite-13b-chat-v2'],
 'max_number_of_rag_patterns': 4,
 'optimization_metrics': ['answer_correctness']}

<a id="run"></a>
## RAG Experiment run

Call the `run()` method to trigger the AutoAI RAG experiment. You can either use interactive mode (synchronous job) or background mode (asynchronous job) by specifying `background_mode=True`.

In [10]:
run_details = rag_optimizer.run(
    input_data_references=input_data_references,
    test_data_references=test_data_references,
    background_mode=False
)



##############################################

Running 'efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69'

##############################################


pending.................
running....
completed
Training of 'efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69' finished successfully.


You can use the `get_run_status()` method to monitor AutoAI RAG jobs in background mode.

In [11]:
rag_optimizer.get_run_status()

'completed'

<a id="comparison"></a>
## Comparison and testing of RAG Patterns

You can list the trained patterns and information on evaluation metrics in the form of a Pandas DataFrame by calling the `summary()` method. You can use the DataFrame to compare all discovered patterns and select the one you like for further testing.

In [12]:
summary = rag_optimizer.summary()
summary

Unnamed: 0_level_0,mean_answer_correctness,mean_faithfulness,mean_context_correctness,chunking.method,chunking.chunk_size,chunking.chunk_overlap,embeddings.model_id,vector_store.distance_metric,retrieval.method,retrieval.number_of_chunks,generation.model_id
Pattern_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Pattern4,0.7083,0.2317,1.0,recursive,512,64,ibm/slate-125m-english-rtrvr,cosine,simple,3,ibm/granite-13b-chat-v2
Pattern1,0.5833,0.2045,1.0,recursive,512,64,ibm/slate-125m-english-rtrvr,cosine,simple,5,ibm/granite-13b-chat-v2
Pattern2,0.5833,0.2372,1.0,recursive,512,64,ibm/slate-125m-english-rtrvr,euclidean,simple,5,ibm/granite-13b-chat-v2
Pattern3,0.5833,0.2117,1.0,recursive,512,64,ibm/slate-125m-english-rtrvr,euclidean,simple,3,ibm/granite-13b-chat-v2


Additionally, you can pass the `scoring` parameter to the summary method, to filter RAG patterns starting with the best.

```python
summary = rag_optimizer.summary(scoring="faithfulness")
```

In [13]:
rag_optimizer.get_run_details()

{'entity': {'completed_at': '2025-01-10T10:15:30.808Z',
  'hardware_spec': {'id': 'a6c4923b-b8e4-444c-9f43-8a7ec3020110', 'name': 'L'},
  'input_data_references': [{'location': {'href': '/v2/assets/4f76e9c4-724e-45a2-8099-2d93f2746db3?project_id=b9156b62-8f2a-4a40-8570-990fdd5d67cb',
     'id': '4f76e9c4-724e-45a2-8099-2d93f2746db3'},
    'type': 'data_asset'}],
  'message': {'level': 'info', 'text': 'AAR019I: AutoAI execution completed.'},
  'parameters': {'constraints': {'chunking': [{'chunk_overlap': 64,
      'chunk_size': 512,
      'method': 'recursive'}],
    'embedding_models': ['ibm/slate-125m-english-rtrvr'],
    'foundation_models': ['ibm/granite-13b-chat-v2'],
    'max_number_of_rag_patterns': 4,
    'retrieval_methods': ['simple']},
   'optimization': {'metrics': ['answer_correctness']},
   'output_logs': True},
  'results': [{'context': {'iteration': 1,
     'max_combinations': 4,
     'rag_pattern': {'composition_steps': ['chunking',
       'embeddings',
       'vector_s

### Get selected pattern

Get the RAGPattern object from the RAG Optimizer experiment. By default, the RAGPattern of the best pattern is returned.

In [None]:
best_pattern_name = summary.index.values[0]
print('Best pattern is:', best_pattern_name)

best_pattern = rag_optimizer.get_pattern(pattern_name="Pattern1")
best_pattern

The pattern details can be retrieved by calling the `get_pattern_details` method:

```python
rag_optimizer.get_pattern_details(pattern_name='Pattern2')
```

### Create the index/collection

Build solution on the best pattern, with additional document indexing.

You can check which `index_name` you are working on:

```python
best_pattern.vector_store._index_name
```

In [15]:
from langchain_community.document_loaders import WebBaseLoader

urls = [
    "https://ibm.github.io/watsonx-ai-python-sdk/fm_embeddings.html",
    "https://ibm.github.io/watsonx-ai-python-sdk/fm_custom_models.html",
    "https://ibm.github.io/watsonx-ai-python-sdk/fm_text_extraction.html"
]
docs_list = WebBaseLoader(urls).load()
doc_splits = best_pattern.chunker.split_documents(docs_list)

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [16]:
best_pattern.indexing_function(doc_splits)

['26e3b6934e2d26b5016e48a72f8066e6e6c46f842921dc3850a9d7e90db422b7',
 '2a41a4ffaa8030ef315178951f17656100e491e07cbc27e3fd5c246e47297470',
 'df6e5765dcfdfa245e80b5f526c4b6ae9bf661b94f46263f6d10fc235670a649',
 '3cafdf9aa72070f81f85138d6548290f69ce1daf06c2ff77833acb109c54daf1',
 '800812f60df6a099a6540f30db77eada51455f15d781f1ecc59a830206e9ee4a',
 '5c177d22821e150afc95ea18a34a94bd4e4cc18fc906fa9f48067a73e4fd9a14',
 'd83f23a1d666979ab04abca6229d981af044ce09fbef23a0d180eb5756c50bc8',
 '987e2db87207c400067195d32d74c46002c6dbea3d63cbcf2bf7739375075424',
 '75ad5d3f99270fe00cfea3f68f9825fabd66206746ab7b62fe7d7533cb249c78',
 '03222e428bf11a47ab5b4622437bb34aca042c6e889fbca7817b1d80500e954a',
 '9d83427cd6341b0d3be46ae5ecef1dafb4ddcafb63540ec12a005c9d1deb92bf',
 'da6f13c46cf0bd9e4fc79ef9f80cb89a42fcd260bc9c9397bfa349333fc312eb',
 '710988569ce081d6edc51cc23903d248a2017451afc6b75c36e69a34ee02c468',
 '531720786df6b24f44b66afb5e99e5d16f51d78f3a399df9f24e929e0c6e37ae',
 '9673f7cc44e2c6b59af016a5f97efe77

Query the RAGPattern locally, to test it.

In [17]:
questions = ["How to add Task Credentials?"]

payload = {
    client.deployments.ScoringMetaNames.INPUT_DATA: [
        {
            "values": questions,
            "access_token": client.token
        }
    ]
}

best_pattern.inference_function()(payload)

{'predictions': [{'fields': ['answer', 'reference_documents'],
   'values': [["\nTo add Task Credentials, you can use the `client.task_credentials.store()` method. This method requires no parameters and will create new task credentials if they do not already exist. If the list of task credentials is empty, this method will automatically add them.\n\nHere's an example of how to add Task Credentials:\n\n```python\nfrom ibm_watsonx_ai import APIClient\n\n# Initialize the APIClient object if needed\nfrom ibm_watsonx_ai import APIClient\nclient = APIClient(credentials)\n\n# Add Task Credentials\nclient.task_credentials.store()\n```\n\nNote: If you are using a custom foundation model, you will need to add Task Credentials before deploying the model. Failure to do so will result in token expiration issues.",
     [{'page_content': 'With task credentials, you can deploy a custom foundation model and avoid token expiration issues.\nFor more details, see Adding task credentials.\nTo list availab

<a id="runs"></a>
## Historical runs

In this section you learn to work with historical RAG Optimizer jobs (runs).

To list historical runs use the `list()` method and provide the `'rag_optimizer'` filter.

In [24]:
experiment.runs(filter='rag_optimizer').list()

Unnamed: 0,timestamp,run_id,state,auto_pipeline_optimizer name
0,2025-01-10T10:15:30.971Z,efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69,completed,AutoAI RAG run - ModelInference documentation
1,2025-01-09T15:13:26.515Z,555cb99c-925b-4f71-9e09-83533ed22fd3,completed,AutoAI RAG run - ModelInference documentation
2,2025-01-09T12:58:25.539Z,e0b4281c-8908-433b-a762-b68c9a7e3b09,completed,AutoAI RAG run - ModelInference documentation
3,2025-01-09T09:49:10.264Z,71d650bb-c357-468a-87cb-e461242c68b3,completed,AutoAI RAG run - ModelInference documentation


In [None]:
run_id = run_details['metadata']['id']
run_id

'efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69'

### Get executed optimizer's configuration parameters

In [26]:
experiment.runs.get_rag_params(run_id=run_id)

{'name': 'AutoAI RAG run - ModelInference documentation',
 'description': 'AutoAI RAG Optimizer on ibm_watsonx_ai ModelInference documentation',
 'chunking': [{'chunk_overlap': 64, 'chunk_size': 512, 'method': 'recursive'}],
 'embedding_models': ['ibm/slate-125m-english-rtrvr'],
 'retrieval_methods': ['simple'],
 'foundation_models': ['ibm/granite-13b-chat-v2'],
 'max_number_of_rag_patterns': 4,
 'optimization_metrics': ['answer_correctness']}

### Get historical rag_optimizer instance and training details

In [27]:
historical_opt = experiment.runs.get_rag_optimizer(run_id)

### List trained patterns for selected optimizer

In [28]:
historical_opt.summary()

Unnamed: 0_level_0,mean_answer_correctness,mean_faithfulness,mean_context_correctness,chunking.method,chunking.chunk_size,chunking.chunk_overlap,embeddings.model_id,vector_store.distance_metric,retrieval.method,retrieval.number_of_chunks,generation.model_id
Pattern_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Pattern4,0.7083,0.2317,1.0,recursive,512,64,ibm/slate-125m-english-rtrvr,cosine,simple,3,ibm/granite-13b-chat-v2
Pattern1,0.5833,0.2045,1.0,recursive,512,64,ibm/slate-125m-english-rtrvr,cosine,simple,5,ibm/granite-13b-chat-v2
Pattern2,0.5833,0.2372,1.0,recursive,512,64,ibm/slate-125m-english-rtrvr,euclidean,simple,5,ibm/granite-13b-chat-v2
Pattern3,0.5833,0.2117,1.0,recursive,512,64,ibm/slate-125m-english-rtrvr,euclidean,simple,3,ibm/granite-13b-chat-v2


<a id="cleanup"></a>
## Clean up

To delete the current experiment, use the `cancel_run` method.

**Warning:** Be careful: once you delete an experiment, you will no longer be able to refer to it.

In [29]:
rag_optimizer.cancel_run()

'SUCCESS'

If you want to clean up all created assets:
- experiments
- trainings
- pipelines
- model definitions
- models
- functions
- deployments

please follow up this sample [notebook](https://github.com/IBM/watson-machine-learning-samples/blob/master/cloud/notebooks/python_sdk/instance-management/Machine%20Learning%20artifacts%20management.ipynb).

<a id="summary"></a>
## Summary and next steps

You successfully completed this notebook!.

You learned how to use `ibm-watsonx-ai` to run AutoAI RAG experiments. 

 Check out our _<a href="https://ibm.github.io/watsonx-ai-python-sdk/samples.html" target="_blank" rel="noopener no referrer">Online Documentation</a>_ for more samples, tutorials, documentation, how-tos, and blog posts. 

### Authors

**Mateusz Szewczyk**, Software Engineer watsonx.ai

Copyright © 2024-2025 IBM. This notebook and its source code are released under the terms of the MIT License.