![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Use AutoAI RAG and Milvus database to work with `ibm-watsonx-ai` SDK documentation.

#### Disclaimers

- Use only Spaces that are available in watsonx context.


## Notebook content

This notebook contains the steps and code to demonstrate the usage of IBM AutoAI RAG. The AutoAI RAG experiment conducted in this notebook uses data scraped from the `ibm-watsonx-ai` SDK documentation.

Some familiarity with Python is helpful. This notebook uses Python 3.11.


## Learning goal

The learning goals of this notebook are:

- Create an AutoAI RAG job that will find the best RAG pattern based on provided data


## Contents

This notebook contains the following parts:

- [Setup](#setup)
- [RAG Optimizer definition](#definition)
- [RAG Experiment run](#run)
- [RAG Patterns comparison and testing](#comparison)
- [Historical runs](#runs)
- [Clean up](#cleanup)
- [Summary and next steps](#summary)

<a id="setup"></a>
## Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a <a href="https://cloud.ibm.com/catalog/services/watson-machine-learning" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> instance (a free plan is offered; information about how to create the instance can be found <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/wml-plans.html?context=wx&audience=wdp" target="_blank" rel="noopener no referrer">here</a>).

### Install and import the required modules and dependencies

In [None]:
!pip install -U wget | tail -n 1
!pip install -U 'ibm-watsonx-ai[rag]>=1.1.11' | tail -n 1

### Defining the WML credentials
This cell defines the credentials required to work with the Watson Machine Learning service.

**Action:** Provide the IBM Cloud user API key. For details, see <a href="https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui" target="_blank" rel="noopener no referrer">documentation</a>.

In [None]:
import getpass
from ibm_watsonx_ai import Credentials

credentials = Credentials(
    url="https://us-south.ml.cloud.ibm.com",
    api_key=getpass.getpass("Please enter your WML api key (hit enter): "),
)

### Working with spaces

You need to create a space that will be used for your work. If you do not have a space, you can use [Deployment Spaces Dashboard](https://dataplatform.cloud.ibm.com/ml-runtime/spaces?context=wx) to create one.

- Click **New Deployment Space**
- Create an empty space
- Select Cloud Object Storage
- Select Watson Machine Learning instance and press **Create**
- Go to **Manage** tab
- Copy `Space GUID` and paste it below

**Tip**: You can also use SDK to prepare the space for your work. More information can be found [here](https://github.com/IBM/watson-machine-learning-samples/blob/master/cloud/notebooks/python_sdk/instance-management/Space%20management.ipynb).

**Action**: assign space ID below

In [None]:
space_id = 'PASTE YOUR SPACE GUID HERE'

Create an instance of APIClient with authentication details.

In [7]:
from ibm_watsonx_ai import APIClient

client = APIClient(credentials=credentials, space_id=space_id)

<a id="definition"></a>

## RAG Optimizer definition

### Training data connection

Define connection information to access the COS bucket and the file that contains training data. This example uses [`ibm_watsonx_ai`](https://ibm.github.io/watsonx-ai-python-sdk/index.html) SDK documentation content.

This cell downloads the `ibm_watsonx_ai` Python SDK zip file from GitHub (if not already present), and extracts its contents to a specified folder.

In [4]:
import wget, zipfile, os

filename = "watsonx-ai-python-sdk"
filename_zip = "watsonx-ai-python-sdk.zip"

if not os.path.isfile(filename_zip): 
    wget.download("https://github.com/IBM/watsonx-ai-python-sdk/archive/refs/heads/gh-pages.zip", out=filename_zip)

with zipfile.ZipFile(filename_zip, "r") as zip_ref:
    zip_ref.extractall(filename)

In [5]:
from ibm_watsonx_ai.helpers import DataConnection, ContainerLocation

container_location_path = "ibm_watsonx_ai_sdk_docs"

container_data_connection = DataConnection(ContainerLocation(path=f"{container_location_path}/"))
container_data_connection.set_client(client)

Documents with the `.html` extension are filtered and saved to `ContainerLocation`

In [6]:
html_docs_files = []
for root, dirs, files in os.walk(filename):
    if root == f"{filename}/watsonx-ai-python-sdk-gh-pages":
        for file in files:
            if file.endswith('.html'):
                file_path = os.path.join(root, file)
                html_docs_files.append(file_path)

Writing all SDK documents may take around 3 minutes.

In [7]:
import sys

for ind, html_docs_file in enumerate(html_docs_files):
    container_data_connection.write(html_docs_file, remote_name = html_docs_file.split("/")[-1])
    
    sys.stdout.write(f"\rProgress: {'✓' * (ind+1)}{'.' * (len(html_docs_files)-ind-1)}\r")
    sys.stdout.flush()

Progress: ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓

### Create a connection to COS
You can skip this section if you already have a connection asset that connects to **IBM Cloud Object Storage**.

In [8]:
datasource_name = 'bluemixcloudobjectstorage'
bucketname = container_data_connection.location.bucket

In [9]:
cos_credentials = client.spaces.get_details(space_id=space_id)['entity']['storage']['properties']

In [10]:
conn_meta_props= {
    client.connections.ConfigurationMetaNames.NAME: f"Connection to Database - {datasource_name} ",
    client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: client.connections.get_datasource_type_id_by_name(datasource_name),
    client.connections.ConfigurationMetaNames.DESCRIPTION: "Connection to external Database",
    client.connections.ConfigurationMetaNames.PROPERTIES: {
        'bucket': bucketname,
        'access_key': cos_credentials['credentials']['editor']['access_key_id'],
        'secret_key': cos_credentials['credentials']['editor']['secret_access_key'],
        'iam_url': 'https://iam.cloud.ibm.com/identity/token',
        'url': cos_credentials['endpoint_url']
    }
}

conn_details = client.connections.create(meta_props=conn_meta_props)
connection_id = client.connections.get_id(conn_details)

Creating connections...
SUCCESS


Finally, we can create a Data Connection that represents input data references.

In [11]:
from ibm_watsonx_ai.helpers import DataConnection, S3Location

input_data_references = [DataConnection(
    connection_asset_id=connection_id,
    location=S3Location(bucket=bucketname,
                        path=container_location_path))]
input_data_references[0].set_client(client)

### Defining a connection to test data

Upload a `json` file that will be used for benchmarking to COS and then define a connection to this file. This example uses content from the [`ibm_watsonx_ai`](https://ibm.github.io/watsonx-ai-python-sdk/index.html) SDK documentation.

In [12]:
benchmarking_data_IBM_page_content = [
    {
        "question": "How to install ibm-watsonx-ai library?",
        "correct_answer": "pip install ibm-watsonx-ai",
        "correct_answer_document_ids": [
            "install.html"
        ]
    },
    {
        "question": "What is Credentails class parameters?",
        "correct_answer": "url, api_key, name, iam_serviceid_crn, token, projects_token, username, password, instance_id, version, bedrock_url, proxies, verify",
        "correct_answer_document_ids": [
            "base.html"
        ]
    },
    {
        "question": "How to get AutoAI pipeline with number 3?",
        "correct_answer": "get_pipeline(pipeline_name='Pipeline_3')",
        "correct_answer_document_ids": [
            "autoai_working_with_class_and_optimizer.html"
        ]
    },
    {
        "question": "How to get list of Embedding Models?",
        "correct_answer": "client.foundation_models.EmbeddingModels",
        "correct_answer_document_ids": [
            "fm_embeddings.html"
        ]
    },
    {
        "question": "How to retrieve the list of model lifecycle data?",
        "correct_answer": "get_model_lifecycle(url='https://us-south.ml.cloud.ibm.com', model_id='ibm/granite-13b-instruct-v2')",
        "correct_answer_document_ids": [
            "fm_helpers.html"
        ]
    },
    {
        "question": "What is path to ModelInference class?",
        "correct_answer": "ibm_watsonx_ai.foundation_models.inference.ModelInference",
        "correct_answer_document_ids": [
            "fm_model_inference.html"
        ]
    },
    {
        "question": "What is method for get model inferance details?",
        "correct_answer": "get_details()",
        "correct_answer_document_ids": [
            "fm_model_inference.html"
        ]
    }
]

The code in the next cell uploads testing data to the bucket as a `json` file.

In [13]:
import json

test_filename = "benchmarking_data_ibm_watson_ai.json"

if not os.path.isfile(test_filename):
    with open(test_filename, "w") as json_file:
        json.dump(benchmarking_data_IBM_page_content, json_file, indent=4)

test_asset_details = client.data_assets.create(name=test_filename, file_path=test_filename)

test_asset_id = client.data_assets.get_id(test_asset_details)
test_asset_id

Creating data asset...
SUCCESS


'e42e9165-e6d2-49db-9aaa-109a706ae7db'

Define connection information to testing data.

In [14]:
from ibm_watsonx_ai.helpers import DataConnection

test_data_references = [DataConnection(data_asset_id=test_asset_id)]

### Set up connectivity information to Milvus

<b>This notebook focuses on a self-managed Milvus cluster using <a href="https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-adding-milvus-service" target="_blank" rel="noopener no referrer">IBM watsonx.data.</a></b>

The following cell retrieves the Milvus username, password, host, and port from the environment (if available) and prompts you to provide them manually in case of failure.

You can provide a connection asset ID to read all required connection data from it. Before doing so, make sure that a connection asset was created in your space.

In [16]:
import os
import getpass

milvus_connection_id = input("Provide connection asset ID in your space. Skip this, if you wish to type credentials by hand and hit enter: ") or None

if milvus_connection_id is None:
    try:
        username = os.environ["USERNAME"]
    except KeyError:
        username = input("Please enter your Milvus user name and hit enter: ")
    try:
        password = os.environ["PASSWORD"]
    except KeyError:
        password = getpass.getpass("Please enter your Milvus password and hit enter: ")
    try:
        host = os.environ["HOST"]
    except KeyError:
        host = input("Please enter your Milvus hostname and hit enter: ")
    try:
        port = os.environ["PORT"]
    except KeyError:
        port = input("Please enter your Milvus port number and hit enter: ")
    try:
        ssl = os.environ["SSL"]
    except:
        ssl = bool(input("Please enter ('y'/anything) if your Milvus instance has SSL enabled. Skip if it is not: "))

    # Create connection
    milvus_data_source_type_id = client.connections.get_datasource_type_uid_by_name(
        "milvus"
    )
    details = client.connections.create(
        {
            client.connections.ConfigurationMetaNames.NAME: "Milvus Connection",
            client.connections.ConfigurationMetaNames.DESCRIPTION: "Connection created by the sample notebook",
            client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: milvus_data_source_type_id,
            client.connections.ConfigurationMetaNames.PROPERTIES: {
                "host": host,
                "port": port,
                "username": username,
                "password": password,
                "ssl": ssl,
            },
        }
    )

    milvus_connection_id = client.connections.get_id(details)

Define connection information to vector store references.

In [17]:
vector_store_references = [DataConnection(connection_asset_id=milvus_connection_id)]

### RAG Optimizer configuration

Provide the input information for AutoAI RAG optimizer:
- `name` - experiment name
- `description` - experiment description
- `max_number_of_rag_patterns` - maximum number of RAG patterns to create
- `optimization_metrics` - target optimization metrics

In [2]:
from ibm_watsonx_ai.experiment import AutoAI

experiment = AutoAI(credentials, space_id=space_id)

rag_optimizer = experiment.rag_optimizer(
    name='AutoAI RAG ibm-watsonx-ai SDK documentation',
    description="AutoAI RAG experiemnt trainded on ibm-watsonx-ai SDK documentataion",
    max_number_of_rag_patterns=6,
    optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS]
)

Configuration parameters can be retrieved via `get_params()`.

In [19]:
rag_optimizer.get_params()

{'name': 'AutoAI RAG ibm-watsonx-ai SDK documentation',
 'description': 'AutoAI RAG experiemnt trainded on ibm-watsonx-ai SDK documentataion',
 'max_number_of_rag_patterns': 6,
 'optimization_metrics': ['answer_correctness']}

<a id="run"></a>
## RAG Experiment run

Call the `run()` method to trigger the AutoAI RAG experiment. You can either use interactive mode (synchronous job) or background mode (asynchronous job) by specifying `background_mode=True`.

In [20]:
run_details = rag_optimizer.run(
    input_data_references=input_data_references,
    test_data_references=test_data_references,
    vector_store_references=vector_store_references,
    background_mode=False
)



##############################################

Running '40770d34-62d4-4135-b828-ac58d763cea0'

##############################################


pending...
running.......................................................................................................................................................................
completed
Training of '40770d34-62d4-4135-b828-ac58d763cea0' finished successfully.


You can use the `get_run_status()` method to monitor AutoAI RAG jobs in background mode.

In [21]:
rag_optimizer.get_run_status()

'completed'

<a id="comparison"></a>
## Comparison and testing of RAG Patterns

You can list the trained patterns and information on evaluation metrics in the form of a Pandas DataFrame by calling the `summary()` method. You can use the DataFrame to compare all discovered patterns and select the one you like for further testing.

In [4]:
summary = rag_optimizer.summary()
summary

Unnamed: 0_level_0,mean_answer_correctness,mean_faithfulness,mean_context_correctness,chunking.chunk_size,embeddings.model_id,vector_store.distance_metric,retrieval.method,retrieval.number_of_chunks,generation.model_id
Pattern_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Pattern4,0.785714,0.655429,0.785714,1024,intfloat/multilingual-e5-large,cosine,window,5,meta-llama/llama-3-70b-instruct
Pattern1,0.642857,0.539929,0.654757,512,ibm/slate-125m-english-rtrvr,euclidean,window,5,meta-llama/llama-3-70b-instruct
Pattern5,0.642857,0.616429,0.928571,1024,intfloat/multilingual-e5-large,cosine,window,3,mistralai/mixtral-8x7b-instruct-v01
Pattern6,0.642857,0.641614,0.785714,1024,intfloat/multilingual-e5-large,euclidean,window,3,meta-llama/llama-3-70b-instruct
Pattern2,0.274729,0.719814,0.785714,1024,intfloat/multilingual-e5-large,euclidean,window,5,ibm/granite-13b-chat-v2
Pattern3,0.142857,0.5531,0.785714,1024,intfloat/multilingual-e5-large,euclidean,simple,5,ibm/granite-13b-chat-v2


Additionally, you can pass the `scoring` parameter to the summary method, to filter RAG patterns starting with the best.

```python
summary = rag_optimizer.summary(scoring="faithfulness")
```

### Get selected pattern

Get the RAGPattern object from the RAG Optimizer experiment. By default, the RAGPattern of the best pattern is returned.

In [5]:
best_pattern_name = summary.index.values[0]
print('Best pattern is:', best_pattern_name)

best_pattern = rag_optimizer.get_pattern()

Best pattern is: Pattern4


The pattern details can be retrieved by calling the `get_pattern_details` method:

```python
rag_optimizer.get_pattern_details(pattern_name='Pattern2')
```

Query the RAGPattern locally, to test it.

In [9]:
questions = ["How to use new approach of providing credentials to APIClient?"]

payload = {
    client.deployments.ScoringMetaNames.INPUT_DATA: [
        {
            "values": questions,
            "access_token": client.service_instance._get_token()
        }
    ]
}

resp = best_pattern.inference_function()(payload)

In [10]:
print(resp["predictions"][0]["values"][0][0])



According to the document, the new approach to provide credentials to APIClient is by using the Credentials class. Here's an example:

```
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai import Credentials

credentials = Credentials(
                   url = "https://us-south.ml.cloud.ibm.com",
                   token = "***********",
                  )

client = APIClient(credentials)
```

This replaces the old approach of passing a dictionary with credentials to the APIClient constructor.


### Deploy RAGPattern

Deployment is done by storing the defined RAG function and then by creating a deployed asset.

In [26]:
deployment_details = best_pattern.deploy(
    name="AutoAI RAG deployment - ibm_watsonx_ai documentataion",
    space_id=space_id
)



######################################################################################

Synchronous deployment creation for id: '6858bbdc-0823-45ad-9f77-360714e49558' started

######################################################################################


initializing
Note: online_url and serving_urls are deprecated and will be removed in a future release. Use inference instead.
.............
ready


-----------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_id='c15cef2d-56b7-4a89-83a1-61b033334bdc'
-----------------------------------------------------------------------------------------------




### Test the deployed function

RAG service is now deployed in our space. To test our solution we can run the cell below. Questions have to be provided in the payload. Their format is provided below.

In [27]:
deployment_id = client.deployments.get_id(deployment_details)

score_response = client.deployments.score(deployment_id, payload)
score_response

{'predictions': [{'fields': ['answer', 'reference_documents'],
   'values': [['\n\nAccording to the document, the new approach to provide credentials to APIClient is by using the Credentials class. Here\'s an example:\n\n```\nfrom ibm_watsonx_ai import APIClient\nfrom ibm_watsonx_ai import Credentials\n\ncredentials = Credentials(\n                   url = "https://us-south.ml.cloud.ibm.com",\n                   token = "***********",\n                  )\n\nclient = APIClient(credentials)\n```\n\nThis replaces the old approach of passing a dictionary with credentials to the APIClient constructor.',
     [{'page_content': 'bedrock_url (str, optional) – Bedrock URL, applicable for ICP only\nproxies (dict, optional) – dictionary of proxies, containing protocol and URL mapping (example: { “https”: “https://example.url.com” })\nverify (bool, optional) – certificate verification flag static from_dict(credentials, _verify=None)[source]¶\nCreate a Credentials object from dictionary.\n\nParame

In [28]:
print(score_response["predictions"][0]["values"][0][0])



According to the document, the new approach to provide credentials to APIClient is by using the Credentials class. Here's an example:

```
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai import Credentials

credentials = Credentials(
                   url = "https://us-south.ml.cloud.ibm.com",
                   token = "***********",
                  )

client = APIClient(credentials)
```

This replaces the old approach of passing a dictionary with credentials to the APIClient constructor.


<a id="runs"></a>
## Historical runs

In this section you learn to work with historical RAG Optimizer jobs (runs).

To list historical runs use the `list()` method and provide the `'rag_optimizer'` filter.

In [44]:
experiment.runs(filter='rag_optimizer').list()

Unnamed: 0,timestamp,run_id,state,auto_pipeline_optimizer name
0,2024-09-10T09:52:10.467Z,40770d34-62d4-4135-b828-ac58d763cea0,completed,AutoAI RAG ibm-watsonx-ai SDK documentation
1,2024-09-06T13:48:31.686Z,8907ee45-d838-4569-a57e-ed28b9d7533c,completed,AutoAI RAG ibm-watsonx-ai all docs
2,2024-09-06T11:44:41.837Z,888fa91d-01c2-4c0a-b8e2-f8956aad99bb,completed,AutoAI RAG ibm-watsonx-ai all docs


In [45]:
run_id = run_details['metadata']['id']
run_id

'40770d34-62d4-4135-b828-ac58d763cea0'

### Get executed optimizer's configuration parameters

In [31]:
experiment.runs.get_rag_params(run_id=run_id)

{'name': 'AutoAI RAG ibm-watsonx-ai SDK documentation',
 'description': 'AutoAI RAG experiemnt trainded on ibm-watsonx-ai SDK documentataion',
 'max_number_of_rag_patterns': 6,
 'optimization_metrics': ['answer_correctness']}

### Get historical rag_optimizer instance and training details

In [32]:
historical_opt = experiment.runs.get_rag_optimizer(run_id)

### List trained patterns for selected optimizer

In [33]:
historical_opt.summary()

Unnamed: 0_level_0,mean_answer_correctness,mean_faithfulness,mean_context_correctness,chunking.chunk_size,embeddings.model_id,vector_store.distance_metric,retrieval.method,retrieval.number_of_chunks,generation.model_id
Pattern_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Pattern4,0.785714,0.655429,0.785714,1024,intfloat/multilingual-e5-large,cosine,window,5,meta-llama/llama-3-70b-instruct
Pattern1,0.642857,0.539929,0.654757,512,ibm/slate-125m-english-rtrvr,euclidean,window,5,meta-llama/llama-3-70b-instruct
Pattern5,0.642857,0.616429,0.928571,1024,intfloat/multilingual-e5-large,cosine,window,3,mistralai/mixtral-8x7b-instruct-v01
Pattern6,0.642857,0.641614,0.785714,1024,intfloat/multilingual-e5-large,euclidean,window,3,meta-llama/llama-3-70b-instruct
Pattern2,0.274729,0.719814,0.785714,1024,intfloat/multilingual-e5-large,euclidean,window,5,ibm/granite-13b-chat-v2
Pattern3,0.142857,0.5531,0.785714,1024,intfloat/multilingual-e5-large,euclidean,simple,5,ibm/granite-13b-chat-v2


<a id="cleanup"></a>
## Clean up

To delete the current experiment, use the `cancel_run` method.

**Warning:** Be careful: once you delete an experiment, you will no longer be able to refer to it.

In [70]:
rag_optimizer.cancel_run()

'SUCCESS'

To delete the deployment, use the `delete` method. 

**Warning:** Keeping the deployment active may lead to unnecessary consumption of Compute Unit Hours (CUHs).

In [None]:
client.deployments.delete(deployment_id)

If you want to clean up all created assets:
- experiments
- trainings
- pipelines
- model definitions
- models
- functions
- deployments

please follow up this sample [notebook](https://github.com/IBM/watson-machine-learning-samples/blob/master/cloud/notebooks/python_sdk/instance-management/Machine%20Learning%20artifacts%20management.ipynb).

<a id="summary"></a>
## Summary and next steps

You successfully completed this notebook!.

You learned how to use `ibm-watsonx-ai` to run AutoAI RAG experiments. 

 Check out our _<a href="https://ibm.github.io/watsonx-ai-python-sdk/samples.html" target="_blank" rel="noopener no referrer">Online Documentation</a>_ for more samples, tutorials, documentation, how-tos, and blog posts. 

### Authors

**Mateusz Szewczyk**, Software Engineer at Watson Machine Learning

Copyright © 2024 IBM. This notebook and its source code are released under the terms of the MIT License.