![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# AutoAI RAG experiment with custom foundation model.

#### Disclaimers

- Use only Projects and Spaces that are available in the watsonx context.


## Notebook content

This notebook demonstrates how to deploy custom foundation model and use this model in AutoAI RAG experiment.
The data used in this notebook is from the [Granite Code Models paper](https://arxiv.org/pdf/2405.04324).

Some familiarity with Python is helpful. This notebook uses Python 3.11.


## Learning goal

The learning goals of this notebook are:

- How to deploy your own foundation models with huggingface hub
- Create an AutoAI RAG job that will find the best RAG pattern based on custom foundation model used during the experiment


## Contents

This notebook contains the following parts:
- [Set up the environment](#Set-up-the-environment)
- [Prerequisites](#Prerequisites)
- [Create API Client instance.](#Create-API-Client-instance.)
- [Download custom model from hugging face](#Download-custom-model-from-hugging-face)
- [Deploy the model](#Deploy-the-model)
- [Prepare the data for the AutoAI RAG experiment](#Prepare-the-data-for-the-AutoAI-RAG-experiment)
- [Run the AutoAI RAG experiment](#Run-the-AutoAI-RAG-experiment)
- [Query generated pattern locally](#Query-generated-pattern-locally)
- [Summary](#Summary)

## Set up the environment

In [None]:
%pip install -U wget | tail -n 1
%pip install -U 'ibm-watsonx-ai[rag]>=1.3.12' | tail -n 1
%pip install -U 'huggingface-hub==0.30.2' | tail -n 1

<a id="prerequisites"></a>

## Prerequisites
Please fill below values to be able to move forward:
- API_KEY - your api key to IBM Cloud, more information about API keys can be found [here](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui).
- WML_ENDPOINT - endpoint url associated with your api key, to see the list of available endpoints please refer to this [documentation](https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-endpoints).
- PROJECT_ID - ID of the project associated with your api key and endpoint, to find your project id please refer to this [documentation](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-project-id.html?context=wx).
- DATASOURCE_CONNECTION_ASSET_ID - connection asset ID to your data source which will store custom foundation model files, please refer to this [documentation](https://dataplatform.cloud.ibm.com/docs/content/wsj/manage-data/create-conn.html?context=cpdaas) to get to know how to create this kind of asset. In the example below you will be using the connection to `S3 Cloud Object Storage`.
- BUCKET_NAME - bucket which will store your custom foundation models files.

In [None]:
API_KEY = "PUT YOUR API KEY HERE" # API key to your IBM cloud or Cloud Pack for Data instance
WML_ENDPOINT = "PUT YOUR WML ENDPOINT HERE" # endpoint associated with your API key
PROJECT_ID = "PUT YOUR PROJECT ID HERE" # project ID associated with your API key and endpoint

DATASOURCE_CONNECTION_ASSET_ID = "PUT YOUR COS CONNECTION ID HERE" # datasource connection inside your project
BUCKET_NAME = "PUT YOUR BUCKET NAME HERE" # bucket name in your Cloud Object Storage
BUCKET_MODEL_DIR_NAME = "PLLuM12BInstruct" # dir name inside the bucket which will store your custom model files

HUGGING_FACE_MODEL_REPOSITORY = "CYFRAGOVPL/PLLuM-12B-instruct" # HuggingFace model repository
BUCKET_BENCHMARK_JSON_FILE_PATH = "benchmark.json" # path inside bucket where your benchmark.json file is stored

## Create API Client instance.
This client will allow us to connect with the IBM services.

In [None]:

from ibm_watsonx_ai import APIClient, Credentials

credentials = Credentials(
                api_key=API_KEY,
                url=WML_ENDPOINT
            )

client = APIClient(credentials=credentials,  project_id=PROJECT_ID)

## Download custom model from hugging face

`byom_cache_dir` - path to your local directory where you want to download foundation model files

In [None]:
from pathlib import Path
from huggingface_hub import snapshot_download

byom_cache_dir = Path("your", "model", "cache", "dir")

if not byom_cache_dir.exists():
    raise FileExistsError("Please use the path which exists.")

if byom_cache_dir.is_file():
    raise NotADirectoryError("Please use the path which points to a directory.")

snapshot_download(HUGGING_FACE_MODEL_REPOSITORY, cache_dir=byom_cache_dir)

### Create S3Location instance to connect to your COS bucket

In [None]:
from ibm_watsonx_ai.helpers.connections import DataConnection, S3Location

location = S3Location(bucket=BUCKET_NAME, path=BUCKET_MODEL_DIR_NAME)
data_connection = DataConnection(location=location, connection_asset_id=DATASOURCE_CONNECTION_ASSET_ID)
data_connection.set_client(api_client=client)

### Upload foundation model files to your bucket

`model_files` - this path has to navigate to your local directory with downloaded model, locate there `snapshots` directory and copy the subdirectory name which should look similar to this example: `61aafd9fccad1606b5c462196cd111e734b60781`.

In [None]:
model_files = byom_cache_dir / "model_dir_name" / "snapshots" / "snapshot_id"

if not model_files.exists():
    raise FileExistsError("Please use the snapshot path which exists.")

if model_files.is_file():
    raise NotADirectoryError("Please use the snapshot path which points to a directory.")

for model_file in model_files.iterdir():
    
    # avoid uploading unnecessary files
    if model_file.name.startswith("."):
        continue

    data_connection.write(data=str(model_file), remote_name=model_file.name)

## Deploy the model
Check the docs to avoid any problems during model deployment [here](https://ibm.github.io/watsonx-ai-python-sdk/fm_custom_models.html).

### Create custom model repository

In [None]:
software_spec = client.software_specifications.get_id_by_name('watsonx-cfm-caikit-1.1')

In [None]:
metadata = {
    client.repository.ModelMetaNames.NAME: "PLLuM deployment",
    client.repository.ModelMetaNames.SOFTWARE_SPEC_ID: software_spec,
    client.repository.ModelMetaNames.TYPE: client.repository.ModelAssetTypes.CUSTOM_FOUNDATION_MODEL_1_0,
    client.repository.ModelMetaNames.MODEL_LOCATION: {
        "file_path": BUCKET_MODEL_DIR_NAME,
        "bucket": BUCKET_NAME,
        "connection_id": DATASOURCE_CONNECTION_ASSET_ID,
    },
}

In [None]:
stored_model_details = client.repository.store_model(model=BUCKET_MODEL_DIR_NAME, meta_props=metadata)
stored_model_asset_id = client.repository.get_model_id(stored_model_details)

In [None]:
client.repository.list(framework_filter='custom_foundation_model_1.1')

### Store client task credentials

In [None]:
try:
    client.task_credentials.store()
except Exception:
    print("Client task credentials have been already stored.")

### Perform custom model deployment

In [None]:
MAX_SEQUENCE_LENGTH = 32_000
MAX_NEW_TOKENS = 1000
MIN_NEW_TOKENS = 1
MAX_BATCH_SIZE = 1024

meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: "PLLuM deployment",
    client.deployments.ConfigurationMetaNames.DESCRIPTION: "PLLuM deployment",
    client.deployments.ConfigurationMetaNames.ONLINE: {},
    client.deployments.ConfigurationMetaNames.HARDWARE_REQUEST: {
        'size': client.deployments.HardwareRequestSizes.Small,
        'num_nodes': 1
    },
    # optionally overwrite model parameters here
    client.deployments.ConfigurationMetaNames.FOUNDATION_MODEL: {"max_sequence_length": MAX_SEQUENCE_LENGTH, "max_new_tokens": MAX_NEW_TOKENS, "max_batch_size": MAX_BATCH_SIZE},
    client.deployments.ConfigurationMetaNames.SERVING_NAME: "pllum_12b_instruct" # must be unique
}
deployment_details = client.deployments.create(stored_model_asset_id, meta_props)
deployment_id = deployment_details["metadata"]["id"]

## Prepare the data for the AutoAI RAG experiment

### Download `granite_code_models.pdf` document

In [None]:
import wget

data_url = "https://arxiv.org/pdf/2405.04324"
byom_input_filename = "granite_code_models.pdf"
wget.download(data_url, byom_input_filename)

### Save document in your desired COS bucket

In [None]:
documents_dir_location = S3Location(bucket=BUCKET_NAME, path=byom_input_filename)
documents_dir_data_connection = DataConnection(location=documents_dir_location, connection_asset_id=DATASOURCE_CONNECTION_ASSET_ID)
documents_dir_data_connection.set_client(api_client=client)
documents_dir_data_connection.write(data=byom_input_filename, remote_name=byom_input_filename)

### Create your own benchmark.json file to ask the questions related to the document

In [None]:
import json 

local_benchmark_json_filename = "benchmark.json"

benchmarking_data = [
     {
        "question": "What are the two main variants of Granite Code models?",
        "correct_answer": "The two main variants are Granite Code Base and Granite Code Instruct.",
        "correct_answer_document_ids": [byom_input_filename]
     },
     {
        "question": "What is the purpose of Granite Code Instruct models?",
        "correct_answer": "Granite Code Instruct models are finetuned for instruction-following tasks using datasets like CommitPack, OASST, HelpSteer, and synthetic code instruction datasets, aiming to improve reasoning and instruction-following capabilities.",
        "correct_answer_document_ids": [byom_input_filename]
     },
     {
        "question": "What is the licensing model for Granite Code models?",
        "correct_answer": "Granite Code models are released under the Apache 2.0 license, ensuring permissive and enterprise-friendly usage.",
        "correct_answer_document_ids": [byom_input_filename]
     },
]

with open(local_benchmark_json_filename, mode="w", encoding="utf-8") as fp:
    json.dump(benchmarking_data, fp, indent=4)

### Save your benchmark.json in your COS bucket file

In [None]:
benchmark_file_location = S3Location(bucket=BUCKET_NAME, path=BUCKET_BENCHMARK_JSON_FILE_PATH)
benchmark_file_data_connection = DataConnection(location=benchmark_file_location, connection_asset_id=DATASOURCE_CONNECTION_ASSET_ID)
benchmark_file_data_connection.set_client(api_client=client)
benchmark_file_data_connection.write(data=local_benchmark_json_filename)

## Run the AutoAI RAG experiment

Provide the input information for AutoAI RAG optimizer:
- `custom_prompt_template_text` - custom prompt template text which will be used to query your own foundation model
- `custom_context_template_text` - custom context template text which will be used to query your own foundation model
- `name` - experiment name
- `description` - experiment description
- `max_number_of_rag_patterns` - maximum number of RAG patterns to create
- `optimization_metrics` - target optimization metrics

In [None]:
from ibm_watsonx_ai.experiment import AutoAI
from ibm_watsonx_ai.helpers.connections import ContainerLocation
from ibm_watsonx_ai.foundation_models.schema import (
        AutoAIRAGCustomModelConfig,
        AutoAIRAGModelParams
)

experiment = AutoAI(credentials, project_id=PROJECT_ID)

custom_prompt_template_text = "Answer my question {question} related to these documents {reference_documents}."
custom_context_template_text = "My document {document}"

parameters = AutoAIRAGModelParams(max_sequence_length=32_000)
pllum_foundation_model_config = AutoAIRAGCustomModelConfig(
    deployment_id=deployment_id, 
    project_id=PROJECT_ID, 
    prompt_template_text=custom_prompt_template_text, 
    context_template_text=custom_context_template_text, 
    parameters=parameters
)

rag_optimizer = experiment.rag_optimizer(
    name='AutoAI RAG - Custom foundation model experiment',
    description = "AutoAI RAG experiment using custom foundation model.",
    max_number_of_rag_patterns=4,
    optimization_metrics=['faithfulness'],
    foundation_models=[pllum_foundation_model_config]
) 


container_data_location = DataConnection(
        type="container",
        location=ContainerLocation(
           path="autorag/results"
        ),
    )

container_data_location.set_client(api_client=client)

rag_optimizer.run(
    test_data_references=[benchmark_file_data_connection],
    input_data_references=[documents_dir_data_connection],
    results_reference=container_data_location,
)

In [None]:
rag_optimizer.get_run_details()

In [None]:
summary = rag_optimizer.summary()
summary

In [None]:
best_pattern_name = summary.index.values[0]
print('Best pattern is:', best_pattern_name)

best_pattern = rag_optimizer.get_pattern()

In [None]:
rag_optimizer.get_pattern_details(pattern_name=best_pattern_name)

## Query generated pattern locally

In [None]:
questions = ["What training objectives are used for the models?"]

payload = {
    client.deployments.ScoringMetaNames.INPUT_DATA: [
        {
            "values": questions,
            "access_token": client.service_instance._get_token()
        }
    ]
}

resp = best_pattern.inference_function()(payload)

## Summary

 You successfully completed this notebook!
 
 You learned how to use AutoAI RAG with your own foundation model.
 
Check out our _<a href="https://ibm.github.io/watsonx-ai-python-sdk/samples.html" target="_blank" rel="noopener no referrer">Online Documentation</a>_ for more samples, tutorials, documentation, how-tos, and blog posts.

### Author:
 **Michał Steczko**, Software Engineer at watsonx.ai.

Copyright © 2025 IBM. This notebook and its source code are released under the terms of the MIT License.