# Simple introduction to retrieval-augmented generation with watsonx.ai and Discovery

## Notebook content

This notebook contains the steps and code to demonstrate the retrieval-augmented generation pattern in IBM watsonx.ai, using IBM Watson Discovery as the Search component.

Some familiarity with Python is helpful. This notebook uses Python 3.10.


## Learning goal

The goal of this notebook is to demonstrate how to apply the retrieval-augmented generation pattern to a question-answering use case in watsonx.ai.


## Scenario
The website for an online seed catalog has many articles to help customers plan their garden and ultimately select which seeds to purchase. A new widget is being added to the website to answer customer questions based on the contents of the article the customer is viewing. Given a question related to a given article, answer the question based on the article.


## Contents

This notebook contains the following parts:

- [Overview of retrieval-augmented generation](#overview)
- [Step 1: Set up prerequisites](#setup)
- [Step 2: Create a knowledge base in Watson Discovery](#discovery)
- [Step 3: Define a search function that calls Discovery](#search)
- [Step 4: Craft prompt text](#prompt)
- [Step 5: Generate output using the foundation models Python library](#generate)
- [Step 6: Pull everything together to perform retrieval-augmented generation](#rag)
- [Summary](#summary)

<a id="overview"></a>
## Overview of retrieval-augmented generation

The retrieval-augmented generation pattern involves three basic steps:
1. Search for relevant content in your knowledge base
2. Pull the most relevant content into your prompt as context
3. Send the combined prompt text to a foundation model to generate output

The term _retrieval-augmented generation_ (RAG) was introduced in this paper: <a href="https://arxiv.org/abs/2005.11401" target="_blank" rel="noopener no referrer">Retrieval-augmented generation for knowledge-intensive NLP tasks</a>

> "We build RAG models where the parametric memory is a pre-trained seq2seq transformer, and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever."

In that paper, the term "RAG models" refers to a specific implementation of a _retriever_ (a specific query encoder and vector-based document search index) and a _generator_ (a specific pre-trained, generative language model.) 

However, the basic search-and-generate approach can be generalized to use different retriever components and foundation models.

In this notebook:
- The **knowledge base** is a list of two articles
- The **retrieval component** consists of a Watson Discovery collection
- The **generate** component uses the foundation model Python library in watsonx.ai


<a id="setup"></a>
## Step 1: Set up prerequisites

Before you use the sample code in this notebook, you must perform setup tasks.

### 1.1 Associate an instance of the Watson Machine Learning service with the current project

The _current project_ is the project in which you are running this notebook.

If an instance of Watson Machine Learning is not already associated with the current project, follow the instructions in this topic to do so: <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/assoc-services.html?context=wx&audience=wdp" target="_blank" rel="noopener no referrer">Adding associated services to a project</a>.

### 1.2 Setup credentials

In [None]:
!pip install "python-dotenv==1.0.0" | tail -n 1
!pip install "ibm-watsonx-ai" | tail -n 1

import os
import getpass
from dotenv import load_dotenv

import getpass

try:
    load_dotenv()
    api_key = os.getenv("API_KEY")
except Exception:
    api_key = getpass.getpass("Please enter your api key (hit enter): ")

### 1.3 Define a credentials object with the IBM Cloud API key

In [None]:
credentials = { 
    "url"    : "https://us-south.ml.cloud.ibm.com", 
    "apikey" : api_key
}

<a id="discovery"></a>
## Step 2: Create a knowledge base in Watson Discovery

You must perform the following setup tasks to use Watson Discovery as your Search component:

- 2.1 Create a Discovery service instance
- 2.2 Get the API key and URL for your service instance
- 2.3 Authenticate
- 2.4 Create a Project in Discovery
- 2.5 Create a Collection in Discovery
- 2.6 Create local article files
- 2.7 Upload articles and metadata to Discovery

### 2.2 Get the API key and URL for your Discovery service instance

Use your provided Discovery API key and URL:

**Note** Discovery API key and URL can be retreived from the **Manage** page of your Discovery service instance in IBM Cloud.

In [None]:
discovery_apikey = getpass.getpass("Please enter your Discovery api key (hit enter): ")
discovery_url = input("Please enter your Discovery URL (hit enter): ")

### 2.3 Authenticate

See: [Discovery authentication for IBM Cloud](https://cloud.ibm.com/apidocs/discovery-data?code=python#authentication-cloud)

In [None]:
!pip install ibm_watson
!pip install urllib3==1.26.16

In [None]:
from ibm_watson import DiscoveryV2
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator( discovery_apikey )
g_discovery = DiscoveryV2( version= "2023-03-31", authenticator=authenticator )

g_discovery.set_service_url( discovery_url )

### 2.4 Create a Project in Discovery

In Discovery, you organize your work in "Projects".

Create a Project in Discovery called "RAG with Discovery Search Project" (If the Project Name below already exsist, please, choose another name)

**Note**: You can also create a Project [using the Discovery interface](https://cloud.ibm.com/docs/discovery-data?topic=discovery-data-projects) instead.

In [None]:
initials = input("Please provide your initials (hit enter):")
project_name = f"RAG with Discovery - {initials}"

project_creation_result = g_discovery.create_project( name=project_name, type="document_retrieval" ).get_result()
g_discovery_project_id = project_creation_result["project_id"]

print( "Discovery project ID:\n" + g_discovery_project_id )

### 2.5 Create a Collection in Discovery

In Discovery, you assemble documents to search in "Collections".

Create a Collection in Discovery called "Discovery Search Collection"

**Note**: You can also create a Collection [using the Discovery interface](https://cloud.ibm.com/docs/discovery-data?topic=discovery-data-collections) instead.

In [None]:
collection_name = "Gardening Articles Collection"

collection_creation_result = g_discovery.create_collection(project_id=g_discovery_project_id, name=collection_name, language="en" ).get_result()
g_collection_id = collection_creation_result["collection_id"]

print( "Discovery collection ID:\n" + g_collection_id )

<a id="knowledgebase"></a>
### 2.6 Create local article files

In this notebook, the knowledge base is a collection of two articles.  

(These articles were written as samples for watsonx.ai, they are not real articles published anywhere else.  The authors and publication dates are fictional.)

In [None]:
article_01 = \
"Tomatoes are one of the most popular plants for vegetable gardens.  Tip for success: If you select " \
"varieties that are resistant to disease and pests, growing tomatoes can be quite easy.  For "        \
"experienced gardeners looking for a challenge, there are endless heirloom and specialty varieties "  \
"to cultivate.  Tomato plants come in a range of sizes.  There are varieties that stay very small, "  \
"less than 12 inches, and grow well in a pot or hanging basket on a balcony or patio.  Some grow "    \
"into bushes that are a few feet high and wide, and can be grown is larger containers.  Other "       \
"varieties grow into huge bushes that are several feet wide and high in a planter or garden bed.  "   \
"Still other varieties grow as long vines, six feet or more, and love to climb trellises.  Tomato "   \
"plants do best in full sun.  You need to water tomatoes deeply and often.  Using mulch prevents "    \
"soil-borne disease from splashing up onto the fruit when you water.  Pruning suckers and even "      \
"pinching the tips will encourage the plant to put all its energy into producing fruit."

In [None]:
article_02 = \
"Cucumbers are fun to grow for beginning gardeners and advanced gardeners alike.  There are two "     \
"types of cucumbers: slicing and pickling.  Pickling cucumbers are smaller than slicing cucumbers.  " \
"Cucumber plants come in two types: vining cucumbers, which are more common, and bush cucumbers.  "   \
"Vining cucumbers, which can grow to more than 5 feet tall, grow fast, yield lots of fruit, and you " \
"can train them up a trellis.  Growing cucumbers up a trellis or fence can maximize garden space, "   \
"keep fruit clean, and make it easier to harvest the fruit.  Tropical plants, cucumbers are very "    \
"sensitive to frost or cold weather. Cucumbers prefer full sun for 6 to 8 hours per day.  Cucumbers " \
"need constant watering.  Cucumbers can grow quickly and ripen in just 6 weeks.  Harvest cucumbers "  \
"every day or two because the more you harvest, the more the plant will produce.  If any cucumber "   \
"is left on the vine to fully mature, the plant will stop producing more cucumbers.  You can extend " \
"the harvest season by planting cucumbers in batches, 2 weeks apart."

Create text files in the notebook working directory for each article:

In [None]:
with open ( "article_01.txt", "w" ) as file:  
    file.write( article_01 )

In [None]:
with open ( "article_02.txt", "w" ) as file:  
    file.write( article_02 )

### 2.7 Upload articles and metadata to Discovery

See: [Add a document](https://cloud.ibm.com/apidocs/discovery-data?code=python#adddocument) in the Discovery V2 API.

In [None]:
knowledge_base = [ 
    { 
        "file_name" : "article_01.txt",
        "metadata"  : { "title"     : "Growing tomatoes", 
                        "author"    : "A. Rossi",
                        "published" : "2010" }
    }, 
    {
        "file_name" : "article_02.txt",
        "metadata"  : { "title"     : "Cucumbers for beginners",
                        "author"    : "B. Melnyk",
                        "published" : "2018" }
    }
]

In [None]:
import json

for article in knowledge_base:
    file_name = article["file_name"]
    metadata  = article["metadata"]
    with open( file_name, "rb" ) as f:
        response_json = g_discovery.add_document( project_id=g_discovery_project_id, 
                                                  collection_id=g_collection_id, 
                                                  file=f, 
                                                  filename=file_name, 
                                                  metadata=json.dumps( metadata ),
                                                  file_content_type="text/plain" ).get_result()
        print( file_name + "\n" + json.dumps( response_json, indent=3 ) + "\n" )

It takes a Discovery a few minutes to process the uploaded files before you can perform searches.

In the Discovery graphical interface, you can monitor the status of the file processing to see when the uploaded documents are ready for search.

<a id="search"></a>
## Step 3: Define a search function that calls Discovery

Many articles that discuss retrieval-augmented generation assume the retrieval component uses a vector database.  

However, to perform the general retrieval-augmented generation pattern, any search-and-retrieve method that can reliably return relevant content from the knowledge base will do.

In this notebook, the search component is a Watson Discovery search that returns one or the other of the two articles in the knowledge base, based on a natural language query match.

Define the natural language search in Discovery:

In [None]:
def search( question ):
    response_json = g_discovery.query( project_id=g_discovery_project_id, 
                                       natural_language_query=question
                                     ).get_result()
    #print( json.dumps( response_json, indent=3 ) )
    results_arr = response_json["results"]
    if( len( results_arr ) < 1 ):
        return None
    top_result = results_arr[0]
    top_asset = { "title"     : top_result["metadata"]["title"],
                  "author"    : top_result["metadata"]["author"],
                  "published" : top_result["metadata"]["published"],
                  "text"      : top_result["text"][0] }
    print(top_asset["text"])
    return top_asset

In [None]:
search( "what are the types of cucumbers?")

<a id="prompt"></a>
## Step 4: Craft prompt text

In this notebook, the task to be performed is a question-answering task.

There is no one, best prompt for any given task.  However, models that have been instruction-tuned, such as `bigscience/mt0-xxl-13b`, `google/flan-t5-xxl-11b`, or `google/flan-ul2-20b`, can generally perform this task with the sample prompt below.  Conservative decoding methods tend towards succinct answers.

In the prompt below, notice two string placeholders (marked with `%s`) that will be replaced at generation time:
- The first placeholder will be replaced with the text of the relevant article from the knowledge base
- The second placeholder will be replaced with the question to be answered

In [None]:
prompt_template = """
Article:
###
%s
###

Answer the following question using only information from the article. 
Answer in a complete sentence, with proper capitalization and punctuation. 
If there is no good answer in the article, say "I don't know".

Question: %s
Answer: 
"""

def augment( template_in, context_in, nlquery_in ):
    return template_in % ( context_in,  nlquery_in )


In [None]:
question = "what are the types of cucumbers?"

article_txt = article_02

augmented_prompt = augment( prompt_template, article_txt, question )

print( augmented_prompt )

<a id="generate"></a>
## Step 5: Generate output using the foundation models Python library

You can prompt foundation models in watsonx.ai programmatically using the Python library.

See:
- <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-python-lib.html?context=wx&audience=wdp" target="_blank" rel="noopener no referrer">Introduction to the foundation models Python library</a>
- <a href="https://ibm.github.io/watson-machine-learning-sdk/foundation_models.html" target="_blank" rel="noopener no referrer">Foundation models Python library reference</a>


In [None]:
import os
from ibm_watsonx_ai.foundation_models import Model

model_id = "google/flan-t5-xxl"

gen_parms = { 
    "DECODING_METHOD" : "greedy", 
    "MIN_NEW_TOKENS" : 1, 
    "MAX_NEW_TOKENS" : 50 
}
g_wml_project_id = os.getenv("PROJECT_ID")

model = Model( model_id, credentials, gen_parms, g_wml_project_id )

In [None]:
def generate( model_in, augmented_prompt_in ):
    
    generated_response = model_in.generate( augmented_prompt_in )

    if ( "results" in generated_response ) \
       and ( len( generated_response["results"] ) > 0 ) \
       and ( "generated_text" in generated_response["results"][0] ):
        return generated_response["results"][0]["generated_text"]
    else:
        print( "The model failed to generate an answer" )
        print( "\nDebug info:\n" + json.dumps( generated_response, indent=3 ) )
        return ""

In [None]:
output = generate( model, augmented_prompt )
print( output )

<a id="rag"></a>
## Step 6: Pull everything together to perform retrieval-augmented generation

If you are using VSCODE:
Click on extensions icon in the righ bar and make sure Code Runner is installed.
Type Code Runner in the search bar
Scroll down and check on Code-runner: Run In Terminal

In [None]:
import re

def searchAndAnswer( model ):
    
    question = input( "Type your question:\n")
    if not re.match( r"\S+", question ):
        print( "No question")
        return
        
    # Retrieve the relevant content
    top_match = search( question )
    if top_match is None:
        print( "No good answer was found in the knowledge base" )
        return;
    article_text = top_match["text"]
    
    # Augment a prompt with context
    augmented_prompt = augment( prompt_template, article_text, question )
    
    # Generate output
    output = generate( model, augmented_prompt )
    if not re.match( r"\S+", output ):
        print( "The model failed to generate an answer")
    print( "\nAnswer:\n" + output )
    print( "\nSource: \"" + top_match["title"] + "\", " + top_match["author"] + " (" + top_match["published"] + ")"  )

Test the solution by running the following cell multiple times.  

\*You will be prompted to enter a question each time.

In [None]:
searchAndAnswer( model )

<a id="summary"></a>
## Summary and next steps

You successfully completed this notebook!
 
You learned how to apply the general retrieval-augmented generation pattern with a Watson Discovery search component and a small knowledge base using watsonx.ai.
 
Check out our <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/welcome-main.html?context=wx&audience=wdp" target="_blank" rel="noopener no referrer">Documentation</a> for more samples, tutorials, documentation, and how-tos. 

### Authors

**Kevin MacDonald**, Content Design - IBM Data and AI.

Copyright © 2023 IBM. This notebook and its source code are released under the terms of the MIT License.