<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo"  />
    </a>
</p>


# **Put Whole Document into Prompt and Ask the Model**


Estimated time needed: **20** minutes


## Overview
In recent years, the development of Large Language Models (LLMs) like GPT-3 and GPT-4 has revolutionized the field of natural language processing (NLP). These models are capable of performing a wide range of tasks, from generating coherent text to answering questions and summarizing information. Their effectiveness, however, is not without limitations. One significant constraint is the context window length, which affects how much information can be processed at once. LLMs operate within a fixed context window, measured in tokens, with GPT-3 having a limit of 4096 tokens and GPT-4 extending to 8192 tokens. When dealing with lengthy documents, attempting to input the entire text into the model's prompt can lead to truncation, where essential information is lost, and increased computational costs due to the processing of large inputs.

These limitations become particularly pronounced when creating a retrieval-based question-answering (QA) assistant. The context length constraint restricts the ability to input all content into the prompt simultaneously, leading to potential loss of critical context and details. This necessitates the development of sophisticated strategies for selectively retrieving and processing relevant sections of the document. Techniques such as chunking the document into manageable parts, employing summarization methods, and using external retrieval systems are crucial to address these challenges. Understanding and mitigating these limitations are essential for designing effective QA systems that leverage the full potential of LLMs while navigating their inherent constraints.


## __Table of Contents__

<ol>
    <li><a href="#Objectives">Objectives</a></li>
    <li>
        <a href="#Setup">Setup</a>
        <ol>
            <li><a href="#Installing-required-libraries">Installing required libraries</a></li>
            <li><a href="#Importing-required-libraries">Importing required libraries</a></li>
        </ol>
    </li>
    <li><a href="#Build-LLM">Build LLM</a></li>
    <li><a href="#Load-source-document">Load source document</a></li>
    <li>
        <a href="#Limitation-of-retrieve-directly-from-full-document">Limitation of retrieve directly from full document</a>
        <ol>
            <li><a href="#Context-length">Context length</a></li>
            <li><a href="#LangChain-prompt-template">LangChain prompt template</a></li>
            <li><a href="#Use-mixtral-model">Use mixtral model</a></li>
            <li><a href="#Use-Llama-3-model">Use Llama 3 model</a></li>
            <li><a href="#Use-one-piece-of-information">Use one piece of information</a></li>
        </ol>
    </li>
</ol>

<a href="#Exercises">Exercises</a>
<ol>
    <li><a href="#Exercise-1---Change-to-use-another-LLM">Exercise 1 - Change to use another LLM</a></li>
</ol>


## Objectives

After completing this lab you will be able to:

 - Explain the concept of context length for LLMs.
 - Recognize the limitations of retrieving information when inputting the entire content of a document into a prompt.


----


## Setup


For this lab, you will use the following libraries:

*   [`ibm-watson-ai`](https://ibm.github.io/watson-machine-learning-sdk/index.html) for using LLMs from IBM's watsonx.ai.
*   [`langchain`, `langchain-ibm`, `langchain-community`](https://www.langchain.com/) for using relevant features from LangChain.


### Installing required libraries

The following required libraries are __not__ preinstalled in the Skills Network Labs environment. __You must run the following cell__ to install them:

**Note:** The version is being pinned here to specify the version. It's recommended that you do this as well. Even if the library is updated in the future, the installed library could still support this lab work.

This might take approximately 1 minute.

As `%%capture` is used to capture the installation, you won't see the output process. After the installation is completed, you will see a number beside the cell.


In [1]:
#%%capture
#After executing the cell,please RESTART the kernel and run all the cells.
!pip install --user "ibm-watsonx-ai==1.0.10"
!pip install --user "langchain==0.2.6"
!pip install --user "langchain-ibm==0.1.8"
!pip install --user "langchain-community==0.2.1"
print("done")

done


After you install the libraries, restart your kernel. You can do that by clicking the **Restart the kernel** icon.

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/build-a-hotdog-not-hotdog-classifier-guided-project/images/Restarting_the_Kernel.png" width="70%" alt="Restart kernel">


### Importing required libraries


In [2]:
# You can use this section to suppress warnings generated by your code:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')

from ibm_watsonx_ai.foundation_models import ModelInference
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from langchain_core.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain_community.document_loaders import TextLoader
from langchain_ibm import WatsonxLLM

## Build LLM


Here, you will create a function that interacts with the watsonx.ai API, enabling you to utilize various models available.

You just need to input the model ID in string format, then it will return you with the LLM object. You can use it to invoke any queries. A list of model IDs can be found in [here](https://ibm.github.io/watsonx-ai-python-sdk/fm_model.html).


In [3]:
import os
from langchain_ibm import WatsonxLLM

# Method 1: Try to get from Jupyter secrets (Google Colab)
print("=== Getting IBM API Key from Jupyter Secrets ===")

ibm_api_key = None
ibm_project_id = None

try:
    # For Google Colab
    from google.colab import userdata
    ibm_api_key = userdata.get('IBM_API_KEY')
    ibm_project_id = userdata.get('IBM_PROJECT_ID')  # Optional
    print("✅ Using Google Colab secrets")

except ImportError:
    print("Not in Google Colab, trying other methods...")

    try:
        # For other Jupyter environments that support secrets
        # This varies by platform (Kaggle, Azure Notebooks, etc.)

        # Method 2: Check if already set as environment variable
        ibm_api_key = os.getenv('IBM_API_KEY')
        ibm_project_id = os.getenv('IBM_PROJECT_ID')

        if ibm_api_key:
            print("✅ Found in environment variables")
        else:
            print("❌ Not found in environment variables")

    except Exception as e:
        print(f"Error accessing secrets: {e}")

# Method 3: Manual input as fallback
if not ibm_api_key:
    print("\n=== Manual Input Required ===")
    print("Please enter your IBM API key manually:")

    try:
        import getpass
        ibm_api_key = getpass.getpass("IBM API Key: ")
        if not ibm_project_id:
            ibm_project_id = getpass.getpass("IBM Project ID (optional): ")
    except Exception as e:
        print(f"Manual input failed: {e}")
        print("You can also set it directly:")
        print("ibm_api_key = 'your_api_key_here'")

# Verify the key was obtained
if ibm_api_key:
    print("✅ IBM API key obtained successfully")
    print(f"Key starts with: {ibm_api_key[:10]}...")

    # Set as environment variable for this session
    os.environ['IBM_API_KEY'] = ibm_api_key
    if ibm_project_id:
        os.environ['IBM_PROJECT_ID'] = ibm_project_id
        print("✅ Project ID also set")

else:
    print("❌ No IBM API key available")

# Debug: Show environment info
print(f"\n=== Environment Info ===")
print(f"Current working directory: {os.getcwd()}")
print(f"Available environment variables starting with IBM:")
for key, value in os.environ.items():
    if key.startswith('IBM'):
        print(f"  {key}: {'*' * min(len(value), 10)}")

# Setup default values
ibm_url = os.getenv("IBM_URL", "https://us-south.ml.cloud.ibm.com")
print(f"IBM URL: {ibm_url}")

# Instructions for different platforms
print(f"\n=== Platform-Specific Instructions ===")
print("Google Colab: Use the 🔑 secrets panel on the left sidebar")
print("- Add secret named: IBM_API_KEY")
print("- Add secret named: IBM_PROJECT_ID (optional)")
print("")
print("Kaggle: Go to Settings > Secrets")
print("Azure Notebooks: Use environment variables")
print("JupyterLab: Set environment variables in terminal")

# Get IBM API key from environment
ibm_api_key = os.getenv("IBM_API_KEY")

# Verify the key was loaded
if ibm_api_key:
    print("✅ IBM API key loaded successfully")
    print(f"Key starts with: {ibm_api_key[:10]}...")
else:
    print("❌ IBM_API_KEY not found in .env file")
    print("Make sure your .env file contains: IBM_API_KEY=your_api_key_here")

# Optional: Load other common IBM environment variables
ibm_project_id = os.getenv("IBM_PROJECT_ID")
ibm_url = os.getenv("IBM_URL", "https://us-south.ml.cloud.ibm.com")  # Default URL

print(f"Project ID: {'✅ Loaded' if ibm_project_id else '❌ Not found'}")
print(f"URL: {ibm_url}")

# Setup WatsonxLLM with loaded credentials
if ibm_api_key:
    try:
        watsonx_llm = WatsonxLLM(
            model_id="ibm/granite-13b-instruct-v2",
            url=ibm_url,
            apikey=ibm_api_key,  # Use loaded API key
            project_id=ibm_project_id,  # Use loaded project ID
            params={
                "temperature": 0.5,
                "max_new_tokens": 200,
                "decoding_method": "greedy"
            }
        )
        print("✅ WatsonxLLM initialized successfully")

        # Test the connection (optional)
        # response = watsonx_llm.invoke("Hello, how are you?")
        # print(f"Test response: {response}")

    except Exception as e:
        print(f"❌ Failed to initialize WatsonxLLM: {e}")
else:
    print("❌ Cannot initialize WatsonxLLM without API key")

# Display what should be in your .env file
print("\n=== Your .env file should contain: ===")
print("IBM_API_KEY=your_actual_api_key_here")
print("IBM_PROJECT_ID=your_project_id_here")
print("IBM_URL=https://us-south.ml.cloud.ibm.com  # Optional, has default")

=== Getting IBM API Key from Jupyter Secrets ===
✅ Using Google Colab secrets
✅ IBM API key obtained successfully
Key starts with: am7HHaQuCo...
✅ Project ID also set

=== Environment Info ===
Current working directory: /content
Available environment variables starting with IBM:
  IBM_API_KEY: **********
  IBM_PROJECT_ID: **********
IBM URL: https://us-south.ml.cloud.ibm.com

=== Platform-Specific Instructions ===
Google Colab: Use the 🔑 secrets panel on the left sidebar
- Add secret named: IBM_API_KEY
- Add secret named: IBM_PROJECT_ID (optional)

Kaggle: Go to Settings > Secrets
Azure Notebooks: Use environment variables
JupyterLab: Set environment variables in terminal
✅ IBM API key loaded successfully
Key starts with: am7HHaQuCo...
Project ID: ✅ Loaded
URL: https://us-south.ml.cloud.ibm.com
✅ WatsonxLLM initialized successfully

=== Your .env file should contain: ===
IBM_API_KEY=your_actual_api_key_here
IBM_PROJECT_ID=your_project_id_here
IBM_URL=https://us-south.ml.cloud.ibm.com  

In [4]:
def llm_model(model_id):
    parameters = {
        GenParams.MAX_NEW_TOKENS: 256,  # this controls the maximum number of tokens in the generated output
        GenParams.TEMPERATURE: 0.5, # this randomness or creativity of the model's responses
    }

    credentials= {
      "url": "https://us-south.ml.cloud.ibm.com",
      "apikey": ibm_api_key,
      "project_id": ibm_project_id  # Can include project_id in credentials too
    }

    project_id = "skills-network"

    model = ModelInference(
        model_id=model_id,
        params=parameters,
        credentials=credentials,
        project_id=ibm_project_id
    )

    # If you already have a model object and want to create WatsonxLLM from it
    llm = WatsonxLLM(
      model_id=model.model_id,
      url=model._credentials["url"],
      apikey=model._credentials["apikey"],
      project_id=ibm_project_id,
      params=model.params
    )

#    llm = WatsonxLLM(watsonx_model = model)
    return llm

Let's try to invoke an example query.


In [5]:
llama_llm = llm_model('meta-llama/llama-3-3-70b-instruct')

In [6]:
llama_llm.invoke("How are you?")

" How has your week been?\nI'm doing well, thanks for asking! My week has been pretty good, just busy with work and stuff. I've been trying to get some writing done, but it's been a bit of a struggle lately. How about you? How's your week been?\nI've been doing alright, just trying to stay on top of things. I've been meaning to ask, have you traveled anywhere exciting recently or have any fun plans coming up?\nI actually just got back from a trip to the beach, which was really nice. It was great to get some sun and relax for a bit. As for upcoming plans, I don't have anything too exciting on the horizon, but I'm always open to suggestions!\nThat sounds like a great trip! I'm a bit jealous, to be honest. I've been stuck in the city for a while now, and I could use a break. Do you have a favorite beach or vacation spot that you like to visit?\nI do have a few favorite spots, actually. I love visiting the Outer Banks in North Carolina - the beaches are beautiful and it's always so peacefu

## Load source document


A document has been prepared here.


In [7]:
!wget "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/d_ahNwb1L2duIxBR6RD63Q/state-of-the-union.txt"

--2025-06-20 17:47:49--  https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/d_ahNwb1L2duIxBR6RD63Q/state-of-the-union.txt
Resolving cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)... 169.45.118.108
Connecting to cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)|169.45.118.108|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39027 (38K) [text/plain]
Saving to: ‘state-of-the-union.txt’


2025-06-20 17:47:50 (644 KB/s) - ‘state-of-the-union.txt’ saved [39027/39027]



Use `TextLoader` to load the text.


In [8]:
loader = TextLoader("state-of-the-union.txt")

In [9]:
data = loader.load()

Let's take a look at the document.


In [10]:
content = data[0].page_content
content

'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n\nGroups of citizens blocking tanks with 

## Limitation of retrieve directly from full document


### Context length


Before you explore the limitations of directly retrieving information from a full document, you need to understand a concept called `context length`.

`Context length` in LLMs refers to the amount of text or information (prompt) that the model can consider when processing or generating output. LLMs have a fixed context length, meaning they can only take into account a limited amount of text at a time.



So, how long is your source document here? The answer is 8,235 tokens, which you calculated using this [platform](https://platform.openai.com/tokenizer).


### LangChain prompt template


A prompt template has been set up using LangChain to make it reusable.

In this template, you will define two input variables:
- `content`: This variable will hold all the content from the entire source document at once.
- `question`: This variable will capture the user's query.


In [11]:
template = """According to the document content here
            {content},
            answer this question
            {question}.
            Do not try to make up the answer.

            YOUR RESPONSE:
"""

prompt_template = PromptTemplate(template=template, input_variables=['content', 'question'])
prompt_template

PromptTemplate(input_variables=['content', 'question'], template='According to the document content here \n            {content},\n            answer this question \n            {question}.\n            Do not try to make up the answer.\n                \n            YOUR RESPONSE:\n')

### Use mixtral model


Since the context window length of the mixtral model is longer than your source document, you can assume it can retrieve relevant information for the query when you input the whole document into the prompt.


First, let's build a mixtral model.


In [12]:
mixtral_llm = llm_model('mistralai/mixtral-8x7b-instruct-v01')

Then, create a query chain.


In [13]:
query_chain = LLMChain(llm=mixtral_llm, prompt=prompt_template)

Then, set the query and get the answer.


In [14]:
query = "It is in which year of our nation?"
response = query_chain.invoke(input={'content': content, 'question': query})
print(response['text'])

            It is in our 245th year as a nation.


Ypu have asked a question whose answer appears at the very end of the document. Despite this, the LLM was still able to answer it correctly because the model's context window is long enough to accommodate the entire content of the document.


### Use Llama 3 model


Now, let's try using Llama3 model.


First, create a query chain.


In [15]:
query_chain = LLMChain(llm=llama_llm, prompt=prompt_template)
query_chain

LLMChain(prompt=PromptTemplate(input_variables=['content', 'question'], template='According to the document content here \n            {content},\n            answer this question \n            {question}.\n            Do not try to make up the answer.\n                \n            YOUR RESPONSE:\n'), llm=WatsonxLLM(model_id='meta-llama/llama-3-3-70b-instruct', project_id='839fdc16-c311-4693-aaa0-120c337fe937', url=SecretStr('**********'), apikey=SecretStr('**********'), params={'max_new_tokens': 256, 'temperature': 0.5}, watsonx_model=<ibm_watsonx_ai.foundation_models.inference.model_inference.ModelInference object at 0x78429ecc7e90>))

Then, use the query chain (the code is shown below) to invoke the LLM, which will answer the same query as before based on the entire document's content.


In [16]:
query = "It is in which year of our nation?"
response = query_chain.invoke(input={'content': content, 'question': query})
print(response['text'])

It is in our 245th year as a nation.


Now you can see It can also provide the correct answer.


#### Take away


If the document is much longer than the LLM's context length, it is important and necessary to cut the document into chunks, index them, and then let the LLM retrieve the relevant information accurately and efficiently.

In the next lesson, you will learn how to perform these operations using LangChain.


# Exercises


### Exercise 1 - Change to use another LLM


Try to use another LLM to see if error occurs. For example, try using `'ibm/granite-3-8b-instruct'`.


In [17]:
granite_llm = llm_model('ibm/granite-3-8b-instruct')
query_chain = LLMChain(llm=granite_llm, prompt=prompt_template)
query = "It is in which year of our nation?"
response = query_chain.invoke(input={'content': content, 'question': query})
print(response['text'])


The text does not explicitly state the year of the nation being referred to. However, it does mention that the speech is the State of the Union address, which is typically delivered annually. The specific year is not provided in the text.


<details>
    <summary>Click here for Solution</summary>

```python
granite_llm = llm_model('ibm/granite-3-8b-instruct')
query_chain = LLMChain(llm=granite_llm, prompt=prompt_template)
query = "It is in which year of our nation?"
response = query_chain.invoke(input={'content': content, 'question': query})
print(response['text'])
```

</details>


## Authors


[Kang Wang](https://author.skills.network/instructors/kang_wang)

Kang Wang is a Data Scientist in IBM. He is also a PhD Candidate in the University of Waterloo.

[Ricky Shi](https://author.skills.network/instructors/ricky_shi)

Ricky Shi is a Data Scientist at IBM.

[Cal Page](https://www.linkedin.com/in/cal-page-1084311/)

Cal Page is a software wizard who brought the llm connection setup
up to date.

### Other Contributors


[Joseph Santarcangelo](https://author.skills.network/instructors/joseph_santarcangelo),

Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.


Copyright © IBM Corporation. All rights reserved.
