<a target="_blank" href="https://colab.research.google.com/github/UpstageAI/cookbook/blob/main/Solar-Fullstack-LLM-101/09_1_Smart_RAG.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# 09. Smart RAG

## Overview  
In this exercise, we will explore the concept of Smart Retrieval-Augmented Generation (Smart RAG) using the Solar framework. Smart RAG enhances the standard RAG approach by incorporating intelligent retrieval mechanisms to select the most relevant and high-quality information from external sources. This notebook will guide you through implementing Smart RAG and demonstrate its effectiveness in generating more accurate and contextually appropriate responses.

## Purpose of the Exercise
The purpose of this exercise is to demonstrate the advanced application of Smart Retrieval-Augmented Generation within the Solar framework. By the end of this tutorial, users will understand how to leverage intelligent retrieval techniques to improve the relevance and accuracy of the language model’s outputs, thereby enhancing its overall performance and reliability.



## Explanation of the Code: Smart Retrieval Augmented Generation (RAG)
![smartRAG](https://github.com/UpstageAI/cookbook/blob/main/Solar-Fullstack-LLM-101/figures/a_in.png?raw=1)

### High-Level Overview

The code demonstrates a smart Retrieval Augmented Generation (RAG) system that combines local retrieval with external search capabilities. The main goal is to provide relevant context for answering user questions by first searching a local vector database and then falling back to an external search service if the local context is insufficient.


The code defines two main functions:


  1. is_in: Determines whether the answer to a given question can be found within the provided context.
smart_rag: Retrieves relevant context for a given question, either from the local vector database or an external search service, and generates an answer using the retrieved context.

  1. The code uses the LangChain library for generating prompts and invoking language models, as well as the Tavily API for external search capabilities.


### Detailed Explanation

1. The code starts by defining the is_in function, which takes a question and context as input and determines whether the answer to the question can be found within the context.
    * It defines a prompt template called is_in_conetxt that asks the language model to check if the answer is in the context and return "yes" or "no".
    * The prompt template is used to create a ChatPromptTemplate object.
    * A chain of operations is constructed using the | operator:
      * The ChatPromptTemplate is passed to the ChatUpstage model.
      * The model's output is parsed using the StrOutputParser.
    * The chain is invoked with the question and context, and the response is stored in the response variable.
    * The function returns True if the response starts with "yes" (case-insensitive), indicating that the answer is in the context.

1. The code then demonstrates the usage of the is_in function with two example questions and their corresponding contexts retrieved from a retriever.

1. Next, the code defines the smart_rag function, which takes a question as input and generates an answer using the retrieved context.
    * It first retrieves the context for the question using the retriever.invoke method.
    * If the is_in function determines that the answer is not in the retrieved context, it falls back to searching for additional context using the Tavily API.
    * The retrieved context (either from the local retriever or Tavily) is stored in the context variable.
    * A chain of operations is constructed using the | operator:
      * The rag_with_history_prompt (not shown in the code snippet) is used as the prompt template.
      * The prompt is passed to the llm language model.
      * The model's output is parsed using the StrOutputParser.
    * The chain is invoked with the conversation history, retrieved context, and the question, and the generated answer is returned.

1. Finally, the code demonstrates the usage of the smart_rag function with two example questions:
    * "What is DUS?": The answer is expected to be found in the local context.
    * "What's the population of San Francisco?": The answer is not expected to be found in the local context, so it falls back to searching with Tavily.

This code showcases how LangChain can be used to build a smart RAG system that combines local retrieval with external search capabilities. By first searching a local vector database and falling back to an external search service if needed, the system aims to provide relevant context for generating accurate answers to user questions.

In [1]:
! pip3 install -qU  markdownify  langchain-upstage rank_bm25 python-dotenv tavily-python

In [2]:
# @title set API key
from pprint import pprint
import os

import warnings

warnings.filterwarnings("ignore")

if "google.colab" in str(get_ipython()):
    # Running in Google Colab. Please set the UPSTAGE_API_KEY in the Colab Secrets
    from google.colab import userdata

    os.environ["UPSTAGE_API_KEY"] = userdata.get("UPSTAGE_API_KEY")
else:
    # Running locally. Please set the UPSTAGE_API_KEY in the .env file
    from dotenv import load_dotenv

    load_dotenv()

assert (
    "UPSTAGE_API_KEY" in os.environ
), "Please set the UPSTAGE_API_KEY environment variable"

In [3]:
solar_summary = """
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters,
demonstrating superior performance in various natural language processing (NLP) tasks.
Inspired by recent efforts to efficiently up-scale LLMs,
we present a method for scaling LLMs called depth up-scaling (DUS),
which encompasses depthwise scaling and continued pretraining.
In contrast to other LLM up-scaling methods that use mixture-of-experts,
DUS does not require complex changes to train and inference efficiently.
We show experimentally that DUS is simple yet effective
in scaling up high-performance LLMs from small ones.
Building on the DUS model, we additionally present SOLAR 10.7B-Instruct,
a variant fine-tuned for instruction-following capabilities,
surpassing Mixtral-8x7B-Instruct.
SOLAR 10.7B is publicly available under the Apache 2.0 license,
promoting broad access and application in the LLM field.
"""

In [4]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_upstage import ChatUpstage

llm = ChatUpstage(model="solar-pro")


prompt_template = PromptTemplate.from_template(
    """
    Please provide answer from the following context.
    If the answer is not present in the context, please write "The information is not present in the context."

    ---
    Question: {question}
    ---
    Context: {context}
    """
)
chain = prompt_template | llm | StrOutputParser()

In [13]:
chain.invoke({"question": "What is DUS?", "context": solar_summary})

'DUS stands for Depth Up-Scaling.'

In [12]:
chain.invoke({"question": "How to get to Seoul from SF", "context": solar_summary})

'The information is not present in the context.'

In [36]:
# RAG or Search?
def is_in(question, context):
    is_in_conetxt = """As a helpful assistant,
please use your best judgment to determine if the answer to the question is within the given context.
If the answer is present in the context, please respond with "yes".
If not, please respond with "no".
Only provide "yes" or "no" and avoid including any additional information.
Please do your best. Here is the question and the context:
---
CONTEXT: {context}
---
QUESTION: {question}
---
OUTPUT (yes or no):"""

    is_in_prompt = PromptTemplate.from_template(is_in_conetxt)
    chain = is_in_prompt | ChatUpstage(model="solar-pro") | StrOutputParser()

    response = chain.invoke({"context": context, "question": question})
    print(response)
    return response.lower().startswith("yes")

In [37]:
is_in("How to get to Seoul from SF", solar_summary)

no


False

In [38]:
is_in("What is DUS?", solar_summary)

yes


True

In [43]:
# Smart RAG, Self-Improving RAG
import os
from tavily import TavilyClient

def smart_rag(question, context):
    if not is_in(question, context):
        print("Searching in tavily")
        tavily = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])
        context = tavily.search(query=question)

    chain = prompt_template | llm | StrOutputParser()
    return chain.invoke({"context": context, "question": question})

In [44]:
smart_rag("What is DUS?", solar_summary)

yes


'DUS stands for Depth Up-Scaling.'

In [45]:
smart_rag("How to get to Seoul from SF?", solar_summary)

no
Searching in tavily


'There are several ways to get from San Francisco to Seoul, including flying, taking a bus, using the subway, or riding a train. The most common method is to fly, with airlines such as AIR PREMIA, Asiana Airlines, Korean Air, and United Airlines offering nonstop flights from San Francisco to Seoul. The cheapest flight can be found with AIR PREMIA for $698 one-way, while the best prices for this route can be found on AIR PREMIA. The average price for a round trip flight from San Francisco to Seoul is around $682.'

# Excercise

The `is_in` function sometimes works, but other times it does not. You can significantly improve it by providing a more detailed description and adding two or three examples in the prompt to see how it performs better.