# RAG

Retrieval-Augmented Generation (RAG) is a capability that allows LLMs to access relevant,
private data that wasn't available during model training.
It also enables more personalized responses.
RAG solves the problem of model knowledge becoming outdated,
while being more cost-effective than retraining the model.

## How RAG Works
1. First, you convert your documents, knowledge base, or other content into special vector embeddings (think of them as digital fingerprints of information)
2. When a question comes in,
   RAG searches these fingerprints to find the most relevant pieces of information
3. The system adds this relevant information to the AI's prompt
4. The AI creates a response using both its training and this fresh, specific information

As in previous notebooks, let's start with some initial setup

In [1]:
%useLatestDescriptors
%use spring-ai-openai
USE { dependencies { implementation("org.springframework.ai:spring-ai-advisors-vector-store:1.0.0-M8") } }

In [2]:
val apiKey = System.getenv("OPENAI_API_KEY") ?: "YOUR_OPENAI_API_KEY"

val openAiApi = OpenAiApi.builder().apiKey(apiKey).build()
val openAiOptions = OpenAiChatOptions.builder()
    .model(OpenAiApi.ChatModel.GPT_4_O_MINI)
    .temperature(0.7)
    .build()

As mentioned above, documents need to be converted into vectors.

For this, we'll need an `EmbeddingModel`.

In [3]:
val embeddingModel = OpenAiEmbeddingModel(openAiApi)

We now have an `EmbeddingModel`,
but we need somewhere to store the vector representations of documents.
This is what vector stores are designed for.
In our example, we'll use a simple in-memory implementation of a vector store.

In [4]:
import org.springframework.ai.vectorstore.SimpleVectorStore

val vectoreStore = SimpleVectorStore.builder(embeddingModel).build()

Now we just need to add a document to our store.

Let's use a Kotlin FAQ

In [5]:
import java.io.File

val doc = Document(File("data/kotlinFAQ.md").readText())
vectoreStore.add(listOf(doc))

Now that we've prepared everything necessary,
let's use the `QuestionAnswerAdvisor`, which implements RAG in Spring-AI.

Here's what will happen:
1. Send a query
2. The query gets vectorized
3. The system searches for the closest match to our query vector in the vector store
4. The closest results are added to the original query as additional context
5. Original query along with this additional context is sent to the LLM
6. Receive an answer

In [7]:
import org.springframework.ai.chat.client.advisor.vectorstore.QuestionAnswerAdvisor

ChatClient.create(
    OpenAiChatModel.builder()
        .openAiApi(openAiApi)
        .defaultOptions(openAiOptions)
        .build()
)
    .prompt()
    .advisors(QuestionAnswerAdvisor(vectoreStore))
    .user("current version of Kotlin?")
    .call()
    .content()

The current version of Kotlin is 2.1.20, which was published on March 20, 2025. For more information, you can visit the [Kotlin GitHub page](https://github.com/jetbrains/kotlin).