# Introduction

Hello everyone!

I am Harsh Singhal, and I am excited to share with you this notebook on Retrieval Augmented Generation (RAG). This notebook is part of a linked series where we dive into the fascinating world of RAG and explore its applications.

## About Me

I am a passionate AI professional and I love exploring cutting-edge technologies and finding innovative solutions to real-world problems. NLP and its various applications have been a particular area of interest for me, and I am thrilled to share my insights and discoveries with you through this series.

## Connect with Me

If you would like to connect with me or follow my work, feel free to connect with me on LinkedIn:

[Harsh Singhal LinkedIn Profile](https://www.linkedin.com/in/harshsinghal)

## Retrieval Augmented Generation (RAG)

The field of NLP has witnessed significant advancements in recent years, and RAG is one such exciting development. In this series of notebooks, we will cover the fundamentals of RAG, its architecture, and practical implementations. We will also work on some hands-on examples to grasp the concepts better.

Let us embark on this journey together and explore the powerful capabilities of RAG!

Happy learning! 


In this Jupyter notebook, we will be working with the llama_index and langchain libraries to perform document indexing and retrieval using GPT-3.5-turbo, an advanced language model. The purpose of this notebook is to demonstrate how to set up the environment, load documents, and create an index for efficient document retrieval.

Before we proceed, please note that we will be using the OpenAI API to leverage the capabilities of the GPT-3.5-turbo model. As a security measure, remember never to reveal your API keys directly in code. Instead, use environment variables or other secure means to store sensitive information.

We will follow these steps:

- Import necessary classes and functions from the llama_index and langchain libraries.
- Set up the OpenAI API key using an environment variable and directly for demonstration purposes (please avoid doing this in production code).
- Load data from the specified directory, where we assume the documents are stored. You may adjust the path according to your data location.
- Initialize the LLMPredictor with the desired GPT-3.5-turbo model and temperature setting.
- Create a ServiceContext using the initialized predictor.
- Index the loaded documents using the created service context to enable efficient document retrieval.
- Now that you have an overview of the tasks we'll be performing, let's proceed with the document loading and indexing process. Happy coding! 🚀

In [None]:
# Install llama_index which is a popular middleware used in many GenAI applications
!pip install llama_index

In [None]:
#          ___         
#         / ()\\        
#       _|_____|_       
#      | | === | |      
#      |_|  O  |_|        
#       ||  O  ||         
#       ||__*__||         
#      |~ \\___/ ~|       
#      /=\\ /=\\ /=\\     
#______[_]_[_]_[_]_______


# Import necessary classes and functions from the llama_index and langchain libraries
from llama_index import (
    GPTVectorStoreIndex,
    SimpleDirectoryReader,
    ServiceContext,
    StorageContext,
    LLMPredictor,
    load_index_from_storage,
)
from langchain.chat_models import ChatOpenAI

# Import the openai library and os module to set the API key
import openai
import os

# SECURITY ALERT: Never reveal your API keys directly in code. Use environment variables or other secure means.
# Here, we're setting the OpenAI API key both using an environment variable and directly (demonstration purposes only)
os.environ['OPENAI_API_KEY'] = 'YOUR_API_KEY'
openai.api_key = 'YOUR_API_KEY'

# Notify the user that the document loading process has begun
print("started the loading document process...")

# Read the data from the specified directory. Change './boiler_docs/' to your desired path.
documents = SimpleDirectoryReader('/kaggle/input/aws-case-studies-and-blogs/').load_data()

# Initialize the LLMPredictor with the desired GPT-3.5-turbo model and temperature setting
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"))

# Create a ServiceContext using the initialized predictor
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

# Notify the user that the indexing process has begun
print("started the indexing process...")

# Create an index using the loaded documents and the created service context
index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)



In [None]:
# Store the created index to the disk at the specified location
print("storing the index to disk")
index.storage_context.persist(persist_dir="/kaggle/working/aws_case_documents_index")


In [None]:
# Notify the user that we are querying the index
print("Querying the index...")

# Query the index for the provided question and store the response
response = index.as_query_engine().query("Write a detailed summary of AWS Personalize and in your response provided an executive summary after which you can present succinct bullet points? Remember to add formatting elements so that the output is easy to read and well-formatted. Use line breaks to improve formatting.")

# Print the received response
print(response)

In [None]:
# Notify the user that we are querying the index
print("Querying the index...")

# Query the index for the provided question and store the response
response = index.as_query_engine().query("Write a detailed summary of the use of Machine Learning by AWS customers in the Life Science industry. In your response provided an executive summary after which you can present succinct bullet points? Remember to add formatting elements so that the output is easy to read and well-formatted. Use line breaks to improve formatting.")

# Print the received response
print(response)

In [None]:
# Notify the user that we are querying the index
print("Querying the index...")

# Query the index for the provided question and store the response
response = index.as_query_engine().query("Write a detailed summary of the companies using AWS IoT services. In your response provided an executive summary after which you can present succinct bullet points? Remember to add formatting elements so that the output is easy to read and well-formatted. Use line breaks to improve formatting.")

# Print the received response
print(response)

In [None]:
# Notify the user that we are querying the index
print("Querying the index...")

# Query the index for the provided question and store the response
response = index.as_query_engine().query("Write a detailed summary of the competitive advantage of AWS services over other cloud vendors in the AI space. In your response provided an executive summary after which you can present succinct bullet points? Remember to add formatting elements so that the output is easy to read and well-formatted. Use line breaks to improve formatting.")

# Print the received response
print(response)

In [None]:
# Notify the user that we are querying the index
print("Querying the index...")

# Query the index for the provided question and store the response
response = index.as_query_engine().query("Write a detailed summary how GenAI applications can be used in the Fintech industry. In your response provided an executive summary after which you can present succinct bullet points? Remember to add formatting elements so that the output is easy to read and well-formatted. Use line breaks to improve formatting.")

# Print the received response
print(response)

You can look at the details of the response object and go deeper into what sources of information were recalled.

In [None]:
dir(response)

In [None]:
response.source_nodes[0]