# Lesson 1: Advanced RAG Pipeline

In [1]:
import os
import openai
import torch

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']

In [12]:
from llama_index import SimpleDirectoryReader
documents = SimpleDirectoryReader(input_files=["../data/amazon-seller-guide.pdf"]).load_data()

In [13]:
print(type(documents), "\n")
print(len(documents), "\n")
print(type(documents[0]))
print(documents[0])

<class 'list'> 

29 

<class 'llama_index.schema.Document'>
Doc ID: 40b9d840-0b0d-4e75-841b-cddbc15910e0
Text: ARTICLE 1: Amazon Keyword Tips: 7 Mistakes New FBA Sellers Make
Amazon keyword research is the bread and butter of any successful
product-selling campaign. Get it right, and you put your product in
front of the right audience with the proper phases you’ve optimized.
However, getting it wrong is a fast track to a frustrating Amazon
journey, even ...


## Basic RAG pipeline

In [14]:
from llama_index import Document

document = Document(text="\n\n".join([doc.text for doc in documents]))

In [15]:
from llama_index import VectorStoreIndex
from llama_index import ServiceContext
from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-small-en-v1.5")
index = VectorStoreIndex.from_documents([document],service_context=service_context)

In [16]:
query_engine = index.as_query_engine()

In [18]:
response = query_engine.query("What are some actions that Amazon seller can take to improve their performance?")
print(str(response))

Amazon sellers can take several actions to improve their performance. One action is to actively request reviews from customers using Amazon's approved methods, such as the "Request a Review" button on the manage orders page or automated review request tools like Helium 10's Follow-Up. By prompting customers to leave reviews, sellers can increase their review count and enhance their visibility on the competitive Amazon marketplace.

Another action is to prioritize exceptional customer experiences. By providing a seamless post-purchase journey and consistently delivering products that meet or exceed customer expectations, sellers can incentivize buyers to share their positive experiences through reviews. This can help build a reputable brand and increase customer satisfaction.

Additionally, sellers should effectively manage negative seller feedback. While negative reviews on product listings are difficult to remove, negative seller feedback can be removed by following the steps outlined

## Evaluation setup using TruLens

In [22]:
eval_questions = []
with open('../data/eval_questions.txt', 'r') as file:
    for line in file:
        # Remove newline character and convert to integer
        item = line.strip()
        print(item)
        eval_questions.append(item)

What are the primary phases of the Amazon SEO audit process, as outlined in the article, and why are they important?

How does Amazon's A9 algorithm interpret search queries, and what factors influence the search results it provides?

What is the significance of crawling in the SEO audit process, and which tool is recommended for keyword research and analysis in this phase?

In the indexing phase, how does Amazon A9 determine which keywords are associated with a product, and how can you ensure that your product gets indexed effectively?

What factors influence the ranking of products on Amazon, and why is it crucial for sellers to understand these factors?

How can you monitor keyword rankings and sales performance on Amazon using Helium 10 tools, and why is this monitoring important for sellers?

What are the differences between organic (SEO) sales and advertising-driven sales, and how can you optimize your listings to reduce advertising costs?

What is the post-audit stage mentioned 

In [23]:
# You can try your own question:
new_question = "What are the top priorities in SEO strategy?"
eval_questions.append(new_question)

In [24]:
print(eval_questions)

['What are the primary phases of the Amazon SEO audit process, as outlined in the article, and why are they important?', '', "How does Amazon's A9 algorithm interpret search queries, and what factors influence the search results it provides?", '', 'What is the significance of crawling in the SEO audit process, and which tool is recommended for keyword research and analysis in this phase?', '', 'In the indexing phase, how does Amazon A9 determine which keywords are associated with a product, and how can you ensure that your product gets indexed effectively?', '', 'What factors influence the ranking of products on Amazon, and why is it crucial for sellers to understand these factors?', '', 'How can you monitor keyword rankings and sales performance on Amazon using Helium 10 tools, and why is this monitoring important for sellers?', '', 'What are the differences between organic (SEO) sales and advertising-driven sales, and how can you optimize your listings to reduce advertising costs?', 

In [None]:
from trulens_eval import TruChain, Feedback, Huggingface, Tru, Query
tru = Tru()

tru.reset_database()

In [None]:
from utils import get_prebuilt_trulens_recorder

tru_recorder = get_prebuilt_trulens_recorder(query_engine,
                                             app_id="Direct Query Engine")

In [None]:
with tru_recorder as recording:
    for question in eval_questions:
        response = query_engine.query(question)

In [None]:
records, feedback = tru.get_records_and_feedback(app_ids=[])

In [None]:
records.head()

In [None]:
# launches on http://localhost:8501/
tru.run_dashboard()

## Advanced RAG pipeline

### 1. Sentence Window retrieval

In [None]:
from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)

In [None]:
from utils import build_sentence_window_index

sentence_index = build_sentence_window_index(
    document,
    llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="sentence_index"
)

In [None]:
from utils import get_sentence_window_query_engine

sentence_window_engine = get_sentence_window_query_engine(sentence_index)

In [None]:
window_response = sentence_window_engine.query(
    "how do I get started on a personal project in AI?"
)
print(str(window_response))

In [None]:
tru.reset_database()

tru_recorder_sentence_window = get_prebuilt_trulens_recorder(
    sentence_window_engine,
    app_id = "Sentence Window Query Engine"
)

In [None]:
for question in eval_questions:
    with tru_recorder_sentence_window as recording:
        response = sentence_window_engine.query(question)
        print(question)
        print(str(response))

In [None]:
tru.get_leaderboard(app_ids=[])

In [None]:
# launches on http://localhost:8501/
tru.run_dashboard()

### 2. Auto-merging retrieval

In [None]:
from utils import build_automerging_index

automerging_index = build_automerging_index(
    documents,
    llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="merging_index"
)

In [None]:
from utils import get_automerging_query_engine

automerging_query_engine = get_automerging_query_engine(
    automerging_index,
)

In [None]:
auto_merging_response = automerging_query_engine.query(
    "How do I build a portfolio of AI projects?"
)
print(str(auto_merging_response))

In [None]:
tru.reset_database()

tru_recorder_automerging = get_prebuilt_trulens_recorder(automerging_query_engine,
                                                         app_id="Automerging Query Engine")

In [None]:
for question in eval_questions:
    with tru_recorder_automerging as recording:
        response = automerging_query_engine.query(question)
        print(question)
        print(response)

In [None]:
tru.get_leaderboard(app_ids=[])

In [None]:
# launches on http://localhost:8501/
tru.run_dashboard()