# High-performance RAG (and Evaluation) with LlamaIndex

In the following Notebook we will be exploring two of the most powerful techniques to take your single-domain RAG pipelines to the next level. We'll also be discussing methods that you can use to evaluate your RAG pipeline to get insight into how its performance improves over time!

- Fine-tuning Embeddings Model
- Expanding Context Window from Retrieved Node

But before any of that, we need to grab some dependencies, and set up some boilerplate!

## Dependencies and Boilerplate

We'll set up our `nest_asyncio` so we can leverage async loops in our Notebook.

We'll also install the required libraries we'll be using today, and set up our OpenAI API key!

This notebook will require the use of GPT-4, and the final evaluation piece might exceed the standard rate-limit. You will need to modify the evaluation pipeline to ensure you aren't faced with a rate limit!

### Nest Asyncio

In [1]:
import nest_asyncio

nest_asyncio.apply()

### Install Dependencies

In [3]:
!pip install llama_index pypdf -q -U

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/284.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m276.5/284.0 kB[0m [31m8.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.0/284.0 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[?25h

### Provide OpenAI API Key

In [4]:
import os
import getpass

#os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter Your OpenAI API Key: ")

## Loading Data

The data can be found in [this GitHub repo](https://github.com/AI-Maker-Space/DataRepository/tree/main/high-performance-rag).

It is a collection of Academic Papers related to Camelids!

In [5]:
#!git clone https://github.com/AI-Maker-Space/DataRepository.git

In [6]:
#%cd DataRepository/high-performance-rag

In [7]:
#!unzip "Camel Papers Test.zip"

In [8]:
#!unzip "Camel Papers Train.zip"

Now we can begin building our simple index for each of the training directories, and the validation directories.

We will use LlamaIndex's `SimpleNodeParser` to achieve this!

In [9]:
from llama_index.node_parser import SimpleNodeParser
from llama_index.schema import MetadataMode

#TRAIN_FILES = "Camel Papers Train"
#VAL_FILES = "Camel Papers Test"

In [10]:
urls_train = [
    'https://www.hsph.harvard.edu/nutritionsource/kids-healthy-eating-plate/',
    'https://www.hsph.harvard.edu/nutritionsource/healthy-eating-plate/',
    'https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/',
    'https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/whole-grains/',
    'https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/protein/',
    'https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/vegetables-and-fruits/',
    'https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/fats-and-cholesterol/',
    'https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/fats-and-cholesterol/types-of-fat/',
    'https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/fats-and-cholesterol/cholesterol/',
    'https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/fats-and-cholesterol/dietary-fat-and-disease/',
    'https://www.hsph.harvard.edu/nutritionsource/vitamins/',
    'https://www.hsph.harvard.edu/nutritionsource/healthy-drinks/',
    'https://www.hsph.harvard.edu/nutritionsource/healthy-drinks/other-healthy-beverage-options/',
    'https://www.hsph.harvard.edu/nutritionsource/healthy-drinks/drinks-to-consume-in-moderation/',
    'https://www.hsph.harvard.edu/nutritionsource/healthy-drinks/sugary-drinks/',
    'https://www.hsph.harvard.edu/nutritionsource/sports-drinks/',
    'https://www.hsph.harvard.edu/nutritionsource/energy-drinks/',
    'https://www.hsph.harvard.edu/nutritionsource/healthy-drinks/beverages-public-health-concerns/',
    'https://www.hsph.harvard.edu/nutritionsource/healthy-drinks/artificial-sweeteners/',
    'https://www.hsph.harvard.edu/nutritionsource/salt-and-sodium/',
    'https://www.hsph.harvard.edu/nutritionsource/salt-and-sodium/take-action-on-salt/',
    'https://www.hsph.harvard.edu/nutritionsource/salt-and-sodium/sodium-public-health-concerns/',
    'https://www.hsph.harvard.edu/nutritionsource/carbohydrates/',
    'https://www.hsph.harvard.edu/nutritionsource/carbohydrates/carbohydrates-and-blood-sugar/',
    'https://www.hsph.harvard.edu/nutritionsource/carbohydrates/fiber/',
    'https://www.hsph.harvard.edu/nutritionsource/carbohydrates/added-sugar-in-the-diet/',
    'https://www.hsph.harvard.edu/nutritionsource/sustainability/',
    'https://www.hsph.harvard.edu/nutritionsource/sustainability/plate-and-planet/',
    'https://www.hsph.harvard.edu/nutritionsource/sustainability/food-waste/',
    'https://www.hsph.harvard.edu/nutritionsource/healthy-weight/',
    'https://www.hsph.harvard.edu/nutritionsource/healthy-weight/measuring-fat/',
    'https://www.hsph.harvard.edu/nutritionsource/healthy-weight/best-diet-quality-counts/',
    'https://www.hsph.harvard.edu/nutritionsource/healthy-weight/healthy-dietary-styles/',
    'https://www.hsph.harvard.edu/nutritionsource/healthy-weight/diet-reviews/',
    'https://www.hsph.harvard.edu/nutritionsource/staying-active/',
    'https://www.hsph.harvard.edu/nutritionsource/staying-active/active-communities/',
    'https://www.hsph.harvard.edu/nutritionsource/stress-and-health/',
    'https://www.hsph.harvard.edu/nutritionsource/sleep/',
    'https://www.hsph.harvard.edu/nutritionsource/healthy-longevity/',
    'https://www.hsph.harvard.edu/nutritionsource/disease-prevention/',
    'https://www.hsph.harvard.edu/nutritionsource/disease-prevention/cardiovascular-disease/',
    'https://www.hsph.harvard.edu/nutritionsource/disease-prevention/cardiovascular-disease/preventing-cvd/',
    'https://www.hsph.harvard.edu/nutritionsource/disease-prevention/diabetes-prevention/',
    'https://www.hsph.harvard.edu/nutritionsource/disease-prevention/diabetes-prevention/preventing-diabetes-full-story/',
    'https://www.hsph.harvard.edu/nutritionsource/cancer/',
    'https://www.hsph.harvard.edu/nutritionsource/cancer/preventing-cancer/',
    'https://www.hsph.harvard.edu/nutritionsource/oral-health/',
    'https://www.hsph.harvard.edu/nutritionsource/precision-nutrition/',
    'https://www.hsph.harvard.edu/nutritionsource/nutrition-and-immunity/',
    'https://www.hsph.harvard.edu/nutritionsource/recipes-2/',
    'https://www.hsph.harvard.edu/nutritionsource/asparagus-with-warm-tarragon-pecan-vinaigrette/',
    'https://www.hsph.harvard.edu/nutritionsource/asparagus-spears-with-mandarin-orange/',
    'https://www.hsph.harvard.edu/nutritionsource/baby-arugula-and-shaved-fennel-with-lemon-vinaigrette/',
    'https://www.hsph.harvard.edu/nutritionsource/braised-cabbage-with-leeks-and-sesame-seeds/',
    'https://www.hsph.harvard.edu/nutritionsource/braised-oyster-mushrooms-coconut-macadamia/',
    'https://www.hsph.harvard.edu/nutritionsource/butternut-squash-soup-recipe/',
    'https://www.hsph.harvard.edu/nutritionsource/caesar-salad/',
    'https://www.hsph.harvard.edu/nutritionsource/cardamom-roasted-cauliflower/',
    'https://www.hsph.harvard.edu/nutritionsource/carrot-and-coriander-soup/',
    'https://www.hsph.harvard.edu/nutritionsource/cauliflower-tomato-soup/',
    'https://www.hsph.harvard.edu/nutritionsource/cauliflower-walnut-soup/',
    'https://www.hsph.harvard.edu/nutritionsource/endive-salad-with-citrus-walnut-dressing/',
    'https://www.hsph.harvard.edu/nutritionsource/customizable-stuffed-peppers/',
    'https://www.hsph.harvard.edu/nutritionsource/fresh-spinach-with-sesame-seeds/',
    'https://www.hsph.harvard.edu/nutritionsource/garlic-braised-greens/',
    'https://www.hsph.harvard.edu/nutritionsource/green-beans-with-dried-cherries/',
    'https://www.hsph.harvard.edu/nutritionsource/green-beans-with-chili-garlic-sauce/',
    'https://www.hsph.harvard.edu/nutritionsource/green-chutney/',
    'https://www.hsph.harvard.edu/nutritionsource/grilled-eggplant-cutlets/',
    'https://www.hsph.harvard.edu/nutritionsource/kale-with-caramelized-onions/',
    'https://www.hsph.harvard.edu/nutritionsource/marinated-shiitake-mushroom-and-cucumber-salad/',
    'https://www.hsph.harvard.edu/nutritionsource/mashed-cauliflower/',
    'https://www.hsph.harvard.edu/nutritionsource/mushroom-stroganoff/',
    'https://www.hsph.harvard.edu/nutritionsource/pan-roasted-wild-mushrooms-with-coffee-and-hazelnuts/',
    'https://www.hsph.harvard.edu/nutritionsource/portabella-steak-sandwich/',
    'https://www.hsph.harvard.edu/nutritionsource/provencal-vegetables/',
    'https://www.hsph.harvard.edu/nutritionsource/vegetable-stock/',
    'https://www.hsph.harvard.edu/nutritionsource/roasted-brussels-sprouts/',
    'https://www.hsph.harvard.edu/nutritionsource/brussels-sprouts-with-shallots/',
    'https://www.hsph.harvard.edu/nutritionsource/roasted-beets-with-balsamic-vinegar/',
    'https://www.hsph.harvard.edu/nutritionsource/roasted-balsamic-vegetables/',
    'https://www.hsph.harvard.edu/nutritionsource/roasted-squash-with-pomegranate/',
    'https://www.hsph.harvard.edu/nutritionsource/sweet-potatoes-with-pecans/',
    'https://www.hsph.harvard.edu/nutritionsource/ruby-chard/',
    'https://www.hsph.harvard.edu/nutritionsource/sauted-rainbow-swiss-chard/',
    'https://www.hsph.harvard.edu/nutritionsource/simple-celery-date-salad/',
    'https://www.hsph.harvard.edu/nutritionsource/southwestern-corn-hash/',
    'https://www.hsph.harvard.edu/nutritionsource/spicy-broccolini/',
    'https://www.hsph.harvard.edu/nutritionsource/spicy-indian-slaw/',
    'https://www.hsph.harvard.edu/nutritionsource/stir-fried-vegetables-tomato-curry/',
    'https://www.hsph.harvard.edu/nutritionsource/sugar-snap-peas-with-fresh-mint/',
    'https://www.hsph.harvard.edu/nutritionsource/tarragon-succotash/',
    'https://www.hsph.harvard.edu/nutritionsource/tunisian-carrot-salad/',
    'https://www.hsph.harvard.edu/nutritionsource/vegetable-stock-recipe/',
    'https://www.hsph.harvard.edu/nutritionsource/vegetarian-shepherds-pie-recipe/',
    'https://www.hsph.harvard.edu/nutritionsource/wild-mushroom-soup-with-soba/',
    'https://www.hsph.harvard.edu/nutritionsource/yellow-squash-with-sage/',
    'https://www.hsph.harvard.edu/nutritionsource/arugula-watermelon-feta-and-mint-salad-with-balsamic-vinaigrette/',
    'https://www.hsph.harvard.edu/nutritionsource/citrus-salad/',
    'https://www.hsph.harvard.edu/nutritionsource/almond-coconut-macaroons/',
    'https://www.hsph.harvard.edu/nutritionsource/dried-fruit-and-nuts/',
    'https://www.hsph.harvard.edu/nutritionsource/watermelon-salad/',
    'https://www.hsph.harvard.edu/nutritionsource/fruit-compote-spiced-nuts/',
    'https://www.hsph.harvard.edu/nutritionsource/strawberry-rhubarb-crisp/',
    'https://www.hsph.harvard.edu/nutritionsource/barley-roasted-portobello-and-fennel-salad/',
    'https://www.hsph.harvard.edu/nutritionsource/blueberry-muffins/',
    'https://www.hsph.harvard.edu/nutritionsource/brown-rice-pancakes/',
    'https://www.hsph.harvard.edu/nutritionsource/bulgur-pilaf/',
    'https://www.hsph.harvard.edu/nutritionsource/couscous-minted-with-pine-nuts/',
    'https://www.hsph.harvard.edu/nutritionsource/couscous-quinoa-tabouli/',
    'https://www.hsph.harvard.edu/nutritionsource/cranberry-orange-muffin/',
    'https://www.hsph.harvard.edu/nutritionsource/fantastic-bulgur-dish/',
    'https://www.hsph.harvard.edu/nutritionsource/farro-risotto-walnut-pesto/',
    'https://www.hsph.harvard.edu/nutritionsource/farro-roasted-confetti-vegetables/',
    'https://www.hsph.harvard.edu/nutritionsource/hearty-whole-grain-bread/',
    'https://www.hsph.harvard.edu/nutritionsource/irish-brown-bread/',
    'https://www.hsph.harvard.edu/nutritionsource/jalapeno-cheddar-corn-muffins/',
    'https://www.hsph.harvard.edu/nutritionsource/lemon-chickpea-breakfast-muffins/',
    'https://www.hsph.harvard.edu/nutritionsource/mediterranean-rice/',
    'https://www.hsph.harvard.edu/nutritionsource/mixed-up-grains/',
    'https://www.hsph.harvard.edu/nutritionsource/mushroom-barley-risotto/',
    'https://www.hsph.harvard.edu/nutritionsource/oatmeal-roti/',
    'https://www.hsph.harvard.edu/nutritionsource/pasta-in-zemino/',
    'https://www.hsph.harvard.edu/nutritionsource/rigatoni-fresh-basil-pesto-corn-zucchini/',
    'https://www.hsph.harvard.edu/nutritionsource/quinoa-chia-edamame-veggie-burger/',
    'https://www.hsph.harvard.edu/nutritionsource/quinoa-enchilada-casserole/',
    'https://www.hsph.harvard.edu/nutritionsource/spicy-coconut-rice-with-limes/',
    'https://www.hsph.harvard.edu/nutritionsource/three-green-wheat-berry-salad-with-mushroom-bacon-recipe/',
    'https://www.hsph.harvard.edu/nutritionsource/wheatberries-and-chives/',
    'https://www.hsph.harvard.edu/nutritionsource/whole-wheat-banana-nut-muffins/',
    'https://www.hsph.harvard.edu/nutritionsource/whole-wheat-penne-with-pistachio-pesto-and-cherry-tomatoes/',
    'https://www.hsph.harvard.edu/nutritionsource/wild-rice-with-cranberries/',
    'https://www.hsph.harvard.edu/nutritionsource/greek-skordalia/',
    'https://www.hsph.harvard.edu/nutritionsource/green-lentil-hummus-herbs-olives/',
    'https://www.hsph.harvard.edu/nutritionsource/guacamole/',
    'https://www.hsph.harvard.edu/nutritionsource/hot-pepper-vinaigrette/',
    'https://www.hsph.harvard.edu/nutritionsource/hummus/',
    'https://www.hsph.harvard.edu/nutritionsource/italian-pesto-alla-trapanese/',
    'https://www.hsph.harvard.edu/nutritionsource/mint-vinaigrette/',
    'https://www.hsph.harvard.edu/nutritionsource/oregano-garlic-vinaigrette/',
    'https://www.hsph.harvard.edu/nutritionsource/spanish-romesco-sauce/',
    'https://www.hsph.harvard.edu/nutritionsource/turkish-muhammara/',
    'https://www.hsph.harvard.edu/nutritionsource/turkish-tarator/',
    'https://www.hsph.harvard.edu/nutritionsource/walnut-pesto/',
    'https://www.hsph.harvard.edu/nutritionsource/white-bean-and-kale-hummus/',
    'https://www.hsph.harvard.edu/nutritionsource/asian-trail-mix/',
    'https://www.hsph.harvard.edu/nutritionsource/cozy-red-lentil-mash/',
    'https://www.hsph.harvard.edu/nutritionsource/crunchy-roasted-chickpeas/',
    'https://www.hsph.harvard.edu/nutritionsource/curried-red-lentil-soup/',
    'https://www.hsph.harvard.edu/nutritionsource/dukkah/',
    'https://www.hsph.harvard.edu/nutritionsource/french-style-lentils/',
    'https://www.hsph.harvard.edu/nutritionsource/garbanzo-beans-with-spinach-and-tomatoes/',
    'https://www.hsph.harvard.edu/nutritionsource/green-beans-with-tofu-and-crushed-peanuts/',
    'https://www.hsph.harvard.edu/nutritionsource/mushroom-tofu-veggie-burger/',
    'https://www.hsph.harvard.edu/nutritionsource/spicy-lemongrass-tofu-with-asian-basil/',
    'https://www.hsph.harvard.edu/nutritionsource/sprouted-lentil-cabbage-celery-slaw/',
    'https://www.hsph.harvard.edu/nutritionsource/thai-eggplant-salad-with-coconut-tofu-strips/',
    'https://www.hsph.harvard.edu/nutritionsource/tomato-and-white-bean-salad/',
    'https://www.hsph.harvard.edu/nutritionsource/whole-wheat-penne-with-pistachio-pesto-and-cherry-tomatoes/',
    'https://www.hsph.harvard.edu/nutritionsource/white-beans-wild-rice-and-mushrooms/',
    'https://www.hsph.harvard.edu/nutritionsource/vegetarian-refried-beans/',
    'https://www.hsph.harvard.edu/nutritionsource/cod-and-littleneck-clams/',
    'https://www.hsph.harvard.edu/nutritionsource/crawfish-touffe/',
    'https://www.hsph.harvard.edu/nutritionsource/crispy-pan-seared-white-fish-walnut-romesco-pea-shoot-salad/',
    'https://www.hsph.harvard.edu/nutritionsource/fish-creole/',
    'https://www.hsph.harvard.edu/nutritionsource/miso-marinated-salmon-grilled-alder-wood/',
    'https://www.hsph.harvard.edu/nutritionsource/pan-roasted-salmon-with-dill-olive-oil-capers/',
    'https://www.hsph.harvard.edu/nutritionsource/pan-roasted-salmon/',
]


urls_validation = [
    'https://www.hsph.harvard.edu/nutritionsource/shaved-fennel-salad-coriander-crusted-hamachi/',
    'https://www.hsph.harvard.edu/nutritionsource/shrimp-and-chicken-gumbo/',
    'https://www.hsph.harvard.edu/nutritionsource/shrimp-red-curry-crispy-sprouted-lentils/',
    'https://www.hsph.harvard.edu/nutritionsource/wild-salmon-salad/',
    'https://www.hsph.harvard.edu/nutritionsource/fish-tacos-with-cilantro-slaw/',
    'https://www.hsph.harvard.edu/nutritionsource/chicken-shrimp-and-fruit-salad/',
    'https://www.hsph.harvard.edu/nutritionsource/lemongrass-marinated-chicken-breast/',
    'https://www.hsph.harvard.edu/nutritionsource/olive-oil-dressing-with-chicken-walnuts-recipe/',
    'https://www.hsph.harvard.edu/nutritionsource/rosemary-and-lemon-grilled-chicken-breast/',
    'https://www.hsph.harvard.edu/nutritionsource/spicy-chicken-kebabs-with-moorish-flavors/',
    'https://www.hsph.harvard.edu/nutritionsource/stir-fried-chicken/',
    'https://www.hsph.harvard.edu/nutritionsource/moroccan-chicken-stew-with-apricots/',
    'https://www.hsph.harvard.edu/nutritionsource/stir-fried-chicken/',
    'https://www.hsph.harvard.edu/nutritionsource/baked-ricotta/',
    'https://www.hsph.harvard.edu/nutritionsource/roasted-tomatoes-stuffed-goat-cheese-garlic-basil/',
    'https://www.hsph.harvard.edu/nutritionsource/fruit-cooler/',
    'https://www.hsph.harvard.edu/nutritionsource/iced-tea-with-lemon-and-mint/'
]

In [11]:
from llama_index import SimpleDirectoryReader
from llama_index.node_parser import SimpleNodeParser
from llama_index.schema import MetadataMode
from llama_index import download_loader



def load_corpus(URL, verbose=False):
    if verbose:
        print(f"Loading files in {URL}")

    BeautifulSoupWebReader = download_loader("BeautifulSoupWebReader")
    loader = BeautifulSoupWebReader()
    docs = loader.load_data(urls=URL)
    if verbose:
        print(f"Loaded {len(docs)} docs")

    parser = SimpleNodeParser.from_defaults()
    nodes = parser.get_nodes_from_documents(docs, show_progress=verbose)

    if verbose:
        print(f"Parsed {len(nodes)} nodes")

    return nodes

In [12]:
train_nodes = load_corpus(urls_train, verbose=True)
val_nodes = load_corpus(urls_validation, verbose=True)

Loading files in ['https://www.hsph.harvard.edu/nutritionsource/kids-healthy-eating-plate/', 'https://www.hsph.harvard.edu/nutritionsource/healthy-eating-plate/', 'https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/', 'https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/whole-grains/', 'https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/protein/', 'https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/vegetables-and-fruits/', 'https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/fats-and-cholesterol/', 'https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/fats-and-cholesterol/types-of-fat/', 'https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/fats-and-cholesterol/cholesterol/', 'https://www.hsph.harvard.edu/nutritionsource/what-should-you-eat/fats-and-cholesterol/dietary-fat-and-disease/', 'https://www.hsph.harvard.edu/nutritionsource/vitamins/', 'https://www.hsph.harvard.edu/nutritionsource/healt

Parsing nodes:   0%|          | 0/168 [00:00<?, ?it/s]

Parsed 543 nodes
Loading files in ['https://www.hsph.harvard.edu/nutritionsource/shaved-fennel-salad-coriander-crusted-hamachi/', 'https://www.hsph.harvard.edu/nutritionsource/shrimp-and-chicken-gumbo/', 'https://www.hsph.harvard.edu/nutritionsource/shrimp-red-curry-crispy-sprouted-lentils/', 'https://www.hsph.harvard.edu/nutritionsource/wild-salmon-salad/', 'https://www.hsph.harvard.edu/nutritionsource/fish-tacos-with-cilantro-slaw/', 'https://www.hsph.harvard.edu/nutritionsource/chicken-shrimp-and-fruit-salad/', 'https://www.hsph.harvard.edu/nutritionsource/lemongrass-marinated-chicken-breast/', 'https://www.hsph.harvard.edu/nutritionsource/olive-oil-dressing-with-chicken-walnuts-recipe/', 'https://www.hsph.harvard.edu/nutritionsource/rosemary-and-lemon-grilled-chicken-breast/', 'https://www.hsph.harvard.edu/nutritionsource/spicy-chicken-kebabs-with-moorish-flavors/', 'https://www.hsph.harvard.edu/nutritionsource/stir-fried-chicken/', 'https://www.hsph.harvard.edu/nutritionsource/mor

Parsing nodes:   0%|          | 0/17 [00:00<?, ?it/s]

Parsed 37 nodes


Now that we've split our source documents into a number of nodes, we can move on to constructing a fine-tuning dataset.

## Constructing a Fine-tuning Dataset

Using the nodes we created above, we can finally start constructing a fine-tuning dataset utilizing OpenAI's `gpt-3.5-turbo`.

We'll start by using LlamaIndex's `generate_qa_embedding_pairs` and storing it in a `EmbeddingQAFinetuneDataset`.

The basic idea here is straightforward enough:

1. We look at a node
2. We generate a question that could be answered by that node

This gives us a number of question/context pairs that we can use to fine-tune our Embeddings model.

In [13]:
from llama_index.finetuning import (
    generate_qa_embedding_pairs,
    EmbeddingQAFinetuneDataset,
)

In [14]:
from llama_index.llms import Gemini


llm = Gemini(api_key='AIzaSyA903hLQGcsRu0IrCVKJeoxV8JPwMzPCXk',model='gemini-pro')


In [15]:
train_dataset = generate_qa_embedding_pairs(train_nodes, llm=llm)
train_dataset.save_json("train_dataset.json")

100%|██████████| 543/543 [38:58<00:00,  4.31s/it]


In [16]:
train_dataset = EmbeddingQAFinetuneDataset.from_json("train_dataset.json")

In [17]:
val_dataset = generate_qa_embedding_pairs(val_nodes, llm=llm)
val_dataset.save_json("val_dataset.json")

100%|██████████| 37/37 [02:16<00:00,  3.69s/it]


In [18]:
val_dataset = EmbeddingQAFinetuneDataset.from_json("val_dataset.json")

## Fine-tuning `BAAI/bge-small-en-v1.5`

Now that we have a dataset, let's grab a `sentence-transformers` Embeddings model!

We'll be using BAAI's [`bge-small-en-v1.5`](https://huggingface.co/BAAI/bge-small-en-v1.5) as a base embeddings model.

It is a well performing embeddings model by itself, but there's a lot of very specific domain terms and vocabulary in our courpus - so lets fine-tune it and see what that can do for us!

In [19]:
!pip install sentence_transformers -q -U

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/132.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━[0m [32m122.9/132.8 kB[0m [31m4.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.8/132.8 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25h

We'll be leveraging LlamaIndex's `SentenceTransformersFinetuneEngine` to make fine-tuning our embeddings model a breeze.

In [20]:
from llama_index.finetuning import SentenceTransformersFinetuneEngine

finetune_engine = SentenceTransformersFinetuneEngine(
    train_dataset, # Dataset to be trained on
    model_id="BAAI/bge-small-en-v1.5", # HuggingFace reference to base embeddings model
    model_output_path="llama_model_v1", # Output directory for fine-tuned embeddings model
    val_dataset=val_dataset, # Dataset to validate on
    epochs=2 # Number of Epochs to train for
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/90.3k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

All that's left to do now is call `.finetune()`!

In [21]:
finetune_engine.finetune()

Epoch:   0%|          | 0/2 [00:00<?, ?it/s]

Iteration:   0%|          | 0/109 [00:00<?, ?it/s]

Iteration:   0%|          | 0/109 [00:00<?, ?it/s]

Now that we've fine-tuned our embeddings model, lets grab the model out of the engine so we can use it later!

In [22]:
finetuned_embedding_model = finetune_engine.get_finetuned_model()

In [23]:
finetuned_embedding_model.to_json()

'{"model_name": "llama_model_v1", "embed_batch_size": 10, "tokenizer_name": "llama_model_v1", "max_length": 512, "pooling": "cls", "normalize": true, "query_instruction": null, "text_instruction": null, "cache_folder": null, "class_name": "HuggingFaceEmbedding"}'

In [24]:
from llama_index.embeddings import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="llama_model_v1")

In [25]:
!zip -r /content/file.zip llama_model_v1

  adding: llama_model_v1/ (stored 0%)
  adding: llama_model_v1/config.json (deflated 48%)
  adding: llama_model_v1/1_Pooling/ (stored 0%)
  adding: llama_model_v1/1_Pooling/config.json (deflated 57%)
  adding: llama_model_v1/tokenizer_config.json (deflated 75%)
  adding: llama_model_v1/model.safetensors (deflated 14%)
  adding: llama_model_v1/vocab.txt (deflated 53%)
  adding: llama_model_v1/sentence_bert_config.json (deflated 4%)
  adding: llama_model_v1/README.md (deflated 56%)
  adding: llama_model_v1/modules.json (deflated 62%)
  adding: llama_model_v1/tokenizer.json (deflated 71%)
  adding: llama_model_v1/special_tokens_map.json (deflated 42%)
  adding: llama_model_v1/2_Normalize/ (stored 0%)
  adding: llama_model_v1/config_sentence_transformers.json (deflated 26%)
  adding: llama_model_v1/eval/ (stored 0%)
  adding: llama_model_v1/eval/Information-Retrieval_evaluation_results.csv (deflated 81%)


In [26]:
from google.colab import files
files.download("/content/file.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Evaluating Embeddings Model

We're going to be evaluating our newly fine-tuned model against the base model using the evaluation pipeline provided by the `sentence_transformers` library.

You can find out all about the `InformationRetrievalEvaluator` [here](https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/evaluation/InformationRetrievalEvaluator.py).

The score we'll be looking at by default is `Mean Average Precision @ K` or `MAP@K`. Though more results can be found in the `/results` directory.

In [27]:
from sentence_transformers.evaluation import InformationRetrievalEvaluator
from sentence_transformers import SentenceTransformer
from pathlib import Path

def evaluate_st(
    dataset,
    model_id,
    name,
):
    corpus = dataset.corpus
    queries = dataset.queries
    relevant_docs = dataset.relevant_docs

    evaluator = InformationRetrievalEvaluator(queries, corpus, relevant_docs, name=name)
    model = SentenceTransformer(model_id)
    output_path = "results/"
    Path(output_path).mkdir(exist_ok=True, parents=True)
    return evaluator(model, output_path=output_path)

In [28]:
evaluate_st(val_dataset, "BAAI/bge-small-en-v1.5", name="bge")

0.5238951452947395

In [29]:
evaluate_st(val_dataset, "llama_model_v1", name="finetuned")

0.6540377724201254

## Advanced Retrieval Method: Sentence Window Retrieval

Fine-tuning our embeddings is a powerful way to ensure we're better at retrieving the correct context - but we can go a step further and improve the way we actually look at context as well.

In this demonstration, we'll be leveraging the idea of a SentenceWindowNodeParser and metadata replacement to take our retrieval to the next level.

At a high level, what we're doing is straightforward:

1. We parse our document into sentence-wise nodes.
2. We find the most relevant sentence-wise nodes to our query.
3. We add additional context based on a "window" around that base sentence-wise node.
4. We use that enhanced context as context for our LLM!


Let's look at this with a visual example:

In [None]:
block_1 = """
I went to Tosche Station. I bought a Power Converter. I live on a planet with 2 Moons. My name is Luke Skywalker.
"""

sentences = block_1.split(".")
print(sentences)

chunks = [block_1[:50], block_1[50:100], block_1[100:]]
print(chunks)

['\nI went to Tosche Station', ' I bought a Power Converter', ' I live on a planet with 2 Moons', ' My name is Luke Skywalker', '\n']
['\nI went to Tosche Station. I bought a Power Conver', 'ter. I live on a planet with 2 Moons. My name is L', 'uke Skywalker.\n']


In [30]:
from llama_index import ServiceContext, set_global_service_context
from llama_index.llms import OpenAI
from llama_index.embeddings import OpenAIEmbedding, HuggingFaceEmbedding
from llama_index.node_parser import SentenceWindowNodeParser, SimpleNodeParser

# window node parser
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=6,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

# simple node parser
simple_node_parser = SimpleNodeParser.from_defaults()

# base Query Engine LLM
llm = Gemini(api_key='AIzaSyA903hLQGcsRu0IrCVKJeoxV8JPwMzPCXk',model='gemini-pro')

# fine-tuned Embeddings model
embed_model = HuggingFaceEmbedding(
    model_name="llama_model_v1"
)

# base Embeddings model
embed_model_base = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en"
)

# fine-tuned ServiceContext
ctx = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model,
)

# base ServiceContext
ctx_base = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model_base
)

config.json:   0%|          | 0.00/684 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Let's create nodes using our `node_parser` and `simple_node_parser` after loading our documents found in the `TRAIN_FILES` directory.

In [32]:
BeautifulSoupWebReader = download_loader("BeautifulSoupWebReader")
loader = BeautifulSoupWebReader()
docs = loader.load_data(urls=urls_train)

In [33]:
nodes = node_parser.get_nodes_from_documents(docs)

In [34]:
base_nodes = simple_node_parser.get_nodes_from_documents(docs)

Now we can create their respecitve `VectorStoreIndex`s for each set of nodes.

In [35]:
from llama_index import VectorStoreIndex

sentence_index = VectorStoreIndex(nodes, service_context=ctx)

In [36]:
sentence_index.storage_context.persist(persist_dir="sentence_index")

In [37]:
!zip -r /content/sentence_index.zip sentence_index

  adding: sentence_index/ (stored 0%)
  adding: sentence_index/graph_store.json (stored 0%)
  adding: sentence_index/image__vector_store.json (deflated 19%)
  adding: sentence_index/default__vector_store.json (deflated 65%)
  adding: sentence_index/index_store.json (deflated 68%)
  adding: sentence_index/docstore.json (deflated 95%)


In [38]:
files.download("/content/sentence_index.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [72]:
!pip install chromadb

Collecting chromadb
  Downloading chromadb-0.4.22-py3-none-any.whl (509 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m509.0/509.0 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
Collecting chroma-hnswlib==0.7.3 (from chromadb)
  Downloading chroma_hnswlib-0.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m14.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting fastapi>=0.95.2 (from chromadb)
  Downloading fastapi-0.109.0-py3-none-any.whl (92 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.0/92.0 kB[0m [31m12.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting uvicorn[standard]>=0.18.3 (from chromadb)
  Downloading uvicorn-0.27.0.post1-py3-none-any.whl (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.7/60.7 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-3.3

In [75]:
from llama_index.vector_stores import ChromaVectorStore
from llama_index.storage.storage_context import StorageContext
import chromadb


In [76]:
db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("quickstart")


vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [78]:
!zip -r /content/chroma_db.zip chroma_db

  adding: chroma_db/ (stored 0%)
  adding: chroma_db/4ca07ca4-6651-4b08-b292-ca14a5103519/ (stored 0%)
  adding: chroma_db/4ca07ca4-6651-4b08-b292-ca14a5103519/data_level0.bin (deflated 7%)
  adding: chroma_db/4ca07ca4-6651-4b08-b292-ca14a5103519/length.bin (deflated 7%)
  adding: chroma_db/4ca07ca4-6651-4b08-b292-ca14a5103519/link_lists.bin (stored 0%)
  adding: chroma_db/4ca07ca4-6651-4b08-b292-ca14a5103519/header.bin (deflated 61%)
  adding: chroma_db/chroma.sqlite3 (deflated 63%)


In [79]:
files.download("/content/chroma_db.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [77]:
base_index = VectorStoreIndex(base_nodes, service_context=ctx,storage_context=storage_context)

In [40]:
base_index.storage_context.persist(persist_dir="base_index")

In [41]:
!zip -r /content/base_index.zip base_index

  adding: base_index/ (stored 0%)
  adding: base_index/graph_store.json (stored 0%)
  adding: base_index/image__vector_store.json (deflated 19%)
  adding: base_index/default__vector_store.json (deflated 58%)
  adding: base_index/index_store.json (deflated 68%)
  adding: base_index/docstore.json (deflated 81%)


In [42]:
files.download("/content/base_index.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In the following step, we'll set up our `MetadataReplacementPostProcessor` which is what will replace our sentences (`original_text`) with our expanded contexts (`window`).

Remember, we're retrieving the `top_k` (3, in this case) sentences - and then converting them to their surrounding context.

In [43]:
from llama_index.indices.postprocessor import MetadataReplacementPostProcessor

query_engine = sentence_index.as_query_engine(
    similarity_top_k=3,
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ],
)

Let's look at a sample response!

In [81]:
window_response = query_engine.query("provide recipe for spinach with recipe name, ingredients, preparation and nutrition facts")

In [82]:
window_response.response

'Recipe Name: Fresh Spinach with Sesame Seeds\n\nIngredients:\n- 1 pound baby spinach\n- 1 clove garlic, minced\n- 1 tablespoon canola oil or peanut oil\n- 2 teaspoons sesame oil\n- 1 tablespoon toasted sesame seeds\n\nPreparation:\n1. Heat the canola or peanut oil in a wok or a large sauté pan over medium heat, and sauté the garlic for 20 seconds; do not let it get brown.\n2. Add spinach and toss lightly with tongs so that all pieces cook evenly.\n3. When spinach is lightly wilted, remove from heat, drizzle with sesame oil, and toss. Add sesame seeds and toss again. Serve hot, warm, or at room temperature.\n\nNutrition facts:\nPer 1/3 of recipe\n- 121 calories\n- 5 grams of protein\n- 7 grams of carbohydrates\n- 4 grams of fiber\n- 120 milligrams of sodium\n- 6 grams of fat\n  - 1 grams of saturated fat\n  - 3 grams of monounsaturated fat\n  - 2 grams of polyunsaturated fat'

In [83]:
display(Markdown(f"<b>{window_response}</b>"))

<b>Recipe Name: Fresh Spinach with Sesame Seeds

Ingredients:
- 1 pound baby spinach
- 1 clove garlic, minced
- 1 tablespoon canola oil or peanut oil
- 2 teaspoons sesame oil
- 1 tablespoon toasted sesame seeds

Preparation:
1. Heat the canola or peanut oil in a wok or a large sauté pan over medium heat, and sauté the garlic for 20 seconds; do not let it get brown.
2. Add spinach and toss lightly with tongs so that all pieces cook evenly.
3. When spinach is lightly wilted, remove from heat, drizzle with sesame oil, and toss. Add sesame seeds and toss again. Serve hot, warm, or at room temperature.

Nutrition facts:
Per 1/3 of recipe
- 121 calories
- 5 grams of protein
- 7 grams of carbohydrates
- 4 grams of fiber
- 120 milligrams of sodium
- 6 grams of fat
  - 1 grams of saturated fat
  - 3 grams of monounsaturated fat
  - 2 grams of polyunsaturated fat</b>

We can also look at the visual representation of what happened, with our original sentence - and then our expanded context window.



In [46]:
window = window_response.source_nodes[0].node.metadata["window"]
sentence = window_response.source_nodes[0].node.metadata["original_text"]

print(f"Window: {window}")
print("------------------")
print(f"Original Sentence: {sentence}")

Window: 














Fresh Spinach with Sesame Seeds | The Nutrition Source | Harvard T.H.  Chan School of Public Health
























































 


























Menu
Close Menu

Skip to content



Information For:

Prospective Students
Current Students
Alumni
Faculty & Staff
Friends & Supporters
 





Search for:


 






Harvard T.H.  Chan School of Public Health



Email
People
Departments
Calendar
Careers
my.harvard
Giving
 





About
Faculty & Research
Admissions & Aid
Academics
Executive/Continuing Ed
News
 






						The Nutrition Source					



Home > The Nutrition Source > Fresh Spinach with Sesame Seeds				



The Nutrition Source
Menu





Search for:


 




Home
Nutrition News
What Should I Eat?

 Healthy Eating Plate & Pyramid

Healthy Eating Plate Translations
Kid’s Healthy Eating Plate


Whole Grains
Protein
Vegetables and Fruits
Fats and Cholesterol

Types of Fat
Cholesterol
Dietary Fat and Disease


Vitamins and Minera

Let's compare to the same query using the simple nodes.

In [69]:
query_engine = base_index.as_query_engine(similarity_top_k=2)
vector_response = query_engine.query("provide recipe for spinach with recipe name, ingredients, preparation and nutrition facts")

In [70]:
from IPython.display import Markdown, display


In [71]:
display(Markdown(f"<b>{vector_response}</b>"))

<b>Recipe Name: Fresh Spinach with Sesame Seeds

Ingredients:
- 1 pound baby spinach
- 1 clove garlic, minced
- 1 tablespoon canola oil or peanut oil
- 2 teaspoons sesame oil
- 1 tablespoon toasted sesame seeds

Preparation:
1. Heat the canola or peanut oil in a wok or a large sauté pan over medium heat, and sauté the garlic for 20 seconds; do not let it get brown.
2. Add spinach and toss lightly with tongs so that all pieces cook evenly.
3. When spinach is lightly wilted, remove from heat, drizzle with sesame oil, and toss. Add sesame seeds and toss again. Serve hot, warm, or at room temperature.

Nutrition facts:
Per 1/3 of recipe
- 121 calories
- 5 grams of protein
- 7 grams of carbohydrates
- 4 grams of fiber
- 120 milligrams of sodium
- 6 grams of fat
  - 1 grams of saturated fat
  - 3 grams of monounsaturated fat
  - 2 grams of polyunsaturated fat</b>

In [50]:
vector_response.response

'**Fresh Spinach with Sesame Seeds**\n\n**Ingredients:**\n\n* 1 pound baby spinach\n* 1 clove garlic, minced\n* 1 tablespoon canola oil or peanut oil\n* 2 teaspoons sesame oil\n* 1 tablespoon toasted sesame seeds\n\n**Preparation:**\n\n1. Heat the canola or peanut oil in a wok or a large sauté pan over medium heat, and sauté the garlic for 20 seconds; do not let it get brown.\n2. Add spinach and toss lightly with tongs so that all pieces cook evenly.\n3. When spinach is lightly wilted, remove from heat, drizzle with sesame oil, and toss. Add sesame seeds and toss again. Serve hot, warm, or at room temperature.\n\n**Nutrition facts:**\n\nPer 1/3 of recipe:\n\n* 121 calories\n* 5 grams of protein\n* 7 grams of carbohydrates\n* 4 grams of fiber\n* 120 milligrams of sodium\n* 6 grams of fat\n    * 1 grams of saturated fat\n    * 3 grams of monounsaturated fat\n    * 2 grams of polyunsaturated fat'

## Evaluating our Pipeline

We'll be leveraging LlamaIndex's evaluation tools to evaluate our pipeline today.

We'll be relying on the [`DatasetGenerator`](https://github.com/run-llama/llama_index/blob/main/llama_index/evaluation/dataset_generation.py) to create our `QueryResponseDataset` leveraging `GPT-4`.

The dataset generated will be similar to before - which is a Question/Context dataset.

> NOTE: GPT-4 powered evaluation can be expensive and fairly time-consuming. Ensure you've scoped out cost before proceeding with evaluation.

In [None]:
import random
from llama_index.evaluation import (
    DatasetGenerator,
    QueryResponseDataset,
)

# the number of nodes to evaluate
num_nodes_eval = 10

# selecting a random sample of nodes
sample_eval_nodes = random.sample(base_nodes, num_nodes_eval)

# setting up our GPT-4 powered evaluation context
eval_service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo"))

# creating our dataset generator
dataset_generator = DatasetGenerator(
    sample_eval_nodes,
    service_context=eval_service_context,
    show_progress=True,
    num_questions_per_chunk=2,
)

  dataset_generator = DatasetGenerator(


Now we can simply fire off our `dataset_generator` and wait!

In [None]:
eval_dataset = await dataset_generator.agenerate_dataset_from_nodes()


  0%|          | 0/10 [00:00<?, ?it/s][A
 10%|█         | 1/10 [00:03<00:28,  3.19s/it][A
 20%|██        | 2/10 [00:03<00:12,  1.52s/it][A
 30%|███       | 3/10 [00:03<00:06,  1.09it/s][A
 40%|████      | 4/10 [00:04<00:04,  1.42it/s][A
 60%|██████    | 6/10 [00:04<00:02,  1.82it/s][A
 70%|███████   | 7/10 [00:05<00:01,  1.77it/s][A
 80%|████████  | 8/10 [00:05<00:00,  2.23it/s][A
100%|██████████| 10/10 [00:05<00:00,  1.70it/s]

  0%|          | 0/2 [00:00<?, ?it/s][A
 50%|█████     | 1/2 [00:02<00:02,  2.33s/it][A
100%|██████████| 2/2 [00:02<00:00,  1.37s/it]

  0%|          | 0/2 [00:00<?, ?it/s][A
 50%|█████     | 1/2 [00:02<00:02,  2.19s/it][A
100%|██████████| 2/2 [00:04<00:00,  2.14s/it]

  0%|          | 0/2 [00:00<?, ?it/s][A
 50%|█████     | 1/2 [00:02<00:02,  2.44s/it][A
100%|██████████| 2/2 [00:03<00:00,  1.84s/it]

  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:08<00:00,  4.25s/it]

  0%|          | 0/2 [00:00<?, ?it/s][A
 50%|█████     | 1

In [None]:
eval_dataset.save_json("llama_eval_qr_dataset.json")

In [None]:
eval_dataset = QueryResponseDataset.from_json("llama_eval_qr_dataset.json")

  return cls(**data)


We'll be using the following standard evaluation metrics provided by LlamaIndex.

- CorrectnessEvaluator - [Code](https://github.com/run-llama/llama_index/blob/main/llama_index/evaluation/correctness.py)
- SemanticSimilarityEvaluator - [Code](https://github.com/run-llama/llama_index/blob/main/llama_index/evaluation/semantic_similarity.py)
- RelevancyEvaluator - [Code](https://github.com/run-llama/llama_index/blob/main/llama_index/evaluation/relevancy.py)
- FaithfulnessEvaluator - [Code](https://github.com/run-llama/llama_index/blob/main/llama_index/evaluation/faithfulness.py)

In [None]:
from llama_index.evaluation import (
    CorrectnessEvaluator,
    SemanticSimilarityEvaluator,
    RelevancyEvaluator,
    FaithfulnessEvaluator
)

evaluator_c = CorrectnessEvaluator(service_context=eval_service_context)
evaluator_s = SemanticSimilarityEvaluator(service_context=eval_service_context)
evaluator_r = RelevancyEvaluator(service_context=eval_service_context)
evaluator_f = FaithfulnessEvaluator(service_context=eval_service_context)

Next, we'll set up additional evaluation tools, these tools will mostly be used to make evaluating and collecting our evaluations a bit simpler. Thanks, LlamaIndex!

In [None]:
from llama_index.evaluation.eval_utils import get_responses, get_results_df
from llama_index.evaluation import BatchEvalRunner

max_samples = 15

eval_qs = eval_dataset.questions
ref_response_strs = [r for (_, r) in eval_dataset.qr_pairs]

Next up, we'll set up `QueryEngine`s for our two pipelines we wish to evaluate and let them predict!

First up is our SentenceWindow-MetaDataReplacement pipeline powered by fine-tuned embeddings.

In [None]:
query_engine = sentence_index.as_query_engine(
    similarity_top_k=3,
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ],
)
pred_responses_finetuned_embeds = get_responses(
    eval_qs[:max_samples], query_engine, show_progress=True
)


  0%|          | 0/15 [00:00<?, ?it/s][A
  7%|▋         | 1/15 [00:03<00:53,  3.85s/it][A
 13%|█▎        | 2/15 [00:04<00:24,  1.85s/it][A
 27%|██▋       | 4/15 [00:04<00:08,  1.23it/s][A
 33%|███▎      | 5/15 [00:05<00:07,  1.37it/s][A
 40%|████      | 6/15 [00:06<00:07,  1.14it/s][A
 47%|████▋     | 7/15 [00:09<00:12,  1.55s/it][A
 53%|█████▎    | 8/15 [00:10<00:10,  1.51s/it][A
 60%|██████    | 9/15 [00:11<00:07,  1.27s/it][A
 67%|██████▋   | 10/15 [00:11<00:04,  1.08it/s][A
 73%|███████▎  | 11/15 [00:12<00:03,  1.16it/s][A
 80%|████████  | 12/15 [00:12<00:02,  1.41it/s][A
 87%|████████▋ | 13/15 [00:13<00:01,  1.30it/s][A
 93%|█████████▎| 14/15 [00:21<00:02,  2.94s/it][A
100%|██████████| 15/15 [00:23<00:00,  1.60s/it]


Next is our Simple Retrieval Base Embeddings pipeline.

In [None]:
base_index_base_embeddings = VectorStoreIndex(base_nodes, service_context=ctx_base)
base_embeddings_base_query_engine = base_index_base_embeddings.as_query_engine(
  similarity_top_k=3
)
base_pred_responses_base_embedings = get_responses(
    eval_qs[:max_samples], base_embeddings_base_query_engine, show_progress=True
)


 50%|█████     | 1/2 [05:08<05:08, 308.61s/it]

  7%|▋         | 1/15 [00:02<00:34,  2.50s/it][A
 13%|█▎        | 2/15 [00:02<00:15,  1.19s/it][A
 27%|██▋       | 4/15 [00:03<00:07,  1.41it/s][A
 47%|████▋     | 7/15 [00:04<00:03,  2.14it/s][A
 53%|█████▎    | 8/15 [00:06<00:04,  1.41it/s][A
 60%|██████    | 9/15 [00:07<00:05,  1.20it/s][A
 67%|██████▋   | 10/15 [00:07<00:03,  1.52it/s][A
 73%|███████▎  | 11/15 [00:08<00:03,  1.20it/s][A
 80%|████████  | 12/15 [00:09<00:02,  1.38it/s][A
 87%|████████▋ | 13/15 [00:10<00:01,  1.22it/s][A
 93%|█████████▎| 14/15 [00:11<00:00,  1.11it/s][A
100%|██████████| 15/15 [00:18<00:00,  1.21s/it]


In [None]:
import numpy as np

pred_response_strs_finetuned_embeds = [str(p) for p in pred_responses_finetuned_embeds]
base_pred_response_strs_base_embeds = [str(p) for p in base_pred_responses_base_embedings]

We'll create our evaluator dict, which will help create the appropriate `pd.DataFrame` in the final step - and set up our `BatchEvalRunner` which will be used to evaluate our pipelines responses against using GPT-4!

In [None]:
evaluator_dict = {
    "correctness": evaluator_c,
    "faithfulness": evaluator_f,
    "relevancy": evaluator_r,
    "semantic_similarity": evaluator_s,
}

batch_runner = BatchEvalRunner(evaluator_dict, workers=2, show_progress=True)

In [None]:
base_eval_results_base_embeddings = await batch_runner.aevaluate_responses(
    queries=eval_qs[:max_samples],
    responses=base_pred_responses_base_embedings[:max_samples],
    reference=ref_response_strs[:max_samples],
)

100%|██████████| 60/60 [00:47<00:00,  1.27it/s]


In [None]:
eval_results_finetuned_embeddings = await batch_runner.aevaluate_responses(
    queries=eval_qs[:max_samples],
    responses=pred_responses_finetuned_embeds[:max_samples],
    reference=ref_response_strs[:max_samples],
)

100%|██████████| 60/60 [01:02<00:00,  1.03s/it]


Finally we can look at our results, which I'll let speak for themselves!

In [None]:
results_df = get_results_df(
    [
        base_eval_results_base_embeddings,
        eval_results_finetuned_embeddings],
    ["Base Retriever w Base Embeddings", "Sentence Window Retriever w FT Embeddings"],
    ["correctness", "relevancy", "faithfulness", "semantic_similarity"],
)

In [None]:
display(results_df.sort_values(by=['semantic_similarity'], ascending=False))

Unnamed: 0,names,correctness,relevancy,faithfulness,semantic_similarity
1,Sentence Window Retriever w FT Embeddings,4.133333,0.933333,0.666667,0.973979
0,Base Retriever w Base Embeddings,3.9,0.733333,0.266667,0.963818
