In [1]:
# import some necessary libraries 
from llama_index.llms import OpenAI
from llama_index.query_engine import CitationQueryEngine
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    ServiceContext,
)
import llama_index
from llama_index.vector_stores import MilvusVectorStore
from milvus import default_server


from dotenv import load_dotenv
import os
load_dotenv()
open_api_key = os.getenv("OPENAI_API_KEY")

#### Scraping some test data

we scrape some data from Wikipedia. Actually, we are scraping the same data as we did for building a multi-document query engine. The below code pings Wikipedia’s API for the pages mentioned in the wiki_titles list. It saves the result into a text file locally.

In [2]:
wiki_titles = ["Toronto", "Seattle", "San Francisco", "Chicago", "Boston", "Washington, D.C.", "Cambridge, Massachusetts", "Houston"]
from pathlib import Path


import requests
for title in wiki_titles:
    response = requests.get(
        'https://en.wikipedia.org/w/api.php',
        params={
            'action': 'query',
            'format': 'json',
            'titles': title,
            'prop': 'extracts',
            'explaintext': True,
        }
    ).json()
    page = next(iter(response['query']['pages'].values()))
    wiki_text = page['extract']


    data_path = Path('data-wiki')
    if not data_path.exists():
        Path.mkdir(data_path)


    with open(data_path / f"{title}.txt", 'w', encoding = 'utf-8') as fp:
        fp.write(wiki_text)

#### Setting up your vector store in LlamaIndex

we use Milvus Lite to run it directly in our notebook. Then we use the MilvusVectorStore module from LlamaIndex to connect to Milvus as our vector store.

In [13]:
default_server.start()
vector_store = llama_index.vector_stores.MilvusVectorStore(
    collection_name="citations",
    host="127.0.0.1",
    port=default_server.listen_port,
    dim = 1536,
    overwrite=True
)
#(dim=1536, overwrite=True)

In [None]:
# https://docs.llamaindex.ai/en/stable/examples/vector_stores/MilvusIndexDemo.html

In [12]:
#default_server.stop()

In [14]:
vector_store

<llama_index.vector_stores.milvus.MilvusVectorStore at 0x1e8f2502410>

create the contexts for our index. The service context tells the index and retriever what services to use. In this case, it’s passing in GPT 3.5 Turbo as the desired LLM. We also create a storage context so the index knows where to store and query for data. In this case, we pass the Milvus vector store object we created above.

In [15]:
service_context = ServiceContext.from_defaults(
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0)
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [16]:
service_context

ServiceContext(llm_predictor=LLMPredictor(system_prompt=None, query_wrapper_prompt=None, pydantic_program_mode=<PydanticProgramMode.DEFAULT: 'default'>), prompt_helper=PromptHelper(context_window=4096, num_output=256, chunk_overlap_ratio=0.1, chunk_size_limit=None, separator=' '), embed_model=OpenAIEmbedding(model_name='text-embedding-ada-002', embed_batch_size=100, callback_manager=<llama_index.callbacks.base.CallbackManager object at 0x000001E8FC763B10>, additional_kwargs={}, api_key='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', api_base='https://api.openai.com/v1', api_version='', max_retries=10, timeout=60.0, default_headers=None, reuse_client=True), transformations=[SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackManager object at 0x000001E8FC763B10>, id_func=<function default_id_func at 0x000001E8F1C3B1A0>, chunk_size=1024, chunk_overlap=200, separator=' ', paragraph_separator='\n\n\n', secondary_ch

In [17]:
storage_context

StorageContext(docstore=<llama_index.storage.docstore.simple_docstore.SimpleDocumentStore object at 0x000001E8F6D45910>, index_store=<llama_index.storage.index_store.simple_index_store.SimpleIndexStore object at 0x000001E8F8011810>, vector_stores={'default': <llama_index.vector_stores.milvus.MilvusVectorStore object at 0x000001E8F2502410>, 'image': <llama_index.vector_stores.simple.SimpleVectorStore object at 0x000001E8F6DFE350>}, graph_store=<llama_index.graph_stores.simple.SimpleGraphStore object at 0x000001E8F6E2BD50>)

* With all of this set up, we can load the data that we scraped earlier and create a vector store index from those documents.

In [18]:
documents = SimpleDirectoryReader("./data-wiki/").load_data()


In [19]:
index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)

#### Querying with citations

we can create a Citation Query Engine. We give it the vector index we built earlier and parameters about how many results to return, and the chunk size of the citation. That’s all there is to set up the citation, and the next step is to query the engine.

In [20]:
query_engine = CitationQueryEngine.from_args(
    index,
    similarity_top_k=3,
    # here we can control how granular citation sources are, the default is 512
    citation_chunk_size=512,
)
response = query_engine.query("Does Seattle or Houston have a bigger airport?")
print(response)
for source in response.source_nodes:
    print(source.node.get_text())

Seattle has a bigger airport than Houston [1].
Source 1:
A secondary passenger airport, Paine Field, opened in 2019 and is located in Everett, 25 miles (40 km) north of Seattle. It is predominantly used by Boeing and their large assembly plant located nearby.The main mode of transportation, however, is the street system, which is laid out in a cardinal directions grid pattern, except in the central business district where early city leaders Arthur Denny and Carson Boren insisted on orienting the plats relative to the shoreline rather than to true North. Only two roads, Interstate 5 and State Route 99 (both limited-access highways) run uninterrupted through the city from north to south. From 1953 to 2019, State Route 99 ran through downtown Seattle on the Alaskan Way Viaduct, an elevated freeway on the waterfront. However, due to damage sustained during the 2001 Nisqually earthquake the viaduct was replaced by a tunnel. The 2-mile (3.2 km) Alaskan Way Viaduct replacement tunnel was orig

In this tutorial, we learned how to do retrieval augmented generation with citations (attributions). Retrieval augmented generation is a type of LLM application many enterprises want to build. In addition to retrieving and formatting your information in a digestible format, we also want to know where the information comes from.

We can build this type of RAG application using LlamaIndex as our data router and Milvus as our vector store. We started by scraping some data from Wikipedia to show how this works. Then, we spin up an instance of Milvus and make a vector store instance in LlamaIndex. From there, we put our data into Milvus and used LlamaIndex to keep track of the attributions and citations using a citation query engine. We can then query that query engine and get responses, including where in the text and what text we are drawing our answer from.