References:

- [LlamaIndex overview & use cases | LangChain integration](https://www.youtube.com/watch?v=cNMYeW2mpBs) ([Colab notebook](https://colab.research.google.com/drive/19xBNmejiJUhWIy71bWFnlL1H-O-hjTbW?usp=sharing#scrollTo=z7U9dbyLTFOD)) ([GitHUb](https://github.com/jerryjliu/llama_index/tree/main))

In [1]:
import openai
import environ
from llama_index import SimpleDirectoryReader
from llama_index import VectorStoreIndex
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.query_engine import SubQuestionQueryEngine
from llama_hub.wikipedia.base import WikipediaReader
import wikipedia

import nest_asyncio
nest_asyncio.apply()

In [2]:
env = environ.Env()
environ.Env.read_env()
API_KEY = env('OPENAI_API_KEY')
openai.api_key = API_KEY



# Data Connectors (Llama Hub)


see [Llama Hub](https://llamahub.ai/)

In [3]:
list_wiki_pages = wikipedia.search("Migration to Germany")
list_wiki_pages

['Immigration to Germany',
 'British migration to Germany',
 'Migration from Ghana to Germany',
 'Migration Period',
 'Germany',
 'Foreign-born population of the United Kingdom',
 'Romani people in Germany',
 'Germans in the United Kingdom',
 'Islam in Germany',
 'Human migration']

In [4]:
def select_pages(list_wiki_pages: list) -> list:
    """
    TODO: create a function to select a subset of pages are contain information relevant for the topic

    Args:
        list_wike_pages (list): A DataFrame with the list of databases (Title, Code, etc.).

    Returns:
        List of a subset of wikipedia pages that are relevant for the topic.
    """
    subset_list_wiki_pages = list_wiki_pages
    return subset_list_wiki_pages

In [5]:
subset_list_wiki_pages = select_pages(list_wiki_pages)

In [6]:
loader = WikipediaReader()
documents = loader.load_data(pages=list_wiki_pages)

# Basic Query

In [7]:
# Build an index for the Document objects.
index = VectorStoreIndex.from_documents(documents)

In [8]:
# query an index
query_engine = index.as_query_engine()
response = query_engine.query("How many move to Germany each year?")

In [9]:
print(response)

Over 1 million people move to Germany each year since 2013.


# Query Documents

In [10]:
doc_1 = SimpleDirectoryReader(input_files=["docs/KS-09-23-223-EN-N.pdf"]).load_data()

In [11]:
doc_1

[Document(id_='7d97fcb1-e503-48f0-a791-4b0a76e9ab29', embedding=None, metadata={'page_label': 'I', 'file_name': 'KS-09-23-223-EN-N.pdf'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='339e51f0ad48abc021dffab32821312b91d7b3640a4c066958000f52c4226bd9', text='European Migration Net work \nAnnual Report \non Migration and \nAsylum 2022\nStatistical Annex\nCo-produced by Eurostat\nand the European Migration \nNetwork\nJune 2023', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='e8e46b34-7158-48da-abd4-c79453eefbf6', embedding=None, metadata={'page_label': 'II', 'file_name': 'KS-09-23-223-EN-N.pdf'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='fda3071c151b12799eb8646c317874ce5b4122f4c20793cd83e9fbf1a26d7c94', text='', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{conte

In [12]:
doc_1_index = VectorStoreIndex.from_documents(doc_1)

In [13]:
doc_1_engine = doc_1_index.as_query_engine(similarity_top_k=3)

In [14]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=doc_1_engine,
        metadata=ToolMetadata(name='report_2022', description='Annual Report on Migration and Asylum 2022')
    ),
]

In [15]:
# Given a query, this query engine "SubQuestionQueryEngine" will generate a “query plan”
# containing sub-queries against sub-documents before synthesizing the final answer.
s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools)

In [16]:
response = s_engine.query("How many people received residence permits?")

Generated 1 sub questions.
[36;1m[1;3m[report_2022] Q: How many people received residence permits according to the Annual Report on Migration and Asylum 2022?
[0m[36;1m[1;3m[report_2022] A: Based on the Annual Report on Migration and Asylum 2022, the number of people who received residence permits is not explicitly mentioned in the provided context information.
[0m

# Hypothetical document embeddings (HyDE)