# Part 1: Querying

If you want to use the OpenAPI API during this tutorial, make a file `secret.py` with `openai_api_key = "YOURKEYHERE"`

## Setup

### Imports

In [None]:
import pprint
import os
import sys
from pathlib import Path

# logging for lazy people :)
from loguru import logger

# we're not importing specific methods or classes so it's clear when we actually call llama_index!
import llama_index

### Logging

In [None]:
logger.remove()
logger.add(sys.stdout, format="{time} - {level} - {message}", level="DEBUG")
logger.add("tutorial_part_1.log", level="DEBUG")

2

### Tell the notebook where files are stored

In [None]:
DATA_PATH = Path("data/pydata/schedule.json")
INDEX_PATH = Path("indices/pydata_schedule_index/")

### Tell `llama-index` to use a local embeddings model for retrieval

More information about this embeddings model: https://www.sbert.net/docs/pretrained_models.html#model-overview

Note that all-minilm-l6-v2 has a maximum size of 256 tokens.

In [None]:
embed_model = "local:sentence-transformers/all-minilm-l6-v2"
llm = None

In [None]:
service_context = llama_index.ServiceContext.from_defaults(
  embed_model=embed_model, chunk_size=256, llm=llm
)

LLM is explicitly disabled. Using MockLLM.


### Load a vector index with the PyData Amsterdam 2023 schedule

Load the index from file.

In [None]:
if not os.path.exists(INDEX_PATH):
    logger.error("Index file for part 1 does not exist on disk. :(")
else:
    try:                                                                             
        storage_context = llama_index.StorageContext.from_defaults(persist_dir=INDEX_PATH)
        index = llama_index.load_index_from_storage(storage_context, service_context=service_context)
        logger.info("Loaded index from local storage")                               
    except Exception as e:                                                           
        logger.error(e) 

2023-09-04T10:54:01.995195+0200 - INFO - Loaded index from local storage


## Exercises

In [None]:
# Create a retriever from the vector index `index`
retriever = index.as_retriever()

In [None]:
# Retrieve chunks that mention llama-index
results = retriever.retrieve("llama_index")

In [None]:
for result in results:
    print(result)
    print()

node=TextNode(id_='e5163ca0-dbf3-455c-9577-905893dc33a1', embedding=None, metadata={'title': 'Building a personal search engine with llama-index', 'speakers': 'Judith van Stegeren, Yorick van Pelt'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='f71f1eae-c12e-5b4f-8902-b8d0b8a38d72', node_type=None, metadata={'title': 'Building a personal search engine with llama-index', 'speakers': 'Judith van Stegeren, Yorick van Pelt'}, hash='ad9a94ef0dbb13afaa9a4a03c12f20c93873fc7fab5cf57bc88176de147749a0'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='db5d421e-0e1d-4e6f-98b0-a98866ff9d61', node_type=None, metadata={'title': 'Building a personal search engine with llama-index', 'speakers': 'Judith van Stegeren, Yorick van Pelt'}, hash='2d9407d1e16cbae760a7599b871c7d5e22787d217f522a204e2c310b0d68a0cf')}, hash='e279497d370f982bc12a34c7f0eb19e888975d85b14b5a4c3218d733c2c7d55e', text='For the demo ap

In [None]:
# Retrieve chunks related to causal machine learning
results = retriever.retrieve("causal")
for result in results:
    print(result)
    print()

node=TextNode(id_='65b097e3-7cab-42d1-85b6-a850b1a4d33c', embedding=None, metadata={'title': 'The proof of the pudding is in the (way of) eating: quasi-experimental methods of causal inference and their practical pitfalls', 'speakers': 'Jakob Willisch'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='e5be9c09-4c7c-5a3f-b1f8-2c7ecec74cd8', node_type=None, metadata={'title': 'The proof of the pudding is in the (way of) eating: quasi-experimental methods of causal inference and their practical pitfalls', 'speakers': 'Jakob Willisch'}, hash='7ea1436f06df89779e327b88d70a9064aeb40f842a522e502f74727cf3287d1a'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='a3b902b3-1144-44b8-9ae9-0b53e7029138', node_type=None, metadata={'title': 'The proof of the pudding is in the (way of) eating: quasi-experimental methods of causal inference and their practical pitfalls', 'speakers': 'Jakob Willisch'}, hash='bf

In [None]:
# Print the talk title and speaker for results related to causal machine learning
for result in results:
    print(result.node.metadata)
    print()

{'title': 'The proof of the pudding is in the (way of) eating: quasi-experimental methods of causal inference and their practical pitfalls', 'speakers': 'Jakob Willisch'}

{'title': "Causal Inference Libraries: What They Do, What I'd Like Them To Do", 'speakers': 'Kevin Klein'}

{'title': "Causal Inference Libraries: What They Do, What I'd Like Them To Do", 'speakers': 'Kevin Klein'}

{'title': 'The proof of the pudding is in the (way of) eating: quasi-experimental methods of causal inference and their practical pitfalls', 'speakers': 'Jakob Willisch'}

{'title': 'Personalization at Uber scale via causal-driven machine learning', 'speakers': 'Okke van der Wal'}

{'title': 'Staggered Difference-in-Differences in Practice: Causal Insights from the Music Industry', 'speakers': 'Nazli M. Alagoz'}

{'title': 'Staggered Difference-in-Differences in Practice: Causal Insights from the Music Industry', 'speakers': 'Nazli M. Alagoz'}

{'title': 'Personalization at Uber scale via causal-driven ma

In [None]:
# Create a new retriever that retrieves more than 2 results.
retriever = index.as_retriever(similarity_top_k=10)

In [None]:
# Find all talks about causal inference at PyData
results = retriever.retrieve('causal')
talks = set([(r.node.metadata['title'], r.node.metadata['speakers']) for r in results])
pprint.pprint(talks)

{("Causal Inference Libraries: What They Do, What I'd Like Them To Do",
  'Kevin Klein'),
 ('Personalization at Uber scale via causal-driven machine learning',
  'Okke van der Wal'),
 ('Staggered Difference-in-Differences in Practice: Causal Insights from the '
  'Music Industry',
  'Nazli M. Alagoz'),
 ('The proof of the pudding is in the (way of) eating: quasi-experimental '
  'methods of causal inference and their practical pitfalls',
  'Jakob Willisch')}


## Querying the vector index with an external LLM

### Set OpenAI API key (optional)

In [None]:
from secret import openai_api_key

### Use OpenAI's gpt-3.5-turbo for querying

In [None]:
llm = llama_index.llms.OpenAI(model="gpt-3.5-turbo", api_key=openai_api_key)
service_context = llama_index.ServiceContext.from_defaults(
  embed_model=embed_model, chunk_size=256, llm=llm
)

NameError: name 'openai_api_key' is not defined

In [None]:
if not os.path.exists(INDEX_PATH):
    logger.error("Index file for part 1 does not exist on disk. :(")
else:
    try:                                                                             
        storage_context = llama_index.StorageContext.from_defaults(persist_dir=INDEX_PATH)
        index = llama_index.load_index_from_storage(storage_context, service_context=service_context)
        logger.info("Loaded index from local storage")                               
    except Exception as e:                                                           
        logger.error(e) 

2023-08-30T16:38:04.830107+0200 - INFO - Loaded index from local storage


In [None]:
# Create a query engine
query_engine = index.as_query_engine()

In [None]:
# Find talks that might be interesting for startup founders
response = query_engine.query("Which talks are probably interesting for startup founders?")

In [None]:
print(response.response)

Context information is below.
---------------------
title: Kickstart AI sponsored drinks [time & location TBD]

Kickstart AI sponsored drinks [time & location TBD]

Kickstart AI is a foundation powered by a coalition of iconic Dutch brands (Ahold Delhaize, ING, KLM and NS). Their mission is to accelerate AI adoption in the Netherlands, and improve society through the use of AI.

Lorem ipsum dolor

title: LLM Agents 101: How I Gave ChatGPT Access to My To-Do List

We'll showcase these solutions through amusing moments and challenges encountered during the development of my to-do list agent.- A summary of what works well and what still needs improvement

This talk is designed as an introduction to LLM agents.Throughout the presentation, I will aim to maintain a high-level perspective to ensure that even less technical audience members can grasp the concepts.To achieve this, I will share entertaining situations where my agent did something unexpected and how I resolved those issues.Howeve