# 04. Pinecone Query

Query the index created in notebook 03.

In [22]:
import os
import openai
import pinecone

from tqdm.auto import tqdm
from dotenv import load_dotenv
from IPython.core.display import Markdown

## Configure Environment

In [23]:
load_dotenv()

True

In [24]:
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
OPENAI_CHAT_MODEL = os.getenv('OPENAI_CHAT_MODEL', 'gpt-3.5-turbo')
OPENAI_EMBED_MODEL = os.getenv('OPENAI_EMBED_MODEL', 'text-embedding-ada-002')

PINECONE_API_KEY = os.getenv('PINECONE_API_KEY')
PINECONE_INDEX = os.getenv('PINECONE_INDEX', 'openai-dsm100-2022-oct-transcriptions')
PINECONE_ENV = os.getenv('PINECONE_ENV', 'us-east1-gcp')

In [25]:
# configure openai
openai.api_key = OPENAI_API_KEY

# configure pinecone
pinecone.init(
    api_key=PINECONE_API_KEY,
    environment=PINECONE_ENV
)

## Define Sample Questions

In [26]:
question01 = """
a. AI has a long history with ups and downs, over-optimistic expectations, and the
industrial period. At present, AI is in the era of:

Select ALL statements that apply.

[3 marks]
i. Big data.
ii. Deep learning.
iii. Robotics.
iv. General AI.
"""

In [27]:
question02 = """
f. The more background knowledge an intelligent agent can use – the more intelligent
behaviour it can exhibit. Background knowledge can include:
[3 marks]

Select ALL statements that apply. 

(i). Problem-specific knowledge
(ii). Declarative knowledge
(iii). Procedural knowledge
(iv). A dataset
"""

## Functions

In [28]:
def get_pinecone_index(index_name: str) -> pinecone.index.Index:
    # connect to index
    index = pinecone.Index(index_name)
    
    # view index stats
    print(index.describe_index_stats())  

    return index  

# test the function
index = get_pinecone_index(PINECONE_INDEX)

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 100}},
 'total_vector_count': 100}


In [29]:
def get_text_embeddings(text: str) -> list:
    result = openai.Embedding.create(
            input=[text], engine=OPENAI_EMBED_MODEL)

    return result['data'][0]['embedding']

# test the function
#get_text_embeddings(question01)

In [32]:
def query_pinecone_index(index: pinecone.index.Index,
                         query: str,
                         k: int = 5,
                         include_metadata=True) -> list:
    # get embeddings for query
    query_embedding = get_text_embeddings(query)

    # query index
    return index.query(
        vector=query_embedding,
        top_k=k,
        include_metadata=include_metadata
    )

# test the function
#query_pinecone_index(index, question02)

In [42]:
def create_prompt(index: pinecone.index.Index,
                  query: str, 
                  prompt_limit: int = 3750) -> str:
    prompt = ""
    
    # query pinecone for matching transcripts
    pinecone_results = query_pinecone_index(index, query)

    # get relevant contexts
    contexts = [
        x['metadata']['text'] for x in pinecone_results['matches']
    ]

    # build our prompt with the retrieved contexts included
    prompt_start = (
        "Answer the question based on the context below.\n\n"+
        "Context:\n"
    )
    
    prompt_end = (
        f"\n\nQuestion: {query}\nAnswer:"
    )

    # append contexts until hitting limit
    for i in range(1, len(contexts)):
        if len("\n\n---\n\n".join(contexts[:i])) >= prompt_limit:
            prompt = (
                prompt_start +
                "\n\n---\n\n".join(contexts[:i-1]) +
                prompt_end
            )
            break
        elif i == len(contexts)-1:
            prompt = (
                prompt_start +
                "\n\n---\n\n".join(contexts) +
                prompt_end
            )

    return prompt

# test the function
prompt = create_prompt(index, question02)
display(Markdown(prompt))



Answer the question based on the context below.

Context:
[music]In this topic, we discussed how intelligent agents can find solutions and make decisions.  We can see that uninformed or blind search, informed or heuristic search, and we also discuss that domain-specific knowledge can improve decision-making substantially.  The more agent knows and remembers from the past, the more intelligent behavior it can exhibit.  We also talked about meta-learning or meta-reasoning.  When different search strategies are analyzed and compared, in what situation, what strategy would work best.  If you're interested in this topic, please check the recommended literature.  Of course, there are plenty of resources available in our online library and around. 

---

[music]-In this mini lecture, we will talk about knowledge-based agents.  We have already seen in the previous topics that knowledge makes an agent more intelligent, be it domain-specific knowledge, or knowledge of the past experience, what worked well and what did not.  The more an agent knows and remembers, the better decisions it can make.  Agents that are using knowledge for making decisions are called knowledge-based agents.  This component of a knowledge-based agent is called, unsurprisingly, a knowledge base.  Knowledge has to be encoded in a machine processable way, usually, using knowledge representation languages.  What is knowledge?Generally, there is no clear separation in the means between data, knowledge, and information.  It is often the case even in computer science.  It is, well, stating explicitly what we will consider as such in this module.  Data are facts.  Typically, data are encoded as a table.  For example, if a table stores various chemical element sand their freezing and melting points, then an example of a data item or a fact would be, melting point of helium is -272.  2 celsius.  Typically, knowledge is encoded as a set of rules.  For example, if the temperature is below the melting point, then the chemical substance is in a liquid state.  More advanced knowledge can be represented in the form of an executable model.  For example, a metabolic model captures our knowledge about the metabolic pathways, and molecular mechanisms of a particular organism.  Information is a more generic term that combines data and knowledge.  Problems-specific and domain specific knowledge is called background knowledge.  This can be available to a knowledge-based agent from the very beginning

Question: 
f. The more background knowledge an intelligent agent can use – the more intelligent
behaviour it can exhibit. Background knowledge can include:
[3 marks]

Select ALL statements that apply. 

(i). Problem-specific knowledge
(ii). Declarative knowledge
(iii). Procedural knowledge
(iv). A dataset

Answer:

In [47]:
def ask_question(index: pinecone.index.Index,
                  query: str, 
                  prompt_limit: int = 3750) -> str:

    # create prompt
    prompt = create_prompt(index, query, prompt_limit)

    response = openai.ChatCompletion.create(
        model=OPENAI_CHAT_MODEL,
        messages=[
            { 'role': 'user', 'content': prompt }
        ]
    )

    return response['choices'][0]['message']['content']

# test the function
answer = ask_question(index, question02)
display(Markdown('Question:' + question02))
display(Markdown('Answer: ' + answer))

Question:
f. The more background knowledge an intelligent agent can use – the more intelligent
behaviour it can exhibit. Background knowledge can include:
[3 marks]

Select ALL statements that apply. 

(i). Problem-specific knowledge
(ii). Declarative knowledge
(iii). Procedural knowledge
(iv). A dataset


Answer: (i). Problem-specific knowledge
(ii). Declarative knowledge
(iii). Procedural knowledge

In [49]:
answer = ask_question(index, question01)
display(Markdown('Question:' + question01))
display(Markdown('Answer: ' + answer))

Question:
a. AI has a long history with ups and downs, over-optimistic expectations, and the
industrial period. At present, AI is in the era of:

Select ALL statements that apply.

[3 marks]
i. Big data.
ii. Deep learning.
iii. Robotics.
iv. General AI.


Answer: i. Big data.
ii. Deep learning.

## Question Definitions

In [50]:
questions = {}