# Demo

## Instantiate database

You can connect to your database like this.

In [1]:
import distyll
import os

# # ===== OPTION 1: Default Weaviate (Embedded) =====
import distyll
db = distyll.DBConnection()

# # ===== OPTION 2: Custom Weaviate instance =====
# import weaviate
#
# client = weaviate.Client(
#     url=os.environ['JP_WCS_URL'],
#     auth_client_secret=weaviate.AuthApiKey(os.environ['JP_WCS_ADMIN_KEY']),
# )
# db = distyll.DBConnection(client=client)

# Set OpenAI API key
db.set_apikey(openai_key=os.environ["OPENAI_APIKEY"])

Started /Users/jphwang/.cache/weaviate-embedded: process ID 63681


{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2023-09-13T14:50:49+01:00"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2023-09-13T14:50:49+01:00"}
{"action":"hnsw_vector_cache_prefill","count":3000,"index_id":"chunk_36d7fKskIRov","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2023-09-13T14:50:49+01:00","took":1233542}
{"action":"hnsw_vector_cache_prefill","count":3000,"index_id":"datachunk_KXbH2V85PoHM","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2023-09-13T14:50:49+01:00","took":1392417}
{"action":"hnsw_vector_cache_prefill","count":3000,"index_id":"datasource_Fwu3NBT1aiZr","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2023-09-13

## Arxiv example

In [2]:
for pdf_url in [
    # 'https://arxiv.org/pdf/1706.03762',  # Attention is all you need
    'https://arxiv.org/pdf/2305.15334',  # Gorilla
    # "https://arxiv.org/pdf/2201.11903",  # Chain of thought prompting paper
]:
    db.add_arxiv(pdf_url)

In [3]:
pdf_url = 'https://arxiv.org/pdf/2305.15334'

In [4]:
response = db.query_summary(
    prompt="In bullet points, tell me what this material describes",
    object_path=pdf_url
)
print(response.generated_text)

- The material describes a large language model called Gorilla that effectively uses tools through API calls.
- It introduces a comprehensive dataset called APIBench and evaluates Gorilla's performance in using tools and adapting to document changes.
- Gorilla outperforms other models in API functionality accuracy and reduces hallucination errors.
- The material addresses the challenges of integrating tools into language models and proposes a methodology that includes self-instruct fine-tuning and retrieval.
- It covers various topics related to language models and their interaction with APIs, including the performance of retrieval methods, benefits of fine-tuning with retrievers, issue of hallucination errors, challenge of API documentation changes, and the LLM's ability to understand constraints in API calls.
- The material discusses the limitations and social impacts of the research, including potential bias in ML APIs.
- In summary, the material provides valuable insights into the 

In [5]:
prompt = "how does gorilla work? explain in simple language"

response = db.query_chunks(
    prompt=prompt,
    search_query="gorilla algorithm",
    object_path=pdf_url,
)
print(response.generated_text)

Gorilla is a large language model (LLM) that is connected with massive APIs (Application Programming Interfaces). It has been developed to improve the capabilities of LLMs in tasks such as natural dialogue, mathematical reasoning, and program synthesis.

Gorilla's primary focus is on generating reliable API calls to machine learning models without making up false information. It can adapt to changes in API usage during testing and can satisfy constraints while selecting APIs.

The model used in Gorilla has been fine-tuned to surpass the performance of other LLMs, including GPT-4, in writing API calls. When combined with a document retriever, Gorilla can also adapt to changes in documents, allowing for flexible user updates or version changes.

One of the challenges in LLMs is generating accurate input arguments and avoiding incorrect usage of API calls. Gorilla addresses this challenge and improves accuracy while reducing false information.

Gorilla has been compared to other state-of-

## Notes

Optionally, you can also specify a particular Weaviate instance

In [6]:
# import weaviate
# import os
# client = weaviate.Client(
#     url=os.environ['JP_WCS_URL'],
#     auth_client_secret=weaviate.AuthApiKey(os.environ['JP_WCS_ADMIN_KEY']),
#     additional_headers={
#         'X-OpenAI-Api-Key': os.environ['OPENAI_APIKEY']
#     }
# )
#
# import distyll
# db = distyll.DBConnection(client=client)