# Demo

## Instantiate database

You can connect to your database like this.

In [1]:
import distyll
db = distyll.DBConnection()

embedded weaviate is already listening on port 6666


## YouTube video example

### Add data

Let's add data from a YouTube video

In [2]:
youtube_url = "https://youtu.be/sNw40lEhaIQ"

In [3]:
db.add_from_youtube(youtube_url)

52

### Query data

Now we can query it

In [4]:
response = db.query_summary(
    prompt="In bullet points, tell me what this material describes",
    object_path=youtube_url
)

In [5]:
print(response.generated_text)

- The material is part of a series discussing contextual representations, specifically focusing on the GPT transformer-based architecture.
- It covers topics such as autoregressive loss function, token representation, hidden representation, language modeling, transformer architecture, and masking.
- The concept of self-attention and its role in allowing the model to look back at previous positions in a sequence when making predictions is explained.
- The training process for a GPT-style model using "teacher forcing" is discussed, highlighting its significance.
- The material briefly touches on the process of sequence generation and different strategies for sampling tokens.
- It mentions that there are different versions of GPT and alternative models available.
- The information provided is based on the text and may vary or become outdated over time.


In [6]:
response = db.query_chunks(
    prompt="In bullet points, tell me what this material describes",
    search_query="open source models",
    object_path=youtube_url,
)

In [7]:
print(response.generated_text)

- The material describes contextual representations and the GPT (transformer-based architecture).
- It explains the process of training a GPT-style model with teacher forcing.
- It discusses how language modeling predicts scores over the entire vocabulary and makes a choice about which token to use.
- It mentions the generation process and the decision rule applied to the model's representations.
- It provides information about the structure and parameters of different GPT models, including GPT-1, GPT-2, and GPT-3.
- It mentions alternative open-source models, including the Bloom model.


## Arxiv example

In [8]:
# pdf_url = 'https://arxiv.org/pdf/1706.03762'
pdf_url = 'https://arxiv.org/pdf/2305.15334'
db.add_pdf(pdf_url)

243

In [9]:
response = db.query_summary(
    prompt="In bullet points, tell me what this material describes",
    object_path=pdf_url
)
print(response.generated_text)

- The development of a model called Gorilla
- Comparison of Gorilla's performance to GPT-4
- Gorilla's adaptability to document changes and ability to mitigate hallucination issues
- Introduction of the APIBench dataset for evaluating LLMs' accuracy in using APIs
- Integration of a retrieval system with Gorilla to improve LLMs' accuracy in using tools and updated documentation
- Focus on enhancing the effectiveness and adaptability of LLMs in using APIs
- Training with constraints
- Different retrieval techniques
- Impact of using optimal retrievers
- Suggestion of using a better retriever for finetuning, but zero-shot finetuning as an alternative when a good retriever is not available
- Mention of program synthesis and neural networks in program synthesis
- Application of language models in various tasks.


In [10]:
response = db.query_chunks(
    prompt="how does gorilla work?",
    search_query="gorilla algorithm",
    object_path=pdf_url,
)
print(response.generated_text)

Gorilla is a finetuned LLaMA-based model that is designed to improve the performance of large language models (LLMs) such as GPT-4 when it comes to generating accurate input arguments and avoiding hallucination errors in API calls. It surpasses the performance of GPT-4 in writing API calls and can adapt to test-time document changes, allowing for flexible user updates or version changes. Gorilla integrates a retrieval system, which helps LLMs use tools more accurately and keep up with frequently updated documentation, enhancing the reliability and applicability of their outputs. Gorilla's code, model, data, and demo are available at https://gorilla.cs.berkeley.edu.

The process of how Gorilla works involves collecting an API dataset and generating instruction-answer pairs. It incorporates an information-retriever into the training and inference pipelines, enabling the model to adapt to changes in API documentation. Gorilla is a retrieve-aware finetuned LLaMA-7B model specifically desig

In [11]:
response = db.query_chunks(
    prompt="how does gorilla reduce hallucination in LLMs?",
    search_query="gorilla algorithm",
    object_path=pdf_url,
)
print(response.generated_text)

Gorilla reduces hallucination in LLMs by using a retrieval system and a finetuned LLaMA-based model. The retrieval system helps Gorilla adapt to test-time document changes, allowing for flexible user updates or version changes. This reduces the tendency of LLMs to generate inaccurate input arguments and hallucinate the wrong usage of an API call. Additionally, Gorilla's retrieval-aware training enables the model to understand and reason about constraints, further mitigating the issue of hallucination. Overall, Gorilla improves the accuracy of API functionality while reducing hallucination errors in LLMs.


## Notes

Optionally, you can also specify a particular Weaviate instance

In [12]:
# import weaviate
# import os
# client = weaviate.Client(
#     url=os.environ['JP_WCS_URL'],
#     auth_client_secret=weaviate.AuthApiKey(os.environ['JP_WCS_ADMIN_KEY']),
#     additional_headers={
#         'X-OpenAI-Api-Key': os.environ['OPENAI_APIKEY']
#     }
# )
#
# import distyll
# db = distyll.DBConnection(client=client)