# Demo

## Instantiate database

You can connect to your database like this.

In [1]:
import distyll
db = distyll.DBConnection()

embedded weaviate is already listening on port 6666


Optionally, you can also specify a particular Weaviate instance

In [2]:
# import weaviate
# import os
# client = weaviate.Client(
#     url=os.environ['JP_WCS_URL'],
#     auth_client_secret=weaviate.AuthApiKey(os.environ['JP_WCS_ADMIN_KEY']),
#     additional_headers={
#         'X-OpenAI-Api-Key': os.environ['OPENAI_APIKEY']
#     }
# )
#
# import distyll
# db = distyll.DBConnection(client=client)

## YouTube video example

### Add data

Let's add data from a YouTube video

In [3]:
youtube_url = "https://youtu.be/sNw40lEhaIQ"

In [4]:
db.add_from_youtube(youtube_url)

52

### Query data

Now we can query it

In [5]:
import query

In [6]:
response = query.generate_on_summary(db=db, prompt="In bullet points, tell me what this material describes", object_path=youtube_url)

In [7]:
print(response.generated_text)

- The material is part of a series discussing contextual representations, specifically focusing on the GPT transformer-based architecture.
- It covers topics such as autoregressive loss function, token representation, hidden representation, language modeling, transformer architecture, and masking.
- The concept of self-attention and its role in allowing the model to look back at previous positions in a sequence when making predictions is explained.
- The training process for a GPT-style model using "teacher forcing" is discussed, highlighting its significance.
- The material briefly touches on the process of sequence generation and different strategies for sampling tokens.
- It mentions that there are different versions of GPT and alternative models available.
- The information provided is based on the text and may vary or become outdated over time.


In [8]:
response = query.generate_on_search(
    db=db,
    prompt="In bullet points, tell me what this material describes",
    search_query="open source models",
    object_path=youtube_url,
    limit=2
)

In [9]:
print(response.generated_text)

- The material describes a summary of open alternatives in the field of open source models.
- It mentions that the information provided may be outdated.
- It highlights the variety of models available in the open source community.
- It mentions the Bloom model, which has 176 billion parameters and is considered extremely large.


## Arxiv example

In [10]:
# pdf_url = 'https://arxiv.org/pdf/1706.03762'
pdf_url = 'https://arxiv.org/pdf/2305.15334'
db.add_pdf(pdf_url)

243

In [11]:
import query
response = query.generate_on_summary(db=db, prompt="Tell me what this material describes", object_path=pdf_url)
print(response.generated_text)

This material describes a research paper titled "Gorilla: Improving Accuracy of Large Language Models using APIs and Documentation" published on arXiv. The paper introduces a model called Gorilla, which aims to enhance the accuracy and reduce errors in large language models (LLMs) when utilizing APIs and their documentation. It compares Gorilla's performance to GPT-4 and demonstrates its adaptability to document changes and its ability to mitigate hallucination issues. The paper also introduces the APIBench dataset for evaluating LLMs' accuracy in using APIs. It discusses topics such as training with constraints, different retrieval techniques, and the impact of using optimal retrievers. The paper suggests that using a better retriever for finetuning is preferable, but zero-shot finetuning can be an alternative when a good retriever is not available. Additionally, the text briefly mentions other topics like program synthesis, neural networks in program synthesis, and the application of

In [12]:
response = query.generate_on_search(
    db=db,
    prompt="how does gorills seems to enhance accuracy and reduce errors in LLMs?",
    search_query="gorilla improvements",
    object_path=pdf_url,
    limit=5
)
print(response.generated_text)

Gorilla, a language model developed based on LLaMA (Language Learning for Machine Automation), enhances accuracy and reduces errors in LLMs (Large Language Models) in several ways:

1. Surpassing GPT-4 Performance: Gorilla's performance surpasses GPT-4, the state-of-the-art LLM, on multiple large datasets. It outperforms GPT-4 in generating reliable API calls to ML models without hallucination.

2. Adaptability to Test-Time Changes: Gorilla demonstrates a strong capability to adapt to test-time document changes. This means it can easily adjust to updates or version changes in the API documentation, enabling flexible user updates.

3. Mitigating Hallucination Issues: Gorilla substantially mitigates the issue of hallucination, which refers to generating incorrect or misleading information. It generates API calls without hallucination, ensuring reliable and accurate outputs.

4. Understanding and Reasoning about Constraints: Gorilla has the ability to understand and reason about constrain

In [19]:
response = query.generate_on_search(
    db=db,
    prompt="how does gorilla work?",
    search_query="gorilla algorithm",
    object_path=pdf_url,
    limit=10
)
print(response.generated_text)

Gorilla is a model that generates reliable API calls to machine learning (ML) models without hallucination. It has been designed to adapt to test-time API usage changes and can satisfy constraints while picking APIs. Gorilla's performance surpasses the state-of-the-art Language Model (LLM) GPT-4 in three massive datasets that were collected. 

Gorilla is a retrieve-aware finetuned LLaMA-7B model specifically for API calls. It can be combined with a document retriever to adapt to test-time document changes, allowing for flexible user updates or version changes. The model has been trained to understand and reason about constraints.

The integration of a retrieval system with Gorilla demonstrates the potential for LLMs to use tools more accurately, keep up with frequently updated documentation, and increase the reliability and applicability of their outputs.

More information about Gorilla can be found in the source: https://arxiv.org/pdf/2305.15334


In [18]:
response = query.generate_on_search(
    db=db,
    prompt="how does gorilla work and how does it reduce hallucination in LLMs",
    search_query="gorilla algorithm and hallucination",
    object_path=pdf_url,
    limit=10
)
print(response.generated_text)

Gorilla is a model that works by leveraging document retrieval to improve its performance. It reduces hallucination in Language Model (LLM) systems by incorporating a retrieval-aware training approach. 

In the zero-shot setting, Gorilla demonstrates the highest accuracy gain while maintaining good factual capability. It achieves this by effectively avoiding hallucination errors when prompted with different retrievers. The accuracy and hallucination reduction of Gorilla are compared to other models such as LLAMA, GPT-3.5, GPT-4, Claude, and HuggingFace.

The performance of Gorilla is evaluated on different configurations, and it consistently outperforms other models. Even when the oracle answer is given, Gorilla remains the best-performing model. It significantly outperforms GPT-4 in terms of API functionality accuracy and reducing hallucination errors.

Gorilla's retrieval-aware training enables the model to adapt to changes in the dataset and improve its performance over time. It red