# Demo

## Instantiate database

You can connect to your database like this.

In [1]:
import distyll
db = distyll.DBConnection()

embedded weaviate is already listening on port 6666


Optionally, you can also specify a particular Weaviate instance

In [2]:
# import weaviate
# import os
# client = weaviate.Client(
#     url=os.environ['JP_WCS_URL'],
#     auth_client_secret=weaviate.AuthApiKey(os.environ['JP_WCS_ADMIN_KEY']),
#     additional_headers={
#         'X-OpenAI-Api-Key': os.environ['OPENAI_APIKEY']
#     }
# )
#
# import distyll
# db = distyll.DBConnection(client=client)

## YouTube video example

### Add data

Let's add data from a YouTube video

In [3]:
youtube_url = "https://youtu.be/sNw40lEhaIQ"

In [4]:
db.add_from_youtube(youtube_url)

52

### Query data

Now we can query it

In [5]:
import query

In [6]:
response = query.generate_on_summary(db=db, prompt="In bullet points, tell me what this material describes", object_path=youtube_url)

In [7]:
print(response.generated_text)

- The material is part of a series discussing contextual representations, specifically focusing on the GPT transformer-based architecture.
- It covers topics such as autoregressive loss function, token representation, hidden representation, language modeling, transformer architecture, and masking.
- The concept of self-attention and its role in allowing the model to look back at previous positions in a sequence when making predictions is explained.
- The training process for a GPT-style model using "teacher forcing" is discussed, highlighting its significance.
- The material briefly touches on the process of sequence generation and different strategies for sampling tokens.
- It mentions that there are different versions of GPT and alternative models available.
- The information provided is based on the text and may vary or become outdated over time.


In [8]:
response = query.generate_on_search(
    db=db,
    prompt="In bullet points, tell me what this material describes",
    search_query="open source models",
    object_path=youtube_url,
    limit=2
)

In [9]:
print(response.generated_text)

- The material describes a summary of open alternatives in the field of open source.
- It mentions that the information provided may be outdated.
- It highlights the variety of models available in the open source community.
- It mentions the Bloom model, which has 176 billion parameters and is considered extremely large.


## Arxiv example

In [10]:
pdf_url = 'https://arxiv.org/pdf/1706.03762'
db.add_pdf(pdf_url)

159

In [11]:
import query
response = query.generate_on_summary(db=db, prompt="Tell me what this material describes", object_path=pdf_url)
print(response.generated_text)

The material describes a research paper titled "Attention Is All You Need" that introduces a new network architecture called the Transformer. The paper discusses the Transformer's use of attention mechanisms instead of recurrent or convolutional neural networks, its superior performance in tasks like machine translation and English constituency parsing, its requirement of less training time, and its high parallelizability. The paper covers various aspects of the Transformer model, including its architecture, comparison to existing models, performance in machine translation tasks, generalizability to other tasks, explanation of attention mechanisms, and researchers' contributions. It also discusses topics such as positional encoding, self-attention, and their comparison to recurrent and convolutional layers. The training process for models using the Transformer architecture is described, including the dataset used, encoding method, vocabulary size, batch formation, optimizer, regulariza

In [12]:
response = query.generate_on_search(
    db=db,
    prompt="In bullet points, tell me how attention heads work",
    search_query="attention head",
    object_path=pdf_url,
    limit=5
)
print(response.generated_text)

- Attention heads are a component of the transformer model used in natural language processing tasks.
- They are responsible for focusing on different parts of the input sequence during the encoding process.
- Each attention head learns to attend to different aspects of the input, allowing the model to capture different types of information.
- The attention heads operate in parallel, allowing for multiple perspectives to be considered simultaneously.
- The attention heads compute a compatibility function between a query and a set of key-value pairs to determine the importance of each value.
- The weights assigned to each value are computed using a softmax function, resulting in a distribution of attention weights.
- The attention weights are then used to compute a weighted sum of the values, which is the output of the attention head.
- By having multiple attention heads, the model can capture different types of dependencies and relationships in the input sequence.
- The number of atten