# Demo

## Instantiate database

Connect to a Weaviate database using this app:

In [6]:
import distyll  # My personal demo project
db = distyll.DBConnection()

## YouTube video example

### Add data

Let's add data from YouTube videos

In [7]:
youtube_urls = [
    "https://youtu.be/sNw40lEhaIQ",  # Stanford: NLU Contextual Word Representations: GPT Spring 2023
    "https://youtu.be/enRb6fp5_hw",  # Stanford: NLU Information Retrieval: Guiding Ideas Spring 2023
    "https://youtu.be/nMMNkfSQuiU",  # Starfield (video game) review - from a week ago
]
for youtube_url in youtube_urls:
    db.add_from_youtube(youtube_url)

### Query data

Now we can query it

In [8]:
for youtube_url in youtube_urls:
    response = db.query_summary(
        prompt="In short, 3-5 plain-language bullet points, tell me what this material describes",
        object_path=youtube_url
    )
    print(f'\n{youtube_url}')
    print(response.generated_text)


https://youtu.be/sNw40lEhaIQ
- The material describes the transformer-based architecture called GPT (Generative Pre-trained Transformer) and its use for language modeling.
- It explains the autoregressive loss function used for neural language modeling and provides illustrations to support the technical details.
- The attention mechanism used in GPT is discussed, along with the need for masking to prevent looking into the future.
- The training process of GPT models with teacher forcing is explained, including how the input sequence is processed and how the model's parameters are updated.
- The material also mentions the use of teacher forcing during generation and discusses fine-tuning GPT models, as well as different versions and variations of GPT models.

https://youtu.be/enRb6fp5_hw
- The material discusses the influence of natural language processing (NLP) on information retrieval, including the use of transformer models like BERT and large language models from OpenAI.
- It highl

In [9]:
query = "language models"
for youtube_url in youtube_urls:
    response = db.query_chunks(
        prompt=f"Does this text contain information about {query}? If so, summarize in 3-5 plain-language bullet points",
        object_path=youtube_url,
        search_query=query
    )
    print(f'\n{youtube_url}')
    print(response.generated_text)


https://youtu.be/sNw40lEhaIQ
Yes, this text contains information about language models. Here are the summarized bullet points:

1. The text discusses the autoregressive loss function used for neural language modeling in GPT, a famous transformer-based architecture.
2. It explains the process of language modeling using GPT, including the embedding representation and hidden representation of tokens.
3. The text mentions the use of transformer blocks in GPT and the addition of language modeling-specific parameters.
4. It highlights the need for masking in attention mechanisms to prevent looking into the future during language modeling.
5. The text describes the training process of GPT models with teacher forcing, where the actual token at the next time step is used.
6. It explains the generation process in GPT, where new tokens are created based on the scores predicted by the model.
7. The text mentions the possibility of fine-tuning GPT models and provides information about the sizes an

In [10]:
query = "a game"
for youtube_url in youtube_urls:
    response = db.query_chunks(
        prompt=f"Does this text contain information about {query}? If so, summarize in 3-5 plain-language bullet points",
        object_path=youtube_url,
        search_query=query
    )
    print(f'\n{youtube_url}')
    print(response.generated_text)


https://youtu.be/sNw40lEhaIQ
No, this text does not contain information about a game.

https://youtu.be/enRb6fp5_hw
No, this text does not contain information about a game.

https://youtu.be/nMMNkfSQuiU
- The text is a review of the game Starfield.
- The reviewer mentions their love for Bethesda's previous game, Fallout 4.
- They describe Starfield as a sci-fi universe with spaceships, lasers, and political intrigue.
- The game has a slow start and some technical issues, but eventually becomes enjoyable.
- There are various quests, side quests, and companions to interact with in the game.


## Notes

Optionally, you can also specify a particular Weaviate instance

In [None]:
# import weaviate
# import os
# client = weaviate.Client(
#     url=os.environ['JP_WCS_URL'],
#     auth_client_secret=weaviate.AuthApiKey(os.environ['JP_WCS_ADMIN_KEY']),
#     additional_headers={
#         'X-OpenAI-Api-Key': os.environ['OPENAI_APIKEY']
#     }
# )
#
# import distyll
# db = distyll.DBConnection(client=client)