## Demo notebook

Try it out as shown here:

Import the functions, and instantiate the database. Then, connect to the date (collection).

You'll be interacting with the data through the `collection`.

In [1]:
import wkb
client = wkb.start_db()
collection = wkb.Collection(client, wkb.WV_CLASS)

Found class. Skipping class creation


### Load data

There are a bunch of built-in functions to make it easier to add data to your knowledge base.

To add a text file - simply specify the path to it:

In [2]:
collection.add_text_file("./srcdata/kubernetes_concepts_overview.txt")

13

To add a Wiki article (currently it just adds the summary) - provide the article title:

In [3]:
for wiki_title in [
    "Database",
    "Vector database",
    "Containerization (computing)",
    "Formula One",
]:
    collection.add_wiki_article(wiki_title)

And to add a YouTube video: proivde its URL

In [4]:
youtube_url = 'https://youtu.be/xk28RMhRy1U'  # Weaviate 1.20 release podcast
collection.add_from_youtube(youtube_url)

[youtube] Extracting URL: https://youtu.be/xk28RMhRy1U
[youtube] xk28RMhRy1U: Downloading webpage
[youtube] xk28RMhRy1U: Downloading ios player API JSON
[youtube] xk28RMhRy1U: Downloading android player API JSON
[youtube] xk28RMhRy1U: Downloading m3u8 information
[info] xk28RMhRy1U: Downloading 1 format(s): 251
[download] Destination: temp_audio.mp3
[download] 100% of   51.50MiB in 00:00:28 at 1.79MiB/s      
Found Etienne Dilocker on Weaviate 1.20 - Weaviate Podcast #56! - downloading
[youtube] Extracting URL: https://youtu.be/xk28RMhRy1U
[youtube] xk28RMhRy1U: Downloading webpage
[youtube] xk28RMhRy1U: Downloading ios player API JSON
[youtube] xk28RMhRy1U: Downloading android player API JSON
[youtube] xk28RMhRy1U: Downloading m3u8 information
[info] xk28RMhRy1U: Downloading 1 format(s): 251
[download] temp_audio.mp3 has already been downloaded
[download] 100% of   51.50MiB
Successfully Downloaded to temp_audio.mp3
Splitting audio to 5


  clip.export(f"{i}_" + audio_file_path)
  clip.export(f"{i}_" + audio_file_path)
  clip.export(f"{i}_" + audio_file_path)
  clip.export(f"{i}_" + audio_file_path)
  clip.export(f"{i}_" + audio_file_path)


Getting transcripts from 5 audio files...
Processing transcript 1 of 5...
Processing transcript 2 of 5...


  audio_file = open(clip_outpath, "rb")


Processing transcript 3 of 5...


  audio_file = open(clip_outpath, "rb")


Processing transcript 4 of 5...


  audio_file = open(clip_outpath, "rb")


Processing transcript 5 of 5...


  audio_file = open(clip_outpath, "rb")


[youtube] Extracting URL: https://youtu.be/xk28RMhRy1U
[youtube] xk28RMhRy1U: Downloading webpage


  transcript_texts = _get_transcripts_from_audio_file(outpath)


[youtube] xk28RMhRy1U: Downloading ios player API JSON
[youtube] xk28RMhRy1U: Downloading android player API JSON
[youtube] xk28RMhRy1U: Downloading iframe API JS
[youtube] xk28RMhRy1U: Downloading player 4cc5d082
[youtube] xk28RMhRy1U: Downloading web player API JSON
[youtube] xk28RMhRy1U: Downloading m3u8 information




[youtube] xk28RMhRy1U: Downloading initial data API JSON
[youtube] Extracting URL: https://youtu.be/xk28RMhRy1U
[youtube] xk28RMhRy1U: Downloading webpage
[youtube] xk28RMhRy1U: Downloading ios player API JSON
[youtube] xk28RMhRy1U: Downloading android player API JSON
[youtube] xk28RMhRy1U: Downloading m3u8 information
[youtube] Extracting URL: https://youtu.be/xk28RMhRy1U
[youtube] xk28RMhRy1U: Downloading webpage
[youtube] xk28RMhRy1U: Downloading ios player API JSON
[youtube] xk28RMhRy1U: Downloading android player API JSON
[youtube] xk28RMhRy1U: Downloading m3u8 information
[youtube] Extracting URL: https://youtu.be/xk28RMhRy1U
[youtube] xk28RMhRy1U: Downloading webpage
[youtube] xk28RMhRy1U: Downloading ios player API JSON
[youtube] xk28RMhRy1U: Downloading android player API JSON
[youtube] xk28RMhRy1U: Downloading m3u8 information




[youtube] xk28RMhRy1U: Downloading initial data API JSON
[youtube] Extracting URL: https://youtu.be/xk28RMhRy1U
[youtube] xk28RMhRy1U: Downloading webpage
[youtube] xk28RMhRy1U: Downloading ios player API JSON
[youtube] xk28RMhRy1U: Downloading android player API JSON
[youtube] xk28RMhRy1U: Downloading m3u8 information


120

### Make use of your knowledge base

You can use the data in your knowledge base in various ways.

You can summarize a particular entry - for example, the YouTube video that we just ingested:

In [6]:
print(collection.summarize_entry(youtube_url))

In this podcast episode, Etienne Dilocker discusses various topics related to Weaviate 1.20. The discussions include optimizing memory allocation, the concept of shards in Weaviate, multi-tenancy, the challenges of implementing vector search in a multi-tenant environment, linear scaling, the benefits of a larger cluster, import throughput, testing phases, user feedback, sharding, memory allocation, and the performance of Weaviate in terms of disk reads, caching, and search results. 

Dilocker also explains the concept of multi-tenancy and its benefits, such as reducing the number of nodes needed to serve queries, improving cost efficiency by deactivating inactive tenants, and scaling Weaviate to handle a large number of nodes. Additionally, the podcast covers topics like product quantization and its impact on reducing memory usage, re-ranking, query success monitoring, and the importance of observability and monitoring in system management.

Overall, listeners can learn about the techn

Or, you can ask it to suggest key ideas that you can learn about, given a particular topic.

In [8]:
print(collection.suggest_topics_to_learn("weaviate"))

Related sub-topics about Weaviate:

1. Hybrid search capabilities, including BM25 and re-ranking
2. Module system for flexible customization and adding features
3. Improving search experience and accessibility
4. Experimental and evaluation phases of development
5. Incorporating user feedback and making improvements
6. Multi-tenancy and its benefits for data management and machine learning applications


You can actually provide a custom prompt with the summarization task. For example, you can get Weaviate to suggest some things I can learn about - from a specific video.

In [9]:
topic_prompt = f"""
Extract a list of three to six related sub-topics
related to the main topic that the user might learn about.
Deliver the topics as a short list, each separated by two consecutive newlines like `\n\n`
"""

print(collection.summarize_entry(youtube_url, custom_prompt=topic_prompt))

The related sub-topics that you might learn about from the provided information are:

1. Optimizing memory allocation and reducing memory footprint in Weaviate
2. The concept of shards and their importance in organizing data in Weaviate
3. Multi-tenancy in Weaviate and its role in separating data for different users
4. Challenges and strategies for implementing vector search in a multi-tenant environment
5. Linear scaling and cluster size in Weaviate for improved performance
6. Testing and evaluation processes for multi-tenancy implementation
7. Sharding and its benefits in querying a small portion of data in isolation
8. Load testing and scalability challenges in Weaviate
9. Benefits and cost savings of multi-tenancy in Weaviate
10. Product quantization as a vector compression technique in Weaviate
11. In-memory and on-disk storage of vectors in Weaviate
12. The HNSW traversal algorithm and its impact on search results
13. Challenges and techniques for loading and updating vectors in 

Or maybe I want to share details about what I learned from a video?

In [10]:
youtube_url = 'https://youtu.be/zN4VCb0LbQI'  # Pydnatic vs dataclasses vs attrs
collection.add_from_youtube(youtube_url)

print(collection.summarize_entry(youtube_url, custom_prompt="Summarize their conclusions about Pydantic, dataclasses and Attrs in a tweet or two - use emojis and make it interesting."))

🐍📦 Comparing Python packages:
- Dataclasses: basic features, no validation
- Atters: control over class defs + comparison, ext dependency
- Pydantic: powerful validation, strict data types (inheritance)
🤔 Choose based on needs! #Python #Packages


In [12]:
youtube_url = 'https://youtu.be/wi63uvjs6Uc'  # God of war review
collection.add_from_youtube(youtube_url)

[youtube] Extracting URL: https://youtu.be/wi63uvjs6Uc
[youtube] wi63uvjs6Uc: Downloading webpage




[youtube] wi63uvjs6Uc: Downloading ios player API JSON
[youtube] wi63uvjs6Uc: Downloading android player API JSON
[youtube] wi63uvjs6Uc: Downloading iframe API JS
[youtube] wi63uvjs6Uc: Downloading player 4cc5d082
[youtube] wi63uvjs6Uc: Downloading web player API JSON
[youtube] wi63uvjs6Uc: Downloading m3u8 information




[youtube] wi63uvjs6Uc: Downloading initial data API JSON
[info] wi63uvjs6Uc: Downloading 1 format(s): 251
[download] Destination: temp_audio.mp3
[download] 100% of   15.92MiB in 00:00:00 at 18.17MiB/s    
Found God of War Ragnarok Review - downloading
[youtube] Extracting URL: https://youtu.be/wi63uvjs6Uc
[youtube] wi63uvjs6Uc: Downloading webpage




[youtube] wi63uvjs6Uc: Downloading ios player API JSON
[youtube] wi63uvjs6Uc: Downloading android player API JSON
[youtube] wi63uvjs6Uc: Downloading iframe API JS
[youtube] wi63uvjs6Uc: Downloading web player API JSON
[youtube] wi63uvjs6Uc: Downloading m3u8 information




[youtube] wi63uvjs6Uc: Downloading initial data API JSON
[info] wi63uvjs6Uc: Downloading 1 format(s): 251
[download] temp_audio.mp3 has already been downloaded
[download] 100% of   15.92MiB
Successfully Downloaded to temp_audio.mp3
Splitting audio to 2


  clip.export(f"{i}_" + audio_file_path)
  clip.export(f"{i}_" + audio_file_path)


Getting transcripts from 2 audio files...
Processing transcript 1 of 2...
Processing transcript 2 of 2...


  audio_file = open(clip_outpath, "rb")


[youtube] Extracting URL: https://youtu.be/wi63uvjs6Uc
[youtube] wi63uvjs6Uc: Downloading webpage


  transcript_texts = _get_transcripts_from_audio_file(outpath)


[youtube] wi63uvjs6Uc: Downloading ios player API JSON
[youtube] wi63uvjs6Uc: Downloading android player API JSON
[youtube] wi63uvjs6Uc: Downloading m3u8 information
[youtube] Extracting URL: https://youtu.be/wi63uvjs6Uc
[youtube] wi63uvjs6Uc: Downloading webpage
[youtube] wi63uvjs6Uc: Downloading ios player API JSON
[youtube] wi63uvjs6Uc: Downloading android player API JSON
[youtube] wi63uvjs6Uc: Downloading m3u8 information
This game, God of War Ragnarok, seems to appeal to a wide range of gamers. It offers an enthralling spectacle with impeccable writing and pitch-perfect performances. The combat mechanics are praised, particularly the return of weapons and brutal finishes on stunned enemies. The game is described as a sprawling epic with grand designs and serves as a fitting conclusion to the Norse saga of Kratos. It explores themes such as destiny and the cycle of violence between parents and children. The expanded cast of characters, role of humor, and impressive continuous camer

You can also ask questions to a particular object.

In [13]:
print(collection.ask_object(youtube_url, "What kind of gamers might this game appeal to?"))

This game, God of War Ragnarok, might appeal to gamers who enjoy action-adventure games with a strong emphasis on storytelling. It is likely to attract fans of the previous God of War game, as it is a sequel that builds upon the success of its predecessor. The game offers fast-paced combat, cinematic spectacle, and a variety of weapons and abilities to engage with. Gamers who appreciate Norse mythology and its interpretation in video games may also find this game appealing. Additionally, those who enjoy exploring richly detailed and visually stunning game worlds may be drawn to God of War Ragnarok.


You can perform fairly basic things - like text similarity searches

In [14]:
collection.text_search("kubernetes", 2)

[{'_additional': {'distance': 0.12698501},
  'body': 'your system.\n\nKubernetes provides you with:\n\nService discovery and load balancing Kubernetes can expose a container using the DNS name or using their own IP address. If traffic to a container is high, Kubernetes is able to load balance and distribute the network traffic so that the deployment is stable.\nStorage orchestration Kubernetes allows you to automatically mount a storage system of your choice, such as local storages, public cloud providers, and more.\nAutomated rollouts and rollbacks You can describe the desired state for your deployed containers using Kubernetes, and it can change the actual state to the desired state at a controlled rate. For example, you can automate Kubernetes to create new containers for your deployment, remove existing containers and adopt all their resources to the new container.\nAutomatic',
  'chunk_number': 7,
  'source_path': './srcdata/kubernetes_concepts_overview.txt',
  'source_title': './