## Weaviate quickstart guide (as a notebook!)

This notebook will guide you through the basics of Weaviate. You can find the full documentation [on our site here](https://weaviate.io/developers/weaviate/quickstart).

<a target="_blank" href="https://colab.research.google.com/github/weaviate-tutorials/quickstart/blob/main/quickstart_end_to_end.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

You will need the Weaviate Python client. If you don't yet have it installed - you can do so with:

In [None]:
!pip install -U weaviate-client

### Weaviate instance

For this, you will need a working instance of Weaviate somewhere. We recommend either:
- Creating a free sandbox instance on Weaviate Cloud Database (WCD) (https://console.weaviate.cloud/), or
- Using [Embedded Weaviate](https://weaviate.io/developers/weaviate/installation/embedded).

Instantiate the client using **one** of the following code examples:

#### For using WCD

NOTE: Before you do this, you need to create the instance in WCS and get the credentials. Please refer to the [WCD Quickstart guide](https://weaviate.io/developers/wcs/quickstart).

In [None]:
# # For using WCD
# import weaviate
# from weaviate.classes.init import Auth
# import os

# # Best practice: store your credentials in environment variables
# wcd_url = os.environ["WCD_DEMO_URL"]
# wcd_api_key = os.environ["WCD_DEMO_RO_KEY"]
# openai_api_key = os.environ["OPENAI_APIKEY"]

# with weaviate.connect_to_weaviate_cloud(
#     cluster_url=wcd_url,  # Replace with your Weaviate Cloud URL
#     auth_credentials=Auth.api_key(wcd_api_key),  # Replace with your Weaviate Cloud key
#     headers={
#         'X-OpenAI-Api-key': openai_api_key  # Replace with appropriate header key/value pair for the required API
#     }
# ) as client:  # Use this context manager to ensure the connection is closed
#     print(client.is_ready())

#### For using Embedded Weaviate

This will spin up a Weaviate instance in the background.

In [17]:
# For using embedded
import weaviate
import os

# remember to close the connection at the end!
client = weaviate.connect_to_embedded(
    version="1.27.1",
    headers = {
        "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]  # Replace with your inference API key
    }
)

{"action":"startup","build_git_commit":"05de0dbea","build_go_version":"go1.23.2","build_image_tag":"HEAD","build_wv_version":"1.27.1","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-11-10T10:56:13-03:00"}
{"action":"startup","auto_schema_enabled":true,"build_git_commit":"05de0dbea","build_go_version":"go1.23.2","build_image_tag":"HEAD","build_wv_version":"1.27.1","level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2024-11-10T10:56:13-03:00"}
{"build_git_commit":"05de0dbea","build_go_version":"go1.23.2","build_image_tag":"HEAD","build_wv_version":"1.27.1","level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"2024-11-10T10:56:13-03:00"}
{"build_git_commit":"05de0dbea","build_go_version":"go1.23.2","

In [27]:
# lets check if our client is ready
print("Client is Ready?", client.is_ready())
# lets check the client version
print("Client Version:", weaviate.__version__)
# lets check the server version
print("Server Version:", client.get_meta().get("version"))

Client is Ready? True
Client Version: 4.9.3
Server Version: 1.27.1


### Create a class

In [19]:
if client.collections.exists("Question"):
    client.collections.delete("Question")

In [None]:
from weaviate import classes as wvc
questions =  client.collections.create(
    name="Question",
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(model="text-embedding-3-large"),
    generative_config=wvc.config.Configure.Generative.openai()
)

{"action":"hnsw_prefill_cache_async","build_git_commit":"05de0dbea","build_go_version":"go1.23.2","build_image_tag":"HEAD","build_wv_version":"1.27.1","level":"info","msg":"not waiting for vector cache prefill, running in background","time":"2024-11-10T10:56:23-03:00","wait_for_cache_prefill":false}
{"build_git_commit":"05de0dbea","build_go_version":"go1.23.2","build_image_tag":"HEAD","build_wv_version":"1.27.1","level":"info","msg":"Created shard question_Ho4YWAbg0Vek in 3.797083ms","time":"2024-11-10T10:56:23-03:00"}
{"action":"hnsw_vector_cache_prefill","build_git_commit":"05de0dbea","build_go_version":"go1.23.2","build_image_tag":"HEAD","build_wv_version":"1.27.1","count":1000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-11-10T10:56:23-03:00","took":89584}


### Add objects

We'll add objects to our Weaviate instance using a batch import process.

We shows you two options, where you can either:
- Have Weaviate create vectors, or
- Specify custom vectors.

#### Have Weaviate create vectors (with `text2vec-openai`)

In [None]:
# Load data
import requests
import json

url = 'https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json'
resp = requests.get(url)
data = json.loads(resp.text)

with questions.batch.dynamic() as batch:
    for d in data:
        batch.add_object({
            "answer": d["Answer"],
            "question": d["Question"],
            "category": d["Category"],
        })
# lets check if our batch had any errors
print(questions.batch.failed_objects)

#### Specify "custom" vectors (i.e. generated outside of Weaviate)

In [None]:
# # Load data
# import requests
# import json

# url = 'https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json'
# resp = requests.get(url)
# data = json.loads(resp.text)

# with questions.batch.dynamic() as batch:
#     for d in data:
#         batch.add_object(
#             {
#                 "answer": d["Answer"],
#                 "question": d["Question"],
#                 "category": d["Category"],
#             },
#             vector=d["vector"] # passing the vector here
#         )

### Queries

#### Semantic search

Let's try a similarity search. We'll use nearText search to look for quiz objects most similar to biology.

In [22]:
response = questions.query.near_text(
    query="biology",
    limit=2
)

for obj in response.objects:
    print(json.dumps(obj.properties, indent=2))

{
  "answer": "species",
  "question": "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification",
  "category": "SCIENCE"
}
{
  "answer": "DNA",
  "question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance",
  "category": "SCIENCE"
}


The response includes a list of top 2 (due to the limit set) objects whose vectors are most similar to the word biology.

Notice that even though the word biology does not appear anywhere, Weaviate returns biology-related entries.

This example shows why vector searches are powerful. Vectorized data objects allow for searches based on degrees of similarity, as shown here.

#### Semantic search with a filter
You can add a Boolean filter to your example. For example, let's run the same search, but only look in objects that have a "category" value of "ANIMALS".

In [23]:
response = questions.query.near_text(
    query="biology",
    limit=2,
    filters=(
        wvc.query.Filter.by_property("category").equal("ANIMALS")
    )
)

for obj in response.objects:
    print(json.dumps(obj.properties, indent=2))

{
  "answer": "the nose or snout",
  "question": "The gavial looks very much like a crocodile except for this bodily feature",
  "category": "ANIMALS"
}
{
  "answer": "Elephant",
  "question": "It's the only living mammal in the order Proboseidea",
  "category": "ANIMALS"
}


The response includes a list of top 2 (due to the limit set) objects whose vectors are most similar to the word biology - but only from the "ANIMALS" category.

Using a Boolean filter allows you to combine the flexibility of vector search with the precision of where filters.

#### Generative search (single prompt)

Next, let's try a generative search, where search results are processed with a large language model (LLM).

Here, we use a `single prompt` query, and the model to explain each answer in plain terms.

In [25]:
response = questions.generate.near_text(
    query="biology",
    limit=2,
    single_prompt="Explain {answer} as you might to a five-year-old."
)
for obj in response.objects:
    print("###")
    print("Properties:", obj.properties)
    print("Generated Content:", obj.generated)

###
Properties: {'answer': 'species', 'question': "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification", 'category': 'SCIENCE'}
Generated Content: A species is a group of animals or plants that are similar to each other in certain ways. They look alike and can have babies that also look like them. Each species has its own special characteristics that make them unique.
###
Properties: {'answer': 'DNA', 'question': 'In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance', 'category': 'SCIENCE'}
Generated Content: DNA is like a recipe book that tells our bodies how to grow and work. It is made up of tiny instructions called genes that tell our bodies what color our eyes will be, how tall we will grow, and lots of other things that make us who we are. Just like how a recipe book helps us make yummy food, DNA helps our bodies do all the amazing things they can do!


We see that Weaviate has retrieved the same results as before. But now it includes an additional, generated text with a plain-language explanation of each answer.

#### Generative search (grouped task)

In the next example, we will use a grouped task prompt instead to combine all search results and send them to the LLM with a prompt. We ask the LLM to write a tweet about all of these search results.

In [26]:
response = questions.generate.near_text(
    query="biology",
    limit=2,
    grouped_task="Write a tweet with emojis about these facts."
)

print(response.generated)  # Inspect the generated text

🌿🐦 2000 news: the Gunnison sage grouse is a new species of sage grouse! 🌿🐦

🧬 In 1953 Watson & Crick discovered the structure of DNA, the gene-carrying substance! 🧬 #ScienceFacts #NewSpecies #DNADiscovery 🌿🧬


Generative search sends retrieved data from Weaviate to a large language model, or LLM. This allows you to go beyond simple data retrieval, but transform the data into a more useful form, without ever leaving Weaviate.

Well done! In just a few short minutes, you have:

- Created your own cloud-based vector database with Weaviate,
- Populated it with data objects,
    - Using an inference API, or
    - Using custom vectors,
- Performed searches, including:
    - Semantic search,
    - Sementic search with a filter and
    - Generative search.

## Next

You can do much more with Weaviate. We suggest trying:

- Examples from our [search how-to](https://weaviate.io/developers/weaviate/search) guides for [keyword](https://weaviate.io/developers/weaviate/search/bm25), [similarity](https://weaviate.io/developers/weaviate/search/similarity), [hybrid](https://weaviate.io/developers/weaviate/search/hybrid), [generative](https://weaviate.io/developers/weaviate/search/generative) searches and [filters](https://weaviate.io/developers/weaviate/search/filters) or
- Learning [how to manage data](https://weaviate.io/developers/weaviate/manage-data), like [reading](https://weaviate.io/developers/weaviate/manage-data/read), [batch importing](https://weaviate.io/developers/weaviate/manage-data/import), [updating](https://weaviate.io/developers/weaviate/manage-data/update), [deleting](https://weaviate.io/developers/weaviate/manage-data/delete) objects or [bulk exporting](https://weaviate.io/developers/weaviate/manage-data/read-all-objects) data.

For more holistic learning, try <i class="fa-solid fa-graduation-cap"></i> [Weaviate Academy](https://weaviate.io/developers/academy). We have built free courses for you to learn about Weaviate and the world of vector search.

You can also try a larger, [1,000 row](https://raw.githubusercontent.com/databyjp/wv_demo_uploader/main/weaviate_datasets/data/jeopardy_1k.json) version of the Jeopardy! dataset, or [this tiny set of 50 wine reviews](https://raw.githubusercontent.com/databyjp/wv_demo_uploader/main/weaviate_datasets/data/winemag_tiny.csv).