Install the Weaviate python client

In [3]:
!pip install -U weaviate-client

Defaulting to user installation because normal site-packages is not writeable
Collecting weaviate-client
  Obtaining dependency information for weaviate-client from https://files.pythonhosted.org/packages/35/fe/611cb830fa7680b0b3fca460ff2e32062d8f4bc2b6fb0f84e56ae2993ec4/weaviate_client-3.22.1-py3-none-any.whl.metadata
  Downloading weaviate_client-3.22.1-py3-none-any.whl.metadata (3.4 kB)
Collecting validators<=0.21.0,>=0.18.2 (from weaviate-client)
  Obtaining dependency information for validators<=0.21.0,>=0.18.2 from https://files.pythonhosted.org/packages/ad/50/18dbf2ac594234ee6249bfe3425fa424c18eeb96f29dcd47f199ed6c51bc/validators-0.21.0-py3-none-any.whl.metadata
  Downloading validators-0.21.0-py3-none-any.whl.metadata (2.6 kB)
Collecting authlib>=1.1.0 (from weaviate-client)
  Obtaining dependency information for authlib>=1.1.0 from https://files.pythonhosted.org/packages/81/6e/f4522542322c7f53783da5f65464a7dee137c687111624d2ac733e2a1b98/Authlib-1.2.1-py2.py3-none-any.whl.metad

## Preparation: Get the data

We'll use a subset of the Jeopardy! quiz library:

> https://raw.githubusercontent.com/databyjp/wv_demo_uploader/main/weaviate_datasets/data/jeopardy_1k.json


Load (or download) the data, and preview it

In [4]:
import requests
import json

# Load the data locally
with open("jeopardy_1k.json", "r") as f:
    raw_data = f.read()

# Or download it from GitHub
response = requests.get('https://raw.githubusercontent.com/databyjp/wv_demo_uploader/main/weaviate_datasets/data/jeopardy_1k.json')
raw_data = response.text

# Parse the JSON and preview it
data = json.loads(raw_data)
print(type(data), len(data))
print(json.dumps(data[0], indent=2))

<class 'list'> 1000
{
  "Air Date": "2006-11-08",
  "Round": "Double Jeopardy!",
  "Value": 800,
  "Category": "AMERICAN HISTORY",
  "Question": "Abraham Lincoln died across the street from this theatre on April 15, 1865",
  "Answer": "Ford's Theatre (the Ford Theatre accepted)"
}


## Step 1: Create a Weaviate instance (database)

In [5]:
import weaviate
from weaviate import EmbeddedOptions
import os

client = weaviate.Client(
    embedded_options=EmbeddedOptions(),
    additional_headers={
        "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]
    }
)

Binary /home/repl/.cache/weaviate-embedded did not exist. Downloading binary from https://github.com/weaviate/weaviate/releases/download/v1.19.12/weaviate-v1.19.12-Linux-amd64.tar.gz
Started /home/repl/.cache/weaviate-embedded: process ID 1784


{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2023-08-14T16:56:25Z"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2023-08-14T16:56:25Z"}
{"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50051","time":"2023-08-14T16:56:25Z"}
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://127.0.0.1:6666","time":"2023-08-14T16:56:25Z"}


Create a helper function as we'll be dealing with JSON responses a lot

In [6]:
def jprint(data_in):
    print(json.dumps(data_in, indent=2))

Retrieve Weaviate instance information to check our configuration.

In [7]:
jprint(client.get_meta())

{
  "hostname": "http://127.0.0.1:6666",
  "modules": {
    "generative-openai": {
      "documentationHref": "https://beta.openai.com/docs/api-reference/completions",
      "name": "Generative Search - OpenAI"
    },
    "qna-openai": {
      "documentationHref": "https://beta.openai.com/docs/api-reference/completions",
      "name": "OpenAI Question & Answering Module"
    },
    "ref2vec-centroid": {},
    "text2vec-cohere": {
      "documentationHref": "https://docs.cohere.ai/embedding-wiki/",
      "name": "Cohere Module"
    },
    "text2vec-huggingface": {
      "documentationHref": "https://huggingface.co/docs/api-inference/detailed_parameters#feature-extraction-task",
      "name": "Hugging Face Module"
    },
    "text2vec-openai": {
      "documentationHref": "https://beta.openai.com/docs/guides/embeddings/what-are-embeddings",
      "name": "OpenAI Module"
    }
  },
  "version": "1.19.12"
}


## Step 2: Add data to Weaviate

### Add class definition

In [8]:
if client.schema.exists("Question"):
    client.schema.delete_class("Question")

In [9]:
class_definition = {
    "class": "Question",
    "vectorizer": "text2vec-openai",
    "vectorIndexConfig": {
        "distance": "cosine",
    },
    "moduleConfig": {
        "generative-openai": {}
    },
    "properties": [
        {
            "name": "question",
            "dataType": ["text"]
        },
        {
            "name": "answer",
            "dataType": ["text"]
        },
    ],
}

client.schema.create_class(class_definition)

{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"question_4pkaAci4eAfx","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2023-08-14T16:56:25Z","took":26981}


In [10]:
jprint(client.schema.get("Question"))

{
  "class": "Question",
  "invertedIndexConfig": {
    "bm25": {
      "b": 0.75,
      "k1": 1.2
    },
    "cleanupIntervalSeconds": 60,
    "stopwords": {
      "additions": null,
      "preset": "en",
      "removals": null
    }
  },
  "moduleConfig": {
    "generative-openai": {},
    "text2vec-openai": {
      "model": "ada",
      "modelVersion": "002",
      "type": "text",
      "vectorizeClassName": true
    }
  },
  "properties": [
    {
      "dataType": [
        "text"
      ],
      "indexFilterable": true,
      "indexSearchable": true,
      "moduleConfig": {
        "text2vec-openai": {
          "skip": false,
          "vectorizePropertyName": false
        }
      },
      "name": "question",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "indexFilterable": true,
      "indexSearchable": true,
      "moduleConfig": {
        "text2vec-openai": {
          "skip": false,
          "vectorizePropertyName": false
        

### Add data

In [11]:
for o in data[:2]:
    obj_body = {
        "question": o["Question"],
        "answer": o["Answer"],
    }
    print(obj_body)

{'question': 'Abraham Lincoln died across the street from this theatre on April 15, 1865', 'answer': "Ford's Theatre (the Ford Theatre accepted)"}
{'question': 'Any pigment on the wall so faded you can barely see it', 'answer': 'faint paint'}


In [12]:
with client.batch() as batch:
    for o in data:
        obj_body = {
            "question": o["Question"],
            "answer": o["Answer"],
        }
        batch.add_data_object(
            data_object=obj_body,
            class_name="Question"
        )

#### Confirm data load

In [13]:
jprint(client.query.aggregate("Question").with_meta_count().do())

{
  "data": {
    "Aggregate": {
      "Question": [
        {
          "meta": {
            "count": 1000
          }
        }
      ]
    }
  }
}


Does the data look right?

Let's grab a few objects from Weaviate!

In [14]:
response = (
    client.query
    .get("Question", ["question", "answer"])
    .with_limit(2)
    .do()
)

jprint(response)

{
  "data": {
    "Get": {
      "Question": [
        {
          "answer": "the Sargasso Sea",
          "question": "This sea in the North Atlantic is delineated only by the plants that float on its surface"
        },
        {
          "answer": "Mona Lisa",
          "question": "For one of his works, Marcel Duchamp added a beard & mustache to a print of this da Vinci lady"
        }
      ]
    }
  }
}


![img](https://github.com/weaviate-tutorials/intro-workshop/blob/main/images/object_import_process_full.png?raw=1)

## Step 3: Work with the data

### Filtering (similar to WHERE filter in SQL)

In [15]:
where_filter = {
    "path": ["question"],
    "operator": "Like",
    "valueText": "*history*"
}

response = (
    client.query
    .get("Question", ["question", "answer"])
    .with_where(where_filter)
    .with_limit(3)
    .do()
)

jprint(response)

{
  "data": {
    "Get": {
      "Question": [
        {
          "answer": "the Field Museum",
          "question": "What was once the Chicago Natural History Museum is now called this, after its founder"
        },
        {
          "answer": "the draft",
          "question": "You're in the Army now--in 1940 FDR instituted the first peacetime one of these in U.S. history"
        },
        {
          "answer": "Oil",
          "question": "The Drake Well Museum in Titusville, Penn. is dedicated to the history of this industry"
        }
      ]
    }
  }
}


In [16]:
where_filter = {
    "operator": "Or",
    "operands": [
        {
            "path": ["question"],
            "operator": "Like",
            "valueText": "*history*"            
        },
        {
            "path": ["answer"],
            "operator": "Like",
            "valueText": "*history*"            
        },        
    ]
}

response = (
    client.query
    .get("Question", ["question", "answer"])
    .with_where(where_filter)
    .with_limit(3)
    .do()
)

jprint(response)

{
  "data": {
    "Get": {
      "Question": [
        {
          "answer": "the Field Museum",
          "question": "What was once the Chicago Natural History Museum is now called this, after its founder"
        },
        {
          "answer": "the draft",
          "question": "You're in the Army now--in 1940 FDR instituted the first peacetime one of these in U.S. history"
        },
        {
          "answer": "\"A Brief History Of Time In A Bottle\"",
          "question": "Stephen Hawking's 1988 bio of the universe that was a No. 1 hit for Jim Croce"
        }
      ]
    }
  }
}


### Keyword search

In [17]:
response = (
    client.query
    .get("Question", ["question", "answer"])
    .with_bm25("history")
    .with_limit(3)
    .do()
)

jprint(response)

{
  "data": {
    "Get": {
      "Question": [
        {
          "answer": "\"A Brief History Of Time In A Bottle\"",
          "question": "Stephen Hawking's 1988 bio of the universe that was a No. 1 hit for Jim Croce"
        },
        {
          "answer": "Oil",
          "question": "The Drake Well Museum in Titusville, Penn. is dedicated to the history of this industry"
        },
        {
          "answer": "the Field Museum",
          "question": "What was once the Chicago Natural History Museum is now called this, after its founder"
        }
      ]
    }
  }
}


### Semantic search

In [18]:
response = (
    client.query
    .get("Question", ["question", "answer"])
    .with_near_text({"concepts": ["history"]})
    .with_limit(3)
    .do()
)

jprint(response)

{
  "data": {
    "Get": {
      "Question": [
        {
          "answer": "Greyhound",
          "question": "A Hibbing, Minn. museum traces the history of this bus company founded there in 1914 using Hupmobiles"
        },
        {
          "answer": "The Rijksmuseum",
          "question": "This Dutch national art museum had its origins in one founded by Louis Bonaparte in 1808"
        },
        {
          "answer": "Shinto",
          "question": "Compiled in 712, the Kojiki, \"Records of Ancient Matters\", is one of this religion's oldest texts"
        }
      ]
    }
  }
}


In [19]:
response = (
    client.query
    .get("Question", ["question", "answer"])
    .with_near_text({"concepts": ["history"]})
    .with_additional("distance")
    .with_limit(3)
    .do()
)

jprint(response)

{
  "data": {
    "Get": {
      "Question": [
        {
          "_additional": {
            "distance": 0.19912618
          },
          "answer": "Greyhound",
          "question": "A Hibbing, Minn. museum traces the history of this bus company founded there in 1914 using Hupmobiles"
        },
        {
          "_additional": {
            "distance": 0.20586908
          },
          "answer": "The Rijksmuseum",
          "question": "This Dutch national art museum had its origins in one founded by Louis Bonaparte in 1808"
        },
        {
          "_additional": {
            "distance": 0.20853132
          },
          "answer": "Shinto",
          "question": "Compiled in 712, the Kojiki, \"Records of Ancient Matters\", is one of this religion's oldest texts"
        }
      ]
    }
  }
}


In [20]:
response = (
    client.query
    .get("Question", ["question", "answer"])
    .with_additional("vector")
    .with_limit(1)
    .do()
)

jprint(response)

{
  "data": {
    "Get": {
      "Question": [
        {
          "_additional": {
            "vector": [
              0.013181721,
              -0.00023911761,
              0.010781428,
              -0.030288784,
              -0.007300339,
              0.030819235,
              -0.031774048,
              -0.0022610498,
              -0.043470506,
              -0.015064823,
              0.019215608,
              0.010224453,
              0.0021367252,
              0.0067068967,
              0.013473469,
              0.014163056,
              0.020011285,
              0.013990659,
              0.0038457736,
              -0.018486237,
              -0.0110267615,
              -0.001100687,
              0.0072075105,
              0.0027019875,
              0.02160264,
              -0.01758447,
              0.0128302965,
              -0.032994088,
              -0.018194487,
              0.0042137746,
              -0.0065643378,
              -0.0048602624,
  

### Generative search

In [21]:
response = (
    client.query
    .get("Question", ["question", "answer"])
    .with_near_text({"concepts": ["history"]})
    .with_generate(single_prompt="Write a tweet about {question} as an interesting factoid.")
    .with_limit(3)
    .do()
)

jprint(response)

{
  "data": {
    "Get": {
      "Question": [
        {
          "_additional": {
            "generate": {
              "error": null,
              "singleResult": "\"Did you know? \ud83d\ude8c A fascinating piece of history lies in Hibbing, Minn.! \ud83c\udfde\ufe0f The local museum takes you back to 1914, where a bus company was founded using Hupmobiles! \ud83d\ude8d Explore the rich heritage and evolution of transportation at this hidden gem. \ud83c\udf1f #Hibbing #Museum #TransportationHistory\""
            }
          },
          "answer": "Greyhound",
          "question": "A Hibbing, Minn. museum traces the history of this bus company founded there in 1914 using Hupmobiles"
        },
        {
          "_additional": {
            "generate": {
              "error": null,
              "singleResult": "\"\ud83c\udfa8 Did you know? The Dutch national art museum, originally founded by Louis Bonaparte in 1808, holds a rich history of artistic treasures! From its humble be

In [22]:
response = (
    client.query
    .get("Question", ["question", "answer"])
    .with_near_text({"concepts": ["history"]})
    .with_generate(single_prompt="Translate {question} into French.")
    .with_limit(3)
    .do()
)

jprint(response)

{
  "data": {
    "Get": {
      "Question": [
        {
          "_additional": {
            "generate": {
              "error": null,
              "singleResult": "Un mus\u00e9e \u00e0 Hibbing, dans le Minnesota, retrace l'histoire de cette compagnie de bus fond\u00e9e en 1914 en utilisant des Hupmobiles."
            }
          },
          "answer": "Greyhound",
          "question": "A Hibbing, Minn. museum traces the history of this bus company founded there in 1914 using Hupmobiles"
        },
        {
          "_additional": {
            "generate": {
              "error": null,
              "singleResult": "Ce mus\u00e9e national d'art n\u00e9erlandais trouve ses origines dans celui fond\u00e9 par Louis Bonaparte en 1808."
            }
          },
          "answer": "The Rijksmuseum",
          "question": "This Dutch national art museum had its origins in one founded by Louis Bonaparte in 1808"
        },
        {
          "_additional": {
            "generate

In [23]:
response = (
    client.query
    .get("Question", ["question", "answer"])
    .with_near_text({"concepts": ["history"]})
    .with_generate(grouped_task="Write a poem about these facts")
    .with_limit(3)
    .do()
)

jprint(response)

{
  "data": {
    "Get": {
      "Question": [
        {
          "_additional": {
            "generate": {
              "error": null,
              "groupedResult": "In the land of Hibbing, where history resides,\nA museum stands tall, where knowledge abides.\nTracing the footsteps of a company grand,\nFounded in 1914, by a visionary hand.\n\nGreyhound, the name that echoes through time,\nA bus company born, with a purpose sublime.\nUsing Hupmobiles, they embarked on a quest,\nConnecting people, from east to the west.\n\nIn Hibbing, the stories of old come alive,\nAs the wheels of progress continue to drive.\nThrough exhibits and artifacts, we can explore,\nThe legacy of Greyhound, forevermore.\n\nAcross the ocean, in a land far away,\nThe Rijksmuseum stands, where art holds sway.\nA Dutch national treasure, with a rich history,\nBorn from the vision of Louis Bonaparte's decree.\n\nIn 1808, the seeds were sown,\nA museum of art, where beauty was shown.\nThrough the ages, it grew a

In [24]:
print(response["data"]["Get"]["Question"][0]["_additional"]["generate"]["groupedResult"])

In the land of Hibbing, where history resides,
A museum stands tall, where knowledge abides.
Tracing the footsteps of a company grand,
Founded in 1914, by a visionary hand.

Greyhound, the name that echoes through time,
A bus company born, with a purpose sublime.
Using Hupmobiles, they embarked on a quest,
Connecting people, from east to the west.

In Hibbing, the stories of old come alive,
As the wheels of progress continue to drive.
Through exhibits and artifacts, we can explore,
The legacy of Greyhound, forevermore.

Across the ocean, in a land far away,
The Rijksmuseum stands, where art holds sway.
A Dutch national treasure, with a rich history,
Born from the vision of Louis Bonaparte's decree.

In 1808, the seeds were sown,
A museum of art, where beauty was shown.
Through the ages, it grew and evolved,
Preserving masterpieces, for all to behold.

From Rembrandt's strokes to Van Gogh's flair,
The Rijksmuseum's halls, a haven so rare.
A sanctuary of culture, where passions ignite,
G