# Demo Weaviate Vector Database for AI Integration.
### Prerequisites
- Kubernetes Cluster
- Helm 3.x
- Python3.x
- Portworx Enterprise 2.11.x or 3.x

**Edit the values.yaml in this repo**
You can change other things but this *values.yaml* is configured to work with this demo. Just make sure the **storageClassName:** exists in your cluster. By default I am using **px-csi-replicated** this is a repl=2 StorageClass and is included in Portworx default installs (as of testing on October 11, 2023).

```
storage:
  size: 32Gi
  storageClassName: "px-csi-replicated"
```



### Now run the steps below to install Weaviate and run a test query.

In [None]:
import weaviate
import json
import requests
import os

In [40]:
os.system(
'helm upgrade --install \
weaviate weaviate/weaviate \
--create-namespace \
--namespace weaviate \
--values ../values.yaml'
)

Release "weaviate" does not exist. Installing it now.
NAME: weaviate
LAST DEPLOYED: Wed Oct 18 20:08:36 2023
NAMESPACE: weaviate
STATUS: deployed
REVISION: 1
TEST SUITE: None


0

Run and re-run this cell until you see the Weaviate pod show *READY 1/1*

In [None]:
os.system('kubectl -n weaviate get pod,svc,pvc')

## Import ENV Vars
- To use the OpenAI integration you must have a API Key that is availble to make queries against OpanAI API. Edit the `.env` file to include it. 
- The Weaviate key is a generic key and matches the installation parameters in the `values.yaml`.  
- Please remember if you modify the `values.yaml` you must also modify your `.env`
- In the above cell you will see the output of the pod running and the external IP this will need to be placed into `.env` as well. `IP_WEAVIATE='Your IP'`

Once you have fully edited the `.env` you may run the following cell:

In [None]:


from dotenv import load_dotenv
load_dotenv()

## Pull the key for Weaviate and OpenAPI Create the classObj


In [None]:
weaviate_key = str(os.getenv('WEAVIATE_KEY'))
open_ai_key = os.getenv('OPENAI_KEY')
ip_weaviate = os.getenv('IP_WEAVIATE')

client = weaviate.Client(
    url = "http://" + ip_weaviate ,  # Replace with your endpoint
    auth_client_secret=weaviate.AuthApiKey(api_key=str(weaviate_key)),  # Replace w/ your Weaviate instance API key
    additional_headers = {
        "X-OpenAI-Api-Key": open_ai_key  # Replace with your inference API key
    }
)


class_obj = {
    "class": "Question",
    "vectorizer": "text2vec-openai",  # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
    "moduleConfig": {
        "text2vec-openai": {},
        "generative-openai": {}  # Ensure the `generative-openai` module is used for generative queries
    }
}

client.schema.create_class(class_obj)

## Import data into Weaviate
- Pull the data sample of 10 Jeopardy Questions
- Insert it into Weaviate

In [None]:

resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')
data = json.loads(resp.text)  # Load data


client.batch.configure(batch_size=100)  # Configure batch
with client.batch as batch:  # Initialize a batch process
    for i, d in enumerate(data):  # Batch import data
        print(f"importing question: {i+1}")
        properties = {
            "answer": d["Answer"],
            "question": d["Question"],
            "category": d["Category"],
        }
        batch.add_data_object(
            data_object=properties,
            class_name="Question"
        )


In [None]:
#confirm load
client = weaviate.Client(
    url = "http://" + ip_weaviate,  # Replace with your endpoint
    auth_client_secret=weaviate.AuthApiKey(api_key=weaviate_key),  # Replace w/ your Weaviate instance API key
    additional_headers = {
        "X-OpenAI-Api-Key": open_ai_key  # Replace with your inference API key
    }
)
some_objects = client.data_object.get()
print(json.dumps(some_objects))

## Create Weaviate Client
- Create the client 

In [None]:

client = weaviate.Client(
    url = "http://" + ip_weaviate,  # Replace with your endpoint
    auth_client_secret=weaviate.AuthApiKey(api_key=weaviate_key),  # Replace w/ your Weaviate instance API key
    additional_headers = {
        "X-OpenAI-Api-Key": open_ai_key  # Replace with your inference API key
    }
)


## Search
- Send the search
    - Take the sample data from its vector database
    - provide answers where the concept is **near** "Biology"
    - Generate a response from OpenAI using this information and follow the prompt marked by `grouped_task=`
    

In [41]:

response = (
    client.query
    .get("Question", ["question", "answer", "category"])
    .with_near_text({"concepts": ["biology"]})
    .with_limit(2)
    .do()
)

print(json.dumps(response, indent=4))


{
    "data": {
        "Get": {
            "Question": [
                {
                    "answer": "DNA",
                    "category": "SCIENCE",
                    "question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance"
                },
                {
                    "answer": "species",
                    "category": "SCIENCE",
                    "question": "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification"
                }
            ]
        }
    }
}


## Clean Up
- Use helm to uninstall and kubectl to delete the pvc in the namespace. 
- Leaving this fresh for another demo

In [None]:
os.system(
'helm uninstall -n weaviate weaviate && kubectl -n weaviate delete pvc --all'
)

You are now so awesome at Generative AI and Vector DB's you would probably tweet about it