<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Write-to-db" data-toc-modified-id="Write-to-db-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Write to db</a></span></li><li><span><a href="#Get-all-db-Documents" data-toc-modified-id="Get-all-db-Documents-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Get all db Documents</a></span></li><li><span><a href="#Load-Document-from-db-given-id" data-toc-modified-id="Load-Document-from-db-given-id-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Load Document from db given id</a></span></li><li><span><a href="#Delete-Document-given-id" data-toc-modified-id="Delete-Document-given-id-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Delete Document given id</a></span></li><li><span><a href="#Search-Documents-given-query-Document" data-toc-modified-id="Search-Documents-given-query-Document-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Search Documents given query Document</a></span></li><li><span><a href="#Update-doc-given-id" data-toc-modified-id="Update-doc-given-id-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Update doc given id</a></span></li></ul></div>

In [1]:
%load_ext autoreload

%autoreload 2

In [2]:
import numpy as np
import weaviate
import utils

In [3]:
!curl -s http://localhost:8080/v1/meta

{"hostname":"http://[::]:8080","modules":{},"version":"1.9.0"}


In [4]:
client = weaviate.Client('http://localhost:8080')

In [5]:
from docarray import Document

doc_schema = {
            'class': 'Document',
            'properties': [
                {'dataType': ['blob'], 'name': 'serialized_doc'},
            ],
                'vectorizer': 'none',
        }

We can make sure to clean the db to start with

In [6]:
client.schema.delete_all()

## Write to db

In [7]:
d1 = Document(embedding=np.array([1,2,3]))
d2 = Document(embedding=np.array([0,0,0]))
d3 = Document(embedding=np.array([1,0,0]))
d4 = Document(embedding=np.array([0,1,0]))
d5 = Document(embedding=np.array([.5,0,0]))

docs = [d1,d2,d3,d4,d5]

In [8]:
for d in docs:
    utils.write_to_weaviate(client, d)

## Get all db Documents

In [9]:
utils.get_all_docs(client)

[<Document ('id', 'embedding') at a93fe2f6754d11ec9fe0787b8ab3f5de>,
 <Document ('id', 'embedding') at a93fe788754d11ec9fe0787b8ab3f5de>,
 <Document ('id', 'embedding') at a93fe9d6754d11ec9fe0787b8ab3f5de>,
 <Document ('id', 'embedding') at a93fed50754d11ec9fe0787b8ab3f5de>,
 <Document ('id', 'embedding') at a93ff02a754d11ec9fe0787b8ab3f5de>]

In [10]:
for d in utils.get_all_docs(client):
    print(d.embedding)

[1 2 3]
[0 0 0]
[1 0 0]
[0 1 0]
[0.5 0.  0. ]


## Load Document from db given id

In [11]:
doc_id = docs[0].id
utils.get_doc_by_id(client, doc_id )

 <Document ('id', 'embedding') at a93fe2f6754d11ec9fe0787b8ab3f5de>


## Delete Document given id

In [12]:
utils.delete_given_id(client, d5.id)

In [13]:
for d in utils.get_all_docs(client):
    print(d.embedding)

[1 2 3]
[0 0 0]
[1 0 0]
[0 1 0]


## Search Documents given query Document


- <span style='color:red'> What is the certainty returned ? (why there is a None value?) </span>

- <span style='color:red'> What distance is used ? </span>

- <span style='color:red'> How can we get the distance measure returned ? </span>

- <span style='color:red'> How can we get change the distance measure used (if possible)? </span>



In [14]:
# what does certainty mean ? 
# Wh
query_embedding = {'vector':np.array([0.9,0,0])}
client.query.get('Document', ['_additional {certainty}','_additional {id}']).with_near_vector(query_embedding).do()

{'data': {'Get': {'Document': [{'_additional': {'certainty': None,
      'id': 'a93fe788-754d-11ec-9fe0-787b8ab3f5de'}},
    {'_additional': {'certainty': 1,
      'id': 'a93fe9d6-754d-11ec-9fe0-787b8ab3f5de'}},
    {'_additional': {'certainty': 0.5,
      'id': 'a93fed50-754d-11ec-9fe0-787b8ab3f5de'}},
    {'_additional': {'certainty': 0.63363063,
      'id': 'a93fe2f6-754d-11ec-9fe0-787b8ab3f5de'}}]}}}

In [15]:
q =  Document(embedding=np.array([0.9,0,0]))

In [16]:
utils.search_near_docs(client, q)

[<Document ('id', 'embedding') at a93fe788754d11ec9fe0787b8ab3f5de>,
 <Document ('id', 'embedding') at a93fe9d6754d11ec9fe0787b8ab3f5de>,
 <Document ('id', 'embedding') at a93fed50754d11ec9fe0787b8ab3f5de>,
 <Document ('id', 'embedding') at a93fe2f6754d11ec9fe0787b8ab3f5de>]

In [17]:
# here the doc with embedding [1,0,0] should be the first retrieved item if euclidean dist
for d in utils.search_near_docs(client, q):
    print(d.embedding)

[0 0 0]
[1 0 0]
[0 1 0]
[1 2 3]


## Update doc given id

Internally we can use data_object.replace to update the document of a DocumentArray by a another document

- <span style='color:red'> Why we need to pass a vector ? If I don't I get the following error</span>


```
UnexpectedStatusCodeException: Replace object! Unexpected status code: 500, with response body: {'error': [{'message': "update object: this class is configured to use vectorizer 'none' thus a vector must be present when importing, got: field 'vector' is empty or contains a zero-length vector"}]}
```

In [18]:
wdocs = utils.get_all_docs(client)
for d in wdocs:
    print(d.embedding)

[1 2 3]
[0 0 0]
[1 0 0]
[0 1 0]


In [20]:
d = wdocs[0]
d.embedding = [6,6,6]

utils.update_doc_given_id(client, d)

In [21]:
wdocs = utils.get_all_docs(client)
for d in wdocs:
    print(d.embedding)

[6, 6, 6]
[0 0 0]
[1 0 0]
[0 1 0]
