# Kiri Core Example: Semantic Search

This is a notebook that walks through some of the core functionality of Kiri's semantic search.

# Search

We've got two flavors to look at: In-memory, and Elastic-based search.

The in-memory version is meant for local testing, experimenting -- dev stuff.

Using an Elastic backend is more suitable for production.

We'll start with memory.

In [1]:
# If you've got one, change it here.
api_key = None

In [2]:
from kiri import Kiri, InMemoryDocStore


# If a DocStore isn't provided when Kiri is instantiated, it'll default to an InMemoryDocStore. 
# Explicitness here for clarity.
m_doc_store = InMemoryDocStore()

if api_key:
    kiri = Kiri(m_doc_store, api_key=api_key)
else:
    kiri = Kiri(m_doc_store, local=True)

### Documents

Documents are the base object on which much of Kiri operates.

At their simplest, they're a wrapper for a string of content.

Upon uploading a document to a DocStore, it is processed -- the 
`vectorise_model` parameter of the Kiri Core is used to encode 
the entire document as a vector, which is then used for search.

There are variants of Documents that will be covered later.

In [3]:
from kiri import Document


# The only required argument for a Document is some text content
content = "The Kiri Natural Language Engine is a high-level library for NLP tasks: a sweet suite of modules."

# Doc attributes are abritray -- put in whatever metadata you might need for your use case.
attributes = {"title": "Kiri Search Demo", "url": "https://kiri.ai"}

doc = Document(content=content, attributes=attributes)

kiri.upload([doc])

##### Brief Note on Uploading Content
Kiri's upload process simply passes the supplied documents along to the upload function of whichever DocStore you provided on instantiation.

This allows you to customize the upload process with your own preprocessing steps by extending an existing DocStore class.

In a similar vein, Kiri allows you to supply your own Document vectorization function to the DocStore upload -- however, that will be covered in a separate notebook with some other 'advanced' features.

### Searching

Once you've uploaded content to the DocStore, you're ready to search. It's as simple as that.

For an in-memory search, you can provide a list of IDs to narrow search down.
However, all you *need* to provide is a query.
Docs will be retrieved from the DocStore, and sent to the result processing function along with
a query vector.

When processing results, comparisons are performed between the query and document vectors to calculate
each document's relevancy score.

In [4]:
results = kiri.search("What's Kiri?", max_results=10, min_score=0.01)

# If you haven't modified this, then... well, you're going to get one result.
# Find some text you want to search yourself, and try making some Documents!

print(f"Results found: {results.total_results}")
print("======")
for res in results.results:
    # You can access your attributes here.
    print(res.document.attributes["title"])
    print(res.preview)
    print(f"Score: {str(res.score)}")
    print("------")

Results found: 1
Kiri Search Demo
 The Kiri Natural Language Engine is a high-level library for NLP tasks: a sweet suite of modules.
Score: 0.5683653724383403
------


## What's Elastic?

Elasticsearch is an open source search engine -- it's commonly used in enterprise.
It's quick, scalable, and documents are schema-free. 

## Why use it with Kiri?

Elastic provides a production-ready backend for search with very little overhead required.

When used in combination with Kiri, you get all the benefits of Elastic, enhanced with Kiri's
semantic processing.

#### Setting up Elasticsearch
##### With Docker
There's a simple Docker one-liner that can be used to instantly set up Elasticsearch.
`docker run -d -p 9200:9200 -e "discovery.type=single-node" elasticsearch:latest`

The `-d` flag refers to detached mode, meaning it will run in the background. Omit `-d` to get logs in your terminal instance.

If you'd like to host the Docker instance remotely, AWS has a small free-tier instance. It has 750 monthly hours, so you can leave it constantly running.

##### Local only
Refer to [this guide](https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-install.html) to get Elasticsearch running locally on your computer.

Once you've got it installed and running, it's easy to use with Kiri.

#### Elastic-specific Kiri classes
Kiri includes variants of its Document and DocStore classes (ElasticDocument and ElasticDocStore).
These variants handle the necessary procsesing to turn a Document object into an Elastic-ready one,
as well as back into an object when retrieved *from* Elasticsearch.

In practice, this makes it incredibly easy to switch from a dev environment with `InMemorySearch` and standard `Documents` to an Elastic one ready for production.

In [13]:
from kiri import ElasticDocStore, ElasticDocument

# An Elasticsearch index is essentially a database 
e_doc_store = ElasticDocStore("http://localhost:9200", doc_class=ElasticDocument, 
                              index="kiri_example")

if api_key:
    e_kiri = Kiri(e_doc_store, api_key=api_key)
else:
    e_kiri = Kiri(e_doc_store, local=True)

# Making a document just like before.
# This time, we'll use the ElasticDocument class.
e_content = "Kiri works with Elasticsearch as a backend. Nice."
e_attrs = {"title": "Kiri Elastic Demo", "foo": "bar", "url": "https://kiri.ai"}

e_doc = ElasticDocument(content=e_content, attributes=e_attrs)

e_kiri.upload([e_doc])

### Searching an ElasticDocStore

Performing a search on an `ElasticDocStore` can be just as simple as the `InMemoryDocStore`. All you need is a query.

Kiri defaults to creating an Elastic query that checks cosine similarity between document and query vectors.
However, Elasticsearch includes capabilities to search via more complex queries. This will be covered in more detail in a separate notebook.

In [17]:
# Easy...

e_results = e_kiri.search("What backends does Kiri support?", max_results=10, min_score=0.01)

print(f"Results found: {e_results.total_results}")
print("======")
for res in e_results.results:
    print(res.document.attributes["title"])
    # Again, can print any attributes you added.
    print(f"Document Foo: {res.document.attributes['foo']}")
    print(res.preview)
    print(f"Score: {str(res.score)}")
    print("------")

Results found: 1
Kiri Elastic Demo
Document Foo: bar
 Kiri works with Elasticsearch as a backend. Nice.
Score: 0.44349044316815744
------
