# ✨ ClassifAI Demo ✨

---

#### ClassifAI is a tool to help in the creation and serving of searchable vector databases, for text classification tasks.

#### There are three main concerns involved in making a live, searchable, vector database for your applications:

1. **Vectorising** - The creation of vectors from text  
2. **Indexing** - The creation of a vector store, converting many texts to vectors 
3. **Serving** - Wrapping the Vector Store in an API to make it searchable from endpoints

#### ClassifAI provides three key modules to address these, letting you build Rest-API search systems from your text data

#### Setup

In [None]:
# !pip install git+https://github.com/datasciencecampus/<repo-to-be-public-soon>

## Vectorising

![Vectoriser_image](files/vectoriser.png)

#### We provide several vectoriser classes that you can use to convert text to embeddings/vectors;
```python
from classifai.vectorisers import (
    HuggingFaceVectoriser,
    GcpVectoriser,
    OllamaVectoriser
)
```

If none of these match your needs, you can define a custom vectoriser by extending our base class;
```python
from classifai.vectorisers import VectoriserBase
```
We'll discuss that option in more detail after this initial demo.

---

#### Initialising a vectoriser:

We'll download and use a locally-hosted, small HuggingFace model;

In [None]:
from classifai.vectorisers import HuggingFaceVectoriser

# Our embedding model is pulled down from HuggingFace, or used straight away if previously downloaded
# This also works with many different huggingface models!
vectoriser = HuggingFaceVectoriser(model_name="sentence-transformers/all-MiniLM-L6-v2")

# The `.transform()` method converts text to a vector, or several texts to an array of vectors
my_first_vector = vectoriser.transform("classifai is a great tool for building AI applications.")
list_of_vectors = vectoriser.transform(["bag-of-words isn't as good as classifAI", "tf-idf isn't as good as classifAI"])


my_first_vector.shape, list_of_vectors.shape

## Indexing

### The VectorStore class creates a vector database by converting a set of labelled texts to embeddings, using an associated Vectoriser.
#### Once created, it can be 'searched', using the vectoriser to embed queries as vectors and calculate their semantic similarity to the labelled texts in the VectorStore
![VectorStore_image](files/VectorStore.png)


In [None]:
from classifai.indexers import VectorStore

my_vector_store = VectorStore(
    file_name="data/testdata.csv",
    data_type="csv",
    vectoriser=vectoriser,
    meta_data={"colour": str, "language": str},
    overwrite=True,
)

### Once created, you can search the vector store by calling the .search() method

In [None]:
my_vector_store.search("What colour is snow?")

In [None]:
# you can search multiple queries at once (and specify how many results you want per query)
my_vector_store.search(["What colour is snow?", "what is inside books"], n_results=5)

#### You can also search by id by calling the .reverse_search method on the object

In [None]:
my_vector_store.reverse_search(["1100", "1056"])

## Serving up your VectorStore!

#### So you've created a VectorStore, with your chosen Vectoriser; that makes vectors and you can search it...

#### *Now, how do I host it so others can use it?*

![Server_Image](files/servers.png)

In [None]:
# You wouldn't usually do this in a Jupyter notebook,
# and the kernel locks up when you start up the FastAPI
# server, so we're just showing the necessary code here
# and we'll demo it via a Python script shortly
import nest_asyncio

from classifai.servers import start_api

nest_asyncio.apply()
start_api(vector_stores=[my_vector_store], endpoint_names=["my_endpoint"], port=8000)

# Look at https://0.0.0.0:8000/docs to see the Swagger API documentation and test in the browser

## Roundup

#### That's it - you should now have made a running restAPI service that lets you search the texts you indexed in the test CSV.

#### Check out the GitHub repo, where there is a quick start guide in the Readme.md 😊