# ClassifAI Package 
## Object Oriented Demo ✨


#### The ClassifAI package is a tool to help in the creation and serving of vector databases, for classification tasks.

#### This Notebook is a quick guide that shows how the package can separate three main concerns involved in making a live, searchable, vector database for your applications:

1. **Vectorising** - The creation of vectors from text  
2. **Indexing** - The creation of a vector store, converting many texts to vectors 
3. **Serving** - Wrapping the Vector Store in an API to make it searchable from endpoints

#### We provide three key modules in this package, that let you build Rest-API search systems from your text data using these three classes together in one development process.

## Setup

In [None]:
!pip install ipykernel
!pip install ipywidgets

# or use the corresponding uv commands

In [None]:
!pip install git+https://github.com/datasciencecampus/classifAI_package

In [None]:
# if you have a gcloud account that has Vertex AI embedding models enabled, you can run this line to use those embedding models in this demo
!gcloud auth application-default login

## Vectorising

#### We provide several vectoriser classes - that you can use to convert text to embeddings/vectors

In [None]:
from IPython.display import Image, display

display(Image(filename="./files/vectoriser.png"))

In [None]:
from classifai_package.vectorisers import HuggingFaceVectoriser

# But this also works with many different huggingface models!
vectoriser = HuggingFaceVectoriser(model_name="sentence-transformers/all-MiniLM-L6-v2")

my_first_vector = vectoriser.transform("Classifai_package is a great tool for building AI applications.")

my_first_vector

#### Huggingface models might be the most accessible, but we also provide a GCP_Vectoriser if you have a vertex account set up!

In [None]:
from classifai_package.vectorisers import GcpVectoriser

my_gcp_vectoriser = GcpVectoriser(
    project_id="<YOUR PROJECT ID>",
)

my_second_vector = my_gcp_vectoriser.transform("The quick brown fox jumps over the log")

my_second_vector.shape

#### Both of these Vectoriser classes accept strings (or lists of strings) and return numpy arrays:

## Indexing

#### We then provide an Indexer Class that allows you to create and store vectors. You pass it **any** of the Vectoriser models

#### its job is to iterate over a csv file you provide and convert it to vectors and store it:

In [None]:
from IPython.display import Image, display

display(Image(filename="./files/VectorStore.png"))

In [None]:
from classifai_package.indexers import VectorStore

my_vector_store = VectorStore(
    file_name="data/testdata.csv",
    data_type="csv",
    vectoriser=vectoriser,  # or switch to the GcpVectoriser if you have it :)
    batch_size=10,
)

#### Once this is created you can search the vector store by calling the .search() method on the object!

`You might also notice that the vector store and its metadata are now stored in the "testdata" folder`

From here you can load existing vector stores in from memory without doing the indexing again - call the class method **VectorStore.from_filespace()**

In [None]:
my_vector_store.search("What colour is snow?")

In [None]:
# or multiple queries at once!    (and specify how many results you want per query)
my_vector_store.search(["What colour is snow?", "what is inside books"], n_results=5)

#### You can also search by id by calling the .reverse_search method on the object

In [None]:
my_vector_store.reverse_search(["1100", "1056"])

#### this all seemlessly uses the vector model and the vector database you indexed to bring you the top K search results

## Serving up your VectorStore!

#### So you've created a vectorstore, with you chosen vectoriser, that makes vectors and you can search it.... **how do I host it so others can use it?**

In [None]:
from IPython.display import Image, display

display(Image(filename="./files/servers.png"))

In [None]:
import nest_asyncio

from classifai_package.servers import start_api

nest_asyncio.apply()  # this line just makes it possible to run the server in a Jupyter notebook, you would not need this in a normal python script.


start_api(vector_stores=[my_vector_store], endpoint_names=["my_endpoint"], port=8000)

## Roundup

#### That's it - you should now have made a running restAPI service that lets you search the texts you indexed in the test CSV.

#### Check out the GitHub repo, where there is a quick start guide in the Readme.md 😊