# ClassifAI Package 
## Object Oriented Demo ✨


#### The ClassifAI package is a tool to help in the creation and serving of vector databases, for classification tasks.

#### This Notebook is a quick guide that shows how the package can separate three main concerns involved in making a live, searchable, vector database for your applications:

1. **Vectorising** - The creation of vectors from text  
2. **Indexing** - The creation of a vector store, converting many texts to vectors 
3. **Serving** - Wrapping the Vector Store in an API to make it searchable from endpoints

#### We provide three key modules in this package, that let you build Rest-API search systems from your text data using these three classes together in one development process.

```markdown
![Workflow Diagram](files/intro.png)
```

## Setup

In [None]:
!pip install ipykernel
!pip install ipywidgets

In [None]:
!pip install git+https://github.com/datasciencecampus/classifAI_API@oo-prototype


In [None]:
!gcloud auth application-default login

## Vectorising

We provide several vectoriser classes - that you can use to convert text to embeddings/vectors

```markdown
![Workflow Diagram](files/vectoriser.png)
```

In [2]:
from classifAI_API.vectorisers import HuggingFaceVectoriser

#But this also works with many different huggingface models!
vectoriser = HuggingFaceVectoriser(model_name="sentence-transformers/all-MiniLM-L6-v2")

my_first_vector = vectoriser.transform("ClassifAI is a great tool for building AI applications.")

my_first_vector  

array([[-4.22041267e-01, -4.10377234e-01, -4.79407161e-01,
        -7.37738162e-02,  1.84127182e-01,  7.83520713e-02,
        -1.13779500e-01,  2.69262046e-01, -2.63399690e-01,
        -2.31431574e-02, -2.26780459e-01, -1.37488946e-01,
         7.27206469e-02,  8.67812857e-02,  3.72249275e-01,
         1.19233511e-01, -1.69905335e-01,  1.92818478e-01,
        -4.00653422e-01, -3.09419453e-01,  2.70585865e-01,
         2.20776826e-01,  1.22355677e-01, -1.28483042e-01,
        -2.09892988e-01,  3.53233993e-01,  2.28323430e-01,
        -3.78446400e-01,  2.30527714e-01, -2.21412882e-01,
        -2.15664342e-01, -5.41087613e-02,  3.89351815e-01,
         2.29717374e-01, -4.39128220e-01,  5.05839765e-01,
        -4.14097816e-01,  6.41432777e-02, -2.26554405e-02,
        -2.16778591e-01, -5.79406857e-01,  2.08568498e-01,
         1.58676267e-01, -5.79368114e-01,  5.55266500e-01,
         1.76069494e-02, -4.18267787e-01, -2.87758768e-01,
         5.84228516e-01, -5.10404352e-03, -5.70754349e-0

#### Huggingface models might be the most accessible, but we also provide a GCP_Vectoriser if you have a vertex account set up!

In [None]:
from classifAI_API.vectorisers import GcpVectoriser

my_gcp_vectoriser = GcpVectoriser(project_id="<YOUR PROJECT ID>",)

my_second_vector = my_gcp_vectoriser.transform("The quick brown fox jumps over the log")

my_second_vector.shape

#### Both of these Vectoriser classes accept strings (or lists of strings) and return numpy arrays:

## Indexing

#### We then provide an Indexer Class that allows you to create and store vectors. You pass it **any** of the Vectoriser models

#### its job is to iterate over a csv file you provide and convert it to vectors and store it:

In [None]:
from classifAI_API.indexers import VectorStore


my_vector_store = VectorStore(
    file_name="data/testdata.csv",
    data_type="csv",
    vectoriser=vectoriser, #or switch to the GcpVectoriser if you have it :)
    batch_size=10
)

#### Once this is created you can search the vector store by calling the .search() method on the object!

`You might also notice that the vector store and its metadata are now stored in the "classifai_vector_store" folder`

From here you can load existing vector stores in from memory without doing the indexing again - call the class method **VectorStore.from_filespace()**

In [None]:
my_vector_store.search("What colour is snow?")

query_id,query_text,doc_id,doc_text,rank,score
i64,str,str,str,i64,f64
0,"""What colour is snow?@""","""1004""","""""Snow is white.""""",0,18.900158
0,"""What colour is snow?@""","""1003""","""""Grass is green.""""",1,8.781666
0,"""What colour is snow?@""","""1001""","""""The sky is blue.""""",2,8.378674
0,"""What colour is snow?@""","""1009""","""""Rain falls from clouds.""""",3,6.892408
0,"""What colour is snow?@""","""1007""","""""Flowers bloom in spring.""""",4,5.258996
0,"""What colour is snow?@""","""1008""","""""The desert is hot.""""",5,4.887315
0,"""What colour is snow?@""","""1006""","""""Mountains are tall.""""",6,4.532707
0,"""What colour is snow?@""","""1002""","""""Apples are sweet.""""",7,3.380986
0,"""What colour is snow?@""","""1005""","""""The ocean is vast.""""",8,1.276922
0,"""What colour is snow?@""","""1010""","""""Books are full of knowledge.""""",9,0.950359


In [26]:
#or multiple queries at once!
my_vector_store.search(["What colour is snow?", "what is inside books"], n_results=5)

query_id,query_text,doc_id,doc_text,rank,score
i64,str,str,str,i64,f64
0,"""What colour is snow?""","""1004""","""""Snow is white.""""",0,21.517035
0,"""What colour is snow?""","""1003""","""""Grass is green.""""",1,10.439695
0,"""What colour is snow?""","""1001""","""""The sky is blue.""""",2,10.01306
0,"""What colour is snow?""","""1009""","""""Rain falls from clouds.""""",3,8.291956
0,"""What colour is snow?""","""1006""","""""Mountains are tall.""""",4,6.119362
1,"""what is inside books""","""1001""","""""The sky is blue.""""",0,0.0
1,"""what is inside books""","""1001""","""""The sky is blue.""""",1,0.0
1,"""what is inside books""","""1001""","""""The sky is blue.""""",2,0.0
1,"""what is inside books""","""1001""","""""The sky is blue.""""",3,0.0
1,"""what is inside books""","""1001""","""""The sky is blue.""""",4,0.0


#### this all seemlessly uses the vector model and the vector database you indexed to bring you the top K search results

## Serving up your VectorStore!

#### So you've created a vectorstore, with you chosen vectoriser, that makes vectors and you can search it.... **how do I host it so others can use it?**

In [28]:
async def main():
    print(1)
    
await main()

1


In [29]:
from classifAI_API.servers import start_api

start_api(vector_stores=[my_vector_store], endpoint_names=["easy_lib"], port=8000)

INFO - Starting ClassifAI API
INFO - Registering endpoints for: easy_lib


RuntimeError: asyncio.run() cannot be called from a running event loop

### But starting a server doesn't work in Jupyter, so you'll need to run demo_part2.py (also in this folder) from command line