<img src="https://relevance.ai/wp-content/uploads/2021/11/logo.79f303e-1.svg" width="150" alt="Relevance AI" />
<h5> Developer-first vector platform for ML teams </h5>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/RelevanceAI/workflows/blob/main/workflows/subclustering/basic_subclustering.ipynb)

# 🤖: Basic Sub-clustering

This notebook is a quick guide on how to use Relevance AI for subclustering. Subclustering allows users to infinitely drill down into their clusters by running more clusters.

Basic sub-clustering allows users to rely on clustering in simple ways.

For more details, please refer to the  [references](https://relevanceai.readthedocs.io/en/development/operations/cluster/subclustering.html).


In [None]:
!pip install -q 'RelevanceAI[notebook, excel]'

In [None]:
from relevanceai import Client

"""
You can sign up/login and find your credentials here: https://cloud.relevance.ai/sdk/api
Once you have signed up, click on the value under `Authorization token` and paste it here
"""

client = Client()

# 🚣 Inserting data

Prepare a sample retail dataset.

In [None]:
from relevanceai.utils.datasets import get_ecommerce_dataset_encoded

docs = get_ecommerce_dataset_encoded()
docs[0]


In [None]:
ds = client.Dataset('basic_subclustering')
ds.upsert_documents(docs)

In [None]:
ds.schema

Running the initial clustering approach:

In [None]:

n_clusters = 10
vector_field = "product_image_clip_vector_"
parent_alias = f"kmeans_{n_clusters}"

from sklearn.cluster import KMeans
model = KMeans(n_clusters=n_clusters)

cluster_ops = ds.cluster(
   model,
   vector_fields=[vector_field],
   alias=parent_alias
)

# You can find the parent field in the schema or alternatively provide a field
parent_field = f"_cluster_.{vector_field}.{parent_alias}"


Running sub-clustering is then as simple as running **ds.subcluster** to the function.

In [None]:

# Given the parent field - we now run subclustering
ds.subcluster(
   model=model,
   parent_field=parent_field,
   vector_fields=["sample_2_vector_"],
   alias="subcluster-kmeans-2"
)


You can then run sub-clustering again on a separate parent alias!

In [None]:

# Given the parent field - we now run subclustering
ds.subcluster(
   model=model,
   parent_field=parent_field,
   vector_fields=["sample_2_vector_"],
   alias="subcluster-kmeans-2"
)


You are also able to infinitely continue subclustering as required by constantly referring back to the parent alias.


**Next steps**

If you require more indepth knowledge around subclustering, we will be writing more guides on how to adapt these to different aliases and models in the near future.


For more details, please refer to the  [references](https://relevanceai.readthedocs.io/en/development/operations/cluster/subclustering.html).