# Installation

In [None]:
# remove `!` if running the line in a terminal
!pip install -U RelevanceAI[notebook]==1.4.0


# Setup

First, you need to set up a client object to interact with RelevanceAI.

In [None]:
from relevanceai import Client

"""
You can sign up/login and find your credentials here: https://cloud.relevance.ai/sdk/api
Once you have signed up, click on the value under `Activation token` and paste it here
"""
client = Client()



# Data

You will need to have a dataset under your Relevance AI account. You can either use our e-commerce dataset as shown below or follow the tutorial on how to create your own dataset.

Our e-commerce dataset includes fields such as `product_title`, as well as the vectorized version of the field `product_title_clip_vector_`. Loading these documents can be done via:



## Load the data

In [None]:
from relevanceai.datasets import get_ecommerce_dataset_encoded

documents = get_ecommerce_dataset_encoded()
{k:v for k, v in documents[0].items() if '_vector_' not in k}


## Upload the data to Relevance AI

Run the following cell, to upload these documents into your personal Relevance AI account under the name `quickstart_clustering_list_furthest`

In [None]:
df = client.Dataset('quickstart_clustering_list_furthest')
df.insert_documents(documents)


## Check the data

In [None]:
df.health


# Clustering

We apply the Kmeams clustering algorithm to the vector field, `product_title_clip_vector_`, to perform clustersing.

In [None]:
from relevanceai.clusterer import KMeansModel

VECTOR_FIELD = "product_title_clip_vector_"
KMEAN_NUMBER_OF_CLUSTERS = 10
ALIAS = "kmeans_" + str(KMEAN_NUMBER_OF_CLUSTERS)

model = KMeansModel(k=KMEAN_NUMBER_OF_CLUSTERS)
clusterer = client.ClusterOps(alias=ALIAS, model=model)
clusterer.fit_predict_update(df, [VECTOR_FIELD])


Clustering results are automatically inserted into your datase. 
Here, we download a small sample and show the clustering results using our json_shower.

In [None]:
from relevanceai import show_json

sample_documents = df.sample(n=5)
samples = [{
    'product_title':d['product_title'],
    'cluster':d['_cluster_'][VECTOR_FIELD][ALIAS]
} for d in sample_documents]

show_json(samples, text_fields=['product_title', 'cluster'])



# List furthest from center

In [None]:

clusterer.list_furthest_from_center()
