<img src="https://relevance.ai/wp-content/uploads/2021/11/logo.79f303e-1.svg" width="150" alt="Relevance AI" />
<h5> Developer-first vector platform for ML teams </h5>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/RelevanceAI/workflows/blob/main/workflows/explain-text-clusters/explain-text-clusters_form.ipynb)

# 😄 Explain Text Clusters

The following prepares your text vectors for cluster analysis. 

## Clustering

Please run the ['Cluster'](https://colab.research.google.com/github/RelevanceAI/workflows/blob/feature/sdk-418-convert-workflows-to-forms/workflows/cluster/Cluster_Your_Data_with_Relevance_AI_form.ipynb) workflow before running this 'Cluster Analysis' workflow.


## Cluster Analysis

This technique uses marginal similarity measure to explain clusters. 

### Algorithm Choice

The `relational` will compare the first document against the rest and is best used to explain why a document has been placed into this cluster with the other documents.

The `centroid` algorithm is best used to explain why a document has been placed into this cluster based on comparing to the center vector. This has down the downside of being noisy but is a more faithful representation of the cluster.

In [None]:
#@title After filling this form, press the top left button.
# You can grab your token here https://cloud.relevance.ai/sdk/api

token = "<copy paste from https://cloud.relevance.ai/sdk/api>"  #@param  {type:"string"}
dataset_id = "<your dataset ID here>"                           #@param {type:"string"}

#@markdown Cluster Operations Parameters - Please specify which cluster to load (see `ds.schema`)

#@markdown eg. `'_cluster_.product_title_clip_vector_.kmeans-20': 'text'`

cluster_alias = "<your cluster alias here>"                        #@param {type:"string"}
vector_field = "<your vector field here>"        #@param {type:"string"}

#@markdown Cluster Text Explanation Parameters

text_field =  "<your text field here>"                            #@param {type:"string"}

encode_function_or_model_id = "all-mpnet-base-v2"   #@param  {type:"string"} 
cluster_explanation_algorithm = "relational"        #@param  {type:"string"} 

if cluster_explanation_algorithm not in ['relational', 'centroid']:
    raise ValueError(f"{cluster_explanation_algorithm} must be either 'relational' or 'centroid'.")

## Install deps
!pip install -q RelevanceAI==2.3.2
!pip install -q sentence-transformers==2.2.0


from relevanceai import Client 
import json

client = Client(token=token)

ds = client.Dataset(dataset_id)

try: 
  json.dumps(ds.schema, indent=2)
  cluster_ops = ds.ClusterOps(
      alias = cluster_alias, 
      vector_fields = [vector_field]
  )
  cluster_ops.create_centroids()

  cluster_ops.explain_text_clusters(
      text_field = text_field, 
      encode_fn_or_model = encode_function_or_model_id, 
      algorithm = cluster_explanation_algorithm
  )
except Exception as e:
  raise ValueError(f'{e}')


# 🌇 Next Steps

This is just a quick tutorial on Relevance AI, there are many more applications that is possible such as zero-shot based labelling, recommendations, anomaly detection, projector and more:

- Explore our platform and check out new workflows at https://cloud.relevance.ai
- There are more indepth tutorials and guides at https://docs.relevance.ai
- There are detailed library references at https://relevanceai.readthedocs.io/
- Join our slack community at https://join.slack.com/t/relevance-ai/shared_invite/zt-11fo8oush-dHPd57wamhoQ7J5arNv1mg