# KeyPhrases Workflow

This workflows shows users how to find keyphrases with Relevance AI.

In [None]:
!pip install -q RelevanceAI==1.3.2
!pip install -q rake-nltk

In [None]:
from relevanceai import Client 
client = Client()

# Input

In [None]:
# Modify inputs here if required
dataset_id = "ecommerce-example"
text_fields = ["product_title"]

In [None]:
ds = client.Dataset(dataset_id)
if dataset_id == "ecommerce-example":
    from relevanceai.datasets import get_dummy_ecommerce_dataset
    docs = get_dummy_ecommerce_dataset()
    ds.upsert_documents(docs)

# RAKE 

Relevance AI supports keyword extraction algorithm called `rake`. While this is the default, users can also specify additional stopwords to exclude on top of the `rake` algorithm and use normal keyword extraction. For example: 
- `nltk` with `unigram` (single word) extraction
- `nltk` with `bigram` (multiple word) extraction

In [None]:
ds.keyphrases(text_fields=text_fields, algorithm="rake")

# Specifying the amount of words in the keyphrases

In [None]:
ds.keyphrases(text_fields=text_fields, algorithm="nltk", n=3)

# Infinitely Hackable With Preprocessing Hooks

You can easily add processing to the string if required such as removing apostrophes or making lower case.

In [None]:
def remove_apostrophe(string):
    return string.replace("'s", "")

def lower_case(string):
    return string.lower()

In [None]:
ds.keyphrases(text_fields=text_fields, algorithm="nltk", n=3, preprocess_hooks=[remove_apostrophe. lower_case])

# Include Custom Stopwords

Users can also add stopwords on top of normal stopwords to improve insight!

In [None]:
ds.keyphrases(text_fields=text_fields, algorithm="nltk", n=3, additional_stopwords=["Men", "Women"], preprocess_hooks=[remove_apostrophe])

# Cluster Keyphrases

Users can also get the key phrases across each cluster. This can be helpful if users want some automated way to label their clusters!

In [None]:
# First we run some clustering
clusterer = ds.auto_cluster("kmeans-5", vector_fields=["product_title_clip_vector_"])

In [None]:
ds.cluster_keyphrases(vector_fields=["product_title_clip_vector_"], cluster_alias="kmeans-5", text_fields=["product_title"])