# Overview

Sentence-Transformers can be used in different ways to perform clustering of small or large set of sentences. In this notebook, sentences are mapped to sentence embeddings and then k-mean clustering is applied.

In [1]:
!pip install sentence-transformers==2.3.1

Collecting sentence-transformers==2.3.1
  Downloading sentence_transformers-2.3.1-py3-none-any.whl.metadata (11 kB)
Downloading sentence_transformers-2.3.1-py3-none-any.whl (132 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.8/132.8 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentence-transformers
Successfully installed sentence-transformers-2.3.1


In [2]:
from sentence_transformers import SentenceTransformer
from sklearn.cluster import KMeans


embedder=SentenceTransformer('all-MiniLM-L6-v2')
embedder.max_seq_length=256
embedder

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
  (2): Normalize()
)

We use Gemini robot generates 15 sentences which are described Melbourne Victoria. 

In [3]:
melbourne_descriptions = [
    "Melbourne, the cultural capital of Australia, exudes a vibrant energy with its laneways teeming with coffee shops, street art, and hidden bars.",
    "The Royal Botanic Gardens offer a tranquil escape within the city, adorned with diverse flora and serene walking paths.",
    "The iconic Federation Square pulsates with life, hosting events, art installations, and the Melbourne Museum and National Gallery of Victoria.",
    "Cobblestone streets and Victorian architecture transport you back in time as you explore the charming neighborhoods of Carlton and Fitzroy.",
    "St. Kilda Beach, with its vibrant beach boxes and bustling boardwalk, is the perfect spot for a sunset stroll, swim, or a bite by the bay.",
    "Queen Victoria Market, one of the largest open-air markets in the world, bursts with fresh produce, artisan goods, and lively conversations.",
    "Flinders Street Station, with its grand dome and bustling platforms, is a gateway to the city and a landmark in its own right.",
    "Melbourne's sporting scene is legendary, with the MCG hosting cricket matches, Rod Laver Arena for tennis, and AAMI Park for AFL.",
    "Hidden laneways offer a surprising culinary adventure, from hole-in-the-wall restaurants to Michelin-starred establishments.",
    "World-class coffee culture pervades the city, with independent cafes serving up the perfect brew in unique settings.",
    "Art and creativity flourish in Melbourne, with street art adorning walls, galleries showcasing diverse talents, and festivals celebrating artistic expression.",
    "The Yarra River winds through the city, offering scenic cruises, riverside walks, and spots for kayaking or paddleboarding.",
    "From quirky shops to high-end boutiques, Melbourne's shopping scene caters to every taste and budget.",
    "Diverse neighborhoods like Richmond, Brunswick, and Footscray offer distinct cultural experiences and culinary delights.",
    "Whether you're seeking vibrant nightlife, cultural immersion, or outdoor adventures, Melbourne has something for everyone."
]

corpus_embeddings=embedder.encode(melbourne_descriptions)
corpus_embeddings.size

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

5760

In [4]:
# perform kmean clustering
num_clusters=5
clustering_model=KMeans(n_clusters=num_clusters)
clustering_model.fit(corpus_embeddings)
cluster_assignment=clustering_model.labels_



In [5]:
clustered_sentences=[[] for i in range(num_clusters)]

for sentence_id, cluster_id in enumerate(cluster_assignment):
    clustered_sentences[cluster_id].append(melbourne_descriptions[sentence_id])

for i, cluster in enumerate(clustered_sentences):
    print('Cluster', i+1)
    print(cluster)

Cluster 1
['St. Kilda Beach, with its vibrant beach boxes and bustling boardwalk, is the perfect spot for a sunset stroll, swim, or a bite by the bay.', 'The Yarra River winds through the city, offering scenic cruises, riverside walks, and spots for kayaking or paddleboarding.']
Cluster 2
['The Royal Botanic Gardens offer a tranquil escape within the city, adorned with diverse flora and serene walking paths.', 'Cobblestone streets and Victorian architecture transport you back in time as you explore the charming neighborhoods of Carlton and Fitzroy.', 'Queen Victoria Market, one of the largest open-air markets in the world, bursts with fresh produce, artisan goods, and lively conversations.', 'Flinders Street Station, with its grand dome and bustling platforms, is a gateway to the city and a landmark in its own right.', "From quirky shops to high-end boutiques, Melbourne's shopping scene caters to every taste and budget."]
Cluster 3
['World-class coffee culture pervades the city, with i