**This text covers 4 products from ZOHO namely - Zoho CRM , Zoho Mail , Zoho Docs , Zoho Books**


corpus_text = "Zoho CRM is a powerful tool that revolutionizes the way businesses manage their relationships with customers. It provides a comprehensive suite of features that allow businesses to track sales, automate marketing activities, and create custom reports. It is an exceptional email service that is clean, fast, and offers a user-friendly interface. Zoho mail provides a reliable and secure way for businesses to communicate internally and externally. With its powerful features such as instant chat, calendar, tasks, notes, and bookmarks, Zoho Mail goes beyond being just an email service and serves as a comprehensive communication hub. Zoho Docs is a comprehensive online document management system used for storing, sharing, and managing files securely. It allows businesses to create, collaborate, and edit documents in real-time, making it an ideal tool for team collaboration. Zoho Books is a comprehensive accounting solution designed to help businesses manage their finances and cash flow with ease. It offers a range of features including invoicing, expense tracking, inventory management, and sales and purchase order management. With its intuitive interface and powerful capabilities, Zoho Books simplifies the complex process of managing business finances, making it an ideal choice for small to medium-sized businesses."


#Changes that can possibly be done:
1. Model Training
2. K-Means
3. DBSCAN
4. Hierarchical Clustering
5. Replace 'all-MiniLM-L6-v2' with a different & efficient model
6. Replace 'Sentence Transformers' with some other model like 'TF-IDF' or 'Word2Vec' for generating the embeddings

In [None]:
pip install sentence-transformers

#K - Means

In [None]:
from sentence_transformers import SentenceTransformer, util
from sklearn.cluster import KMeans
import numpy as np
import nltk

In [None]:
# Download the Punkt tokenizer for sentence splitting
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')

In [None]:
# Corpus with example sentences
corpus_text = "Zoho CRM is a powerful tool that revolutionizes the way businesses manage their relationships with customers. It provides a comprehensive suite of features that allow businesses to track sales, automate marketing activities, and create custom reports. It is an exceptional email service that is clean, fast, and offers a user-friendly interface. Zoho mail provides a reliable and secure way for businesses to communicate internally and externally. With its powerful features such as instant chat, calendar, tasks, notes, and bookmarks, Zoho Mail goes beyond being just an email service and serves as a comprehensive communication hub. Zoho Docs is a comprehensive online document management system used for storing, sharing, and managing files securely. It allows businesses to create, collaborate, and edit documents in real-time, making it an ideal tool for team collaboration. Zoho Books is a comprehensive accounting solution designed to help businesses manage their finances and cash flow with ease. It offers a range of features including invoicing, expense tracking, inventory management, and sales and purchase order management. With its intuitive interface and powerful capabilities, Zoho Books simplifies the complex process of managing business finances, making it an ideal choice for small to medium-sized businesses."

# Split the corpus text into sentences
corpus = nltk.sent_tokenize(corpus_text)
print(corpus)

embedder = SentenceTransformer('all-MiniLM-L6-v2')

# Compute embeddings for each sentence in the corpus
corpus_embeddings = embedder.encode(corpus)
corpus_embeddings = corpus_embeddings /  np.linalg.norm(corpus_embeddings, axis=1, keepdims=True)

In [None]:
# Perform kmeans clustering
num_clusters = 4
clustering_model = KMeans(n_clusters=num_clusters)
clustering_model.fit(corpus_embeddings)
cluster_assignment = clustering_model.labels_

# Assign each sentence to a cluster
clustered_sentences = [[] for _ in range(num_clusters)]
for sentence_id, cluster_id in enumerate(cluster_assignment):
    clustered_sentences[cluster_id].append(corpus[sentence_id])

# Print clustered sentences
for i, cluster in enumerate(clustered_sentences):
    print(f"Cluster {i+1}:")
    print(cluster)
    print("\n")

Cluster 1:
['Zoho CRM is a powerful tool that revolutionizes the way businesses manage their relationships with customers.', 'Zoho Docs is a comprehensive online document management system used for storing, sharing, and managing files securely.', 'Zoho Books is a comprehensive accounting solution designed to help businesses manage their finances and cash flow with ease.', 'With its intuitive interface and powerful capabilities, Zoho Books simplifies the complex process of managing business finances, making it an ideal choice for small to medium-sized businesses.']


Cluster 2:
['It provides a comprehensive suite of features that allow businesses to track sales, automate marketing activities, and create custom reports.', 'It offers a range of features including invoicing, expense tracking, inventory management, and sales and purchase order management.']


Cluster 3:
['It is an exceptional email service that is clean, fast, and offers a user-friendly interface.', 'Zoho mail provides a re



#DBSCAN

In [None]:
pip install sentence_transformers

In [None]:
from sentence_transformers import SentenceTransformer, util
from sklearn.cluster import DBSCAN
import numpy as np
import nltk

In [None]:
# Download the Punkt tokenizer for sentence splitting
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')

In [None]:
# Corpus with example sentences
corpus_text = "Zoho CRM is a powerful tool that revolutionizes the way businesses manage their relationships with customers. It provides a comprehensive suite of features that allow businesses to track sales, automate marketing activities, and create custom reports. It is an exceptional email service that is clean, fast, and offers a user-friendly interface. Zoho mail provides a reliable and secure way for businesses to communicate internally and externally. With its powerful features such as instant chat, calendar, tasks, notes, and bookmarks, Zoho Mail goes beyond being just an email service and serves as a comprehensive communication hub. Zoho Docs is a comprehensive online document management system used for storing, sharing, and managing files securely. It allows businesses to create, collaborate, and edit documents in real-time, making it an ideal tool for team collaboration. Zoho Books is a comprehensive accounting solution designed to help businesses manage their finances and cash flow with ease. It offers a range of features including invoicing, expense tracking, inventory management, and sales and purchase order management. With its intuitive interface and powerful capabilities, Zoho Books simplifies the complex process of managing business finances, making it an ideal choice for small to medium-sized businesses."

# Split the corpus text into sentences
corpus = nltk.sent_tokenize(corpus_text)

embedder = SentenceTransformer('all-MiniLM-L6-v2')

# Compute embeddings for each sentence in the corpus
corpus_embeddings = embedder.encode(corpus)
corpus_embeddings = corpus_embeddings /  np.linalg.norm(corpus_embeddings, axis=1, keepdims=True)





In [None]:
# Perform DBSCAN clustering
clustering_model = DBSCAN(eps=0.8, min_samples=2)
cluster_assignment = clustering_model.fit_predict(corpus_embeddings)


# Assign each sentence to a cluster
clustered_sentences = {}
for sentence_id, cluster_id in enumerate(cluster_assignment):
    if cluster_id not in clustered_sentences:
        clustered_sentences[cluster_id] = []

    clustered_sentences[cluster_id].append(corpus[sentence_id])

# Print clustered sentences
for i, cluster in clustered_sentences.items():
    print(f"Cluster {i+1}:")
    print(cluster)
    print("\n")


Cluster 0:
['Zoho CRM is a powerful tool that revolutionizes the way businesses manage their relationships with customers.', 'It is an exceptional email service that is clean, fast, and offers a user-friendly interface.', 'It allows businesses to create, collaborate, and edit documents in real-time, making it an ideal tool for team collaboration.']


Cluster 1:
['It provides a comprehensive suite of features that allow businesses to track sales, automate marketing activities, and create custom reports.', 'It offers a range of features including invoicing, expense tracking, inventory management, and sales and purchase order management.']


Cluster 2:
['Zoho mail provides a reliable and secure way for businesses to communicate internally and externally.', 'With its powerful features such as instant chat, calendar, tasks, notes, and bookmarks, Zoho Mail goes beyond being just an email service and serves as a comprehensive communication hub.']


Cluster 3:
['Zoho Docs is a comprehensive on

#TF-IDF for generating embeddings instead of SentenceTransformer

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import DBSCAN
import nltk

In [None]:
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')

# Corpus with example sentences
corpus_text = "Zoho CRM is a powerful tool that revolutionizes the way businesses manage their relationships with customers. It provides a comprehensive suite of features that allow businesses to track sales, automate marketing activities, and create custom reports. It is an exceptional email service that is clean, fast, and offers a user-friendly interface. Zoho mail provides a reliable and secure way for businesses to communicate internally and externally. With its powerful features such as instant chat, calendar, tasks, notes, and bookmarks, Zoho Mail goes beyond being just an email service and serves as a comprehensive communication hub. Zoho Docs is a comprehensive online document management system used for storing, sharing, and managing files securely. It allows businesses to create, collaborate, and edit documents in real-time, making it an ideal tool for team collaboration. Zoho Books is a comprehensive accounting solution designed to help businesses manage their finances and cash flow with ease. It offers a range of features including invoicing, expense tracking, inventory management, and sales and purchase order management. With its intuitive interface and powerful capabilities, Zoho Books simplifies the complex process of managing business finances, making it an ideal choice for small to medium-sized businesses."

# Split the corpus text into sentences
corpus = nltk.sent_tokenize(corpus_text)

# Use TfidfVectorizer to generate embeddings
vectorizer = TfidfVectorizer()
corpus_embeddings = vectorizer.fit_transform(corpus)

In [None]:
# Perform DBSCAN clustering with 4 clusters
clustering_model = DBSCAN(eps=0.8, min_samples=1)
cluster_assignment = clustering_model.fit_predict(corpus_embeddings)

# Assign each sentence to a cluster
clustered_sentences = {}
for sentence_id, cluster_id in enumerate(cluster_assignment):
    if cluster_id not in clustered_sentences:
        clustered_sentences[cluster_id] = []

    clustered_sentences[cluster_id].append(corpus[sentence_id])

# Print clustered sentences
for i, cluster in clustered_sentences.items():
    print(f"Cluster {i}:")
    print(cluster)
    print()

Cluster 0:
['Zoho CRM is a powerful tool that revolutionizes the way businesses manage their relationships with customers.']

Cluster 1:
['It provides a comprehensive suite of features that allow businesses to track sales, automate marketing activities, and create custom reports.']

Cluster 2:
['It is an exceptional email service that is clean, fast, and offers a user-friendly interface.']

Cluster 3:
['Zoho mail provides a reliable and secure way for businesses to communicate internally and externally.']

Cluster 4:
['With its powerful features such as instant chat, calendar, tasks, notes, and bookmarks, Zoho Mail goes beyond being just an email service and serves as a comprehensive communication hub.']

Cluster 5:
['Zoho Docs is a comprehensive online document management system used for storing, sharing, and managing files securely.']

Cluster 6:
['It allows businesses to create, collaborate, and edit documents in real-time, making it an ideal tool for team collaboration.']

Cluster 

#Hierarchical Clustering

In [None]:
from sentence_transformers import SentenceTransformer
import numpy as np
from sklearn.cluster import AgglomerativeClustering

In [None]:
# Load a pre-trained Sentence Transformer model
model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

In [None]:
corpus_text = "Zoho CRM is a powerful tool that revolutionizes the way businesses manage their relationships with customers. It provides a comprehensive suite of features that allow businesses to track sales, automate marketing activities, and create custom reports. It is an exceptional email service that is clean, fast, and offers a user-friendly interface. Zoho mail provides a reliable and secure way for businesses to communicate internally and externally. With its powerful features such as instant chat, calendar, tasks, notes, and bookmarks, Zoho Mail goes beyond being just an email service and serves as a comprehensive communication hub. Zoho Docs is a comprehensive online document management system used for storing, sharing, and managing files securely. It allows businesses to create, collaborate, and edit documents in real-time, making it an ideal tool for team collaboration. Zoho Books is a comprehensive accounting solution designed to help businesses manage their finances and cash flow with ease. It offers a range of features including invoicing, expense tracking, inventory management, and sales and purchase order management. With its intuitive interface and powerful capabilities, Zoho Books simplifies the complex process of managing business finances, making it an ideal choice for small to medium-sized businesses."

# Split the corpus text into individual sentences
sentences = corpus_text.split('.')

# Encode the sentences into numerical vectors
sentence_embeddings = model.encode(sentences)


In [None]:
# Perform hierarchical clustering with 4 clusters
cluster = AgglomerativeClustering(n_clusters=4, affinity='cosine', linkage='average')
clusters = cluster.fit_predict(sentence_embeddings)

# Create a dictionary to store the sentences in each cluster
clustered_sentences = {}
for sentence, cluster_label in zip(sentences, clusters):
    if cluster_label not in clustered_sentences:
        clustered_sentences[cluster_label] = []
    clustered_sentences[cluster_label].append(sentence)

# Print the clusters
for cluster_label, cluster_sentences in clustered_sentences.items():
    print(f'Cluster {cluster_label + 1}:')
    for sentence in cluster_sentences:
        print(f'- {sentence}')
    print()

Cluster 1:
- Zoho CRM is a powerful tool that revolutionizes the way businesses manage their relationships with customers
-  It is an exceptional email service that is clean, fast, and offers a user-friendly interface
-  Zoho mail provides a reliable and secure way for businesses to communicate internally and externally
-  With its powerful features such as instant chat, calendar, tasks, notes, and bookmarks, Zoho Mail goes beyond being just an email service and serves as a comprehensive communication hub
-  Zoho Docs is a comprehensive online document management system used for storing, sharing, and managing files securely
-  Zoho Books is a comprehensive accounting solution designed to help businesses manage their finances and cash flow with ease
-  With its intuitive interface and powerful capabilities, Zoho Books simplifies the complex process of managing business finances, making it an ideal choice for small to medium-sized businesses

Cluster 3:
-  It provides a comprehensive sui



# K - Means and TF-ID Vectorizer (accurate result so far)


In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans

# Define the sentences related to Zoho products
sentences = [
    "Zoho CRM is a powerful tool that revolutionizes the way businesses manage their relationships with customers",
    "Zoho Mail is an exceptional email service that is clean, fast, and offers a user-friendly interface",
    "With its powerful features such as instant chat, calendar, tasks, notes, and bookmarks, Zoho Mail goes beyond being just an email service and serves as a comprehensive communication hub",
    "Zoho Docs is a comprehensive online document management system used for storing, sharing, and managing files securely",
    "With its intuitive interface and powerful capabilities, Zoho Books simplifies the complex process of managing business finances, making it an ideal choice for small to medium-sized businesses"
]

# Convert sentences to sentence embeddings
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(sentences)

# Perform KMeans clustering with 4 clusters (separating Zoho CRM and Zoho Books)
kmeans = KMeans(n_clusters=4)
clusters = kmeans.fit_predict(X)

# Create a dictionary to store the sentences in each cluster
clustered_sentences = {}
for sentence, cluster_label in zip(sentences, clusters):
    if cluster_label not in clustered_sentences:
        clustered_sentences[cluster_label] = []
    clustered_sentences[cluster_label].append(sentence)

# Print the clusters
for cluster_label, cluster_sentences in clustered_sentences.items():
    print(f'Cluster {cluster_label}:')
    for sentence in cluster_sentences:
        print(f'- {sentence}')
    print()

Cluster 1:
- Zoho CRM is a powerful tool that revolutionizes the way businesses manage their relationships with customers

Cluster 0:
- Zoho Mail is an exceptional email service that is clean, fast, and offers a user-friendly interface
- With its powerful features such as instant chat, calendar, tasks, notes, and bookmarks, Zoho Mail goes beyond being just an email service and serves as a comprehensive communication hub

Cluster 2:
- Zoho Docs is a comprehensive online document management system used for storing, sharing, and managing files securely

Cluster 3:
- With its intuitive interface and powerful capabilities, Zoho Books simplifies the complex process of managing business finances, making it an ideal choice for small to medium-sized businesses

