# Clustering Conversations: Discovering User Query Patterns

> **Series Overview**: This is the first notebook in a three-part series on systematically analyzing and improving RAG systems. We'll move from raw user queries to production-ready classifiers that enable data-driven improvements.

> **Prerequisites**: Install dependencies from `pyproject.toml` and set your `GOOGLE_API_KEY` for Gemini (required by Kura for summarization).

## Why This Matters

In large-scale RAG applications, you'll encounter thousands of user queries. Manually reviewing each is impossible, and simple keyword counting misses deeper patterns. **Topic modeling helps you systematically identify patterns in user queries**, giving you insights into what users are asking and how well your system serves them.

Topic modeling serves as the foundation for transforming raw user interactions into actionable insights by:

1. **Revealing clusters** of similar queries that might need specialized handling
2. **Providing evidence** for prioritizing improvements based on actual usage patterns
3. **Highlighting gaps** where your retrieval might be underperforming
4. **Creating a foundation** for building automated classification systems

While topic modeling isn't objective ground truth, it's an invaluable discovery tool that helps you understand where to focus limited engineering resources based on real user behavior rather than intuition.

## What You'll Learn

In this first notebook, you'll discover how to:

1. **Prepare Query Data for Analysis**
   - Format JSON data into Kura conversation objects
   - Structure query-document pairs with proper metadata
   - Set up data for effective clustering

2. **Run Hierarchical Topic Clustering**
   - Use Kura's LLM-enhanced clustering approach
   - Generate meaningful summaries of conversation groups
   - Visualize the topic hierarchies that emerge

3. **Analyze and Interpret Results**
   - Examine cluster themes and distribution patterns
   - Identify high-impact areas for system improvements
   - Recognize limitations in default summarization

## What You'll Discover

**By the end of this notebook, you'll uncover that just three major topics account for over two-thirds of all user queries**, with artifact management appearing as a dominant theme across 61% of conversations. However, you'll also discover that default summaries are too generic, missing crucial details about specific W&B features—a limitation that motivates the custom summarization approach in the next notebook.

## What Makes Kura Different

Traditional topic modeling approaches like BERTopic or LDA rely purely on embeddings to group similar documents. **Kura enhances this process by leveraging LLMs to**:

1. **Generate Meaningful Summaries** - Create human-readable descriptions rather than just numeric vectors
2. **Extract Key Intents** - Identify specific user goals beyond surface-level keywords
3. **Build Topic Hierarchies** - Create multi-level trees showing relationships between themes

By using LLMs for summarization before clustering, Kura produces more intuitive, actionable results than pure embedding-based approaches, setting the foundation for the systematic RAG improvement framework you'll build across this series.

# Understanding Our Dataset

## Our Dataset

We're working with 560 real user queries from the Weights & Biases documentation, each manually labelled with a retrieved relevant document. This dataset gives us direct insight into how users interact with ML experiment tracking documentation.

By examining these query-document pairs, we gain valuable insights into:

* What information users actively seek and how they phrase questions
* Which documentation sections are most needed or confusing
* How different query patterns cluster together, revealing common user challenges

Topic modeling helps us identify semantically similar conversations, allowing us to group these queries into meaningful clusters that reveal broader patterns of user needs and pain points.

For anyone building RAG systems, this kind of dataset is gold. It helps you understand user intent, find gaps in your documentation, and prioritize improvements based on actual usage patterns rather than guesswork.

Without systematic analysis of such data, it's nearly impossible to identify patterns in how users interact with your system. Topic modeling gives us a data-driven way to improve retrieval strategies and function calling by understanding the most common user needs.

## Preparing Our Data

Before using Kura for topic modeling, we need to prepare our dataset. Each entry contains:
- `query`: The user's original question
- `matching_document`: The relevant document manually matched to this query
- `query_id`: Unique identifier for the query
- `matching_document_document_id`: ID of the matching document

Let's examine what this data looks like:

In [1]:
import json

with open("./data/conversations.json") as f:
    data = json.loads(f.read())

data[0]

{'query_id': '5e878c76-25c1-4bad-8cae-6a40ca4c8138',
 'query': 'experiment tracking',
 'matching_document': '## Track Experiments\n### How it works\nTrack a machine learning experiment with a few lines of code:\n1. Create a W&B run.\n2. Store a dictionary of hyperparameters, such as learning rate or model type, into your configuration (`wandb.config`).\n3. Log metrics (`wandb.log()`) over time in a training loop, such as accuracy and loss.\n4. Save outputs of a run, like the model weights or a table of predictions.  \n\nThe proceeding pseudocode demonstrates a common W&B Experiment tracking workflow:  \n\n```python showLineNumbers\n\n# 1. Start a W&B Run\n\nwandb.init(entity="", project="my-project-name")\n\n# 2. Save mode inputs and hyperparameters\n\nwandb.config.learning\\_rate = 0.01\n\n# Import model and data\n\nmodel, dataloader = get\\_model(), get\\_data()\n\n# Model training code goes here\n\n# 3. Log metrics over time to visualize performance\n\nwandb.log({"loss": loss})\n\n#

This raw format isn't immediately useful for topic modeling. We need to transform it into something that Kura can process effectively. 

To do so, we'll convert it to a `Conversation` class which `Kura` exposes. This format allows Kura to:

1. Process the conversation flow (even though we only have single queries in this example)
2. Generate summaries of each conversation
3. Embed and cluster conversations based on content and structure

We'll create a function to convert each query-document pair into a Kura Conversation object with a single user Message that combines both the query and retrieved document.

In [2]:
from kura.types import Message, Conversation
from datetime import datetime
from rich import print

def process_query_obj(obj:dict):
    return Conversation(
    chat_id=obj['query_id'],
    created_at=datetime.now(),
    messages=[
        Message(
            created_at=datetime.now(),
            role="user",
            content=f"""
User Query: {obj['query']}
Retrieved Information : {obj['matching_document']}
"""
            )
        ],
        metadata={
            'query_id': obj['query_id']
        }
    )


print(process_query_obj(data[0]))

  from .autonotebook import tqdm as notebook_tqdm


Each individual `Conversation` object exposes a metadata field which allows us to provide additional context that can be valuable for analysis.

In this case here, we add the Query ID to the metadata field so that we can preserve it for downstream processing. By properly structuring our data and enriching it with metadata, we're setting a strong foundation for the topic modeling work ahead. 

This careful preparation will pay off when we analyze the results and turn insights into actionable improvements

## Running the Clustering Process

Now that we've converted our raw data into Kura's Conversation format, we're ready to run the clustering process. This is where we discover patterns across hundreds of conversations without needing to manually review each one.

We'll use Kura's built-in clustering capabilities to group similar conversations together, identify common themes, and build a hierarchical organization of topics. The clustering algorithm combines embedding similarity with LLM-powered summarization to create meaningful, interpretable results.

### The Clustering Pipeline

The hierarchical clustering process follows a systematic approach:

1. Summarization: First, each conversation is summarized by an LLM to capture its essence while removing sensitive details
2. Embedding: These summaries are converted into vector embeddings that capture their semantic meaning
3. Base Clustering: Similar conversations are grouped into small, initial clusters
4. Hierarchical Merging: Similar clusters are progressively combined into broader categories
5. Naming and Description: Each cluster receives a descriptive name and explanation generated by an LLM

By starting with many detailed clusters before gradually reducing them to more general topics, we can preserve these meaningful patterns while making it easy for humans to review.

In [10]:
from kura import Kura

kura = Kura()
conversations = [process_query_obj(obj) for obj in data]
clusters = await kura.cluster_conversations(conversations)

Summarising 560 conversations: 100%|██████████| 560/560 [00:15<00:00, 36.78it/s]
Embedding Summaries: 100%|██████████| 560/560 [00:06<00:00, 90.22it/s] 
Generating Base Clusters: 100%|██████████| 56/56 [00:02<00:00, 21.47it/s]


Starting with 56 clusters


Embedding Clusters: 100%|██████████| 56/56 [00:01<00:00, 35.68it/s]
Generating Meta Clusters: 100%|██████████| 5/5 [00:06<00:00,  1.25s/it]


Reduced to 45 clusters


Embedding Clusters: 100%|██████████| 45/45 [00:01<00:00, 36.75it/s]
Generating Meta Clusters: 100%|██████████| 4/4 [00:05<00:00,  1.41s/it]


Reduced to 36 clusters


Embedding Clusters: 100%|██████████| 36/36 [00:01<00:00, 33.56it/s]
Generating Meta Clusters: 100%|██████████| 3/3 [00:04<00:00,  1.60s/it]


Reduced to 19 clusters


Embedding Clusters: 100%|██████████| 19/19 [00:01<00:00, 17.48it/s]
Generating Meta Clusters: 100%|██████████| 2/2 [00:03<00:00,  1.66s/it]


Reduced to 9 clusters




In the output, we can see the consolidation process happening in real-time. Kura starts with 56 base clusters, then gradually merges them through multiple rounds until we reach 9 final top-level clusters. Each merge combines similar topics while preserving the essential distinctions between different conversation types.

Now, let's examine these top-level clusters to understand the main themes in our data. 

By looking at the cluster names, descriptions, and sizes, we can quickly identify what users are discussing most frequently and how these topics relate to each other

In [34]:
# Get top-level clusters (those without parents)
parent_clusters = [cluster for cluster in clusters if cluster.parent_id is None]

# Format each cluster's info with name, description and number of chats
formatted_clusters = []
for cluster in parent_clusters:
    cluster_info = (
        f"[bold]{cluster.name}[/bold] : {cluster.description} : {len(cluster.chat_ids)}"
    )
    formatted_clusters.append(cluster_info)

# Join with newlines and print
print("\n\n".join(formatted_clusters))

## Analysing Our Results

### Understanding Our Top-Level Clusters

Looking at the nine top-level clusters generated by Kura, we can identify clear patterns in how users are interacting with the documentation.

Three major clusters account for 67% of all queries:
1. Experiment Tracking and Artifact Management (143 conversations)
2. Tool Integration and Documentation (140 conversations)
3. Core Functionality Usage (93 conversations)

What's particularly notable is that artifact management appears as a significant theme across multiple clusters. Three clusters specifically focus on managing, creating, and versioning artifacts, totaling 342 conversations (61% of all queries). 

This suggests that users are consistently trying to figure out how to properly track and organize the results of their experiments.

This clustering suggests that improving documentation and features around artifact management would address the majority of user needs. By focusing on how users track experiment results and manage artifacts across their workflow, we could significantly improve the user experience while addressing the most common pain points revealed in these clusters.RetryClaude can make mistakes. Please double-check responses.

### Analysing Our Summaries

Let's now examine what are some of the summaries that were generated by Kura for our individual query document pairs. 

To do so, we'll read in the list of conversations that we started with and then find their corresponding summary. This will allows us to then evaluate how representative the conversation summary is of the individual conversation.

In [37]:
from kura.types import ConversationSummary

with open(kura.summary_checkpoint_name) as f:
    summaries = [ConversationSummary(**json.loads(item)) for item in f.readlines()]

with open(kura.conversation_checkpoint_name) as f:
    conversations = [Conversation(**item) for item in json.loads(f.read())]

id_to_conversation = {
    conversation.chat_id: conversation
    for conversation in conversations
}

for i in range(3):
    print(summaries[i].summary)
    print(id_to_conversation[summaries[i].chat_id].messages[0].content)

## Conclusion

### What You Learned

In this notebook, you discovered how to transform raw user queries into actionable insights for RAG system improvements. You learned to:

- **Prepare query data for Kura** by formatting JSON data into Conversation objects with proper metadata
- **Run hierarchical clustering** using Kura's built-in capabilities to group similar conversations
- **Analyze clustering results** to identify the most common user query patterns and pain points

### What We Accomplished

By leveraging Kura's clustering capabilities, we organized 560 user queries into nine meaningful clusters that revealed clear patterns in how users interact with Weights & Biases documentation. The analysis showed that three major topics—experiment tracking, tool integration, and artifact management—account for over two-thirds of all queries, with artifact management appearing as a significant theme across multiple clusters (61% of conversations).

However, we also identified critical limitations in the default summarization approach. Our generated summaries lacked specificity about the tools users wanted to use and sometimes included irrelevant context from retrieved documents. For example, summaries described queries as "user seeks information about tracking" rather than capturing the specific W&B features involved.

### Next: Better Summaries

While our clustering revealed valuable high-level patterns, the generic summaries limit our ability to understand specific user needs. In the next notebook, "Better Summaries", we'll address this limitation by building a custom summarization model that:

- **Identifies specific W&B features** (Artifacts, Configs, Reports) mentioned in each query
- **Captures precise user intent** rather than generic descriptions  
- **Creates domain-specific summaries** tailored to W&B terminology and workflows

By replacing vague summaries like "user seeks information about tracking" with precise descriptions like "user is managing W&B Artifacts for model versioning", we'll create clusters that better reflect real user needs and provide more targeted, actionable insights for system improvements.