# Better Summaries: Customizing for Domain-Specific Clustering

## Why This Matters

The default summarization approach we used in our initial clustering produced overly generic summaries that failed to capture the essence of Weights & Biases-specific queries. When working with specialized domains like machine learning experiment tracking, these generic summaries miss critical details that would enable more effective query segmentation.

Custom summarization allows us to transform vague descriptions like "The user's overall request for the assistant is to provide information about experiment tracking using a specific tool" into precise, actionable insights like "The user is using Weights and Biases's experiment tracking features to monitor model training (metrics, configs, artifacts)." This precision is critical for building representative clusters that truly reflect how users interact with the platform.

Domain-specific summaries help us by:
1. Capturing the exact W&B features users are working with
2. Identifying specific user goals and pain points
3. Revealing underlying patterns in how users approach documentation
4. Creating a foundation for more targeted system improvements

## What You'll Learn

In this section, you'll discover how to:

1. **Create a Custom Summary Model**
   - Define a specialized summarization approach for W&B queries
   - Structure prompts that extract domain-specific information
   - Implement length constraints for concise, focused summaries

2. **Generate Better Summaries**
   - Compare default vs. custom summarization approaches
   - See how domain knowledge improves summary quality
   - Create summaries that highlight specific W&B features

3. **Build More Representative Clusters**
   - Use improved summaries as the foundation for clustering
   - Configure clustering parameters for optimal results
   - Visualize how better summaries lead to more cohesive clusters

By customizing our summarization approach to our specific domain, we'll create clusters that better reflect real user needs and provide actionable insights for improving our RAG system.

## Creating a Custom Summary Model

To address the limitations we identified in our default summaries, we'll now implement our own custom summary model specific to Weights & Biases queries. By replacing the generic summarization approach with a domain-tailored solution, we can generate summaries that precisely capture the tools, features, and goals relevant to W&B users.

The `WnBSummaryModel` class we'll create extends Kura's base `SummaryModel` with a specialized prompt that instructs the model to:

1. Identify specific W&B features mentioned in the query (e.g., Artifacts, Configs, Reports)
2. Clearly state the problem the user is trying to solve
3. Format responses concisely (25 words or less) to ensure summaries remain focused

This approach generates summaries that are not only more informative but also more consistent, making them ideal building blocks for meaningful clustering. Let's implement our custom model and see how it transforms our understanding of user query patterns.

### Loading in Conversation

Let's first start by loading in our conversations and parsing it into a list of `Conversation` objects that `Kura` can work with

In [1]:
from lib.conversation import process_query_obj
import json

with open("./data/conversations.json") as f:
    data = json.load(f)

conversations = [process_query_obj(obj) for obj in data]

  from .autonotebook import tqdm as notebook_tqdm


Let's now try to see how our default summaries look like

In [2]:
from kura.summarisation import SummaryModel

summaries = await SummaryModel().summarise(conversations[:2])
for summary in summaries:
    print(summary)


Summarising 2 conversations: 100%|██████████| 2/2 [00:01<00:00,  1.00it/s]

chat_id='5e878c76-25c1-4bad-8cae-6a40ca4c8138' summary="The user's overall request for the assistant is to explain how to track machine learning experiments using a specific library by creating a run, storing hyperparameters, logging metrics, and saving outputs of the run as demonstrated in the pseudocode provided ." metadata={'conversation_turns': 1, 'query_id': '5e878c76-25c1-4bad-8cae-6a40ca4c8138'}
chat_id='d7b77e8a-e86c-4953-bc9f-672618cdb751' summary="The user's overall request for the assistant is to summarize information about Bayesian optimization, a hyperparameter tuning technique, and its implementation in Python using libraries like bayes_opt." metadata={'conversation_turns': 1, 'query_id': 'd7b77e8a-e86c-4953-bc9f-672618cdb751'}





Looking at these default summaries, we can identify several key limitations that prevent them from being truly useful for clustering W&B-specific queries:

**Problems with Default Summaries**

1. Lack of Specificity: The first summary refers to "a specific tool" rather than explicitly naming Weights & Biases, missing the opportunity to highlight the domain context.

2. Missing Feature Details: Neither summary identifies which specific W&B features the users are interested in (experiment tracking, Bayesian optimization for hyperparameter tuning), which would be crucial for meaningful clustering.

These generic summaries would lead to clusters based primarily on query structure ("users asking for information") rather than meaningful W&B feature categories or user goals. 

By defining our own summarisation model, we can address these limitations and cluster our user queries based off the specific problems and features they are trying to use.

### Defining Our New Summary Model

Let's now define a new `WnBSummaryModel` which will help address the shortcomings of the default summarisation model.

We'll do so by modifying the `summarise_conversation` method so that our summaries can become more precise and feature-focused. This allows us to better reflect how users interact with Weights and Biases and in turn translate to more representative clusters

In [45]:
from kura.types import Conversation, ConversationSummary
from kura.summarisation import SummaryModel, GeneratedSummary


class WnBSummaryModel(SummaryModel):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    async def summarise_conversation(
        self, conversation: Conversation
    ) -> ConversationSummary:
        # Get the default client and semaphore - This is going to be the Gemini GenAI client and a semaphore limit of around 50 concurrent requests 
        client = self.clients.get("default")  # type: ignore
        sem = self.sems.get("default")  # type: ignore

        async with sem:
            resp = await client.chat.completions.create(
                model=self.model,
                messages=[
                    {
                        "role": "user",
                        "content": """
Summarize the user's issue based on their query and the retrieved information from the Weights and Biases FAQ section.

In your response:

1. Identify the specific Weights and Biases feature(s) the user is working with (e.g., Artifacts, Configs, Reports), including any features implied but not directly named.

2. Clearly state the problem they're trying to solve.

Format your response in 25 words or less following these patterns:

If the query has a clear feature and problem:
"The user is using Weights and Biases's [feature(s)] to [problem] and needs help with [specific issue]."

If the query is ambiguous (e.g., "Bayesian optimization" without context):
"The user made a query about [topic]."

Analyze both the query and retrieved documents carefully to identify the user's actual goal rather than just repeating their keywords. Here is the message context that you should refer to:
<context>
{{ context }}
</context>

Be as specific as possible in your response.
""",
                    },
                ],
                response_model=GeneratedSummary,
                context={"context": conversation.messages[0].content},
            )

            return ConversationSummary(
                chat_id=conversation.chat_id,
                summary=resp.summary,
                metadata={
                    "conversation_turns": len(conversation.messages),
                },
            )

We can now see the generated summaries by calling the `summarise` method below. We'll be using the same conversations above which we generated summaries for.

In [46]:
summaries = await WnBSummaryModel().summarise(conversations[:2])
for summary in summaries:
    print(summary)


Summarising 2 conversations: 100%|██████████| 2/2 [00:01<00:00,  1.01it/s]


## Clustering with Enhanced Summaries

Now that we've developed a more domain-specific summarization approach tailored to the Weights & Biases ecosystem, we can apply these improved summaries to our clustering process. 

Our custom `WnBSummaryModel` captures the specific features, workflows, and user intentions that were missing in the default summaries, providing a stronger foundation for meaningful topic discovery.

This will help us to reveal patterns in feature usage, common pain points and documentation gaps that might have been obscured in our analysis in our previous notebook. Let's see this in action below.


In [42]:
from kura import Kura

kura = Kura(
    summarisation_model=WnBSummaryModel(),
    max_clusters=5,
    checkpoint_dir="./checkpoints_2"
)

clusters = await kura.cluster_conversations(conversations)

Summarising 560 conversations: 100%|██████████| 560/560 [00:15<00:00, 35.40it/s]
Embedding Summaries: 100%|██████████| 560/560 [00:05<00:00, 100.27it/s]
Generating Base Clusters: 100%|██████████| 56/56 [00:03<00:00, 15.53it/s]


Starting with 56 clusters


Embedding Clusters: 100%|██████████| 56/56 [00:01<00:00, 45.32it/s]
Generating Meta Clusters: 100%|██████████| 5/5 [00:06<00:00,  1.27s/it]


Reduced to 27 clusters


Embedding Clusters: 100%|██████████| 27/27 [00:00<00:00, 27.52it/s]
Generating Meta Clusters: 100%|██████████| 3/3 [00:05<00:00,  1.87s/it]


Reduced to 22 clusters


Embedding Clusters: 100%|██████████| 22/22 [00:01<00:00, 20.16it/s]
Generating Meta Clusters: 100%|██████████| 2/2 [00:04<00:00,  2.46s/it]


Reduced to 11 clusters


Embedding Clusters: 100%|██████████| 11/11 [00:00<00:00, 23.41it/s]
Generating Meta Clusters: 100%|██████████| 1/1 [00:03<00:00,  3.56s/it]


Reduced to 6 clusters


Embedding Clusters: 100%|██████████| 6/6 [00:01<00:00,  4.37it/s]
Generating Meta Clusters: 100%|██████████| 1/1 [00:03<00:00,  3.98s/it]


Reduced to 6 clusters


Embedding Clusters: 100%|██████████| 6/6 [00:00<00:00, 13.74it/s]
Generating Meta Clusters: 100%|██████████| 1/1 [00:03<00:00,  3.33s/it]


Reduced to 3 clusters




In [54]:
# Get top-level clusters (those without parents)
parent_clusters = [cluster for cluster in clusters if cluster.parent_id is None]

# Format each cluster's info with name, description and number of chats
formatted_clusters = []
for parent in parent_clusters:
    
    # Add parent cluster info
    cluster_info = (
        f"[bold]({parent.id}) {parent.name}[/bold] : {parent.description} : {len(parent.chat_ids)}\n"
    )
    
    # Get and format child clusters
    child_clusters = [c for c in clusters if c.parent_id == parent.id]
    for child in child_clusters:
        cluster_info += f"\n  • [bold]{child.name}[/bold] : {child.description} : {len(child.chat_ids)}"
        child_child_clusters = [c for c in clusters if c.parent_id == child.id]
        for child_child in child_child_clusters:
            if child_child.parent_id == child.id:
                cluster_info += f"\n    + [bold]{child_child.name}[/bold] : {child_child.description} : {len(child_child.chat_ids)}"
        
        cluster_info += "\n\n"
    
    formatted_clusters.append(cluster_info)
    formatted_clusters.append("\n====\n")

# Join with newlines and print
print("\n\n".join(formatted_clusters))

With these new clusters, we've identified three main types of clusters 

1. **Access Controls** People are asking for how to handle and export data in Weights and Biases
2. **Deployment** : People are asking for how to manage keys, service accounts and also integrate the data with Sagemaker and other custom images.
3. **Managing and Tracking Experiment Data** : People are looking for help on how to specifically manage Artifacts, Generate Visualisation, Integrate W&B with their pytorch and multi-gpu runs etc.

This is a huge upgrade from the previous clusters that we obtained and gives us much more information that we can work with. Exploring these clusters in more depth would probably yield us a lot more information which we can use to train classifiers down the line.

## Conclusion

In this notebook, we've significantly enhanced our final clusters by implementing domain-specific clustering for user queries. By using a specific summary prompt that is tailored to Weights and Biases termininology and features, we transformed vague descriptions into precise, actionable insights that the issues that users faced when interacting with the platform.

Our custom WnBSummaryModel helped us identify three distinct user query patterns:

1. Users seeking help with access controls and data export
2. Users trying to integrate W&B with cloud services and Docker images
3. Users managing and tracking experiment data through artifacts, visualizations, and multi-GPU runs

If we had user satisfaction data, we could identify which cluster types have the lowest satisfaction scores and prioritize those areas first. For example, if users asking about artifacts management consistently report poor experiences, we could build specialized retrieval pipelines with fine-tuned embeddings just for those queries.

In the next notebook, we'll take this a step further by using the `instructor` library to build a classifier that can automatically identify queries related to managing, creating, and versioning artifacts.