<a href="https://www.kaggle.com/code/angelchaudhary/handling-high-risk-user-queries-in-llm-systems?scriptVersionId=290826966" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Designing a LLM Pipeline for High-Risk User Queries

# Introduction

Large Language Models (LLMs) are widely used in real-world applications but not all user queries should be answered directly. Queries related to self-harm, illegal activities, medical advice, or abusive content can lead to harmful or unsafe outputs if handled naively. The key problem addressed in this case study is:

> **How can an LLM system safely handle high-risk user queries while maintaining a good user experience?**

A single LLM response strategy is insufficient for real-world usage where different intents require different levels of control and intervention.

In this notebook, we build a LLM pipeline that:
- Classifies user intent to detect high-risk queries
- Applies safety checks before response generation
- Routes queries through appropriate response strategies such as direct answers, safe refusals, or redirections

# LET'S DO IT!!!
![FUNNYGIF](https://media.giphy.com/media/v1.Y2lkPWVjZjA1ZTQ3dnpoeHI2ajNwcG52ZGNkOHliamI3dGF2NmM4anpqMTIyOXo5bGZzZiZlcD12MV9naWZzX3NlYXJjaCZjdD1n/lP8ezu4iNVmZYOZn3j/giphy.gif)

## LLM Pipeline Overview

Instead of sending every user query directly to an LLM, we use a multi-step pipeline:

1. Detect user intent and risk level
2. Apply safety guardrails
3. Route the query to an appropriate response strategy

## Dataset overview 
We use a dataset of 500 anonymized Reddit posts related to mental health discussions.
Each row represents a single user post, treated as a simulated user input to an LLM system.

In [48]:
import pandas as pd
import numpy as np
df = pd.read_csv("/kaggle/input/reddit-dataset/500_anonymized_Reddit_users_posts_labels - 500_anonymized_Reddit_users_posts_labels.csv")
df.head(2)

Unnamed: 0,User,Post,Label
0,user-0,"['Its not a viable option, and youll be leavin...",Supportive
1,user-1,['It can be hard to appreciate the notion that...,Ideation


In [49]:
df.describe()

Unnamed: 0,User,Post,Label
count,500,500,500
unique,500,500,5
top,user-0,"['Its not a viable option, and youll be leavin...",Ideation
freq,1,1,171


## Label Distribution

The dataset contains multiple intent labels, reflecting varying levels of risk.
These labels will be mapped to safety-aware risk categories in the pipeline.

In [50]:
df["Label"].value_counts()

Label
Ideation      171
Supportive    108
Indicator      99
Behavior       77
Attempt        45
Name: count, dtype: int64

## Mapping Labels to Risk Categories
To build a guardrailed LLM pipeline, we map dataset labels to system-level risk categories.
This allows the model to apply different response strategies based on risk.

In [51]:
label_to_risk = {"Ideation": "HIGH_RISK","Attempt": "HIGH_RISK","Behavior": "HIGH_RISK","Supportive": "SENSITIVE","Other": "SAFE"}
df["risk_category"] = df["Label"].map(label_to_risk)
df[["Post", "Label", "risk_category"]].head()

Unnamed: 0,Post,Label,risk_category
0,"['Its not a viable option, and youll be leavin...",Supportive,SENSITIVE
1,['It can be hard to appreciate the notion that...,Ideation,HIGH_RISK
2,"['Hi, so last night i was sitting on the ledge...",Behavior,HIGH_RISK
3,['I tried to kill my self once and failed badl...,Attempt,HIGH_RISK
4,['Hi NEM3030. What sorts of things do you enjo...,Ideation,HIGH_RISK


In [52]:
df["risk_category"].isna().sum()

99

In [53]:
df[df["risk_category"].isna()]["Label"].value_counts()

Label
Indicator    99
Name: count, dtype: int64

In [54]:
df["risk_category"] = df["Label"].map(label_to_risk)
df["risk_category"] = df["risk_category"].fillna("SENSITIVE")

In [55]:
df["risk_category"].isna().sum()

0

In [56]:
df["risk_category"].value_counts()

risk_category
HIGH_RISK    293
SENSITIVE    207
Name: count, dtype: int64

In [57]:
def contains_self_harm_signal(text: str) -> bool:
    keywords = [
        "kill myself", "suicide", "end my life",
        "overdose", "hurt myself", "die"
    ]
    text = text.lower()
    return any(k in text for k in keywords)

## Guardrail Decision Logic
Once user intent is identified, the system applies guardrails to decide
whether the query can be answered directly, needs a safer response,
or must be refused. This layer prevents high-risk content from reaching the LLM unchecked.

In [58]:
def apply_guardrails(post: str, risk_category: str) -> str:
    if risk_category == "HIGH_RISK":
        if contains_self_harm_signal(post):
            return "BLOCK"
        else:
            return "SAFE_RESPONSE"

    if risk_category == "SENSITIVE":
        return "SAFE_RESPONSE"

    return "ALLOW"

## Query Routing

Based on the guardrail decision, each query is routed to a different
response strategy to balance safety and usefulness.

In [59]:
def route_query(post: str, risk_category: str) -> str:
    decision = apply_guardrails(post, risk_category)

    if decision == "BLOCK":
        return high_risk_response()

    if decision == "SAFE_RESPONSE":
        return sensitive_response()

    return normal_response(post)

In [60]:
def high_risk_response():
    return (
        "I'm really sorry that you're going through something this difficult. "
        "I can't help with that request, but reaching out to a trusted person "
        "or a mental health professional could really help."
    )


def sensitive_response():
    return (
        "It sounds like you're dealing with something challenging. "
        "If you'd like, I can share general information or coping strategies."
    )


def normal_response(post: str):
    return f"LLM Response: Safely answering â†’ {post[:100]}..."

In [61]:
sample_df = df.sample(5)

for _, row in sample_df.iterrows():
    print("Post:", row["Post"][:120], "...")
    print("Risk Category:", row["risk_category"])
    print("Response:", route_query(row["Post"], row["risk_category"]))
    print("-" * 80)

Post: ['Im sorry youre feeling this way. Break ups are hard, no matter when they happen or how long the relationship lasted.Bu ...
Risk Category: HIGH_RISK
Response: I'm really sorry that you're going through something this difficult. I can't help with that request, but reaching out to a trusted person or a mental health professional could really help.
--------------------------------------------------------------------------------
Post: ['and if youre new to it, get help before youre too addictrd', 'Ive only really coded one thing, I made a pacman game ty ...
Risk Category: HIGH_RISK
Response: I'm really sorry that you're going through something this difficult. I can't help with that request, but reaching out to a trusted person or a mental health professional could really help.
--------------------------------------------------------------------------------
Post: ['Sorry for the late response. Im not saying itll get better - I dont know you, so itd be a worthless statement. I truly .

## Conclusion

In this case study, we designed a guardrailed LLM pipeline to safely handle high-risk user queries.
By combining intent classification, conservative risk mapping, secondary content checks, and
dynamic routing, the system prevents harmful outputs while preserving empathetic user responses. The results demonstrate how safety-first design can be implemented without relying solely on hard refusals, making the pipeline suitable for real-world, high stakes GenAI applications.

---

## Limitations

This implementation relies on dataset labels and simple keyword-based heuristics rather than learned models. As a result, some posts may still be treated conservatively due to contextual ambiguity or inherited labels from conversational threads. Additionally, the response strategies are template-based and do not incorporate live LLM
generation or external safety policies.

---

## Future Improvements

Future iterations of this pipeline could include:
- LLM-based or trained intent classifiers for improved accuracy
- Policy engines for fine-grained safety rules
- Human-in-the-loop escalation for borderline cases
- Logging and monitoring for auditability and continuous improvement
- Integration with RAG systems for safe, grounded responses

These extensions would further improve reliability and scalability in production environments.
