diff --git a/docs/getting-started/8-tracing/1_tracing_quickstart.ipynb b/docs/getting-started/8-tracing/1_tracing_quickstart.ipynb index bd6f1dca1..49c516864 100644 --- a/docs/getting-started/8-tracing/1_tracing_quickstart.ipynb +++ b/docs/getting-started/8-tracing/1_tracing_quickstart.ipynb @@ -15,16 +15,22 @@ "\n", "Throughout this notebook, you'll run guardrail requests in both sequential and parallel modes and observe how parallelizing rails significantly reduces end-to-end latency when multiple input or output rails run.\n", "\n", - "For more information about exporting metrics while using NeMo Guardrails, refer to [Tracing](https://docs.nvidia.com/nemo/guardrails/latest/user-guides/tracing/quick-start.html) in the Guardrails toolkit documentation.\n", + "For more information about exporting metrics while using NeMo Guardrails, refer to [Tracing](https://docs.nvidia.com/nemo/guardrails/latest/user-guides/tracing/index.html) in the Guardrails toolkit documentation.\n", "\n", "---\n", "\n", "## Prerequisites\n", "\n", - "This notebook requires the following:\n", + "This notebook can be run on any laptop or workstations, and doesn't require GPUS. You'll use models hosted by Nvidia. Before starting the notebook you need the following:\n", "\n", - "- An NVIDIA NGC account and an NGC API key. You need to provide the key to the `NVIDIA_API_KEY` environment variable. To create a new key, go to [NGC API Key](https://org.ngc.nvidia.com/setup/api-key) in the NGC console.\n", - "- Python 3.10 or later." + "- Python 3.10 or later.\n", + "- An NVIDIA [build.nvidia.com](https://build.nvidia.com/) account. You'll configure Guardrails to call models hosted there to check the safety and security of LLM interactions and generate responses. You need to create an account, and then click the 'Get API Key' green button. Once you have the key, export it to the `NVIDIA_API_KEY` environment variable as below.\n", + "\n", + "```\n", + "# Set the NVIDIA_API_KEY variable using your API Key \n", + "\n", + "export NVIDIA_API_KEY=\"nvapi-.....\"\n", + "```" ] }, { @@ -59,7 +65,7 @@ }, "outputs": [], "source": [ - "!pip install pandas plotly langchain_nvidia_ai_endpoints aiofiles -q" + "!pip install nemoguardrails pandas plotly langchain_nvidia_ai_endpoints aiofiles -q" ] }, { @@ -91,12 +97,24 @@ "start_time": "2025-08-18T18:37:36.456308Z" } }, - "outputs": [], + "outputs": [ + { + "name": "stdin", + "output_type": "stream", + "text": [ + "Enter your NVIDIA API Key created on build.nvidia.com: ········\n" + ] + } + ], "source": [ - "# Check the NVIDIA_API_KEY environment variable is set\n", - "assert os.getenv(\n", - " \"NVIDIA_API_KEY\"\n", - "), f\"Please create a key at build.nvidia.com and set the NVIDIA_API_KEY environment variable\"" + "# Check the NVIDIA_API_KEY environment variable is set, if not prompt for it\n", + "import getpass\n", + "\n", + "api_key = os.getenv(\"NVIDIA_API_KEY\")\n", + "\n", + "if not api_key:\n", + " api_key = getpass.getpass(\"Enter your NVIDIA API Key created on build.nvidia.com: \")\n", + " os.environ[\"NVIDIA_API_KEY\"] = api_key" ] }, { @@ -113,27 +131,19 @@ "cell_type": "code", "execution_count": 6, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Deleting sequential_trace.jsonl\n", - "Deleting parallel_trace.jsonl\n" - ] - } - ], + "outputs": [], "source": [ "def delete_file_if_it_exists(filename: str) -> None:\n", " \"\"\"Check if a file exists, and delete it if so\"\"\"\n", - "\n", " if os.path.exists(filename):\n", " print(f\"Deleting {filename}\")\n", " os.remove(filename)\n", "\n", "\n", - "delete_file_if_it_exists(SEQUENTIAL_TRACE_FILE)\n", - "delete_file_if_it_exists(PARALLEL_TRACE_FILE)" + "def delete_trace_files():\n", + " \"\"\"Helper to delete trace files if they exist\"\"\"\n", + " delete_file_if_it_exists(SEQUENTIAL_TRACE_FILE)\n", + " delete_file_if_it_exists(PARALLEL_TRACE_FILE)" ] }, { @@ -176,7 +186,7 @@ " {\n", " \"type\": \"main\",\n", " \"engine\": \"nim\",\n", - " \"model\": \"meta/llama-3.3-70b-instruct\",\n", + " \"model\": \"meta/llama-4-scout-17b-16e-instruct\",\n", " },\n", " {\n", " \"type\": \"content_safety\",\n", @@ -258,8 +268,7 @@ "source": [ "## Prompts\n", "\n", - "Each Nemoguard model is fine-tuned for a specific task using a customized prompt. The prompts used at inference-time have to match the fine-tuning prompt for the best model performance. We'll load these prompts from other locations in the Guardrails repo and show them below.\n", - "\n" + "Each Nemoguard model is fine-tuned for a specific task using a customized prompt. The prompts used at inference-time have to match the fine-tuning prompt for the best model performance. We'll load these prompts from other locations in the Guardrails repo and show the content-safety and topic-control ones below." ] }, { @@ -320,6 +329,124 @@ "print(\"\\n\".join(all_prompt_tasks))" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Content-Safety Prompt\n", + "\n", + "The content-safety model can be used as an input (checking the user request only) or output (checking user request and LLM response) rail. The prompt has a task definition at the top, asking the LLM to decide if the `user_input` or `bot_response` strings fall into any of the unsafe categories listed in the prompt. The final instruction is to provide a JSON object with the `User Safety` and `Response Safety` fields. " + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "def print_prompt(prompts, task_name):\n", + " prompt_dict = [p for p in prompts[\"prompts\"] if p[\"task\"] == task_name][0]\n", + " prompt_content = prompt_dict[\"content\"]\n", + " print(prompt_content)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.\n", + "\n", + "\n", + "S1: Violence.\n", + "S2: Sexual.\n", + "S3: Criminal Planning/Confessions.\n", + "S4: Guns and Illegal Weapons.\n", + "S5: Controlled/Regulated Substances.\n", + "S6: Suicide and Self Harm.\n", + "S7: Sexual (minor).\n", + "S8: Hate/Identity Hate.\n", + "S9: PII/Privacy.\n", + "S10: Harassment.\n", + "S11: Threat.\n", + "S12: Profanity.\n", + "S13: Needs Caution.\n", + "S14: Other.\n", + "S15: Manipulation.\n", + "S16: Fraud/Deception.\n", + "S17: Malware.\n", + "S18: High Risk Gov Decision Making.\n", + "S19: Political/Misinformation/Conspiracy.\n", + "S20: Copyright/Trademark/Plagiarism.\n", + "S21: Unauthorized Advice.\n", + "S22: Illegal Activity.\n", + "S23: Immoral/Unethical.\n", + "\n", + "\n", + "\n", + "\n", + "user: {{ user_input }}\n", + "\n", + "response: agent: {{ bot_response }}\n", + "\n", + "\n", + "\n", + "Provide your safety assessment for the above conversation in the below output JSON format:\n", + "{\"User Safety\": safe or unsafe rating for the user message, \"Response Safety\": safe or unsafe rating for the agent response. Omit if no agent response present. \"Safety Categories\": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}\n", + "\n", + "Do not include anything other than the output JSON in your response.\n", + "Output JSON:\n", + "\n" + ] + } + ], + "source": [ + "print_prompt(\n", + " content_safety_prompts, \"content_safety_check_output $model=content_safety\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Topic-control Prompt\n", + "\n", + "The topic-control model is an input-only rail. It checks whether the user request relates to one of the topics that are listed in the prompt below. For this example, we're checking for anything off-topic for a customer service agent." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "You are to act as a customer service agent, providing users with factual information in accordance to the knowledge base. Your role is to ensure that you respond only to relevant queries and adhere to the following guidelines\n", + "\n", + "Guidelines for the user messages:\n", + "- Do not answer questions related to personal opinions or advice on user's order, future recommendations\n", + "- Do not provide any information on non-company products or services.\n", + "- Do not answer enquiries unrelated to the company policies.\n", + "- Do not answer questions asking for personal details about the agent or its creators.\n", + "- Do not answer questions about sensitive topics related to politics, religion, or other sensitive subjects.\n", + "- If a user asks topics irrelevant to the company's customer service relations, politely redirect the conversation or end the interaction.\n", + "- Your responses should be professional, accurate, and compliant with customer relations guidelines, focusing solely on providing transparent, up-to-date information about the company that is already publicly available.\n", + "- allow user comments that are related to small talk and chit-chat.\n", + "\n" + ] + } + ], + "source": [ + "print_prompt(topic_safety_prompts, \"topic_safety_check_input $model=topic_control\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -331,7 +458,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 16, "metadata": {}, "outputs": [], "source": [ @@ -345,7 +472,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 17, "metadata": { "scrolled": true }, @@ -376,12 +503,14 @@ "source": [ "### Running Sequential Request\n", "\n", - "To run a sequential request, you'll create a `RailsConfig` object with the sequential config YAML files from above. After you have that, you can create an LLMRails object and use it to issue guardrail inference requests." + "To run a sequential request, you'll create a `RailsConfig` object with the sequential config YAML files from above. After you have that, you can create an LLMRails object and use it to issue guardrail inference requests.\n", + "\n", + "You'll send a safe request, followed by an unsafe request. Guardrails will allow the safe request through to the Application LLM (and return the response), and block the unsafe request before sending to the Application LLM." ] }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2025-08-18T18:37:40.231716Z", @@ -402,7 +531,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2025-08-18T18:37:41.172531Z", @@ -414,7 +543,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "[{'role': 'assistant', 'content': \"Our company policy on Paid Time Off (PTO) is quite comprehensive and designed to support the overall well-being and work-life balance of our employees. According to our HR handbook, all full-time employees are eligible for PTO, which accrues at a rate of 10 days per year for the first two years of service, 15 days per year for years 2-5, and 20 days per year for 5+ years of service.\\n\\nOur PTO policy includes holidays, vacation days, sick leave, and personal days, all of which can be used to take a break from work, attend to personal matters, or simply recharge. Employees can start accruing PTO from their date of hire, but they cannot use it until they've completed 90 days of employment.\\n\\nTo ensure a smooth workflow and minimize disruptions, we require employees to provide at least two weeks' notice before taking PTO, whenever possible. However, we understand that unexpected situations may arise, and we're flexible when it comes to last-minute requests.\\n\\nIt's also worth noting that our company observes 10 paid holidays per year, which are separate from PTO. These holidays include New Year's Day, Memorial Day, Independence Day, Labor Day, Thanksgiving Day, and Christmas Day, among others.\\n\\nIf you're planning to take PTO, you can simply submit a request through our online HR portal, and it will be reviewed and approved by your supervisor. We also offer a convenient PTO tracking system, which allows you to view your available balance, schedule, and request time off all in one place.\\n\\nI hope that helps! Do you have any specific questions about our PTO policy or would you like more information on how to request time off?\"}]\n" + "Deleting sequential_trace.jsonl\n" ] } ], @@ -424,39 +553,73 @@ "sequential_rails_config = RailsConfig.model_validate(SEQUENTIAL_CONFIG)\n", "sequential_rails = LLMRails(sequential_rails_config)\n", "\n", - "safe_request = \"What is the company policy on PTO?\"\n", + "safe_request = \"Respond only with a short paragraph: what is the company policy on PTO? \"\n", + "unsafe_request = \"What's the easiest way to set a house on fire?\"\n", "\n", - "response = await sequential_rails.generate_async(\n", - " messages=[\n", - " {\n", - " \"role\": \"user\",\n", - " \"content\": safe_request,\n", - " }\n", - " ]\n", - ")\n", + "# By default, we'll append to the JSONL files. Want to delete to recreate each time\n", + "delete_file_if_it_exists(SEQUENTIAL_TRACE_FILE)\n", "\n", - "print(response.response)" + "safe_response = await sequential_rails.generate_async(\n", + " messages=[{\"role\": \"user\", \"content\": safe_request}]\n", + ")\n", + "unsafe_response = await sequential_rails.generate_async(\n", + " messages=[{\"role\": \"user\", \"content\": unsafe_request}]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Safe response: Our company policy on Paid Time Off, or PTO, is quite comprehensive and designed to support a healthy work-life balance. Full-time employees are eligible to accrue PTO from their hire date, with the accrual rate increasing with tenure. For example, employees with up to two years of service accrue 15 days of PTO per year, while those with five or more years accrue 25 days. Part-time employees accrue PTO on a pro-rata basis. Additionally, we offer a flexible PTO policy that allows employees to use their accrued time off for vacation, sick leave, or personal days, with the understanding that they must ensure their work responsibilities are covered during their absence. It's also worth noting that we have a blackout period around the holidays where PTO requests are not accepted, but this is communicated well in advance. If you have any specific questions or need more details, I'd be happy to help!\n" + ] + } + ], + "source": [ + "print(f\"Safe response: {safe_response.response[0]['content']}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Unsafe response: I'm sorry, I can't respond to that.\n" + ] + } + ], + "source": [ + "print(f\"Unsafe response: {unsafe_response.response[0]['content']}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Running Parallel request\n", + "### Running Parallel requests\n", "\n", - "Repeat the same request with the three input rails running in parallel, rather than running sequentially." + "You'll now send the same safe and unsafe requests, this time using the parallel rails configuration to check their safety and security. The responses from Guardrails should match the sequential case above, since they don't depend on how we orchestrate the rail-calling." ] }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "[{'role': 'assistant', 'content': \"Our company policy on Paid Time Off (PTO) is quite generous and designed to provide employees with a healthy work-life balance. According to our company handbook, all full-time employees are eligible for PTO, which includes vacation days, sick leave, and personal days.\\n\\nNew employees start with 15 days of PTO per year, which accrues at a rate of 1.25 days per month. This means that after just one month of employment, you'll already have 1.25 days of PTO available to use. And, as you accumulate more time with the company, your PTO balance will increase. For example, after one year of service, you'll have accrued a total of 15 days of PTO, and after two years, you'll have 20 days of PTO available.\\n\\nIt's worth noting that our company also observes 10 paid holidays per year, which are separate from your PTO balance. These holidays include New Year's Day, Memorial Day, Independence Day, Labor Day, Thanksgiving Day, and Christmas Day, among others.\\n\\nIn terms of requesting PTO, employees are required to provide at least two weeks' notice for vacation days and personal days, whenever possible. For sick leave, employees are expected to notify their manager as soon as possible, preferably on the same day.\\n\\nOne of the best parts of our PTO policy is that it's quite flexible. Employees can use their PTO days to take a relaxing vacation, attend to personal or family matters, or simply recharge and refocus. And, if you need to take an extended leave of absence, our company also offers a generous leave of absence policy, which includes options for unpaid leave, short-term disability, and family and medical leave.\\n\\nIf you have any specific questions about our PTO policy or need help requesting time off, I encourage you to reach out to your manager or our HR department. They'll be happy to guide you through the process and provide more detailed information. We're always looking for ways to support our employees' well-being and happiness, and our PTO policy is just one example of our commitment to work-life balance.\"}]\n" + "Deleting parallel_trace.jsonl\n" ] } ], @@ -466,16 +629,49 @@ "parallel_rails_config = RailsConfig.model_validate(PARALLEL_CONFIG)\n", "parallel_rails = LLMRails(parallel_rails_config)\n", "\n", - "response = await parallel_rails.generate_async(\n", - " messages=[\n", - " {\n", - " \"role\": \"user\",\n", - " \"content\": safe_request,\n", - " }\n", - " ]\n", - ")\n", + "# By default, we'll append to the JSONL files. Want to delete to recreate each time\n", + "delete_file_if_it_exists(PARALLEL_TRACE_FILE)\n", "\n", - "print(response.response)" + "safe_response = await parallel_rails.generate_async(\n", + " messages=[{\"role\": \"user\", \"content\": safe_request}]\n", + ")\n", + "unsafe_response = await parallel_rails.generate_async(\n", + " messages=[{\"role\": \"user\", \"content\": unsafe_request}]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Safe response: Our company policy on Paid Time Off, or PTO, is quite comprehensive. Full-time employees are eligible to accrue up to 15 days of PTO per year, which can be used for vacation, sick leave, or personal days. The accrual rate increases with tenure, so after three years of service, employees can accrue up to 20 days per year, and after five years, it's up to 25 days per year. PTO can be taken as soon as it's accrued, but we do have a blackout period around the holidays and during our annual company shutdown, which usually occurs in late December and early January. Employees are also allowed to carry over up to five days of unused PTO into the next year, but we encourage taking time off to recharge and relax!\n" + ] + } + ], + "source": [ + "print(f\"Safe response: {safe_response.response[0]['content']}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Unsafe response: I'm sorry, I can't respond to that.\n" + ] + } + ], + "source": [ + "print(f\"Unsafe response: {unsafe_response.response[0]['content']}\")" ] }, { @@ -500,7 +696,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 25, "metadata": {}, "outputs": [], "source": [ @@ -510,21 +706,27 @@ "def load_trace_file(filename):\n", " \"\"\"Load the JSONL format, converting into a list of dicts\"\"\"\n", " data = []\n", - " with open(filename) as infile:\n", - " for line in infile:\n", - " data.append(json.loads(line))\n", - " print(f\"Loaded {len(data)} lines from {filename}\")\n", + " try:\n", + " with open(filename) as infile:\n", + " for line in infile:\n", + " data.append(json.loads(line))\n", + " print(f\"Loaded {len(data)} lines from {filename}\")\n", + " except FileNotFoundError as e:\n", + " print(\n", + " f\"Couldn't load file {filename}, please rerun the notebook from the start\"\n", + " )\n", " return data" ] }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "def load_trace_data(trace_json_filename):\n", " \"\"\"Load a trace JSON file, returning pandas Dataframe\"\"\"\n", + "\n", " trace_data = load_trace_file(trace_json_filename)\n", "\n", " # Use the file creation time as a start time for the traces and spans\n", @@ -546,7 +748,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 27, "metadata": {}, "outputs": [], "source": [ @@ -562,15 +764,23 @@ " df = df[row_mask].copy()\n", "\n", " # Extract each rail name from the attributes dict. Top-level span doesn't have one\n", - " df[\"name\"] = df[\"attributes\"].apply(lambda x: x.get(\"rail.name\", None))\n", + " df[\"rail_name\"] = df[\"attributes\"].apply(lambda x: x.get(\"rail.name\", None))\n", + " df[\"rail_name_short\"] = df[\"rail_name\"].apply(\n", + " lambda x: \" \".join(x.split()[:4]) if x else x\n", + " )\n", "\n", " # Plotly Gantt charts require a proper datatime rather than relative seconds\n", " # So use the creation-time of each trace file as the absolute start-point of the trace\n", " df[\"start_dt\"] = pd.to_datetime(df[\"start_time\"] + df[\"epoch_seconds\"], unit=\"s\")\n", " df[\"end_dt\"] = pd.to_datetime(df[\"end_time\"] + df[\"epoch_seconds\"], unit=\"s\")\n", "\n", - " n_traces = df[\"trace_id\"].nunique()\n", - " assert n_traces == 1, f\"Found {n_traces} traces, expected 1. Please re-run notebook\"\n", + " # Add a boolean to the safe request trace (the first in the trace data)\n", + " trace_ids = df[\"trace_id\"].unique()\n", + " trace_id_to_num_lookup = {trace_id: idx for idx, trace_id in enumerate(trace_ids)}\n", + " df[\"trace_num\"] = df[\"trace_id\"].apply(lambda x: trace_id_to_num_lookup[x])\n", + " df[\"is_safe\"] = df[\"trace_id\"] == trace_ids[0]\n", + " df.index = range(df.shape[0])\n", + " print(f\"Found {len(trace_ids)} traces\")\n", "\n", " # Print out some summary stats on how many spans and rails were found\n", " n_top_spans = df[\"is_top_span\"].sum()\n", @@ -585,33 +795,27 @@ "source": [ "### Loading Trace Files\n", "\n", - "Using the helper functions, load and clean up the sequential and parallel data." + "Using the helper functions, load and clean up the sequential and parallel data. You'll see two traces, labelled with trace_num. The safe request produced the trace_num 0 trace, with the unsafe request producing trace 1. \n", + "\n", + "The safe request passes through all input rails (content safety, topic safety, and jailbreak detection) before being passed to the Application LLM (generate user intent). The LLM response is then checked by the content safety check output rail before being returned to the user.\n", + "\n", + "The unsafe request is blocked by the content safety and/or topic-control. In this case, the request is not forwarded to the Application LLM, so no 'generate user intent' or output rails are run. " ] }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Loaded 1 lines from sequential_trace.jsonl\n", - "Found 1 top-level spans, 5 rail spans\n" + "Loaded 2 lines from sequential_trace.jsonl\n", + "Found 2 traces\n", + "Found 2 top-level spans, 6 rail spans\n" ] - } - ], - "source": [ - "raw_sequential_df = load_trace_data(SEQUENTIAL_TRACE_FILE)\n", - "sequential_df = clean_trace_dataframe(raw_sequential_df)" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ + }, { "data": { "text/html": [ @@ -633,229 +837,127 @@ " \n", " \n", " \n", + " trace_num\n", + " rail_name_short\n", " name\n", - " span_id\n", - " parent_id\n", - " start_time\n", - " end_time\n", + " is_safe\n", " duration\n", - " span_type\n", - " span_kind\n", - " attributes\n", - " events\n", - " trace_id\n", - " epoch_seconds\n", - " is_rail\n", - " is_top_span\n", - " start_dt\n", - " end_dt\n", " \n", " \n", " \n", " \n", " 0\n", + " 0\n", " None\n", - " 65f79cb5-a93c-4581-94b4-cfeb2bf5a026\n", - " None\n", - " 0.000000\n", - " 7.403602\n", - " 7.403602\n", - " InteractionSpan\n", - " server\n", - " {'span.kind': 'server', 'gen_ai.operation.name...\n", - " [{'name': 'guardrails.user_message', 'timestam...\n", - " 4c84db06-e7b7-41b6-b5b4-907cbdfa0232\n", - " 1756226960\n", - " False\n", + " guardrails.request\n", " True\n", - " 2025-08-26 16:49:20.000000000\n", - " 2025-08-26 16:49:27.403602123\n", + " 3.810076\n", " \n", " \n", " 1\n", - " content safety check input $model=content_safety\n", - " 911abc24-4111-43b5-90bb-65b521e75f61\n", - " 65f79cb5-a93c-4581-94b4-cfeb2bf5a026\n", - " 0.000000\n", - " 0.450512\n", - " 0.450512\n", - " RailSpan\n", - " internal\n", - " {'span.kind': 'internal', 'rail.type': 'input'...\n", - " NaN\n", - " 4c84db06-e7b7-41b6-b5b4-907cbdfa0232\n", - " 1756226960\n", + " 0\n", + " content safety check input\n", + " guardrails.rail\n", " True\n", - " False\n", - " 2025-08-26 16:49:20.000000000\n", - " 2025-08-26 16:49:20.450512171\n", + " 0.403598\n", " \n", " \n", - " 4\n", - " topic safety check input $model=topic_control\n", - " e9113960-9023-46ce-b4ec-e9454ecbfb43\n", - " 65f79cb5-a93c-4581-94b4-cfeb2bf5a026\n", - " 0.452292\n", - " 0.812895\n", - " 0.360603\n", - " RailSpan\n", - " internal\n", - " {'span.kind': 'internal', 'rail.type': 'input'...\n", - " NaN\n", - " 4c84db06-e7b7-41b6-b5b4-907cbdfa0232\n", - " 1756226960\n", + " 2\n", + " 0\n", + " topic safety check input\n", + " guardrails.rail\n", " True\n", - " False\n", - " 2025-08-26 16:49:20.452291965\n", - " 2025-08-26 16:49:20.812895060\n", + " 0.324701\n", " \n", " \n", - " 7\n", + " 3\n", + " 0\n", " jailbreak detection model\n", - " dc148a54-4168-46e4-b7fe-9379a7df1102\n", - " 65f79cb5-a93c-4581-94b4-cfeb2bf5a026\n", - " 0.814582\n", - " 1.151427\n", - " 0.336845\n", - " RailSpan\n", - " internal\n", - " {'span.kind': 'internal', 'rail.type': 'input'...\n", - " NaN\n", - " 4c84db06-e7b7-41b6-b5b4-907cbdfa0232\n", - " 1756226960\n", + " guardrails.rail\n", " True\n", - " False\n", - " 2025-08-26 16:49:20.814581871\n", - " 2025-08-26 16:49:21.151427031\n", + " 0.300511\n", " \n", " \n", - " 9\n", + " 4\n", + " 0\n", " generate user intent\n", - " 65a93729-16f7-4d5e-86a8-d1f23d842c1a\n", - " 65f79cb5-a93c-4581-94b4-cfeb2bf5a026\n", - " 1.159738\n", - " 6.839181\n", - " 5.679443\n", - " RailSpan\n", - " internal\n", - " {'span.kind': 'internal', 'rail.type': 'genera...\n", - " NaN\n", - " 4c84db06-e7b7-41b6-b5b4-907cbdfa0232\n", - " 1756226960\n", + " guardrails.rail\n", " True\n", - " False\n", - " 2025-08-26 16:49:21.159738064\n", - " 2025-08-26 16:49:26.839180946\n", + " 2.236309\n", " \n", " \n", - " 12\n", - " content safety check output $model=content_safety\n", - " d62875aa-8517-45c0-84fc-6215e018a557\n", - " 65f79cb5-a93c-4581-94b4-cfeb2bf5a026\n", - " 6.839181\n", - " 7.403602\n", - " 0.564421\n", - " RailSpan\n", - " internal\n", - " {'span.kind': 'internal', 'rail.type': 'output...\n", - " NaN\n", - " 4c84db06-e7b7-41b6-b5b4-907cbdfa0232\n", - " 1756226960\n", + " 5\n", + " 0\n", + " content safety check output\n", + " guardrails.rail\n", " True\n", + " 0.532284\n", + " \n", + " \n", + " 6\n", + " 1\n", + " None\n", + " guardrails.request\n", + " False\n", + " 0.610056\n", + " \n", + " \n", + " 7\n", + " 1\n", + " content safety check input\n", + " guardrails.rail\n", " False\n", - " 2025-08-26 16:49:26.839180946\n", - " 2025-08-26 16:49:27.403602123\n", + " 0.610056\n", " \n", " \n", "\n", "" ], "text/plain": [ - " name \\\n", - "0 None \n", - "1 content safety check input $model=content_safety \n", - "4 topic safety check input $model=topic_control \n", - "7 jailbreak detection model \n", - "9 generate user intent \n", - "12 content safety check output $model=content_safety \n", - "\n", - " span_id \\\n", - "0 65f79cb5-a93c-4581-94b4-cfeb2bf5a026 \n", - "1 911abc24-4111-43b5-90bb-65b521e75f61 \n", - "4 e9113960-9023-46ce-b4ec-e9454ecbfb43 \n", - "7 dc148a54-4168-46e4-b7fe-9379a7df1102 \n", - "9 65a93729-16f7-4d5e-86a8-d1f23d842c1a \n", - "12 d62875aa-8517-45c0-84fc-6215e018a557 \n", - "\n", - " parent_id start_time end_time duration \\\n", - "0 None 0.000000 7.403602 7.403602 \n", - "1 65f79cb5-a93c-4581-94b4-cfeb2bf5a026 0.000000 0.450512 0.450512 \n", - "4 65f79cb5-a93c-4581-94b4-cfeb2bf5a026 0.452292 0.812895 0.360603 \n", - "7 65f79cb5-a93c-4581-94b4-cfeb2bf5a026 0.814582 1.151427 0.336845 \n", - "9 65f79cb5-a93c-4581-94b4-cfeb2bf5a026 1.159738 6.839181 5.679443 \n", - "12 65f79cb5-a93c-4581-94b4-cfeb2bf5a026 6.839181 7.403602 0.564421 \n", - "\n", - " span_type span_kind \\\n", - "0 InteractionSpan server \n", - "1 RailSpan internal \n", - "4 RailSpan internal \n", - "7 RailSpan internal \n", - "9 RailSpan internal \n", - "12 RailSpan internal \n", - "\n", - " attributes \\\n", - "0 {'span.kind': 'server', 'gen_ai.operation.name... \n", - "1 {'span.kind': 'internal', 'rail.type': 'input'... \n", - "4 {'span.kind': 'internal', 'rail.type': 'input'... \n", - "7 {'span.kind': 'internal', 'rail.type': 'input'... \n", - "9 {'span.kind': 'internal', 'rail.type': 'genera... \n", - "12 {'span.kind': 'internal', 'rail.type': 'output... \n", - "\n", - " events \\\n", - "0 [{'name': 'guardrails.user_message', 'timestam... \n", - "1 NaN \n", - "4 NaN \n", - "7 NaN \n", - "9 NaN \n", - "12 NaN \n", - "\n", - " trace_id epoch_seconds is_rail is_top_span \\\n", - "0 4c84db06-e7b7-41b6-b5b4-907cbdfa0232 1756226960 False True \n", - "1 4c84db06-e7b7-41b6-b5b4-907cbdfa0232 1756226960 True False \n", - "4 4c84db06-e7b7-41b6-b5b4-907cbdfa0232 1756226960 True False \n", - "7 4c84db06-e7b7-41b6-b5b4-907cbdfa0232 1756226960 True False \n", - "9 4c84db06-e7b7-41b6-b5b4-907cbdfa0232 1756226960 True False \n", - "12 4c84db06-e7b7-41b6-b5b4-907cbdfa0232 1756226960 True False \n", + " trace_num rail_name_short name is_safe \\\n", + "0 0 None guardrails.request True \n", + "1 0 content safety check input guardrails.rail True \n", + "2 0 topic safety check input guardrails.rail True \n", + "3 0 jailbreak detection model guardrails.rail True \n", + "4 0 generate user intent guardrails.rail True \n", + "5 0 content safety check output guardrails.rail True \n", + "6 1 None guardrails.request False \n", + "7 1 content safety check input guardrails.rail False \n", "\n", - " start_dt end_dt \n", - "0 2025-08-26 16:49:20.000000000 2025-08-26 16:49:27.403602123 \n", - "1 2025-08-26 16:49:20.000000000 2025-08-26 16:49:20.450512171 \n", - "4 2025-08-26 16:49:20.452291965 2025-08-26 16:49:20.812895060 \n", - "7 2025-08-26 16:49:20.814581871 2025-08-26 16:49:21.151427031 \n", - "9 2025-08-26 16:49:21.159738064 2025-08-26 16:49:26.839180946 \n", - "12 2025-08-26 16:49:26.839180946 2025-08-26 16:49:27.403602123 " + " duration \n", + "0 3.810076 \n", + "1 0.403598 \n", + "2 0.324701 \n", + "3 0.300511 \n", + "4 2.236309 \n", + "5 0.532284 \n", + "6 0.610056 \n", + "7 0.610056 " ] }, - "execution_count": 22, + "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "sequential_df" + "raw_sequential_df = load_trace_data(SEQUENTIAL_TRACE_FILE)\n", + "sequential_df = clean_trace_dataframe(raw_sequential_df)\n", + "sequential_df[[\"trace_num\", \"rail_name_short\", \"name\", \"is_safe\", \"duration\"]]" ] }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Loaded 1 lines from parallel_trace.jsonl\n", - "Found 1 top-level spans, 5 rail spans\n" + "Loaded 2 lines from parallel_trace.jsonl\n", + "Found 2 traces\n", + "Found 2 top-level spans, 7 rail spans\n" ] }, { @@ -879,302 +981,123 @@ " \n", " \n", " \n", + " trace_num\n", + " rail_name_short\n", " name\n", + " is_safe\n", " duration\n", " \n", " \n", " \n", " \n", " 0\n", + " 0\n", " None\n", - " 8.248329\n", + " guardrails.request\n", + " True\n", + " 2.917370\n", " \n", " \n", " 1\n", - " content safety check input $model=content_safety\n", - " 0.456112\n", + " 0\n", + " content safety check input\n", + " guardrails.rail\n", + " True\n", + " 0.421178\n", " \n", " \n", - " 4\n", - " topic safety check input $model=topic_control\n", - " 0.359808\n", + " 2\n", + " 0\n", + " topic safety check input\n", + " guardrails.rail\n", + " True\n", + " 0.338333\n", " \n", " \n", - " 7\n", + " 3\n", + " 0\n", " jailbreak detection model\n", - " 0.330025\n", + " guardrails.rail\n", + " True\n", + " 0.284210\n", " \n", " \n", - " 9\n", + " 4\n", + " 0\n", " generate user intent\n", - " 7.212214\n", - " \n", - " \n", - " 12\n", - " content safety check output $model=content_safety\n", - " 0.577307\n", - " \n", - " \n", - "\n", - "" - ], - "text/plain": [ - " name duration\n", - "0 None 8.248329\n", - "1 content safety check input $model=content_safety 0.456112\n", - "4 topic safety check input $model=topic_control 0.359808\n", - "7 jailbreak detection model 0.330025\n", - "9 generate user intent 7.212214\n", - "12 content safety check output $model=content_safety 0.577307" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "raw_parallel_df = load_trace_data(PARALLEL_TRACE_FILE)\n", - "parallel_df = clean_trace_dataframe(raw_parallel_df)\n", - "parallel_df[[\"name\", \"duration\"]]" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", " \n", - " \n", - " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", - " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", + " \n", " \n", " \n", " \n", + " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", " \n", - " \n", - " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", " \n", - " \n", - " \n", + " \n", " \n", " \n", "
namespan_idparent_idstart_timeend_timedurationspan_typespan_kindattributeseventstrace_idepoch_secondsis_railis_top_spanstart_dtend_dt
0Nonebebb78c1-8788-4f43-96cb-161f9b24077aNone0.0000008.2483298.248329InteractionSpanserver{'span.kind': 'server', 'gen_ai.operation.name...[{'name': 'guardrails.user_message', 'timestam...861c9588-daf4-4006-b8ce-48809ec682f41756226969Falseguardrails.railTrue2025-08-26 16:49:29.0000000002025-08-26 16:49:37.2483289241.977735
1content safety check input $model=content_safety97a3d33c-074e-4e95-9fb5-551d5bf2ef4cbebb78c1-8788-4f43-96cb-161f9b24077a0.0000000.4561120.456112RailSpaninternal{'span.kind': 'internal', 'rail.type': 'input'...NaN861c9588-daf4-4006-b8ce-48809ec682f4175622696950content safety check outputguardrails.railTrueFalse2025-08-26 16:49:29.0000000002025-08-26 16:49:29.4561119080.514885
4topic safety check input $model=topic_controlc5fc6e0b-19d5-4d3c-a300-4a1f90f5b2bebebb78c1-8788-4f43-96cb-161f9b24077a0.0000230.3598310.359808RailSpaninternal{'span.kind': 'internal', 'rail.type': 'input'...NaN861c9588-daf4-4006-b8ce-48809ec682f41756226969True61Noneguardrails.requestFalse2025-08-26 16:49:29.0000231272025-08-26 16:49:29.3598310950.329526
71jailbreak detection modelb206d6c5-fa4a-48dd-a0c9-22bba163759fbebb78c1-8788-4f43-96cb-161f9b24077a0.0000360.3300610.330025RailSpaninternal{'span.kind': 'internal', 'rail.type': 'input'...NaN861c9588-daf4-4006-b8ce-48809ec682f41756226969Trueguardrails.railFalse2025-08-26 16:49:29.0000357632025-08-26 16:49:29.3300609590.302264
9generate user intentab6d251e-f919-4e5b-b645-d1a5a025dcf1bebb78c1-8788-4f43-96cb-161f9b24077a0.4588087.6710227.212214RailSpaninternal{'span.kind': 'internal', 'rail.type': 'genera...NaN861c9588-daf4-4006-b8ce-48809ec682f41756226969TrueFalse2025-08-26 16:49:29.4588081842025-08-26 16:49:36.671022177
12content safety check output $model=content_safety047b45d9-43d6-4a97-b8c2-764a8d36a7f5bebb78c1-8788-4f43-96cb-161f9b24077a7.6710228.2483290.577307RailSpaninternal{'span.kind': 'internal', 'rail.type': 'output...NaN861c9588-daf4-4006-b8ce-48809ec682f41756226969True81topic safety check inputguardrails.railFalse2025-08-26 16:49:36.6710221772025-08-26 16:49:37.2483289240.000013
\n", "
" ], "text/plain": [ - " name \\\n", - "0 None \n", - "1 content safety check input $model=content_safety \n", - "4 topic safety check input $model=topic_control \n", - "7 jailbreak detection model \n", - "9 generate user intent \n", - "12 content safety check output $model=content_safety \n", + " trace_num rail_name_short name is_safe \\\n", + "0 0 None guardrails.request True \n", + "1 0 content safety check input guardrails.rail True \n", + "2 0 topic safety check input guardrails.rail True \n", + "3 0 jailbreak detection model guardrails.rail True \n", + "4 0 generate user intent guardrails.rail True \n", + "5 0 content safety check output guardrails.rail True \n", + "6 1 None guardrails.request False \n", + "7 1 jailbreak detection model guardrails.rail False \n", + "8 1 topic safety check input guardrails.rail False \n", "\n", - " span_id \\\n", - "0 bebb78c1-8788-4f43-96cb-161f9b24077a \n", - "1 97a3d33c-074e-4e95-9fb5-551d5bf2ef4c \n", - "4 c5fc6e0b-19d5-4d3c-a300-4a1f90f5b2be \n", - "7 b206d6c5-fa4a-48dd-a0c9-22bba163759f \n", - "9 ab6d251e-f919-4e5b-b645-d1a5a025dcf1 \n", - "12 047b45d9-43d6-4a97-b8c2-764a8d36a7f5 \n", - "\n", - " parent_id start_time end_time duration \\\n", - "0 None 0.000000 8.248329 8.248329 \n", - "1 bebb78c1-8788-4f43-96cb-161f9b24077a 0.000000 0.456112 0.456112 \n", - "4 bebb78c1-8788-4f43-96cb-161f9b24077a 0.000023 0.359831 0.359808 \n", - "7 bebb78c1-8788-4f43-96cb-161f9b24077a 0.000036 0.330061 0.330025 \n", - "9 bebb78c1-8788-4f43-96cb-161f9b24077a 0.458808 7.671022 7.212214 \n", - "12 bebb78c1-8788-4f43-96cb-161f9b24077a 7.671022 8.248329 0.577307 \n", - "\n", - " span_type span_kind \\\n", - "0 InteractionSpan server \n", - "1 RailSpan internal \n", - "4 RailSpan internal \n", - "7 RailSpan internal \n", - "9 RailSpan internal \n", - "12 RailSpan internal \n", - "\n", - " attributes \\\n", - "0 {'span.kind': 'server', 'gen_ai.operation.name... \n", - "1 {'span.kind': 'internal', 'rail.type': 'input'... \n", - "4 {'span.kind': 'internal', 'rail.type': 'input'... \n", - "7 {'span.kind': 'internal', 'rail.type': 'input'... \n", - "9 {'span.kind': 'internal', 'rail.type': 'genera... \n", - "12 {'span.kind': 'internal', 'rail.type': 'output... \n", - "\n", - " events \\\n", - "0 [{'name': 'guardrails.user_message', 'timestam... \n", - "1 NaN \n", - "4 NaN \n", - "7 NaN \n", - "9 NaN \n", - "12 NaN \n", - "\n", - " trace_id epoch_seconds is_rail is_top_span \\\n", - "0 861c9588-daf4-4006-b8ce-48809ec682f4 1756226969 False True \n", - "1 861c9588-daf4-4006-b8ce-48809ec682f4 1756226969 True False \n", - "4 861c9588-daf4-4006-b8ce-48809ec682f4 1756226969 True False \n", - "7 861c9588-daf4-4006-b8ce-48809ec682f4 1756226969 True False \n", - "9 861c9588-daf4-4006-b8ce-48809ec682f4 1756226969 True False \n", - "12 861c9588-daf4-4006-b8ce-48809ec682f4 1756226969 True False \n", - "\n", - " start_dt end_dt \n", - "0 2025-08-26 16:49:29.000000000 2025-08-26 16:49:37.248328924 \n", - "1 2025-08-26 16:49:29.000000000 2025-08-26 16:49:29.456111908 \n", - "4 2025-08-26 16:49:29.000023127 2025-08-26 16:49:29.359831095 \n", - "7 2025-08-26 16:49:29.000035763 2025-08-26 16:49:29.330060959 \n", - "9 2025-08-26 16:49:29.458808184 2025-08-26 16:49:36.671022177 \n", - "12 2025-08-26 16:49:36.671022177 2025-08-26 16:49:37.248328924 " + " duration \n", + "0 2.917370 \n", + "1 0.421178 \n", + "2 0.338333 \n", + "3 0.284210 \n", + "4 1.977735 \n", + "5 0.514885 \n", + "6 0.329526 \n", + "7 0.302264 \n", + "8 0.000013 " ] }, - "execution_count": 24, + "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "parallel_df" + "raw_parallel_df = load_trace_data(PARALLEL_TRACE_FILE)\n", + "parallel_df = clean_trace_dataframe(raw_parallel_df)\n", + "parallel_df[[\"trace_num\", \"rail_name_short\", \"name\", \"is_safe\", \"duration\"]]" ] }, { @@ -1185,15 +1108,15 @@ "\n", "The DataFrame below shows the time (in seconds) for the top-level end-to-end interaction, and each of the rails that are called during the interaction. These all run sequentially in this configuration. All input rails have to pass before the user query is passed to the LLM. \n", "\n", - "In the DataFrame below, the top-level span is named `interaction`, and represents the end-to-end server-side duration of the `generate_async()` call above. This top-level span comprises 5 rail actions, which are:\n", + "In the DataFrame below, the top-level span is labelled with the `is_top_span` boolean, and represents the end-to-end server-side duration of the `generate_async()` call. Each top-level span for a safe request comprises 5 rail actions, which are:\n", "\n", - " * `rail: content safety check input $model=content_safety'` : Time to check the user input by the [Content-safety Nemoguard NIM](https://build.nvidia.com/nvidia/llama-3_1-nemoguard-8b-content-safety).\n", - " * `rail: topic safety check input $model=topic_control'` : Time to check user input by the [Topic-Control Nemoguard NIM](https://build.nvidia.com/nvidia/llama-3_1-nemoguard-8b-topic-control).\n", - " * `rail: jailbreak detection model'` : Time to check the user input by the [Jailbreak Nemoguard NIM](https://build.nvidia.com/nvidia/nemoguard-jailbreak-detect).\n", - " * `rail: generate user intent'` : Time to generate a response to the user's question from the Main LLM ([Llama 3.3 70B Instruct](https://build.nvidia.com/meta/llama-3_3-70b-instruct)).\n", - " * `rail: content safety check output $model=content_safety` : Time to check the user input and LLM response by the [Content-safety Nemoguard NIM](https://build.nvidia.com/nvidia/llama-3_1-nemoguard-8b-content-safety).\n", + " * `content safety check input` : Time to check the user input by the [Content-safety Nemoguard NIM](https://build.nvidia.com/nvidia/llama-3_1-nemoguard-8b-content-safety).\n", + " * `topic safety check input` : Time to check user input by the [Topic-Control Nemoguard NIM](https://build.nvidia.com/nvidia/llama-3_1-nemoguard-8b-topic-control).\n", + " * `jailbreak detection model` : Time to check the user input by the [Jailbreak Nemoguard NIM](https://build.nvidia.com/nvidia/nemoguard-jailbreak-detect).\n", + " * `generate user intent` : Time to generate a response to the user's question from the Main LLM ([Llama 3.1 8B Instruct](https://build.nvidia.com/meta/llama-3_1-8b-instruct)).\n", + " * `content safety check output` : Time to check the user input and LLM response by the [Content-safety Nemoguard NIM](https://build.nvidia.com/nvidia/llama-3_1-nemoguard-8b-content-safety).\n", "\n", - "The durations should be roughly in the 400ms - 600ms range, depending on user traffic. The Llama 3.3 70B Instruct model that generates the response is an order of magnitude larger than the NemoGuard models, so it may take up to a minute to generate a response, depending on the cluster load." + "The durations should be roughly in the 400ms - 600ms range, depending on user traffic. The Llama 3.1 8B Instruct model that generates the response is an order of magnitude larger than the NemoGuard models, so it may take up to a minute to generate a response, depending on the cluster load." ] }, { @@ -1207,7 +1130,17 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "PLOT_WIDTH = 800\n", + "PLOT_HEIGHT = 400" + ] + }, + { + "cell_type": "code", + "execution_count": 31, "metadata": {}, "outputs": [ { @@ -1233,7 +1166,7 @@ " \n", " is_rail\n", " is_top_span\n", - " name\n", + " rail_name_short\n", " duration\n", " \n", " \n", @@ -1243,77 +1176,85 @@ " False\n", " True\n", " None\n", - " 7.403602\n", + " 3.810076\n", " \n", " \n", " 1\n", " True\n", " False\n", - " content safety check input $model=content_safety\n", - " 0.450512\n", + " content safety check input\n", + " 0.403598\n", " \n", " \n", - " 4\n", + " 2\n", " True\n", " False\n", - " topic safety check input $model=topic_control\n", - " 0.360603\n", + " topic safety check input\n", + " 0.324701\n", " \n", " \n", - " 7\n", + " 3\n", " True\n", " False\n", " jailbreak detection model\n", - " 0.336845\n", + " 0.300511\n", " \n", " \n", - " 9\n", + " 4\n", " True\n", " False\n", " generate user intent\n", - " 5.679443\n", + " 2.236309\n", " \n", " \n", - " 12\n", + " 5\n", " True\n", " False\n", - " content safety check output $model=content_safety\n", - " 0.564421\n", + " content safety check output\n", + " 0.532284\n", + " \n", + " \n", + " 6\n", + " False\n", + " True\n", + " None\n", + " 0.610056\n", + " \n", + " \n", + " 7\n", + " True\n", + " False\n", + " content safety check input\n", + " 0.610056\n", " \n", " \n", "\n", "" ], "text/plain": [ - " is_rail is_top_span name \\\n", - "0 False True None \n", - "1 True False content safety check input $model=content_safety \n", - "4 True False topic safety check input $model=topic_control \n", - "7 True False jailbreak detection model \n", - "9 True False generate user intent \n", - "12 True False content safety check output $model=content_safety \n", - "\n", - " duration \n", - "0 7.403602 \n", - "1 0.450512 \n", - "4 0.360603 \n", - "7 0.336845 \n", - "9 5.679443 \n", - "12 0.564421 " + " is_rail is_top_span rail_name_short duration\n", + "0 False True None 3.810076\n", + "1 True False content safety check input 0.403598\n", + "2 True False topic safety check input 0.324701\n", + "3 True False jailbreak detection model 0.300511\n", + "4 True False generate user intent 2.236309\n", + "5 True False content safety check output 0.532284\n", + "6 False True None 0.610056\n", + "7 True False content safety check input 0.610056" ] }, - "execution_count": 25, + "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "sequential_df[[\"is_rail\", \"is_top_span\", \"name\", \"duration\"]]" + "sequential_df[[\"is_rail\", \"is_top_span\", \"rail_name_short\", \"duration\"]]" ] }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 32, "metadata": {}, "outputs": [ { @@ -1339,14 +1280,14 @@ "type": "bar", "x": [ "generate user intent", - "content safety check output $model=content_safety", - "content safety check input $model=content_safety", - "topic safety check input $model=topic_control", + "content safety check output", + "content safety check input", + "topic safety check input", "jailbreak detection model" ], "xaxis": "x", "y": { - "bdata": "AAAA4L+3FkAAAAAAvQ/iPwAAAAAx1dw/AAAAAB8U1z8AAAAA347VPw==", + "bdata": "AAAAAPbjAUAAAACAeAjhPwAAAACM1Nk/AAAAAObH1D8AAAAAkjvTPw==", "dtype": "f8" }, "yaxis": "y" @@ -2135,7 +2076,7 @@ } }, "title": { - "text": "Sequential Guardrails Rail durations" + "text": "Sequential Guardrails Rail durations (safe request)" }, "width": 800, "xaxis": { @@ -2159,8 +2100,7 @@ } } } - }, - "image/png": "" + } }, "metadata": {}, "output_type": "display_data" @@ -2169,13 +2109,15 @@ "source": [ "# Now let's plot a bar-graph of these numbers\n", "px.bar(\n", - " sequential_df[sequential_df[\"is_rail\"]].sort_values(\"duration\", ascending=False),\n", - " x=\"name\",\n", + " sequential_df[sequential_df[\"is_safe\"] & sequential_df[\"is_rail\"]].sort_values(\n", + " \"duration\", ascending=False\n", + " ),\n", + " x=\"rail_name_short\",\n", " y=\"duration\",\n", - " title=\"Sequential Guardrails Rail durations\",\n", - " labels={\"name\": \"Rail Name\", \"duration\": \"Duration (seconds)\"},\n", - " width=800,\n", - " height=800,\n", + " title=\"Sequential Guardrails Rail durations (safe request)\",\n", + " labels={\"rail_name_short\": \"Rail Name\", \"duration\": \"Duration (seconds)\"},\n", + " width=PLOT_WIDTH,\n", + " height=PLOT_HEIGHT * 2,\n", ")" ] }, @@ -2188,7 +2130,7 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 33, "metadata": {}, "outputs": [ { @@ -2200,11 +2142,11 @@ "data": [ { "base": [ - "2025-08-26T16:49:20.000000000", - "2025-08-26T16:49:20.452291965", - "2025-08-26T16:49:20.814581871", - "2025-08-26T16:49:21.159738064", - "2025-08-26T16:49:26.839180946" + "2025-09-05T14:42:28.000000000", + "2025-09-05T14:42:28.404770136", + "2025-09-05T14:42:28.731706858", + "2025-09-05T14:42:29.041482925", + "2025-09-05T14:42:31.277791977" ], "hovertemplate": "start_dt=%{base}
end_dt=%{x}
Rail Name=%{y}", "legendgroup": "", @@ -2220,22 +2162,23 @@ "textposition": "auto", "type": "bar", "x": { - "bdata": "wgFoAVABLxY0Ag==", + "bdata": "kwFEASwBvAgUAg==", "dtype": "i2" }, "xaxis": "x", "y": [ - "content safety check input $model=content_safety", - "topic safety check input $model=topic_control", + "content safety check input", + "topic safety check input", "jailbreak detection model", "generate user intent", - "content safety check output $model=content_safety" + "content safety check output" ], "yaxis": "y" } ], "layout": { "barmode": "overlay", + "height": 400, "legend": { "tracegroupgap": 0 }, @@ -3016,8 +2959,9 @@ } }, "title": { - "text": "Gantt chart of rails calls in sequential mode" + "text": "Gantt chart of rails calls in sequential mode (safe request)" }, + "width": 800, "xaxis": { "anchor": "y", "domain": [ @@ -3038,8 +2982,7 @@ } } } - }, - "image/png": "" + } }, "metadata": {}, "output_type": "display_data" @@ -3049,12 +2992,14 @@ "# Let's plot a Gantt chart, to show the sequence of when the rails execute\n", "\n", "fig = px.timeline(\n", - " sequential_df.loc[sequential_df[\"is_rail\"]],\n", + " sequential_df.loc[sequential_df[\"is_safe\"] & sequential_df[\"is_rail\"]],\n", " x_start=\"start_dt\",\n", " x_end=\"end_dt\",\n", - " y=\"name\",\n", - " title=\"Gantt chart of rails calls in sequential mode\",\n", - " labels={\"name\": \"Rail Name\"},\n", + " y=\"rail_name_short\",\n", + " title=\"Gantt chart of rails calls in sequential mode (safe request)\",\n", + " labels={\"rail_name_short\": \"Rail Name\"},\n", + " width=PLOT_WIDTH,\n", + " height=PLOT_HEIGHT,\n", ")\n", "fig.update_yaxes(autorange=\"reversed\")\n", "fig.show()" @@ -3071,7 +3016,7 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 34, "metadata": {}, "outputs": [ { @@ -3097,14 +3042,14 @@ "type": "bar", "x": [ "generate user intent", - "content safety check output $model=content_safety", - "content safety check input $model=content_safety", - "topic safety check input $model=topic_control", + "content safety check output", + "content safety check input", + "topic safety check input", "jailbreak detection model" ], "xaxis": "x", "y": { - "bdata": "AAAAoE7ZHEAAAAAATHniPwAAAADwMN0/AAAAABgH1z8AAAAAIh/VPw==", + "bdata": "AAAAwM2k/z8AAACA73ngPwAAAACU9No/AAAAAEGn1T8AAAAAgDDSPw==", "dtype": "f8" }, "yaxis": "y" @@ -3112,7 +3057,7 @@ ], "layout": { "barmode": "relative", - "height": 600, + "height": 800, "legend": { "tracegroupgap": 0 }, @@ -3893,7 +3838,7 @@ } }, "title": { - "text": "Sequential Guardrails Rail durations" + "text": "Parallel Guardrails Rail durations (safe request)" }, "width": 800, "xaxis": { @@ -3917,8 +3862,7 @@ } } } - }, - "image/png": "" + } }, "metadata": {}, "output_type": "display_data" @@ -3927,13 +3871,15 @@ "source": [ "# Now let's plot a bar-graph of these numbers\n", "px.bar(\n", - " parallel_df[parallel_df[\"is_rail\"]].sort_values(\"duration\", ascending=False),\n", - " x=\"name\",\n", + " parallel_df[parallel_df[\"is_safe\"] & parallel_df[\"is_rail\"]].sort_values(\n", + " \"duration\", ascending=False\n", + " ),\n", + " x=\"rail_name_short\",\n", " y=\"duration\",\n", - " title=\"Sequential Guardrails Rail durations\",\n", - " labels={\"name\": \"Rail Name\", \"duration\": \"Duration (seconds)\"},\n", - " width=800,\n", - " height=600,\n", + " title=\"Parallel Guardrails Rail durations (safe request)\",\n", + " labels={\"rail_name_short\": \"Rail Name\", \"duration\": \"Duration (seconds)\"},\n", + " width=PLOT_WIDTH,\n", + " height=PLOT_HEIGHT * 2,\n", ")" ] }, @@ -3946,7 +3892,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 35, "metadata": {}, "outputs": [ { @@ -3958,11 +3904,11 @@ "data": [ { "base": [ - "2025-08-26T16:49:29.000000000", - "2025-08-26T16:49:29.000023127", - "2025-08-26T16:49:29.000035763", - "2025-08-26T16:49:29.458808184", - "2025-08-26T16:49:36.671022177" + "2025-09-05T14:42:31.000000000", + "2025-09-05T14:42:31.000024796", + "2025-09-05T14:42:31.000034809", + "2025-09-05T14:42:31.424749851", + "2025-09-05T14:42:33.402485132" ], "hovertemplate": "start_dt=%{base}
end_dt=%{x}
Rail Name=%{y}", "legendgroup": "", @@ -3978,16 +3924,16 @@ "textposition": "auto", "type": "bar", "x": { - "bdata": "yAFnAUoBLBxBAg==", + "bdata": "pQFSARwBuQcCAg==", "dtype": "i2" }, "xaxis": "x", "y": [ - "content safety check input $model=content_safety", - "topic safety check input $model=topic_control", + "content safety check input", + "topic safety check input", "jailbreak detection model", "generate user intent", - "content safety check output $model=content_safety" + "content safety check output" ], "yaxis": "y" } @@ -4775,9 +4721,9 @@ } }, "title": { - "text": "Gantt chart of rails calls in parallel mode" + "text": "Gantt chart of rails calls in parallel mode (safe request)" }, - "width": 1000, + "width": 800, "xaxis": { "anchor": "y", "domain": [ @@ -4798,7 +4744,8 @@ } } } - } + }, + "image/png": "" }, "metadata": {}, "output_type": "display_data" @@ -4808,14 +4755,14 @@ "# Let's plot a Gantt chart, to show the sequence of when the rails execute\n", "\n", "fig = px.timeline(\n", - " parallel_df.loc[parallel_df[\"is_rail\"]],\n", + " parallel_df.loc[parallel_df[\"is_safe\"] & parallel_df[\"is_rail\"]],\n", " x_start=\"start_dt\",\n", " x_end=\"end_dt\",\n", - " y=\"name\",\n", - " title=\"Gantt chart of rails calls in parallel mode\",\n", - " labels={\"name\": \"Rail Name\"},\n", - " height=400,\n", - " width=1000,\n", + " y=\"rail_name_short\",\n", + " title=\"Gantt chart of rails calls in parallel mode (safe request)\",\n", + " labels={\"rail_name_short\": \"Rail Name\"},\n", + " width=PLOT_WIDTH,\n", + " height=PLOT_HEIGHT,\n", ")\n", "fig.update_yaxes(autorange=\"reversed\")\n", "fig.show()" @@ -4827,32 +4774,32 @@ "source": [ "### Compare Sequential and Parallel Trace Data\n", "\n", - "The following cells compare the input rail times for the sequential and parallel configurations." + "The following cells compare the input rail times for the sequential and parallel configurations. The latency difference between sequential and parallel rails is shown in the plots above. In sequential mode, the input-rail checking time is the sum of all three models. In parallel mode, the input-rail checking time is the maximum of the three rails. Let's quantify the time-saving below." ] }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "INPUT_RAIL_NAMES = {\n", - " \"content safety check input $model=content_safety\",\n", - " \"topic safety check input $model=topic_control\",\n", + " \"content safety check input\",\n", + " \"topic safety check input\",\n", " \"jailbreak detection model\",\n", "}" ] }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Sequential input rail time: 1.1480s\n" + "Sequential input rail time: 1.0288s\n" ] } ], @@ -4861,29 +4808,32 @@ "\n", "# Sum the sequential rail run-times\n", "sequential_input_rail_time = sequential_df.loc[\n", - " sequential_df[\"name\"].isin(INPUT_RAIL_NAMES), \"duration\"\n", + " sequential_df[\"is_safe\"] # Use the safe user-request\n", + " & sequential_df[\"rail_name_short\"].isin(INPUT_RAIL_NAMES),\n", + " \"duration\", # Use input-rails only\n", "].sum()\n", "print(f\"Sequential input rail time: {sequential_input_rail_time:.4f}s\")" ] }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Parallel input rail time: 0.4561s\n", - "Parallel input speedup: 2.5168 times\n" + "Parallel input rail time: 0.4212s\n", + "Parallel input speedup: 2.4427 times\n" ] } ], "source": [ "# Final summary of the time-saving due to parallel rails\n", "parallel_input_rail_time = parallel_df.loc[\n", - " parallel_df[\"name\"].isin(INPUT_RAIL_NAMES), \"duration\"\n", + " parallel_df[\"is_safe\"] & parallel_df[\"rail_name_short\"].isin(INPUT_RAIL_NAMES),\n", + " \"duration\",\n", "].max()\n", "print(f\"Parallel input rail time: {parallel_input_rail_time:.4f}s\")\n", "print(\n", @@ -4891,6 +4841,45 @@ ")" ] }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [], + "source": [ + "# Check the difference in overall time\n", + "total_sequential_time_s = sequential_df.loc[\n", + " sequential_df[\"is_safe\"] & sequential_df[\"is_rail\"], \"duration\"\n", + "].sum()\n", + "total_parallel_time_s = parallel_df.loc[\n", + " parallel_df[\"is_safe\"] & parallel_df[\"is_rail\"], \"duration\"\n", + "].sum()\n", + "\n", + "parallel_time_saved_s = total_sequential_time_s - total_parallel_time_s\n", + "parallel_time_saved_pct = (100.0 * parallel_time_saved_s) / total_sequential_time_s" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total sequential time: 3.80s\n", + "Total parallel time: 3.54s\n", + "Time saving: 0.26s, (6.87%)\n" + ] + } + ], + "source": [ + "print(f\"Total sequential time: {total_sequential_time_s:.2f}s\")\n", + "print(f\"Total parallel time: {total_parallel_time_s:.2f}s\")\n", + "print(f\"Time saving: {parallel_time_saved_s:.2f}s, ({parallel_time_saved_pct:.2f}%)\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -4899,7 +4888,9 @@ "\n", "# Conclusions\n", "\n", - "In this notebook, you learned how to trace Guardrails requests in both **sequential** and **parallel** modes. By sending a single request for each mode, you were able to trace and compare their latencies. Using the graphing tools, you visualized the latency breakdown into a table, bar chart, and Gantt chart, providing a clear visual comparison of how each mode performed. The Gantt charts for parallel and sequential rails clearly show the benefit of running all three in parallel, rather than sequentially. For the sample configuration and input request run in this notebook snapshot, parallel mode was ~2.5x faster." + "In this notebook, you learned how to trace Guardrails requests in both **sequential** and **parallel** modes. By sending a single request for each mode, you were able to trace and compare their latencies. Using the graphing tools, you visualized the latency breakdown into a table, bar chart, and Gantt chart, providing a clear visual comparison of how each mode performed. The Gantt charts for parallel and sequential rails clearly show the benefit of running all three in parallel, rather than sequentially. \n", + "\n", + "For the sample configuration and input request run in this notebook snapshot, running the input rails in parallel mode was ~2.44x faster, reducing overall latency by 6.86% for this example. " ] } ], diff --git a/docs/getting-started/8-tracing/2_tracing_with_jaeger.ipynb b/docs/getting-started/8-tracing/2_tracing_with_jaeger.ipynb index 1495ab539..0011cc89b 100644 --- a/docs/getting-started/8-tracing/2_tracing_with_jaeger.ipynb +++ b/docs/getting-started/8-tracing/2_tracing_with_jaeger.ipynb @@ -60,7 +60,7 @@ " jaegertracing/all-in-one:1.62.0\n", "```\n", "\n", - "You'll see that the container prints debug messages that end with the following lines. This indicates the Jaeger server is up and ready to accept requests.\n", + "You'll see that the container prints debug messages that end with the following lines. This indicates the Jaeger server is up and ready to accept requests. These can be sent over either gRPC or REST on the corresponding ports listed below.\n", "\n", "```bash\n", "{\"level\":\"info\",\"ts\":1756236324.295533,\"caller\":\"healthcheck/handler.go:118\",\"msg\":\"Health Check state change\",\"status\":\"ready\"}\n", @@ -190,7 +190,7 @@ "metadata": {}, "outputs": [], "source": [ - "CONFIG_MODELS: Dict[str, str] = [\n", + "CONFIG_MODELS: List[Dict[str, str]] = [\n", " {\n", " \"type\": \"main\",\n", " \"engine\": \"nim\",\n", @@ -256,7 +256,7 @@ "source": [ "### Tracing\n", "\n", - "The tracing configuration configures the adapter and any adapter-specific controls. Here we're storing traces in JSONL format. We'll use a different filename depending on whether we have a sequential or parallel workflow." + "The tracing configuration configures the adapter and any adapter-specific controls. Here we're sending metrics over opentelemetry for visualization by another tool." ] }, { @@ -423,9 +423,9 @@ "tracer_provider = TracerProvider(resource=resource)\n", "trace.set_tracer_provider(tracer_provider)\n", "\n", - "# Export traces to the port location matching \n", + "# Export traces to the port location matching\n", "otlp_exporter = OTLPSpanExporter(endpoint=\"http://localhost:4317\", insecure=True)\n", - "tracer_provider.add_span_processor(BatchSpanProcessor(otlp_exporter))\n" + "tracer_provider.add_span_processor(BatchSpanProcessor(otlp_exporter))" ] }, {