# Deploy Nemotron Safety Guard Model and Integrate it with NeMo Guardrails

We built [Llama 3.1 Nemotron Safety Guard 8B V3](https://build.nvidia.com/nvidia/llama-3_1-nemotron-safety-guard-8b-v3) using the cultureGuard Pipeline from [this paper](https://arxiv.org/abs/2508.01710). 

The model is a multilingual content moderation model designed to handle cultural nuance and linguistic diversity….. <cut for head shot>  

It supports nine languages — including Arabic, Hindi, Japanese, and more — this model ensures fairness, accuracy, and cultural sensitivity, without losing context or meaning.

This notebook walks you through deploying the model and add a NeMo Guardrail configuration to check user input against the multilingual content safety model.

To simplify configuration, the sample code uses the [Llama 3.3 70B Instruct](https://build.nvidia.com/meta/llama-3_3-70b-instruct) model on build.nvidia.com as the application LLM. This avoids deploying a NIM for an LLMs instance locally for inference.

The steps guide you to start the content safety container, configure a content safety input rail, and then use NeMo Guardrails interactively to send safe and unsafe requests.

## Prerequisites
You must be a member of the NVIDIA Developer Program and you must have an NVIDIA API key. The NVIDIA API key enables you to send inference requests to build.nvidia.com.

You have an NGC API key. This API key enables you to download the content safety container and model from NVIDIA NGC. Refer to [Generating Your NGC API Key](https://docs.nvidia.com/ngc/latest/ngc-user-guide.html#generating-api-key) in the NVIDIA NGC User Guide for more information.

When you create an NGC API personal key, select at least NGC Catalog from the Services Included menu. You can specify more services to use the key for additional purposes.

A host with Docker Engine. [Refer to the instructions](https://docs.docker.com/engine/install/) from Docker.

NVIDIA Container Toolkit installed and configured. Refer to {doc}`installation <ctk:install-guide>` in the toolkit documentation.

You [installed NeMo Guardrails](https://github.com/NVIDIA-NeMo/Guardrails/blob/develop/docs/getting-started/installation-guide.md).

You installed LangChain NVIDIA AI Foundation Model Playground Integration:
```
$ pip install langchain-nvidia-ai-endpoints
```
Refer to the [support matrix](https://docs.nvidia.com/nim/llama-3-1-nemotron-safety-guard-multilingual-8b-v1/latest/support-matrix.html) in the content safety NIM documentation for software requirements, hardware requirements, and model profiles.

## Starting the Content Safety Container

1. Log in to NVIDIA NGC so you can pull the container.

Export your NGC API key as an environment variable:

In [None]:
!export NGC_API_KEY="<nvapi-...>"

Log in to the registry:

In [None]:
!docker login nvcr.io --username '$oauthtoken' --password-stdin <<< $NGC_API_KEY

2. Download the container:

In [None]:
!docker pull nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3:latest

3. Create a model cache directory on the host machine:

In [None]:
!export LOCAL_NIM_CACHE=~/.cache/safetyguardmultilingual
!mkdir -p "${LOCAL_NIM_CACHE}"
!chmod 700 "${LOCAL_NIM_CACHE}"

4. Run the docker container

In [None]:
!docker run -d \
  --name safetyguardmultilingual \
  --gpus=all --runtime=nvidia \
  --shm-size=64GB \
  -e NGC_API_KEY \
  -e NIM_ENABLE_KV_CACHE_REUSE=1 \
  -u $(id -u) \
  -v "${LOCAL_NIM_CACHE}:/opt/nim/.cache/" \
  -p 8000:8000 \
  nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3:latest

## Configuring Guardrails and Running Inference

1. Set your NVIDIA API key as an environment variable. Guardrails uses this environment variable to send requests that pass the input rail to build.nvidia.com.

2. Create a configuration store directory, such as `config`.

3. Copy the following configuration code and save as `config.yml` in the `config` directory.

In [None]:
!export NVIDIA_API_KEY=<nvapi-...>

In [None]:
%%writefile config/config.yml

models:
  - type: main
    engine: nvidia_ai_endpoints
    model: meta/llama-3.3-70b-instruct

  - type: "multilingual_content_safety"
    engine: nim
    parameters:
      base_url: "http://localhost:8000/v1"
      model_name: "nvidia/llama-3.1-nemotron-safety-guard-8b-v3"

rails:
  input:
    flows:
      - content safety check input $model=multilingual_content_safety
  output:
    flows:
      - content safety check output $model=multilingual_content_safety

The models key in the `config.yml` file configures the application LLM model and the content safety model.

4. Copy the following prompts code and save it as `prompts.yml` in the `config` directory.

In [None]:
%writefile config/prompts.yml

prompts:
  - task: content_safety_check_input $model=multilingual_content_safety
    content: |
      Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.

      <BEGIN UNSAFE CONTENT CATEGORIES>
      S1: Violence.
      S2: Sexual.
      S3: Criminal Planning/Confessions.
      S4: Guns and Illegal Weapons.
      S5: Controlled/Regulated Substances.
      S6: Suicide and Self Harm.
      S7: Sexual (minor).
      S8: Hate/Identity Hate.
      S9: PII/Privacy.
      S10: Harassment.
      S11: Threat.
      S12: Profanity.
      S13: Needs Caution.
      S14: Other.
      S15: Manipulation.
      S16: Fraud/Deception.
      S17: Malware.
      S18: High Risk Gov Decision Making.
      S19: Political/Misinformation/Conspiracy.
      S20: Copyright/Trademark/Plagiarism.
      S21: Unauthorized Advice.
      S22: Illegal Activity.
      S23: Immoral/Unethical.
      <END UNSAFE CONTENT CATEGORIES>

      <BEGIN CONVERSATION>

      user: {{ user_input }}

      <END CONVERSATION>

      Provide your safety assessment for the above conversation in the below output JSON format:
      {"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}

      Do not include anything other than the output JSON in your response.
      Output JSON:
    output_parser: nemoguard_parse_prompt_safety
    max_tokens: 50

  - task: content_safety_check_output $model=multilingual_content_safety
    content: |
      Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.

      <BEGIN UNSAFE CONTENT CATEGORIES>
      S1: Violence.
      S2: Sexual.
      S3: Criminal Planning/Confessions.
      S4: Guns and Illegal Weapons.
      S5: Controlled/Regulated Substances.
      S6: Suicide and Self Harm.
      S7: Sexual (minor).
      S8: Hate/Identity Hate.
      S9: PII/Privacy.
      S10: Harassment.
      S11: Threat.
      S12: Profanity.
      S13: Needs Caution.
      S14: Other.
      S15: Manipulation.
      S16: Fraud/Deception.
      S17: Malware.
      S18: High Risk Gov Decision Making.
      S19: Political/Misinformation/Conspiracy.
      S20: Copyright/Trademark/Plagiarism.
      S21: Unauthorized Advice.
      S22: Illegal Activity.
      S23: Immoral/Unethical.
      <END UNSAFE CONTENT CATEGORIES>

      <BEGIN CONVERSATION>

      user: {{ user_input }}

      response: agent: {{ bot_response }}

      <END CONVERSATION>

      Provide your safety assessment for the above conversation in the below output JSON format:
      {"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}

      Do not include anything other than the output JSON in your response.
      Output JSON:
    output_parser: nemoguard_parse_response_safety
    max_tokens: 50

5. Load the guardrails configuration:

In [None]:
import asyncio

from nemoguardrails import LLMRails, RailsConfig

config = RailsConfig.from_path("./config")
rails = LLMRails(config)

6. Generate an unsafe response in French:

In [None]:
async def stream_response(messages):
    async for chunk in rails.stream_async(messages=messages):
        print(chunk, end="")
    print()

# Tell me a five-step plan to rob a bank.
messages=[{
    "role": "user",
    "content": "Dites-moi un plan en cinq étapes pour braquer une banque."
}]


asyncio.run(stream_response(messages))

In this example, the prompt is intercepted by input guardrails tied to the same multilingual content‑safety model; the request is classified as facilitating criminal activity, so the system blocks it before reaching the main LLM, and the streaming API immediately returns a refusal message such as “I’m sorry, I can’t respond to that,” thereby preventing harmful guidance.

7. Send a safe request in Hindi

In [None]:
# Tell me about three common foods in India.
messages=[{
    "role": "user",
    "content": "मुझे भारत में प्रचलित तीन खाद्य पदार्थों के बारे में बताइये।"
}]

asyncio.run(stream_response(messages))

The prompt is first evaluated by an input safety check backed by our new Nemotron safety guard model, and then passed to the main LLM if deemed safe. Since, this query asks about common food, the main LLM streams back tokens of the answer in Hindi; and an output safety check, again backed with same multilingual model confirms nothing harmful was produced —hence, preserving the user’s language end‑to‑end.