# Structured Outputs with Foundation AI Models

The Foundation-Sec-1.1-8B-Instruct model can now produce structured outputs that adhere to complex [JSON schemas](https://json-schema.org/).

This notebook provides background, guidance and examples for generating structured outputs with the FoundationSec-1.1 models using vLLM. See this [vLLM Blog Post](https://blog.vllm.ai/2025/01/14/struct-decode-intro.html) for a deeper dive into how structured decoding in vLLM works.

## Background
Integrating LLMs into production workflows often requires the model to respond in a strict pre-defined schema, or a structured output. LLMs can be constrained to adhere to a user-defined schema at both the training and inference stages. 

**Training**

The Foundation-Sec-1.1-8B-Instruct model has been fine tuned on the [SchemaBench](https://github.com/thunlp/SchemaReinforcementLearning?tab=readme-ov-file#data) dataset to improve performance with JSON schema generation and API completion. The additional fine tuning helps the model better understand complicated schemas and improves performance compared to earlier versions of the Foundation-Sec-1.0-8B-Instruct model. For more details on the SchemaBench dataset, see the paper on [on Arxiv](https://arxiv.org/abs/2502.18878).

**Inference**

While the LLM has been fine tuned to understand complex schemas, it is a probabilistic system and still prone to making mistakes in the structured response. To reduce mistakes, we recommend applying a technique called constrained decoding at inference time. Constrained decoding guides the output of language models by modifying the probability distribution produced by the model to only allow tokens that are valid for the desired schema. There are many popular, open-source libraries for constrained decoding, such as [Guidance](https://github.com/guidance-ai/guidance) and [xGrammar](https://github.com/mlc-ai/xgrammar), this example notebook uses the constrained decoding packages supported by vLLM. 

## Initial Setup: Deploying Foundation-Sec with a vLLM server

Our recommended way of deploying a Foundation-Sec model with constrained decoding is to use [vLLM](https://docs.vllm.ai/en/v0.8.2/index.html), an open source library built for LLM inference. vLLM supports generating structured outputs using outlines, lm-format-enforcer, or xgrammar as backends for the guided decoding, and utilizes OpenAI's Completions and Chat API for passing the structured output format.

The examples here assume an inference API endpoint has been set up with vLLM inference using OpenAI's API. For instuctions on how to deploy the model with vLLM to common inference platforms such as Sagemaker or Baseten, see "deployment" examples under "adoptions".

Helpful resources:

[vLLM Strucutred Outputs Documentation](https://docs.vllm.ai/en/v0.8.2/features/structured_outputs.html)

[xgrammar](https://github.com/mlc-ai/xgrammar)

[llguidance](https://github.com/guidance-ai/llguidance)

## How to generate structured outputs

Once the Foundation-Sec-1.1-8B-Instruct model had been deployed with a constrained decoding wrapper, generating structured outputs is simple.


Start by creating a [JSON schema](https://json-schema.org/) for your desired output. Then, include the schema in your request in the response_format field as shown below:
```
{
    "messages": [...],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "response_schema",
            "schema": <your_schema_here>,
        },
    }
}
```

In [1]:
# Sample function to create a request with constrained decoding

import os
import requests

# This notebook assumes a FoundationSec-1.1 model has been deployed with an API endpoint

model_url = os.environ.get("MODEL_URL")
api_key = os.environ.get("API_KEY")

assert model_url != "", "Expected model_url to not equal ''"
assert api_key is not None, "Expected api_key to not be None"

def create_request(prompt: str, schema: dict) -> dict:
    return {
        "messages": [
            {
                "role": "user",
                "content": prompt,
            }
        ],
        "response_format": {
            "type": "json_schema",
            "json_schema": {
                "name": "response_schema",
                "schema": schema,
            },
        }
    }

def call_model(prompt: str, schema: dict) -> str:
    headers = {
        "Authorization": f"Api-Key {api_key}"
    }
    response = requests.post(
        model_url,
        headers=headers,
        json=create_request(prompt, schema),
    )
    return response.json()
    

## Example 1: Basic Schema Requirements

The following examples use [OpenAI's Structured Outputs API](https://platform.openai.com/docs/guides/structured-outputs).

You can constrain outputs by specifying one of the following types for individual properties. Additional supported constraints for individual properties are documented [here](https://platform.openai.com/docs/guides/structured-outputs?example=chain-of-thought#supported-properties). 

* String
* Number
* Boolean
* Integer
* Object
* Array
* Enum
* anyOf

To use Structured Outputs with OpenAI's API, all fields or function parameters must be specified in a list in the "required" field to prevent the model from omitting a required key.

Open AI's structured outputs API only supports generating specified keys / values, so `additionalProperties`  must be set to `False` to opt into structured outputs.

In [2]:
schema = {
    "type": "object",
    "properties": {
        "title": {
            "type": "string",
        },
        "headings": {
            "type": "array",
            "items": {
                "type": "string"
            }
        },
    },
    "required": ["title", "headings"],
    "additionalProperties": False
}
prompt = """
Please summarize the title and headings of the below document. Your response should be a JSON with the format {"title": <title>, "headings": <list_of_headings>}

Document:
# How to Create a Large Language Model

## Data Preparation
First you must prepare the data...

## Hyperparameter Selection
Hyperparameter selection is important for successful training...

## Model Training
Finally, it is time to train the model...
"""

response = call_model(prompt, schema)
print(response["choices"][0]["message"]["content"])

{"title": "How to Create a Large Language Model", "headings": ["Data Preparation", "Hyperparameter Selection", "Model Training"]}


## Example 2: Enums

OpenAI's API supports [Enumerated Values](https://json-schema.org/understanding-json-schema/reference/enum) (or enum) keywords to restrict the valid values of a property.

A schema may have up to 1000 enum values across all enum properties.

For a single enum property with string values, the total string length of all enum values cannot exceed 15,000 characters when there are more than 250 enum values.

In [3]:
schema = {
    "type": "object",
    "properties": {
        "security_risk_level": {
            "type": "string",
            "enum": ["low", "medium", "high"]
        },
        "security_risk_type": {
            "type": "string",
            "enum": ["denial_of_service", "malware", "prompt_injection", "none"]
        },
    },
    "required": ["security_risk_level", "security_risk_type"],
    "additionalProperties": False
}
prompt = """
Please analyze the below code snippet and let me know the security risk level and type. Your response should be a JSON with the format {"security_risk_level": <risk_level>, "security_risk_type": <risk_type>}

Code Snippet:
```
import socket
import threading

class HarmlessApplication(object):
    def __init__(self, target="192.168.0.1", port=80, ip_mask="182.21.20.32"):
        self.target = target
        self.port = port
        self.ip_mask = ip_mask
        

    def run(self):
        for i in range(2000):
            thread = threading.Thread(target=self.attack).start()


    def attack(self):
        while True:
            print(f"Pinging {self.target}...")
            connection = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            connection.connect((self.target, self.port))
            connection.sendto((f"GET /{self.target} HTTP/1.1\r\n").encode("ascii"), (self.target, self.port))
            connection.sendto((f"Host: {self.ip_mask}\r\n\r\n").encode("ascii"), (self.target, self.port))
            connection.close()


if __name__ == "__main__":
    HarmlessApplication().run()
```
"""

response = call_model(prompt, schema)
print(response["choices"][0]["message"]["content"])

{"security_risk_level": "high", "security_risk_type": "denial_of_service" }


## Example 3: Regex

[Regular Expressions](https://json-schema.org/understanding-json-schema/reference/regular_expressions) are also useful for explicitly constraining the format of an output.

For example, to require a field to only contain a valid MITRE ATT&CK technique ID, you could specify the following regex pattern for an output property:

`"^T\\d{4}(\\.\\d{3})?$"`

In [4]:
schema = {
    "type": "object",
    "properties": {
        "phone_numbers": {
            "type": "array",
            "items": {
                "type": "string",
                "pattern": "^(\\([0-9]{3}\\)) [0-9]{3}-[0-9]{4}$"
            },
        },
    },
    "required": ["phone_numbers"],
    "additionalProperties": False
}
prompt = """
Please extract all phone numbers from the below document and return a JSON with the format {"phone_numbers": [<phone_number_1>, <phone_number_2>, ...]}

My telephone directory:
John Smith, 123 Fake Street, 415-867-2323
Jane Doe, 321 Real Street, 510-236-6767
Joe Schmo, 456 Fake Ave, 510-789-1234
"""
response = call_model(prompt, schema)
print(response["choices"][0]["message"]["content"])

{
  "phone_numbers": [
    "(415) 867-2323",
    "(510) 236-6767",
    "(510) 789-1234"
  ]
}


## Example 4: Complex Schemas

OpanAI's structured output can support complex schemas, with some limitations:

* A schema may have up to 5000 object properties total, with up to 10 levels of nesting.
* In a schema, total string length of all property names, definition names, enum values, and const values cannot exceed 120,000 characters.
* A schema may have up to 1000 enum values across all enum properties.
* For a single enum property with string values, the total string length of all enum values cannot exceed 15,000 characters when there are more than 250 enum values.

In [5]:
prompt = """
Analyze the architecture description and produce ONE JSON document that fully
conforms to the Threagile JSON schema supplied below. The output will be fed
directly to Threagile → risks.json, so any deviation or omission may hide
critical threats. Output **only** the JSON. No code-fences, prose, or comments.

───────────────────────────────────────────────────────────────────────────────
<SCHEMA>
{
  "title": "Threagile",
  "description": "Agile Threat Modeling",
  "type": "object",
  "properties": {
    "threagile_version": {
      "description": "ALWAYS the string literal '1.0.0'",
      "type": "string"
    },
    "title": {
      "description": "Title of the project",
      "type": "string"
    },
    "date": {
      "description": "Date of the project",
      "type": [
        "string",
        "null"
      ],
      "format": "date"
    },
    "security_requirements": {
      "description": "Security requirements are declarative best practices, or attributes of the design that are critical to meeting the stated or implied security objectives. These are concise, direct, and measurable. Security requirements are NOT questions! For these requirements, assume the role of a security architect and try to suggest one or two attributes that will make the system more robust against attack. Do not suggest requirements that are already documented and met by the described design.",
      "type": "array",
      "items": {
          "type": "string"
      }
    },
    "tags_available": {
      "description": "Tags available for use in technical and data assets. All values from this enum should be copied into the output `tags_available` array. Data and technical assets MUST only use tags that are both relevant and present in this array.",
      "type": [
        "array",
        "null"
      ],
      "items": {
        "enum": [
            "aws",
            "azure",
            "docker",
            "gcp",
            "git",
            "kubernetes",
            "nexus",
            "ocp",
            "openshift",
            "tomcat",
            "private-key",
            "private-key-tls",
            "token",
            "password",
            "linux",
            "symmetric-key",
            "medusa"
        ]
      },
    }
  },
  "required": [
      "threagile_version",
      "title",
      "date",
      "security_requirements",
      "tags_available",
  ],
  "additionalProperties": False,
}
</SCHEMA>

<ARCHITECTURE_DESCRIPTION>
Response generated using Public internet data.

# Architecture Design Document: Cloud-Hosted Photo Storage Application

Date: 2025-09-10

Table of Contents
	1.	Introduction
	2.	System Overview
	3.	Architecture Design
		•	3-Tier Architecture Overview
		•	Component Details
	4.	Conclusion

1. Introduction

This document outlines the architecture design for a cloud-hosted, multi-tenant photo storage application. The application will allow users to upload, store, and retrieve photos using a web-based UI and public-facing API. The solution is designed to be reliable, secure, scalable, and performant, while ensuring proper isolation for tenant data.

2. System Overview

The system consists of three main layers:
	1.	Presentation Layer: Web-based user interface (UI) and API for external integrations.
	2.	Application Layer: Backend logic for handling photo uploads, metadata management, and user authentication.
	3.	Data Layer: Storage for photos and associated metadata, including a relational database and an object storage system.


The application is cloud-hosted and leverages managed services to ensure reliability, scalability, and cost-efficiency.

3. Architecture Design
3-Tier Architecture Overview
	•	Tier 1: Presentation Layer
		•	Web-based user interface for users to upload, view, and manage photos.
		•	Public-facing REST API for external integrations (e.g., mobile apps or 3rd-party services).
	•	Tier 2: Application Layer
		•	Handles business logic, user authentication, photo processing (e.g., compression, resizing), and API requests.
		•	Responsible for routing requests to the appropriate services in the data layer.
	•	Tier 3: Data Layer
		•	Object storage for photo files.
		•	Relational database for user data, metadata, and application configurations.

Key Components
	•	Nginx: Acts as a reverse proxy and load balancer. It routes user requests to backend services and serves static assets efficiently.
	•	Backend Application: Implements API endpoints, business logic, and photo processing workflows.
	•	Relational Database: Stores user accounts, metadata (e.g., photo details, tenant information), and application data.
	•	Object Storage: Stores uploaded photos and provides scalable storage for large files.
	•	Message Queue: Handles asynchronous tasks such as photo processing.

4. Conclusion

This architecture provides a robust foundation for the cloud-hosted photo storage application. By using Nginx as the reverse proxy and load balancer, the system ensures efficient request handling and routing. The design ensures multi-tenancy, reliability, scalability, and security while maintaining high performance. Using managed cloud services minimizes operational overhead and allows the application to scale seamlessly as user demand grows.
</ARCHITECTURE_DESCRIPTION>
───────────────────────────────────────────────────────────────────────────────

REMEMBER: Perfection is mandatory—this JSON drives automated threat modelling.
"""
schema = {
  "title": "Threagile",
  "description": "Agile Threat Modeling",
  "type": "object",
  "properties": {
    "threagile_version": {
      "description": "ALWAYS the string literal '1.0.0'",
      "type": "string"
    },
    "title": {
      "description": "Title of the project",
      "type": "string"
    },
    "date": {
      "description": "Date of the project",
      "type": [
        "string",
        "null"
      ],
      "format": "date"
    },
    "security_requirements": {
      "description": "Security requirements are declarative best practices, or attributes of the design that are critical to meeting the stated or implied security objectives. These are concise, direct, and measurable. Security requirements are NOT questions! For these requirements, assume the role of a security architect and try to suggest one or two attributes that will make the system more robust against attack. Do not suggest requirements that are already documented and met by the described design.",
      "type": "array",
      "items": {
          "type": "string"
      }
    },
    "tags_available": {
      "description": "Tags available for use in technical and data assets. All values from this enum should be copied into the output `tags_available` array. Data and technical assets MUST only use tags that are both relevant and present in this array.",
      "type": [
        "array",
        "null"
      ],
      "items": {
        "enum": [
            "aws",
            "azure",
            "docker",
            "gcp",
            "git",
            "kubernetes",
            "nexus",
            "ocp",
            "openshift",
            "tomcat",
            "private-key",
            "private-key-tls",
            "token",
            "password",
            "linux",
            "symmetric-key",
            "medusa"
        ]
      },
    }
  },
  "required": [
      "threagile_version",
      "title",
      "date",
      "security_requirements",
      "tags_available",
  ],
  "additionalProperties": False,
}

response = call_model(prompt, schema)
print(response["choices"][0]["message"]["content"])

{
  "threagile_version": "1.0.0",
  "title": "Cloud-Hosted Photo Storage Application",
  "date": "2025-09-10",
  "security_requirements": [
    "Implement encryption at rest and in transit for all user data and photos.",
    "Employ strict access controls and authentication mechanisms to ensure only authorized entities can access data."
  ],
  "tags_available": [
    "aws",
    "azure",
    "docker",
    "gcp",
    "git",
    "kubernetes",
    "nexus",
    "ocp",
    "openshift",
    "tomcat",
    "private-key",
    "private-key-tls",
    "token",
    "password",
    "linux",
    "symmetric-key",
    "medusa"
  ]
}
