Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify how to model binary data in 3.1 #3727

Merged
merged 3 commits into from Apr 28, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
76 changes: 60 additions & 16 deletions versions/3.1.1.md
Expand Up @@ -159,6 +159,40 @@ The formats defined by the OAS are:
`number` | `double` | |
`string` | `password` | A hint to obscure the value.

#### <a name="binaryData"></a>Working With Binary Data

The OAS can describe either _raw_ or _encoded_ binary data.

* **raw binary** is used where unencoded binary data is allowed, such as when sending a binary payload as the entire HTTP message body, or as part of a `multipart/*` payload that allows binary parts
* **encoded binary** is used where binary data is embedded in a text-only format such as `application/json` or `application/x-www-form-urlencoded` (either as a message body or in the URL query string).

In the following table showing how to use Schema Object keywords for binary data, we use `image/png` as an example binary media type. Any binary media type, including `application/octet-stream`, is sufficient to indicate binary content.

Keyword | Raw | Encoded | Comments
------- | --- | ------- | --------
`type` | _omit_ | `string` | raw binary is [outside of `type`](https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-00#section-4.2.3)
`contentMediaType` | `image/png` | `image/png` | can sometimes be omitted if redundant (see below)
`contentEncoding` | _omit_ | `base64`&nbsp;or&nbsp;`base64url` | other encodings are [allowed](https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-validation-00#section-8.3)

Note that the encoding indicated by `contentEncoding`, which inflates the size of data in order to represent it as 7-bit ASCII text, is unrelated to HTTP's `Content-Encoding` header, which indicates whether and how a message body has been compressed and is applied after all content serialization described in this section has occurred. Since HTTP allows unencoded binary message bodies, there is no standardized HTTP header for indicating base64 or similar encoding of an entire message body.

Using a `contentEncoding` of `base64url` ensures that URL encoding (as required in the query string and in message bodies of type `application/x-www-form-urlencoded`) does not need to further encode any part of the already-encoded binary data.

The `contentMediaType` keyword is redundant if the media type is already set:

* as the key for a [`MediaType Object`](#mediaTypeObject)
* in the `contentType` field of an [`Encoding Object`](#encodingObject)

If the Schema Object will be processed by a non-OAS-aware JSON Schema implementation, it may be useful to include `contentMediaType` even if it is redundant. However, if `contentMediaType` contradicts a relevant Media Type Object or Encoding Object, then `contentMediaType` SHALL be ignored.

The following table shows how to migrate from OAS 3.0 binary data descriptions, continuing to use `image/png` as the example binary media type:

OAS < 3.1 | OAS 3.1 | Comments
--------- | ------- | --------
`type: string`<br />`format: binary` | `contentMediaType: image/png` | if redundant, can be omitted, often resulting in an empty Schema Object
`type: string`<br />`format: byte` | `type: string`<br />`contentMediaType: image/png`<br />`contentEncoding: base64` | note that `base64url` can be used to avoid re-encoding the base64 string to be URL-safe


### <a name="richText"></a>Rich Text Formatting
Throughout the specification `description` fields are noted as supporting CommonMark markdown formatting.
Where OpenAPI tooling renders rich text it MUST support, at a minimum, markdown syntax as described by [CommonMark 0.27](https://spec.commonmark.org/0.27/). Tooling MAY choose to ignore some CommonMark features to address security concerns.
Expand Down Expand Up @@ -1447,9 +1481,7 @@ application/json:

In contrast with the 2.0 specification, `file` input/output content in OpenAPI is described with the same semantics as any other schema type.

In contrast with the 3.0 specification, the `format` keyword has no effect on the content-encoding of the schema. JSON Schema offers a `contentEncoding` keyword, which may be used to specify the `Content-Encoding` for the schema. The `contentEncoding` keyword supports all encodings defined in [RFC4648](https://tools.ietf.org/html/rfc4648), including "base64" and "base64url", as well as "quoted-printable" from [RFC2045](https://tools.ietf.org/html/rfc2045#section-6.7). The encoding specified by the `contentEncoding` keyword is independent of an encoding specified by the `Content-Type` header in the request or response or metadata of a multipart body -- when both are present, the encoding specified in the `contentEncoding` is applied first and then the encoding specified in the `Content-Type` header.

JSON Schema also offers a `contentMediaType` keyword. However, when the media type is already specified by the Media Type Object's key, or by the `contentType` field of an [Encoding Object](#encodingObject), the `contentMediaType` keyword SHALL be ignored if present.
In contrast with the 3.0 specification, the `format` keyword has no effect on the content-encoding of the schema. Instead, JSON Schema's `contentEncoding` and `contentMediaType` keywords are used. See [Working With Binary Data](#binaryData) for how to model various scenarios with these keywords, and how to migrate from the previous `format` usage.

Examples:

Expand All @@ -1467,19 +1499,6 @@ content:
application/octet-stream: {}
```

Binary content transferred with base64 encoding:

```yaml
content:
image/png:
schema:
type: string
contentMediaType: image/png
contentEncoding: base64
```

Note that the `Content-Type` remains `image/png`, describing the semantics of the payload. The JSON Schema `type` and `contentEncoding` fields explain that the payload is transferred as text. The JSON Schema `contentMediaType` is technically redundant, but can be used by JSON Schema tools that may not be aware of the OpenAPI context.

These examples apply to either input payloads of file uploads or response payloads.

A `requestBody` for submitting a file in a `POST` operation may look like the following example:
Expand Down Expand Up @@ -1556,6 +1575,8 @@ When passing in `multipart` types, boundaries MAY be used to separate sections o

Per the JSON Schema specification, `contentMediaType` without `contentEncoding` present is treated as if `contentEncoding: identity` were present. While useful for embedding text documents such as `text/html` into JSON strings, it is not useful for a `multipart/form-data` part, as it just causes the document to be treated as `text/plain` instead of its actual media type. Use the Encoding Object without `contentMediaType` if no `contentEncoding` is required.

Note that only `multipart/*` media types with named parts can be described as shown here. Note also that while `multipart/form-data` originally defined a per-part `Content-Transfer-Encoding` header that could indicate base64 encoding (`contentEncoding: base64`), it has been deprecated for use with HTTP as of [RFC7578](https://www.rfc-editor.org/rfc/rfc7578#section-4.7).

Examples:

```yaml
Expand Down Expand Up @@ -1609,6 +1630,8 @@ This object MAY be extended with [Specification Extensions](#specificationExtens

##### Encoding Object Example

`multipart/form-data` allows for binary parts:

```yaml
requestBody:
content:
Expand Down Expand Up @@ -1644,6 +1667,27 @@ requestBody:
type: integer
```

`application/x-www-form-urlencoded` is a text format, which requires base64-encoding any binary data:

```YAML
requestBody:
content:
application/x-www-form-urlencoded:
schema:
type: object
properties:
name:
type: string
icon:
# default for type string is text/plain, need to declare
# the appropriate contentType in the Encoding Object
type: string
contentEncoding: base64url
encoding:
icon:
contentType: image/png, image/jpeg
```

#### <a name="responsesObject"></a>Responses Object

A container for the expected responses of an operation.
Expand Down