Skip to content

Commit 5a2536e

Browse files
Move over data formats (#76)
1 parent db063a8 commit 5a2536e

File tree

1 file changed

+188
-0
lines changed

1 file changed

+188
-0
lines changed

src/7-new-content/4-data-formats.adoc

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
= Formatting API data
2+
3+
A request body (and a response body) will have a `Content-Type`, and that
4+
content type will tell tools how the data is formatted so it can be converted
5+
into something meaningful in whichever programming language is being used.
6+
7+
== JSON: The Modern Standard
8+
9+
JSON has become the de facto standard for API requests because it:
10+
11+
* Supports native data types (numbers, booleans, null)
12+
* Allows nested structures
13+
* Is human-readable
14+
* Has excellent tooling support
15+
16+
Example of complex JSON request:
17+
18+
{lang=json}
19+
~~~~~~~~
20+
{
21+
"place": {
22+
"name": "Central Park",
23+
"location": {
24+
"lat": 40.785091,
25+
"lon": -73.968285
26+
},
27+
"features": ["park", "landmark"],
28+
"isAccessible": true,
29+
"capacity": null
30+
}
31+
}
32+
~~~~~~~~
33+
34+
== XML: Ye Oldé Standard
35+
36+
Any modern API you interact with will support JSON. Occasionally they will
37+
support XML as well.
38+
39+
XML is relict of the early internet. It dominated web APIs in the 2000s with
40+
standards like SOAP and XML-RPC, but was largely displaced by JSON in the 2010s
41+
due to JSON's simplicity and natural fit with JavaScript. Today, XML persists
42+
mainly in legacy systems, enterprise SOAP services, and specific domains like
43+
publishing (DocBook), feed syndication (RSS/Atom), and configuration files
44+
(Maven, Android manifests).
45+
46+
JSON is a lot easier to work with than XML, and it is a lot easier to read. It is
47+
also more compact, which is important when you are sending data over the wire.
48+
49+
An example of a bunch of different data types in JSON.
50+
51+
{lang=json}
52+
~~~~~~~~
53+
{
54+
"place": {
55+
"id": 1,
56+
"name": "This is a bunch of text.",
57+
"is_true": false,
58+
"maybe": null,
59+
"empty_string": ""
60+
}
61+
}
62+
~~~~~~~~
63+
64+
{lang=xml}
65+
~~~~~~~~
66+
<places>
67+
<place>
68+
<id>1</id>,
69+
<name>This is a bunch of text.</name>
70+
<is_true>0</is_true>
71+
<maybe />
72+
<empty_string />
73+
</place>
74+
</places>
75+
~~~~~~~~
76+
77+
Basically, in XML, _everything_ is considered a string, meaning integers,
78+
booleans, and nulls can be confused. Both `maybe` and `empty_string` have the
79+
same value, because there is no way to denote a `null` value either. Gross.
80+
81+
== Form Data: Legacy Format
82+
83+
Form Data uses the `application/x-www-form-urlencoded` mime type, and is helpful
84+
when accepting web forms from a browser using the `<form>` HTML tag. This was very popular decades ago,
85+
but with modern web applications using more single-page applications (SPAs) and mobile apps to speak JSON
86+
natively, it is something most people just don't bother with anymore.
87+
88+
It's not just that it's old, it's cumbersome to work with, and suffers from a
89+
lack of data types like XML but with even more awkward syntax.
90+
91+
Everything is a string. To handle a boolean a client has to send `1` or `0`,
92+
which will be read as `"1"` or `"0"`. You could send `property=true` but that is
93+
a literal `"true"` string on the server.
94+
95+
{lang=http}
96+
~~~~~~~~
97+
POST /checkins HTTP/1.1
98+
Host: api.example.org
99+
Content-Type: application/x-www-form-urlencoded
100+
101+
place_id=1&message=This%20is%20a%20bunch%20of%20text.&with_friends[]=1&with_friends[]=2&with_friends[]=3
102+
~~~~~~~~
103+
104+
This is a bit of a mess, as the message needs to be "URL encoded" and the
105+
`with_friends` is an array with awkward syntax. On top of that it's not clear
106+
what the data types are. It is also a bit of a pain to work with on the
107+
server-side, as you have to parse the string, split it up, and then convert it
108+
to the correct data types.
109+
110+
For comparison, the same request in JSON is a lot easier to create and work with.
111+
112+
{lang=http}
113+
~~~~~~~~
114+
POST /checkins HTTP/1.1
115+
Host: api.example.org
116+
Content-Type: application/json
117+
118+
{
119+
"place_id": 1,
120+
"message": "This is a bunch of text.",
121+
"with_friends": [1, 2, 3]
122+
}
123+
~~~~~~~~
124+
125+
This is a JSON object, and it is easy to see what is going on. The `place_id` is
126+
an integer, the `message` is a string, and `with_friends` is an array of
127+
integers.
128+
129+
== Multipart Form Data: An Occasionally Helpful Nightmare
130+
131+
Multipart forms are a way to send data in multiple parts as a single HTTP request, often used in REST APIs for handling mixed types of data, such as JSON and binary files (e.g., images or documents). Unlike standard form submission, where data is encoded as application/x-www-form-urlencoded, multipart forms use the multipart/form-data encoding, which allows for the inclusion of both text and file content in the same request.
132+
133+
This is particularly useful for endpoints that need to process metadata (e.g., JSON) alongside uploaded files. Each part of the form is separated by a boundary string and includes headers that describe the content type and disposition of the part.
134+
135+
{lang=http}
136+
~~~~~~~~
137+
POST /checkins HTTP/1.1
138+
Host: api.example.org
139+
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW
140+
141+
------WebKitFormBoundary7MA4YWxkTrZu0gW
142+
Content-Disposition: form-data; name="metadata"
143+
Content-Type: application/json
144+
145+
{
146+
"place_id": 1,
147+
"message": "This is a bunch of text.",
148+
"with_friends": [1, 2, 3]
149+
}
150+
------WebKitFormBoundary7MA4YWxkTrZu0gW
151+
Content-Disposition: form-data; name="file"; filename="example.jpg"
152+
Content-Type: image/jpeg
153+
154+
[Binary data of the image file]
155+
------WebKitFormBoundary7MA4YWxkTrZu0gW--
156+
~~~~~~~~
157+
158+
This is either confusing or brilliant depending on how you're looking at it, but it's generally a massive pain to work with.
159+
160+
== Best Practices
161+
162+
=== 1. Use JSON unless you absolutely can't
163+
164+
Work out which content type (or types) you actually need, and _stick to that_.
165+
95% of the time, that's JSON.
166+
167+
Some want to add CSV or HTML "just in case", and others want to add all the fun
168+
new formats like BSON or MessagePack because they're "quicker" (without doing
169+
basic optimizations on their code/database which would likely yield more
170+
meaningful performance gains). That might be a bit of fun, but it's all adding a
171+
maintenance burden and expecting too much of your clients.
172+
173+
Start with JSON and wait for a big client to ask for a specific format, then
174+
weigh it up against the cost of supporting it.
175+
176+
=== 2. Avoid Multipart forms
177+
178+
There are a few reasons to avoid this. It's hard to document, weird to handle
179+
partial errors, and generally confuses beginners trying to work with an API. An
180+
SDK can hide some of the complexity, but that won't solve the awkward race
181+
conditions that pop up when you create something from the first "part", then the
182+
second or third part fails, rolling back database transactions after emails have
183+
already gone out.
184+
185+
Designing an API for the least experienced user is not necessarily the goal, but
186+
making things unnecessarily complex isn't the plan either, so stick with "one
187+
endpoint does one thing" and we can learn more about how to handle file uploads
188+
and similar later.

0 commit comments

Comments
 (0)