-
Notifications
You must be signed in to change notification settings - Fork 5
/
draft-faltstrom-base45.xml
254 lines (252 loc) · 9.29 KB
/
draft-faltstrom-base45.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE RFC SYSTEM "rfc2629.dtd" [
<!ENTITY RFC4648 SYSTEM
"http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4648.xml">
<!ENTITY RFC2119 SYSTEM
"http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
]>
<?xml-stylesheet type='text/xsl' href='RFC2629.xslt' ?>
<?rfc compact="yes"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<!-- Expand crefs and put them inline -->
<?rfc comments='yes' ?>
<?rfc inline='yes' ?>
<rfc docName="draft-faltstrom-base45-04" ipr="trust200902" category="std">
<front>
<title abbrev="Base45">
The Base45 Data Encoding
</title>
<author fullname="Patrik Faltstrom" initials="P." surname="Faltstrom">
<organization abbrev="Netnod">Netnod</organization>
<address>
<email>paf@netnod.se</email>
</address>
</author>
<author fullname="Fredrik Ljunggren" initials="F." surname="Ljunggren">
<organization abbrev="Kirei">Kirei</organization>
<address>
<email>fredrik@kirei.se</email>
</address>
</author>
<author fullname="Dirk-Willem van Gulik" initials="D." surname="van Gulik">
<organization abbrev="Webweaving">Webweaving</organization>
<address>
<email>dirkx@webweaving.org</email>
</address>
</author>
<date month="April" year="2021" day="03"/>
<area>Operations</area>
<keyword>BASE45</keyword>
<abstract>
<t>
This document describes the base 45 encoding scheme which is
built upon the base 64, base 32 and base 16 encoding schemes.
</t>
</abstract>
</front>
<middle>
<section anchor="intro" title="Introduction">
<t>
When using QR or Aztec codes a different encoding scheme is
needed than the already established base 64, base 32 and base
16 encoding schemes that are described in <xref
target="RFC4648">RFC 4648</xref>. The difference from those and
base 45 is the key table and that the padding with '=' is not
required.
</t>
</section>
<section title="Conventions Used in This Document">
<t>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described
in <xref target="RFC2119">RFC 2119</xref>.
</t>
</section>
<section title="Interpretation of Encoded Data">
<t>
Encoded data is to be interpreted as described in <xref
target="RFC4648">RFC 4648</xref> with the exception that a
different alphabet is selected.
</t>
</section>
<section title="The Base 45 Encoding">
<t>
A 45-character subset of US-ASCII is used, the 45 characters
that can be used in a QR or Aztec code. If we look at Base 64,
it encodes 3 bytes in 4 characters. Base 45 encodes 2 bytes
in 3 characters.
</t>
<t>
The two bytes [A, B] are turned into [C, D, E] where (A*256)
+ B = C + (D*45) + (E*45*45). The values C, D and E are then
looked up in Table 1 to produce a three character string and
the reverse when decoding.
</t>
<t>
If the number of octets are not dividable by two, the last
remaining byte is represented by two characters. [A] is turned
into [C, D] where A = C + (D*45).
</t>
<section title="When to use Base45">
<t>
If binary data is to be stored in a QR-Code one possible way
is to use the Alphanumeric encoding that uses 11 bits for 2
characters as defined in section 7.3.4 in <xref
target="ISO18004">ISO/IEC 18004:2015</xref>. The
ECI mode indicator for this encoding is 0010.
</t>
<t>
If the data is to use some other transport a transport
encoding suitable for that transport should be used. It is
not recommended to for example first encode data in Base45
and then encode the Base45 blob in for example Base64 if the
data is to be sent via email. Instead the Base45 encoding
should be removed, and the data itself should be encoded in
Base64.
</t>
</section>
<section title="The alphabet used in Base45">
<t>
The alphanumeric code is defined to use 45 characters as specified
in this alphabet.
</t>
<t>
<figure><artwork>
Table 1: The Base 45 Alphabet
Value Encoding Value Encoding Value Encoding Value Encoding
00 0 12 C 24 O 36 Space
01 1 13 D 25 P 37 $
02 2 14 E 26 Q 38 %
03 3 15 F 27 R 39 *
04 4 16 G 28 S 40 +
05 5 17 H 29 T 41 -
06 6 18 I 30 U 42 .
07 7 19 J 31 V 43 /
08 8 20 K 32 W 44 :
09 9 21 L 33 X
10 A 22 M 34 Y
11 B 23 N 35 Z
</artwork></figure>
</t>
</section>
<section title="Encoding example">
<t>
A series of bytes is turned into groups of two. Each such 16
bit value is turned into a series of three values calculated
by doing successive calculations modulo 45. The values are
in turned looked up in what is displayed in Table 1.
</t>
<t>
It should be noted that although the examples are all text,
Base45 is an encoding for binary data where each octet can
have any value 0-255.
</t>
<t>
Encoding example 1: The string "AB" is the byte sequence [65
66]. The 16 bit value is 65 * 256 + 66 = 16706. 16706 equals
11 + 45 * 11 + 45 * 45 * 8 so the sequence in base 45 is [11
11 8]. By looking up these values in the table we get the
encoded string "BB8".
</t>
<t>
Encoding example 2: The string "Hello!!" is the byte
sequence [72 101 108 108 111 33 33]. If we look at each 16
bit value, it is [18533 27756 28449 33]. Note the 33 for the
last byte. When looking at the values modulo 45, we get [[38
6 9] [36 31 13] [9 2 14] [33 0]] where the last byte is
represented by two. By looking up these values in the table
we get the encoded string "%69 VD92EX0".
</t>
<t>
Encoding example 3: The string "base-45" is the byte
sequence [98 97 115 101 45 52 53]. If we look at each 16 bit
value, it is [25185 29541 11572 53]. Note the 53 for the
last byte. When looking at the values modulo 45, we get [[30
19 12] [21 26 14] [7 32 5] [8 1]] where the last byte is
represented by two. By looking up these values in the table
we get the encoded string "UJCLQE7W581".
</t>
</section>
<section title="Decoding example">
<t>
The series of characters are lookup up in Table 1, and the
indices for the characters, three and three, are interpreted
as a number in base 45. This number is then turned into two
bytes in base 8.
</t>
<t>
Decoding example 1: The string "QED8WEX0" represents when
lookup in Table 1 the values [26 14 13 8 32 14 33 0]. We
look at the numbers in three number sequences (except last)
and get [[26 14 13] [8 32 14] [33 0]]. In base 45 we get
[26981 29798 33] where the bytes are [[105 101] [116 102]
[33]]. If we look at the ascii values we get the string
"ietf!".
</t>
</section>
</section>
<section title="IANA Considerations">
<t>
There are no considerations for IANA in this document.
</t>
</section>
<section title="Security Considerations">
<t>
When implementing encoding and decoding it is important to be
very careful so that buffer overflow does not take place, or
anything similar. This includes of course the calculations of
modulo 45 and lookup in the table of characters. Decoder also
must be robust regarding input, including proper handling of
any byte value 0-255, including the NUL character (ASCII 0).
</t>
<t>
It should be noted that Base 64 (for example) pad the string
so that the encoding has the correct number of
characters. This is something that Base 45 does not do,
i.e. Base 45 do not include padding. Because of this, special
care is to be taken when odd number of octets are to be
encoded which results not in N*3 characters, but (N-1)*3+2
characters in the encoded string and vice versa, when the
number of encoded characters are not divisible by 3.
</t>
<t>
Further that a base45 encoded piece of data includes
non-URL-safe characters so if base45 encoded data have to be
URL safe, one have to use %-encoding.
</t>
</section>
<section title="Acknowledgements">
<t>
The authors thank Alan Barrett, Tomas Harreveld, Christian
Landgren, Anders Lowinger, Jakob Schlyter, Peter Teufl and
Gaby Whitehead for the feedback. Also everyone that have been
working with Base64 during the years that have proven the
implementions are stable.
</t>
</section>
</middle>
<back>
<references title='Normative References'>
&RFC4648;
&RFC2119;
<reference anchor="ISO18004">
<front>
<title>
ISO/IEC 18004:2015 Information technology - Automatic
identification and data capture techniques - QR Code bar
code symbology specification
</title>
<author>
<organization>ISO/IEC JTC 1/SC 31</organization>
</author>
<date month="February" year="2015" />
</front>
<seriesInfo name="ISO/IEC 18004:2015"
value="https://www.iso.org/standard/62021.html" />
</reference>
</references>
</back>
</rfc>