Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Merge pull request #37 from jeremy-w/pstring

Pstring draft changes
  • Loading branch information...
commit 5c10e65da02e38428e3bf749bbece89baf93f091 2 parents 3f5ea40 + 17a0e97
Melissa 0xabad1dea authored
169 LIB/Draft-String-Format.xml
View
@@ -21,13 +21,13 @@
<date month="April" year="2012"/>
<area>Lib</area>
- <workgroup>0x10c Standards Commitee</workgroup>
+ <workgroup>0x10c Standards Committee</workgroup>
<keyword>String</keyword>
<keyword>String Format</keyword>
<keyword>Library Specifications</keyword>
<abstract>
<t>This document presents a general format for strings in
- libraries intended to be shared accross programs and with
+ libraries intended to be shared across programs and with
other users.</t>
</abstract>
</front>
@@ -37,60 +37,89 @@
<t>Shared Libraries depend on certain formats being given.
Strings, being a common argument type for cross-library
calls, should therefore be standardized. Length-prefixed
- strings are to be used for security and efficency reasons.</t>
+ strings are to be used for security and efficiency reasons.</t>
- <section anchor="intro-lang" title="Requirements Language">
+ <section anchor="intro-terms" title="Terminology">
+ <t>
+ <list style="hanging">
+ <t hangText="word">
+ <vspace />
+ 16 bits.
+ This is the smallest unit addressable by the DCPU.
+ </t>
+ <t hangText="character">
+ <vspace />
+ A group of bits representing a glyph or control sequence. A
+ control sequence is a non-printing character that
+ influences the text.
+ </t>
+ <t hangText="string">
+ <vspace />
+ A sequence of characters.
+ The length of the sequence can vary at runtime.
+ </t>
+ <t hangText="P-string">
+ <vspace />
+ A string whose length in words is indicated by prefixing
+ a word containing that length to the character string.
+ The prefix word itself MUST NOT be included when
+ determining the length.
+ </t>
+ <t hangText="C-string">
+ <vspace />
+ A string whose length is indicated by suffixing a NUL
+ character to the character string. The first occurrence of
+ such NUL character terminates the string. The NUL character
+ is the character whose bits are all zero.
+ </t>
+ <t hangText="library">
+ <vspace />
+ A group of functions used by different programs.
+ </t>
+ <t hangText="program">
+ <vspace />
+ A sequence of machine instructions for the DCPU.
+ </t>
+ <t hangText="user">
+ <vspace />
+ Any person using or creating a program.
+ </t>
+ </list>
+ </t>
+ </section>
+ <section anchor="intro-lang" title="Notational Conventions">
<t>The key words "MUST", "MUST NOT", "REQUIRED",
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "MAY", and "OPTIONAL" in this document
are to be interpreted as described in <xref
target="RFC2119">RFC 2119</xref>.</t>
</section>
- <section anchor="intro-terms" title="Used Terms">
- <t>
- <list style="symbols">
- <t>"String" or "string" refers to a sequence of
- characters of variable length.</t>
- <t>"P-String", "pstring" or "length-prefixed string"
- refers to a 8-bit value representing the length of the
- following string, followed by n characters of length n,
- where n is the value of the length-prefix.</t>
- <t>"C-String", "cstring" or "null-terminated string"
- refers to a sequence of characters that may not include
- the character with the value 0, followed by the
- character with value 0 (also known as "null-terminator"
- or "null character")</t>
- <t>"Character" or "character" refers to a group of
- bits representing a letter, a number, a punctuation
- mark, or a control character. A control character is a
- non-printable character that influences the text.</t>
- <t>"Word" or "word" refers to a group of 16 bits
- representing a value. A word is the smallest addressable
- space in the DCPU.</t>
- <t>"Library" or "library" refers to a group of
- functions that are used across different programs.</t>
- <t>"Program" or "program" refers to a sequence of
- machine instructions for the DCPU.</t>
- <t>"User" or "user" refers to any person using or
- creating a program.</t>
- </list>
- </t>
- </section>
</section>
- <section anchor="structure" title="Structure of P-Strings">
- <t>A P-String is composed of a 16-bit length prefix and a
- sequence of n characters where n is the value of the prefix.
+ <section anchor="structure" title="Format">
+ <t>A P-string is composed of a 16-bit length prefix and a
+ sequence of n words where n is the value of the prefix.
</t>
- <t>P-String = n (nCHAR)</t>
- <t>The prefix for such a string MUST be present and MUST
- represent the exact number of following characters.</t>
+ <t>The prefix word MUST be present and MUST represent the exact
+ number of following words.</t>
+ <t>Note that the empty P-string consists entirely of a length word
+ containing 0x0000.</t>
+ <figure>
+ <artwork>
+ <![CDATA[
+ P-STRING = LENGTH BODY
+ LENGTH = n
+ BODY = nWORD
+ WORD = %x0000-ffff
+ ; any 16-bit value
+]]>
+ </artwork>
+ </figure>
</section>
- <section anchor="rationale" title="Rationale for using P-Strings">
- <section anchor="rationale-benefits" title="Benefits of P-Strings">
- <t>The rationale of using P-Strings is a simple matter of
- weighting benefits against disadvantages. Note that this is
- not a normative rationale, but one developed by the
- Standards Committee with focus on the DCPU design.</t>
+ <section anchor="rationale" title="Rationale">
+ <t>This section is not normative.</t>
+ <t>The rationale for using P-strings is a simple matter of
+ weighing benefits against disadvantages.</t>
+ <section anchor="rationale-benefits" title="Benefits">
<t>
<list style='symbols'>
<t>Accessing the length of the string is O(1) fast.</t>
@@ -103,19 +132,22 @@
</list>
</t>
<t>The null-character can be used in strings, while
- using it in C-Strings would terminate the string.</t>
- <t>The runtime cost of P-String concatenation is O(n),
- while the runtime cost of C-String concatenation is
+ using it in a C-string would terminate the string.</t>
+ <t>The runtime cost of P-string concatenation is O(n),
+ while the runtime cost of C-string concatenation is
O(n+m).</t>
</list>
</t>
</section>
- <section anchor="rationale-disadvantages" title="Disadvantages of P-Strings">
+ <section anchor="rationale-disadvantages" title="Disadvantages">
<t>
<list style='symbols'>
- <t>Instead of the conventional 0, indexing begins at 1.</t>
- <t>Even for fixed-length fields, the first word would
- be needed for the length.</t>
+ <t>Indexing begins at 1 instead of 0.</t>
+ <t>Fixed-length strings still require a prefixed length word.
+ The same would be true of fixed-length C-strings. Both formats
+ can be abbreviated by omitting the known prefix/terminator
+ at the cost of ceasing to be a P- or C-string per the
+ definitions given here.</t>
</list>
</t>
</section>
@@ -124,33 +156,30 @@
<t>
<list style='symbols'>
<t>Since the smallest addressable size in the DCPU is a
- word (16 bit), this effectively allows a maximum string
+ word (16 bits), this effectively allows a maximum string
length of 65536 characters without any loss in
- efficency.</t>
+ efficiency.</t>
<t>Cutting off the beginning of a string is more
- expensive for P-Strings, but cutting off the end is
- more expensive for C-Strings (other solutions are
- possible, but introduce memory leaks). Therefore, those
+ expensive for P-strings, but cutting off the end is
+ more expensive for C-strings. (Other solutions are
+ possible, but introduce memory leaks.) Therefore, the
two arguments negate each other.</t>
- <t>The names P-String and C-String come from Pascal and
- C respectively, both using their respective format.</t>
+ <t>The names P-string and C-string come from Pascal and
+ C, which use the respective format.</t>
+ <t>The current implementation of characters in the DCPU
+ effectively leaves the high 8 bits empty. It is therefore
+ possible to store two characters in one word. Such packed
+ strings are outside the scope of this RFC.</t>
</list>
</t>
</section>
- <section anchor="packed-strings" title="Packed P-Strings">
- <t>The current implementation of characters in the DCPU
- effectively leaves the high 8 bits empty. It is therefore
- possible to store two characters in one word. Such packed
- strings are out of the scope of this document and should
- be defined in another RFC.</t>
- </section>
<section anchor="security" title="Security Considerations">
- <t>Using P-Strings reduces the risk of arbitrary shell
+ <t>Using P-strings reduces the risk of arbitrary shell
code being executed by overflowing the input buffer.
However, hidden instructions in the string may be executed
later when the sections are not cleared and the stack or
- instruction pointer is moved there. Measurements should be
- taken by the user to assure that this does not happen.</t>
+ instruction pointer is moved there. Measures should be
+ taken by the user to ensure this does not happen.</t>
</section>
</middle>
224 LIB/Draft_String_Format
View
@@ -1,224 +0,0 @@
-
-
-
-RFC X1005 (Draft-Lib) M. Schuetze
- April 2012
-
-
- String Format
-
-Abstract
-
- This document presents a general format for strings in libraries
- intended to be shared accross programs and with other users.
-
-
-Table of Contents
-
- 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2
- 1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 2
- 1.2. Used Terms . . . . . . . . . . . . . . . . . . . . . . . . 2
- 2. Structure of P-Strings . . . . . . . . . . . . . . . . . . . . 2
- 3. Benefits of P-Strings . . . . . . . . . . . . . . . . . . . . . 3
- 4. Disadvantages of P-Strings . . . . . . . . . . . . . . . . . . 3
- 5. Other Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 3
- 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 4
- 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 4
- Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 4
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Schuetze [Page 1]
-
- String Format April 2012
-
-
-1. Introduction
-
- Shared Libraries depend on certain formats being given. Strings,
- being a common argument type for cross-library calls, should
- therefore be standardized. Length-prefixed strings are to be used
- for security and efficency reasons.
-
-1.1. Requirements Language
-
- The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
- "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
- document are to be interpreted as described in RFC 2119 [RFC2119].
-
-1.2. Used Terms
-
- o "String" or "string" refers to a sequence of characters of
- variable length.
-
- o "P-String", "pstring" or "length-prefixed string" refers to a
- 8-bit value representing the length of the following string,
- followed by n characters of length n, where n is the value of the
- length-prefix.
-
- o "C-String", "cstring" or "null-terminated string" refers to a
- sequence of characters that may not include the character with the
- value 0, followed by the character with value 0 (also known as
- "null-terminator" or "null character")
-
- o "Character" or "character" refers to a group of 16 bits
- representing a letter, a number, a punctuation mark, or a control
- character. A control character is a non-printable character that
- influences the text.
-
- o "Library" or "library" refers to a group of functions that are
- used across different programs.
-
- o "Program" or "program" refers to a sequence of machine
- instructions for the DCPU.
-
- o "User" or "user" refers to any person using or creating a program.
-
-
-2. Structure of P-Strings
-
- A P-String is composed of a 8-bit length prefix and a sequence of n
- characters where n is the value of the prefix.
-
-
-
-
-
-Schuetze [Page 2]
-
- String Format April 2012
-
-
- An example for the string "Hello, World" would be
-
- | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |12 | Character
- 0 |1 2|3 4|5 6|7 8|9 0|1 2|3 4|5 6|7 8|9 0|1 2|3 4| Byte
- +--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- |12| H | e | l | l | o | , | | W | o | r | l | d | Data
- +--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-
- Figure 1
-
- The prefix for such a string MUST be present and MUST represent the
- exact number of following characters.
-
-
-3. Benefits of P-Strings
-
- o Accessing the length of the string is O(1) fast.
-
- o Buffer overflows are prevented by being able to allocate enough
- space ahead of time.
-
- * This also increases the security of programs by preventing
- arbitrary shell code from being executed.
-
- o The null-character can be used in strings, while using it in
- C-Strings would terminate the string.
-
- o The runtime cost of P-String concatenation is O(n), while the
- runtime cost of C-String concatenation is O(n+m).
-
-
-4. Disadvantages of P-Strings
-
- o Instead of the conventional 0, indexing begins at 1.
-
- o Even for fixed-length fields, the first byte would be needed for
- the length.
-
-
-5. Other Notes
-
- o Since characters in the DCPU are one word (16 bit) wide, the
- length prefix should indicate the number of following characters,
- not the number of following bytes. This would mean a limit of 512
- characters for a string, which is a reasonable restriction for the
- DCPU.
-
-
-
-
-
-Schuetze [Page 3]
-
- String Format April 2012
-
-
- o Cutting off the beginning of a string is more expensive for
- P-Strings, but cutting off the end is more expensive for C-Strings
- (other solutions are possible, but introduce memory leaks).
- Therefore, those two arguments negate each other.
-
-
-6. Security Considerations
-
- Using P-Strings reduces the risk of arbitrary shell code being
- executed by overflowing the input buffer. However, hidden
- instructions in the string may be executed later when the sections
- are not cleared and the stack or instruction pointer is moved there.
- Measurements should be taken by the user to assure that this does not
- happen.
-
-
-7. References
-
- [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
- Requirement Levels", BCP 14, RFC 2119, March 1997.
-
-
-Author's Address
-
- Malte Schuetze
-
- Email: malte.schuetze@fgms.de
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Schuetze [Page 4]
-
184 LIB/Draft_String_Format.txt
View
@@ -10,22 +10,21 @@ RFC X1005 (Draft-Lib) M. Schuetze
Abstract
This document presents a general format for strings in libraries
- intended to be shared accross programs and with other users.
+ intended to be shared across programs and with other users.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2
- 1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 2
- 1.2. Used Terms . . . . . . . . . . . . . . . . . . . . . . . . 2
- 2. Structure of P-Strings . . . . . . . . . . . . . . . . . . . . 3
- 3. Rationale for using P-Strings . . . . . . . . . . . . . . . . . 3
- 3.1. Benefits of P-Strings . . . . . . . . . . . . . . . . . . . 3
- 3.2. Disadvantages of P-Strings . . . . . . . . . . . . . . . . 3
- 4. Other Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 3
- 5. Packed P-Strings . . . . . . . . . . . . . . . . . . . . . . . 4
- 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 4
- 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 4
+ 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 2
+ 1.2. Notational Conventions . . . . . . . . . . . . . . . . . . 2
+ 2. Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
+ 3. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
+ 3.1. Benefits . . . . . . . . . . . . . . . . . . . . . . . . . 3
+ 3.2. Disadvantages . . . . . . . . . . . . . . . . . . . . . . . 3
+ 4. Other Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 4
+ 5. Security Considerations . . . . . . . . . . . . . . . . . . . . 4
+ 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 4
@@ -52,6 +51,7 @@ Table of Contents
+
Schuetze [Page 1]
String Format April 2012
@@ -62,47 +62,47 @@ Schuetze [Page 1]
Shared Libraries depend on certain formats being given. Strings,
being a common argument type for cross-library calls, should
therefore be standardized. Length-prefixed strings are to be used
- for security and efficency reasons.
-
-1.1. Requirements Language
+ for security and efficiency reasons.
- The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
- "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
- document are to be interpreted as described in RFC 2119 [RFC2119].
+1.1. Terminology
-1.2. Used Terms
+ word
+ 16 bits. This is the smallest unit addressable by the DCPU.
- o "String" or "string" refers to a sequence of characters of
- variable length.
-
- o "P-String", "pstring" or "length-prefixed string" refers to a
- 8-bit value representing the length of the following string,
- followed by n characters of length n, where n is the value of the
- length-prefix.
-
- o "C-String", "cstring" or "null-terminated string" refers to a
- sequence of characters that may not include the character with the
- value 0, followed by the character with value 0 (also known as
- "null-terminator" or "null character")
-
- o "Character" or "character" refers to a group of bits representing
- a letter, a number, a punctuation mark, or a control character. A
- control character is a non-printable character that influences the
+ character
+ A group of bits representing a glyph or control sequence. A
+ control sequence is a non-printing character that influences the
text.
- o "Word" or "word" refers to a group of 16 bits representing a
- value. A word is the smallest addressable space in the DCPU.
+ string
+ A sequence of characters. The length of the sequence can vary at
+ runtime.
+
+ P-string
+ A string whose length in words is indicated by prefixing a word
+ containing that length to the character string. The prefix word
+ itself MUST NOT be included when determining the length.
- o "Library" or "library" refers to a group of functions that are
- used across different programs.
+ C-string
+ A string whose length is indicated by suffixing a NUL character to
+ the character string. The first occurrence of such NUL character
+ terminates the string. The NUL character is the character whose
+ bits are all zero.
- o "Program" or "program" refers to a sequence of machine
- instructions for the DCPU.
+ library
+ A group of functions used by different programs.
- o "User" or "user" refers to any person using or creating a program.
+ program
+ A sequence of machine instructions for the DCPU.
+ user
+ Any person using or creating a program.
+1.2. Notational Conventions
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in RFC 2119 [RFC2119].
@@ -113,25 +113,34 @@ Schuetze [Page 2]
String Format April 2012
-2. Structure of P-Strings
+2. Format
+
+ A P-string is composed of a 16-bit length prefix and a sequence of n
+ words where n is the value of the prefix.
+
+ The prefix word MUST be present and MUST represent the exact number
+ of following words.
- A P-String is composed of a 16-bit length prefix and a sequence of n
- characters where n is the value of the prefix.
+ Note that the empty P-string consists entirely of a length word
+ containing 0x0000.
- P-String = n (nCHAR)
- The prefix for such a string MUST be present and MUST represent the
- exact number of following characters.
+ P-STRING = LENGTH BODY
+ LENGTH = n
+ BODY = nWORD
+ WORD = %x0000-ffff
+ ; any 16-bit value
-3. Rationale for using P-Strings
-3.1. Benefits of P-Strings
+3. Rationale
- The rationale of using P-Strings is a simple matter of weighting
- benefits against disadvantages. Note that this is not a normative
- rationale, but one developed by the Standards Committee with focus on
- the DCPU design.
+ This section is not normative.
+
+ The rationale for using P-strings is a simple matter of weighing
+ benefits against disadvantages.
+
+3.1. Benefits
o Accessing the length of the string is O(1) fast.
@@ -141,25 +150,16 @@ Schuetze [Page 2]
* This also increases the security of programs by preventing
arbitrary shell code from being executed.
- o The null-character can be used in strings, while using it in
- C-Strings would terminate the string.
+ o The null-character can be used in strings, while using it in a
+ C-string would terminate the string.
- o The runtime cost of P-String concatenation is O(n), while the
- runtime cost of C-String concatenation is O(n+m).
+ o The runtime cost of P-string concatenation is O(n), while the
+ runtime cost of C-string concatenation is O(n+m).
-3.2. Disadvantages of P-Strings
+3.2. Disadvantages
- o Instead of the conventional 0, indexing begins at 1.
+ o Indexing begins at 1 instead of 0.
- o Even for fixed-length fields, the first word would be needed for
- the length.
-
-
-4. Other Notes
-
- o Since the smallest addressable size in the DCPU is a word (16
- bit), this effectively allows a maximum string length of 65536
- characters without any loss in efficency.
@@ -169,34 +169,42 @@ Schuetze [Page 3]
String Format April 2012
- o Cutting off the beginning of a string is more expensive for
- P-Strings, but cutting off the end is more expensive for C-Strings
- (other solutions are possible, but introduce memory leaks).
- Therefore, those two arguments negate each other.
+ o Fixed-length strings still require a prefixed length word. The
+ same would be true of fixed-length C-strings. Both formats can be
+ abbreviated by omitting the known prefix/terminator at the cost of
+ ceasing to be a P- or C-string per the definitions given here.
- o The names P-String and C-String come from Pascal and C
- respectively, both using their respective format.
+4. Other Notes
+
+ o Since the smallest addressable size in the DCPU is a word (16
+ bits), this effectively allows a maximum string length of 65536
+ characters without any loss in efficiency.
+
+ o Cutting off the beginning of a string is more expensive for
+ P-strings, but cutting off the end is more expensive for
+ C-strings. (Other solutions are possible, but introduce memory
+ leaks.) Therefore, the two arguments negate each other.
-5. Packed P-Strings
+ o The names P-string and C-string come from Pascal and C, which use
+ the respective format.
- The current implementation of characters in the DCPU effectively
- leaves the high 8 bits empty. It is therefore possible to store two
- characters in one word. Such packed strings are out of the scope of
- this document and should be defined in another RFC.
+ o The current implementation of characters in the DCPU effectively
+ leaves the high 8 bits empty. It is therefore possible to store
+ two characters in one word. Such packed strings are outside the
+ scope of this RFC.
-6. Security Considerations
+5. Security Considerations
- Using P-Strings reduces the risk of arbitrary shell code being
+ Using P-strings reduces the risk of arbitrary shell code being
executed by overflowing the input buffer. However, hidden
instructions in the string may be executed later when the sections
are not cleared and the stack or instruction pointer is moved there.
- Measurements should be taken by the user to assure that this does not
- happen.
+ Measures should be taken by the user to ensure this does not happen.
-7. References
+6. References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
@@ -212,13 +220,5 @@ Author's Address
-
-
-
-
-
-
-
-
Schuetze [Page 4]
Please sign in to comment.
Something went wrong with that request. Please try again.