Permalink
Browse files

Merge pull request #37 from jeremy-w/pstring

Pstring draft changes
  • Loading branch information...
2 parents 3f5ea40 + 17a0e97 commit 5c10e65da02e38428e3bf749bbece89baf93f091 @0xabad1dea 0xabad1dea committed Apr 10, 2012
Showing with 191 additions and 386 deletions.
  1. +99 −70 LIB/Draft-String-Format.xml
  2. +0 −224 LIB/Draft_String_Format
  3. +92 −92 LIB/Draft_String_Format.txt
@@ -21,13 +21,13 @@
<date month="April" year="2012"/>
<area>Lib</area>
- <workgroup>0x10c Standards Commitee</workgroup>
+ <workgroup>0x10c Standards Committee</workgroup>
<keyword>String</keyword>
<keyword>String Format</keyword>
<keyword>Library Specifications</keyword>
<abstract>
<t>This document presents a general format for strings in
- libraries intended to be shared accross programs and with
+ libraries intended to be shared across programs and with
other users.</t>
</abstract>
</front>
@@ -37,60 +37,89 @@
<t>Shared Libraries depend on certain formats being given.
Strings, being a common argument type for cross-library
calls, should therefore be standardized. Length-prefixed
- strings are to be used for security and efficency reasons.</t>
+ strings are to be used for security and efficiency reasons.</t>
- <section anchor="intro-lang" title="Requirements Language">
+ <section anchor="intro-terms" title="Terminology">
+ <t>
+ <list style="hanging">
+ <t hangText="word">
+ <vspace />
+ 16 bits.
+ This is the smallest unit addressable by the DCPU.
+ </t>
+ <t hangText="character">
+ <vspace />
+ A group of bits representing a glyph or control sequence. A
+ control sequence is a non-printing character that
+ influences the text.
+ </t>
+ <t hangText="string">
+ <vspace />
+ A sequence of characters.
+ The length of the sequence can vary at runtime.
+ </t>
+ <t hangText="P-string">
+ <vspace />
+ A string whose length in words is indicated by prefixing
+ a word containing that length to the character string.
+ The prefix word itself MUST NOT be included when
+ determining the length.
+ </t>
+ <t hangText="C-string">
+ <vspace />
+ A string whose length is indicated by suffixing a NUL
+ character to the character string. The first occurrence of
+ such NUL character terminates the string. The NUL character
+ is the character whose bits are all zero.
+ </t>
+ <t hangText="library">
+ <vspace />
+ A group of functions used by different programs.
+ </t>
+ <t hangText="program">
+ <vspace />
+ A sequence of machine instructions for the DCPU.
+ </t>
+ <t hangText="user">
+ <vspace />
+ Any person using or creating a program.
+ </t>
+ </list>
+ </t>
+ </section>
+ <section anchor="intro-lang" title="Notational Conventions">
<t>The key words "MUST", "MUST NOT", "REQUIRED",
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "MAY", and "OPTIONAL" in this document
are to be interpreted as described in <xref
target="RFC2119">RFC 2119</xref>.</t>
</section>
- <section anchor="intro-terms" title="Used Terms">
- <t>
- <list style="symbols">
- <t>"String" or "string" refers to a sequence of
- characters of variable length.</t>
- <t>"P-String", "pstring" or "length-prefixed string"
- refers to a 8-bit value representing the length of the
- following string, followed by n characters of length n,
- where n is the value of the length-prefix.</t>
- <t>"C-String", "cstring" or "null-terminated string"
- refers to a sequence of characters that may not include
- the character with the value 0, followed by the
- character with value 0 (also known as "null-terminator"
- or "null character")</t>
- <t>"Character" or "character" refers to a group of
- bits representing a letter, a number, a punctuation
- mark, or a control character. A control character is a
- non-printable character that influences the text.</t>
- <t>"Word" or "word" refers to a group of 16 bits
- representing a value. A word is the smallest addressable
- space in the DCPU.</t>
- <t>"Library" or "library" refers to a group of
- functions that are used across different programs.</t>
- <t>"Program" or "program" refers to a sequence of
- machine instructions for the DCPU.</t>
- <t>"User" or "user" refers to any person using or
- creating a program.</t>
- </list>
- </t>
- </section>
</section>
- <section anchor="structure" title="Structure of P-Strings">
- <t>A P-String is composed of a 16-bit length prefix and a
- sequence of n characters where n is the value of the prefix.
+ <section anchor="structure" title="Format">
+ <t>A P-string is composed of a 16-bit length prefix and a
+ sequence of n words where n is the value of the prefix.
</t>
- <t>P-String = n (nCHAR)</t>
- <t>The prefix for such a string MUST be present and MUST
- represent the exact number of following characters.</t>
+ <t>The prefix word MUST be present and MUST represent the exact
+ number of following words.</t>
+ <t>Note that the empty P-string consists entirely of a length word
+ containing 0x0000.</t>
+ <figure>
+ <artwork>
+ <![CDATA[
+ P-STRING = LENGTH BODY
+ LENGTH = n
+ BODY = nWORD
+ WORD = %x0000-ffff
+ ; any 16-bit value
+]]>
+ </artwork>
+ </figure>
</section>
- <section anchor="rationale" title="Rationale for using P-Strings">
- <section anchor="rationale-benefits" title="Benefits of P-Strings">
- <t>The rationale of using P-Strings is a simple matter of
- weighting benefits against disadvantages. Note that this is
- not a normative rationale, but one developed by the
- Standards Committee with focus on the DCPU design.</t>
+ <section anchor="rationale" title="Rationale">
+ <t>This section is not normative.</t>
+ <t>The rationale for using P-strings is a simple matter of
+ weighing benefits against disadvantages.</t>
+ <section anchor="rationale-benefits" title="Benefits">
<t>
<list style='symbols'>
<t>Accessing the length of the string is O(1) fast.</t>
@@ -103,19 +132,22 @@
</list>
</t>
<t>The null-character can be used in strings, while
- using it in C-Strings would terminate the string.</t>
- <t>The runtime cost of P-String concatenation is O(n),
- while the runtime cost of C-String concatenation is
+ using it in a C-string would terminate the string.</t>
+ <t>The runtime cost of P-string concatenation is O(n),
+ while the runtime cost of C-string concatenation is
O(n+m).</t>
</list>
</t>
</section>
- <section anchor="rationale-disadvantages" title="Disadvantages of P-Strings">
+ <section anchor="rationale-disadvantages" title="Disadvantages">
<t>
<list style='symbols'>
- <t>Instead of the conventional 0, indexing begins at 1.</t>
- <t>Even for fixed-length fields, the first word would
- be needed for the length.</t>
+ <t>Indexing begins at 1 instead of 0.</t>
+ <t>Fixed-length strings still require a prefixed length word.
+ The same would be true of fixed-length C-strings. Both formats
+ can be abbreviated by omitting the known prefix/terminator
+ at the cost of ceasing to be a P- or C-string per the
+ definitions given here.</t>
</list>
</t>
</section>
@@ -124,33 +156,30 @@
<t>
<list style='symbols'>
<t>Since the smallest addressable size in the DCPU is a
- word (16 bit), this effectively allows a maximum string
+ word (16 bits), this effectively allows a maximum string
length of 65536 characters without any loss in
- efficency.</t>
+ efficiency.</t>
<t>Cutting off the beginning of a string is more
- expensive for P-Strings, but cutting off the end is
- more expensive for C-Strings (other solutions are
- possible, but introduce memory leaks). Therefore, those
+ expensive for P-strings, but cutting off the end is
+ more expensive for C-strings. (Other solutions are
+ possible, but introduce memory leaks.) Therefore, the
two arguments negate each other.</t>
- <t>The names P-String and C-String come from Pascal and
- C respectively, both using their respective format.</t>
+ <t>The names P-string and C-string come from Pascal and
+ C, which use the respective format.</t>
+ <t>The current implementation of characters in the DCPU
+ effectively leaves the high 8 bits empty. It is therefore
+ possible to store two characters in one word. Such packed
+ strings are outside the scope of this RFC.</t>
</list>
</t>
</section>
- <section anchor="packed-strings" title="Packed P-Strings">
- <t>The current implementation of characters in the DCPU
- effectively leaves the high 8 bits empty. It is therefore
- possible to store two characters in one word. Such packed
- strings are out of the scope of this document and should
- be defined in another RFC.</t>
- </section>
<section anchor="security" title="Security Considerations">
- <t>Using P-Strings reduces the risk of arbitrary shell
+ <t>Using P-strings reduces the risk of arbitrary shell
code being executed by overflowing the input buffer.
However, hidden instructions in the string may be executed
later when the sections are not cleared and the stack or
- instruction pointer is moved there. Measurements should be
- taken by the user to assure that this does not happen.</t>
+ instruction pointer is moved there. Measures should be
+ taken by the user to ensure this does not happen.</t>
</section>
</middle>
Oops, something went wrong.

0 comments on commit 5c10e65

Please sign in to comment.