Permalink
Browse files

Use less =item in POD in favor of =head*

By using =item for long sections we don't get those things included in
TOC's on e.g. metacpan.org. Just make them =head* instead, mostly
=head3's.

I'm fairly sure I got the intended heading level in sereal_spec.pod
right, but it's possible that I didn't.
  • Loading branch information...
1 parent 4034fcb commit 076d0e808f91d568664e7a94df22a6c3d1e76767 @avar avar committed Nov 14, 2012
Showing with 39 additions and 66 deletions.
  1. +5 −9 Perl/Decoder/lib/Sereal/Decoder.pm
  2. +10 −17 Perl/Encoder/lib/Sereal/Encoder.pm
  3. +10 −14 README.pod
  4. +14 −26 sereal_spec.pod
@@ -84,25 +84,23 @@ Constructor. Optionally takes a hash reference as first parameter. This hash
reference may contain any number of options that influence the behaviour of the
encoder. These options are currently valid:
-=over 2
-
-=item refuse_snappy
+=head3 refuse_snappy
If set, the decoder will refuse Snappy-compressed input data. This can be
desirable for robustness. See the section C<ROBUSTNESS> below.
-=item refuse_objects
+=head3 refuse_objects
If set, the decoder will refuse deserializing any objects in the input stream and
instead throw and exception. Defaults to off. See the section C<ROBUSTNESS> below.
-=item validate_utf8
+=head3 validate_utf8
If set, the decoder will refuse invalid UTF-8 byte sequences. This is off
by default, but it's strongly encouraged to be turned on if you're dealing
with any data that has been encoded by an external source (e.g. http cookies).
-=item max_recursion_depth
+=head3 max_recursion_depth
C<Sereal::Decoder> is recursive. If you pass it a Sereal document that is deeply
nested, it will eventually exhaust the C stack. Therefore, there is a limit on
@@ -113,7 +111,7 @@ Beware that setting it too high can cause hard crashes.
Do note that the setting is somewhat approximate. Setting it to 10000 may break at
somewhere between 9997 and 10003 nested structures depending on their types.
-=item max_num_hash_entries
+=head3 max_num_hash_entries
If set to a non-zero value (default: 0), then C<Sereal::Decoder> will refuse
to deserialize any hash/dictionary (or hash-based object) with more than
@@ -122,8 +120,6 @@ hash-collision attacks on Perl's hash function. Chances are, you don't want
or need this. For a gentle introduction to the topic from the cryptographic
point of view, see L<http://en.wikipedia.org/wiki/Collision_attack>.
-=back
-
=head1 INSTANCE METHODS
=head2 decode
@@ -69,12 +69,7 @@ Constructor. Optionally takes a hash reference as first parameter. This hash
reference may contain any number of options that influence the behaviour of the
encoder. Currently, the following options are recognized:
-=for comments
-Using =head3 insted of =item would be preferable so they can be linked to and would appear in the table of contents
-
-=over 2
-
-=item no_shared_hashkeys
+=head3 no_shared_hashkeys
When the C<no_shared_hashkeys> option is set ot a true value, then
the encoder will disable the detection and elimination of repeated hash
@@ -83,7 +78,7 @@ By skipping the detection of repeated hash keys, performance goes up a bit,
but the size of the output can potentially be much larger.
Do not disable this unless you have a reason to.
-=item snappy
+=head3 snappy
If set, the main payload of the Sereal document will be compressed using
Google's Snappy algorithm. This can yield anywhere from no effect
@@ -96,19 +91,19 @@ Sereal documents transparently.
B<NOTE 1:> Do not use this if you want to parse multiple Sereal packets
from the same buffer. Instead use C<snappy_incr> instead.
-=item snappy_incr
+=head3 snappy_incr
Enables a version of the snappy protocol which is suitable for incremental
parsing of packets. See also the C<snappy> option above for more details.
-=item snappy_threshold
+=head3 snappy_threshold
The size threshold (in bytes) of the uncompressed output below which
snappy compression is not even attempted even if enabled.
Defaults to one kilobyte (1024 bytes). Set to 0 and C<snappy> to enabled
to always compress.
-=item croak_on_bless
+=head3 croak_on_bless
If this option is set, then the encoder will refuse to serialize blessed
references and throw an exception instead.
@@ -117,15 +112,15 @@ This can be important because blessed references can mean executing
a destructor on a remote system or generally executing code based on
data.
-=item undef_unknown
+=head3 undef_unknown
If set, unknown/unsupported data structures will be encoded as C<undef>
instead of throwing an exception.
Mutually exclusive with C<stringify_unknown>.
See also C<warn_unknown> below.
-=item stringify_unknown
+=head3 stringify_unknown
If set, unknown/unsupported data structures will be stringified and
encoded as that string instead of throwing an exception. The
@@ -134,7 +129,7 @@ stringification may cause a warning to be emitted by perl.
Mutually exclusive with C<undef_unknown>.
See also C<warn_unknown> below.
-=item warn_unknown
+=head3 warn_unknown
Only has an effect if C<undef_unknown> or C<stringify_unknown>
are enabled.
@@ -146,7 +141,7 @@ data structures just the same as for a positive value with one
exception: For blessed, unsupported items that have string overloading,
we silently stringify without warning.
-=item max_recursion_depth
+=head3 max_recursion_depth
C<Sereal::Encoder> is recursive. If you pass it a Perl data structure
that is deeply nested, it will eventually exhaust the C stack. Therefore,
@@ -159,7 +154,7 @@ do so.
Do note that the setting is somewhat approximate. Setting it to 10000 may break at
somewhere between 9997 and 10003 nested structures depending on their types.
-=item sort_keys
+=head3 sort_keys
Normally C<Sereal::Encoder> will output hashes in whatever order is convenient,
generally that used by perl to actually store the hash, or whatever order
@@ -174,8 +169,6 @@ variables on use, and some of its rules are a little arcane (for instance utf8
keys), and so two hashes that might appear to be the same might still produce
different output as far as Sereal is concerned.
-=back
-
The thusly allocated encoder object and its output buffer will be reused
between invocations of C<encode()>, so hold on to it for an efficiency
gain if you plan to serialize multiple similar data structures, but destroy
View
@@ -21,14 +21,12 @@ the other projects.
=head2 OBJECTIVES
-=over 4
-
-=item References
+=head3 References
We wanted to be able serialize shared references properly. Many
serialization formats do not support this out of the box.
-=item Weak References
+=head3 Weak References
Given that perl uses a reference counting garbage collection scheme,
Perl has the concept of a special type of reference called a
@@ -39,26 +37,26 @@ to be converted to one that will cause a memory leak on a remote system.
For cross-language compatibility, weak references can very easily
be ignored by other decoder implementations.
-=item Aliases
+=head3 Aliases
Perl supports aliases. These are a special kind of reference which is
effectively a C level pointer instead of a Perl language-level
reference. We needed to be able to represent these as well.
-=item Objects
+=head3 Objects
Promoting a plain data structure reference to an object, as is customary
in dynamic languages, can be dangerous in some circumstances. We needed
to be able to serialize objects safely and reliably, and we wanted a
sane control mechanism for doing so.
-=item Regular Expression Objects
+=head3 Regular Expression Objects
In Perl, a regular expression is a native type. We wanted to be
able serialize these at a native level without losing data
such as modifiers.
-=item Space Efficiencies
+=head3 Space Efficiencies
We want to be able to represent common structures as small as is
reasonable. Although not to the extreme that this makes the protocol
@@ -68,32 +66,30 @@ include removing redundancy from the serialized structure
this kind of redundancy removal, but an encoder implementation can
choose to which extent it makes use of the technique.
-=item Speed Efficiencies
+=head3 Speed Efficiencies
We want to be able to serialize and deserialize quickly. Some of the design
decisions and trade-offs were aimed squarely at performance.
-=item Separate Decoder and Encoder
+=head3 Separate Decoder and Encoder
We wanted to separate the functions of serializing from deserializing
so they could be upgraded independently.
-=item Forward/Backward Compatibility
+=head3 Forward/Backward Compatibility
We wanted the protocol to be robust to forward/backwards compatibility
issues. It should be possible to partially read new formats with an
old decoder, and output old formats with a new encoder.
-=item Language Agnosticism
+=head3 Language Agnosticism
We want the format to be usable by other languages, especially dynamic
languages. We hope to have a Java port soon, right Eric? In aim of making
this easier we have structured our repo so that implementations from other
languages can be easily added, and we would welcome any contributions
along these lines.
-=back
-
=head2 Performance Analysis
There are some graphs of how the Perl implementations Sereal performs as
View
@@ -17,17 +17,15 @@ up of two parts, the header and the body.
=head2 General Points
-=over 4
-
-=item Little Endian
+=head3 Little Endian
All numeric data is in little endian format.
-=item IEEE Floats
+=head3 IEEE Floats
Floating points types are in IEEE format.
-=item Varints
+=head3 Varints
Heavy use is made of a variable length integer encoding commonly called
a "varint" (Google calls it a Varint128). This encoding uses the high bit
@@ -38,23 +36,19 @@ next etc.
See L<Google's description|https://developers.google.com/protocol-buffers/docs/encoding#varints>.
-=back
-
=head2 Header Format
A header consists of multiple components:
<MAGIC> <VERSION-TYPE> <HEADER-SUFFIX-SIZE> <OPT-SUFFIX>
-=over 4
-
-=item MAGIC
+=head3 MAGIC
A "magic string" that identifies a document as being in the Sereal format.
The value of this string is "=srl", and when decoded as an unsigned 32 bit
integer on a little endian machine has a value of 0x6c72733d.
-=item VERSION-TYPE
+=head3 VERSION-TYPE
A single byte, of which the high 4 bits are used to represent the "type"
of the document, and the low 4 bits used to represent the version of the
@@ -80,7 +74,7 @@ Compressed Sereal format, using Google's Snappy compression internally.
Additional compression types are envisaged and will be assigned type
numbers by the maintainers of the protocol.
-=item HEADER-SUFFIX-SIZE
+=head3 HEADER-SUFFIX-SIZE
The structure of the header includes support for embedding additional data.
This is accomplished by specifying the length of the suffix
@@ -89,14 +83,12 @@ binary 0. This is intended for future format extensions that retain some
level of compatibility for old decoders (which know how to skip the
extended header due to the embedded length).
-=item OPT-SUFFIX
+=head3 OPT-SUFFIX
The suffix may contain whatever data the encoder wishes to embed in the
header. In version 1 of the protocol the decoder will never look inside
this data. Later versions may introduce new rules for this field.
-=back
-
=head2 Body Format
The body is made up of tagged data items:
@@ -107,9 +99,7 @@ Tagged items can be containers that hold other tagged items.
At the top level, the body holds only ONE tagged item (often
an array or hash) that holds others.
-=over 4
-
-=item TAG
+=head3 TAG
A tag is a single byte which specifies the type of the data being decoded.
@@ -124,14 +114,12 @@ Some tags, such as POS, NEG and SHORT_BINARY contain embedded in them
either the data (in the case of POS and NEG) or the length of the
OPT-DATA section (in the case of SHORT_BINARY).
-=item OPT-DATA
+=head3 OPT-DATA
This field may contain an arbitrary set of bytes, either determined
implicitly by the tag (such as for FLOAT), explicitly in the tag (as in
SHORT_BINARY) or in a varint following the tag (such as for STRING).
-=back
-
When referring to an offset below, what's meant is a varint encoded
absolute integer byte position. That is, an offset of 10 refers to the
tenth byte in the Sereal document (including its header).
@@ -274,7 +262,7 @@ tenth byte in the Sereal document (including its header).
=for autoupdater stop
-=head4 The Track Bit And Cyclic Data Structures
+=head3 The Track Bit And Cyclic Data Structures
The protocol uses a combination of the offset of a tracked tag and the
flag bit to be able to encode and reconstruct cyclic structures in a single
@@ -290,7 +278,7 @@ that tag. At a later point in the packet there will be an ALIAS or REFP
instruction which will refer to the item by its offset, and the decoder
will reuse it as needed.
-=head4 The COPY Tag
+=head3 The COPY Tag
Sometimes it is convenient to be able to reuse a previously emitted
sequence in the packet to reduce duplication. For instance a data
@@ -307,7 +295,7 @@ forbidden from referring to anything containing a COPY tag, with the
exception that a COPY tag used as a value may refer to an tag that uses
a COPY tag for a classname or hash key.
-=head4 String Types
+=head3 String Types
Sereal supports three string representations. Two are "encodingless" and
are SHORT_BINARY and BINARY, where binary means "raw bytes". The other
@@ -320,12 +308,12 @@ SHORT_BINARY stores the length of the string in the tag itself and is used
for strings of less than 32 characters long. Both BINARY and STR_UTF8
use a varint to indicate the number of B<bytes> (octets) in the string.
-=head4 Hash Keys
+=head3 Hash Keys
Hash keys are always one of the string types, or a COPY tag referencing a
string.
-=head4 Handling objects
+=head3 Handling objects
Objects are serialized as a class name and a tag which represents the
objects data. In Perl land this will always be a reference. Mapping perl

0 comments on commit 076d0e8

Please sign in to comment.