Skip to content

Commit

Permalink
Use less =item in POD in favor of =head*
Browse files Browse the repository at this point in the history
By using =item for long sections we don't get those things included in
TOC's on e.g. metacpan.org. Just make them =head* instead, mostly
=head3's.

I'm fairly sure I got the intended heading level in sereal_spec.pod
right, but it's possible that I didn't.
  • Loading branch information
avar committed Jan 7, 2013
1 parent 4034fcb commit 076d0e8
Show file tree
Hide file tree
Showing 4 changed files with 39 additions and 66 deletions.
14 changes: 5 additions & 9 deletions Perl/Decoder/lib/Sereal/Decoder.pm
Original file line number Diff line number Diff line change
Expand Up @@ -84,25 +84,23 @@ Constructor. Optionally takes a hash reference as first parameter. This hash
reference may contain any number of options that influence the behaviour of the
encoder. These options are currently valid:
=over 2
=item refuse_snappy
=head3 refuse_snappy
If set, the decoder will refuse Snappy-compressed input data. This can be
desirable for robustness. See the section C<ROBUSTNESS> below.
=item refuse_objects
=head3 refuse_objects
If set, the decoder will refuse deserializing any objects in the input stream and
instead throw and exception. Defaults to off. See the section C<ROBUSTNESS> below.
=item validate_utf8
=head3 validate_utf8
If set, the decoder will refuse invalid UTF-8 byte sequences. This is off
by default, but it's strongly encouraged to be turned on if you're dealing
with any data that has been encoded by an external source (e.g. http cookies).
=item max_recursion_depth
=head3 max_recursion_depth
C<Sereal::Decoder> is recursive. If you pass it a Sereal document that is deeply
nested, it will eventually exhaust the C stack. Therefore, there is a limit on
Expand All @@ -113,7 +111,7 @@ Beware that setting it too high can cause hard crashes.
Do note that the setting is somewhat approximate. Setting it to 10000 may break at
somewhere between 9997 and 10003 nested structures depending on their types.
=item max_num_hash_entries
=head3 max_num_hash_entries
If set to a non-zero value (default: 0), then C<Sereal::Decoder> will refuse
to deserialize any hash/dictionary (or hash-based object) with more than
Expand All @@ -122,8 +120,6 @@ hash-collision attacks on Perl's hash function. Chances are, you don't want
or need this. For a gentle introduction to the topic from the cryptographic
point of view, see L<http://en.wikipedia.org/wiki/Collision_attack>.
=back
=head1 INSTANCE METHODS
=head2 decode
Expand Down
27 changes: 10 additions & 17 deletions Perl/Encoder/lib/Sereal/Encoder.pm
Original file line number Diff line number Diff line change
Expand Up @@ -69,12 +69,7 @@ Constructor. Optionally takes a hash reference as first parameter. This hash
reference may contain any number of options that influence the behaviour of the
encoder. Currently, the following options are recognized:
=for comments
Using =head3 insted of =item would be preferable so they can be linked to and would appear in the table of contents
=over 2
=item no_shared_hashkeys
=head3 no_shared_hashkeys
When the C<no_shared_hashkeys> option is set ot a true value, then
the encoder will disable the detection and elimination of repeated hash
Expand All @@ -83,7 +78,7 @@ By skipping the detection of repeated hash keys, performance goes up a bit,
but the size of the output can potentially be much larger.
Do not disable this unless you have a reason to.
=item snappy
=head3 snappy
If set, the main payload of the Sereal document will be compressed using
Google's Snappy algorithm. This can yield anywhere from no effect
Expand All @@ -96,19 +91,19 @@ Sereal documents transparently.
B<NOTE 1:> Do not use this if you want to parse multiple Sereal packets
from the same buffer. Instead use C<snappy_incr> instead.
=item snappy_incr
=head3 snappy_incr
Enables a version of the snappy protocol which is suitable for incremental
parsing of packets. See also the C<snappy> option above for more details.
=item snappy_threshold
=head3 snappy_threshold
The size threshold (in bytes) of the uncompressed output below which
snappy compression is not even attempted even if enabled.
Defaults to one kilobyte (1024 bytes). Set to 0 and C<snappy> to enabled
to always compress.
=item croak_on_bless
=head3 croak_on_bless
If this option is set, then the encoder will refuse to serialize blessed
references and throw an exception instead.
Expand All @@ -117,15 +112,15 @@ This can be important because blessed references can mean executing
a destructor on a remote system or generally executing code based on
data.
=item undef_unknown
=head3 undef_unknown
If set, unknown/unsupported data structures will be encoded as C<undef>
instead of throwing an exception.
Mutually exclusive with C<stringify_unknown>.
See also C<warn_unknown> below.
=item stringify_unknown
=head3 stringify_unknown
If set, unknown/unsupported data structures will be stringified and
encoded as that string instead of throwing an exception. The
Expand All @@ -134,7 +129,7 @@ stringification may cause a warning to be emitted by perl.
Mutually exclusive with C<undef_unknown>.
See also C<warn_unknown> below.
=item warn_unknown
=head3 warn_unknown
Only has an effect if C<undef_unknown> or C<stringify_unknown>
are enabled.
Expand All @@ -146,7 +141,7 @@ data structures just the same as for a positive value with one
exception: For blessed, unsupported items that have string overloading,
we silently stringify without warning.
=item max_recursion_depth
=head3 max_recursion_depth
C<Sereal::Encoder> is recursive. If you pass it a Perl data structure
that is deeply nested, it will eventually exhaust the C stack. Therefore,
Expand All @@ -159,7 +154,7 @@ do so.
Do note that the setting is somewhat approximate. Setting it to 10000 may break at
somewhere between 9997 and 10003 nested structures depending on their types.
=item sort_keys
=head3 sort_keys
Normally C<Sereal::Encoder> will output hashes in whatever order is convenient,
generally that used by perl to actually store the hash, or whatever order
Expand All @@ -174,8 +169,6 @@ variables on use, and some of its rules are a little arcane (for instance utf8
keys), and so two hashes that might appear to be the same might still produce
different output as far as Sereal is concerned.
=back
The thusly allocated encoder object and its output buffer will be reused
between invocations of C<encode()>, so hold on to it for an efficiency
gain if you plan to serialize multiple similar data structures, but destroy
Expand Down
24 changes: 10 additions & 14 deletions README.pod
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,12 @@ the other projects.

=head2 OBJECTIVES

=over 4

=item References
=head3 References

We wanted to be able serialize shared references properly. Many
serialization formats do not support this out of the box.

=item Weak References
=head3 Weak References

Given that perl uses a reference counting garbage collection scheme,
Perl has the concept of a special type of reference called a
Expand All @@ -39,26 +37,26 @@ to be converted to one that will cause a memory leak on a remote system.
For cross-language compatibility, weak references can very easily
be ignored by other decoder implementations.

=item Aliases
=head3 Aliases

Perl supports aliases. These are a special kind of reference which is
effectively a C level pointer instead of a Perl language-level
reference. We needed to be able to represent these as well.

=item Objects
=head3 Objects

Promoting a plain data structure reference to an object, as is customary
in dynamic languages, can be dangerous in some circumstances. We needed
to be able to serialize objects safely and reliably, and we wanted a
sane control mechanism for doing so.

=item Regular Expression Objects
=head3 Regular Expression Objects

In Perl, a regular expression is a native type. We wanted to be
able serialize these at a native level without losing data
such as modifiers.

=item Space Efficiencies
=head3 Space Efficiencies

We want to be able to represent common structures as small as is
reasonable. Although not to the extreme that this makes the protocol
Expand All @@ -68,32 +66,30 @@ include removing redundancy from the serialized structure
this kind of redundancy removal, but an encoder implementation can
choose to which extent it makes use of the technique.

=item Speed Efficiencies
=head3 Speed Efficiencies

We want to be able to serialize and deserialize quickly. Some of the design
decisions and trade-offs were aimed squarely at performance.

=item Separate Decoder and Encoder
=head3 Separate Decoder and Encoder

We wanted to separate the functions of serializing from deserializing
so they could be upgraded independently.

=item Forward/Backward Compatibility
=head3 Forward/Backward Compatibility

We wanted the protocol to be robust to forward/backwards compatibility
issues. It should be possible to partially read new formats with an
old decoder, and output old formats with a new encoder.

=item Language Agnosticism
=head3 Language Agnosticism

We want the format to be usable by other languages, especially dynamic
languages. We hope to have a Java port soon, right Eric? In aim of making
this easier we have structured our repo so that implementations from other
languages can be easily added, and we would welcome any contributions
along these lines.

=back

=head2 Performance Analysis

There are some graphs of how the Perl implementations Sereal performs as
Expand Down
40 changes: 14 additions & 26 deletions sereal_spec.pod
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,15 @@ up of two parts, the header and the body.

=head2 General Points

=over 4

=item Little Endian
=head3 Little Endian

All numeric data is in little endian format.

=item IEEE Floats
=head3 IEEE Floats

Floating points types are in IEEE format.

=item Varints
=head3 Varints

Heavy use is made of a variable length integer encoding commonly called
a "varint" (Google calls it a Varint128). This encoding uses the high bit
Expand All @@ -38,23 +36,19 @@ next etc.

See L<Google's description|https://developers.google.com/protocol-buffers/docs/encoding#varints>.

=back

=head2 Header Format

A header consists of multiple components:

<MAGIC> <VERSION-TYPE> <HEADER-SUFFIX-SIZE> <OPT-SUFFIX>

=over 4

=item MAGIC
=head3 MAGIC

A "magic string" that identifies a document as being in the Sereal format.
The value of this string is "=srl", and when decoded as an unsigned 32 bit
integer on a little endian machine has a value of 0x6c72733d.

=item VERSION-TYPE
=head3 VERSION-TYPE

A single byte, of which the high 4 bits are used to represent the "type"
of the document, and the low 4 bits used to represent the version of the
Expand All @@ -80,7 +74,7 @@ Compressed Sereal format, using Google's Snappy compression internally.
Additional compression types are envisaged and will be assigned type
numbers by the maintainers of the protocol.

=item HEADER-SUFFIX-SIZE
=head3 HEADER-SUFFIX-SIZE

The structure of the header includes support for embedding additional data.
This is accomplished by specifying the length of the suffix
Expand All @@ -89,14 +83,12 @@ binary 0. This is intended for future format extensions that retain some
level of compatibility for old decoders (which know how to skip the
extended header due to the embedded length).

=item OPT-SUFFIX
=head3 OPT-SUFFIX

The suffix may contain whatever data the encoder wishes to embed in the
header. In version 1 of the protocol the decoder will never look inside
this data. Later versions may introduce new rules for this field.

=back

=head2 Body Format

The body is made up of tagged data items:
Expand All @@ -107,9 +99,7 @@ Tagged items can be containers that hold other tagged items.
At the top level, the body holds only ONE tagged item (often
an array or hash) that holds others.

=over 4

=item TAG
=head3 TAG

A tag is a single byte which specifies the type of the data being decoded.

Expand All @@ -124,14 +114,12 @@ Some tags, such as POS, NEG and SHORT_BINARY contain embedded in them
either the data (in the case of POS and NEG) or the length of the
OPT-DATA section (in the case of SHORT_BINARY).

=item OPT-DATA
=head3 OPT-DATA

This field may contain an arbitrary set of bytes, either determined
implicitly by the tag (such as for FLOAT), explicitly in the tag (as in
SHORT_BINARY) or in a varint following the tag (such as for STRING).

=back

When referring to an offset below, what's meant is a varint encoded
absolute integer byte position. That is, an offset of 10 refers to the
tenth byte in the Sereal document (including its header).
Expand Down Expand Up @@ -274,7 +262,7 @@ tenth byte in the Sereal document (including its header).

=for autoupdater stop

=head4 The Track Bit And Cyclic Data Structures
=head3 The Track Bit And Cyclic Data Structures

The protocol uses a combination of the offset of a tracked tag and the
flag bit to be able to encode and reconstruct cyclic structures in a single
Expand All @@ -290,7 +278,7 @@ that tag. At a later point in the packet there will be an ALIAS or REFP
instruction which will refer to the item by its offset, and the decoder
will reuse it as needed.

=head4 The COPY Tag
=head3 The COPY Tag

Sometimes it is convenient to be able to reuse a previously emitted
sequence in the packet to reduce duplication. For instance a data
Expand All @@ -307,7 +295,7 @@ forbidden from referring to anything containing a COPY tag, with the
exception that a COPY tag used as a value may refer to an tag that uses
a COPY tag for a classname or hash key.

=head4 String Types
=head3 String Types

Sereal supports three string representations. Two are "encodingless" and
are SHORT_BINARY and BINARY, where binary means "raw bytes". The other
Expand All @@ -320,12 +308,12 @@ SHORT_BINARY stores the length of the string in the tag itself and is used
for strings of less than 32 characters long. Both BINARY and STR_UTF8
use a varint to indicate the number of B<bytes> (octets) in the string.

=head4 Hash Keys
=head3 Hash Keys

Hash keys are always one of the string types, or a COPY tag referencing a
string.

=head4 Handling objects
=head3 Handling objects

Objects are serialized as a class name and a tag which represents the
objects data. In Perl land this will always be a reference. Mapping perl
Expand Down

0 comments on commit 076d0e8

Please sign in to comment.