Skip to content
This repository has been archived by the owner on Jun 15, 2020. It is now read-only.

Commit

Permalink
Browse files Browse the repository at this point in the history
doc(src) Add documentation, and README.md.
The `README.md` file is generated with `cargo-readme`.
  • Loading branch information
Hywan committed Apr 25, 2018
1 parent c251f11 commit c6fa94b
Show file tree
Hide file tree
Showing 10 changed files with 482 additions and 21 deletions.
4 changes: 2 additions & 2 deletions .gitignore
Expand Up @@ -2,5 +2,5 @@
/Cargo.lock
/bindings/wasm/node_modules/
/bindings/wasm/package-lock.json
/bindings/wasm/parser.wasm
/bindings/wasm/parser.wasm.gz
/bindings/wasm/gutenberg_post_parser.wasm
/bindings/wasm/gutenberg_post_parser.wasm.gz
7 changes: 6 additions & 1 deletion Cargo.toml
@@ -1,10 +1,15 @@
[package]
name = "parser"
name = "gutenberg_post_parser"
version = "0.1.0"
authors = ["Ivan Enderlin <ivan.enderlin@hoa-project.net>"]
license = "BSD-3-Clause"
readme = "./README.md"
repository = "https://github.com/Hywan/gutenberg-parser-rs"

[lib]
name = "gutenberg_post_parser"
crate-type = ["lib", "cdylib"]
path = "src/lib.rs"

[profile.release]
debug = false
Expand Down
93 changes: 93 additions & 0 deletions README.md
@@ -0,0 +1,93 @@
## The Gutenberg post parser.

[Gutenberg] is a new post editor for the [WordPress] ecosystem. A post
has always been HTML, and it continues to be. The difference is that
the HTML is annotated. Like most annotation language, it is located in
comments, like this:

```html
<h1>Famous post</h1>

<!-- wp:component {attributes: "as JSON"} -->
lorem ipsum
<!-- /wp:component -->
```

The parser analyses a post and generates an Abstract Syntax Tree (AST) of it.

### Platforms and bindings

The parser aims at being used on different platforms, such as: Web
within multiple browsers, Web applications like [Electron], native
applications like macOS, iOS, Windows, Linux etc.

Thus, the parser can be compiled as a static library, can be embedded
in any Rust projects, can be compiled to [WebAssembly], and soon more.

This project uses [Justfile] as an alternative to Makefile. Every
following command will use `just`, you might consider to install
it. To learn about all the commands, just `just --list`.

#### Static library

To compile the parser to a static library, run:

```sh
$ just build-library
$ ls target/release/
```

#### WebAssembly

To compile the parser to a [WebAssembly] file, run:

```sh
$ just build-wasm
$ open bindings/wasm/index.html # for a demonstration
```

### Performance and guarantee

The parser guarantees to never copy the data in memory, which makes it
fast and memory efficient.

### License

The license is a classic `BSD-3-Clause`:

> New BSD License
>
> Copyright ©, Ivan Enderlin. All rights reserved.
>
> Redistribution and use in source and binary forms, with or without
> modification, are permitted provided that the following conditions are met:
>
> * Redistributions of source code must retain the above copyright
> notice, this list of conditions and the following disclaimer.
>
> * Redistributions in binary form must reproduce the above copyright
> notice, this list of conditions and the following disclaimer in the
> documentation and/or other materials provided with the distribution.
>
> * Neither the name of this project nor the names of its contributors may be
> used to endorse or promote products derived from this software without
> specific prior written permission.
>
> THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
> AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS AND CONTRIBUTORS BE
> LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> POSSIBILITY OF SUCH DAMAGE.
[Gutenberg]: https://github.com/WordPress/gutenberg/
[WordPress]: https://wordpress.org/
[Electron]: https://github.com/electron/
[Justfile]: https://github.com/casey/just/
[WebAssembly]: http://webassembly.org/

1 change: 1 addition & 0 deletions README.tpl
@@ -0,0 +1 @@
{{readme}}
14 changes: 7 additions & 7 deletions justfile
Expand Up @@ -4,23 +4,23 @@ wasm_directory = "bindings/wasm"
build-library:
cargo +nightly build --release

# Test the parser only (i.e. not the bindings to external languages).
# Test the parser only (i.e. not the bindings to external languages) and its documentation.
test-library:
cargo +nightly test

# Build the documentation.
build-doc:
cargo +nightly doc --release --all-features
cargo +nightly doc --release --package gutenberg_post_parser

# Build the parser and the WASM binding.
build-wasm:
cargo +nightly build --release --features "wasm" --target wasm32-unknown-unknown
cp target/wasm32-unknown-unknown/release/parser.wasm {{wasm_directory}}
cp target/wasm32-unknown-unknown/release/gutenberg_post_parser.wasm {{wasm_directory}}
cd {{wasm_directory}} && \
wasm-gc parser.wasm && \
wasm-opt -Oz -o parser_opt.wasm parser.wasm && \
mv parser_opt.wasm parser.wasm && \
gzip --best --stdout parser.wasm > parser.wasm.gz
wasm-gc gutenberg_post_parser.wasm && \
wasm-opt -Oz -o gutenberg_post_parser_opt.wasm gutenberg_post_parser.wasm && \
mv gutenberg_post_parser_opt.wasm gutenberg_post_parser.wasm && \
gzip --best --stdout gutenberg_post_parser.wasm > gutenberg_post_parser.wasm.gz

# Pack the WASM binding and run an HTTP server to try it.
run-wasm: build-wasm
Expand Down
16 changes: 16 additions & 0 deletions src/ast.rs
@@ -1,10 +1,26 @@
/*!
The Abstract Syntax Tree (AST), i.e. the output of the parser.
*/

use super::Input;
#[cfg(feature = "wasm")] use alloc::Vec;

/// A block is the elementary component of the post format.
#[derive(PartialEq)]
#[cfg_attr(not(feature = "wasm"), derive(Debug))]
pub struct Block<'a> {
/// The fully-qualified block name, where the left part of the
/// pair represents the namespace, and the right part of the pair
/// represents the block name.
pub name: (Input<'a>, Input<'a>),

/// A block can have attributes, just like an HTML element can
/// have attributes. Attributes are encoded as a JSON string.
pub attributes: Option<Input<'a>>,

/// A block can have inner blocks, just like an HTML element can
/// have inner HTML elements.
pub inner_blocks: Vec<Block<'a>>
}
10 changes: 10 additions & 0 deletions src/combinators.rs
@@ -1,5 +1,15 @@
/*!
Additional combinators specifically tailored for this parser.
Warning: It's likely the combinators are public to the crate only, and
thus can be absent from the public documentation.
*/

/// `take_till_terminated(S, C)` is a like `take_till` but with a lookahead
/// combinator `C`.
#[macro_export]
macro_rules! take_till_terminated (
($input:expr, $substr:expr, $submac:ident!( $($args:tt)* )) => (
{
Expand Down
153 changes: 150 additions & 3 deletions src/lib.rs
@@ -1,3 +1,101 @@
/*!
# The Gutenberg post parser.
[Gutenberg] is a new post editor for the [WordPress] ecosystem. A post
has always been HTML, and it continues to be. The difference is that
the HTML is annotated. Like most annotation language, it is located in
comments, like this:
```html
<h1>Famous post</h1>
<!-- wp:component {attributes: "as JSON"} -->
lorem ipsum
<!-- /wp:component -->
```
The parser analyses a post and generates an Abstract Syntax Tree (AST) of it.
## Platforms and bindings
The parser aims at being used on different platforms, such as: Web
within multiple browsers, Web applications like [Electron], native
applications like macOS, iOS, Windows, Linux etc.
Thus, the parser can be compiled as a static library, can be embedded
in any Rust projects, can be compiled to [WebAssembly], and soon more.
This project uses [Justfile] as an alternative to Makefile. Every
following command will use `just`, you might consider to install
it. To learn about all the commands, just `just --list`.
### Static library
To compile the parser to a static library, run:
```sh
$ just build-library
$ ls target/release/
```
### WebAssembly
To compile the parser to a [WebAssembly] file, run:
```sh
$ just build-wasm
$ open bindings/wasm/index.html # for a demonstration
```
## Performance and guarantee
The parser guarantees to never copy the data in memory, which makes it
fast and memory efficient.
## License
The license is a classic `BSD-3-Clause`:
> New BSD License
>
> Copyright ©, Ivan Enderlin. All rights reserved.
>
> Redistribution and use in source and binary forms, with or without
> modification, are permitted provided that the following conditions are met:
>
> * Redistributions of source code must retain the above copyright
> notice, this list of conditions and the following disclaimer.
>
> * Redistributions in binary form must reproduce the above copyright
> notice, this list of conditions and the following disclaimer in the
> documentation and/or other materials provided with the distribution.
>
> * Neither the name of this project nor the names of its contributors may be
> used to endorse or promote products derived from this software without
> specific prior written permission.
>
> THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
> AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS AND CONTRIBUTORS BE
> LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> POSSIBILITY OF SUCH DAMAGE.
[Gutenberg]: https://github.com/WordPress/gutenberg/
[WordPress]: https://wordpress.org/
[Electron]: https://github.com/electron/
[Justfile]: https://github.com/casey/just/
[WebAssembly]: http://webassembly.org/
*/


#![cfg_attr(feature = "wasm", no_std)]
#![
cfg_attr(
Expand All @@ -24,24 +122,73 @@
use alloc::Vec;


// Export modules.
pub mod ast;
#[macro_use] pub mod combinators;
pub mod parser;
#[cfg(feature = "wasm")] pub mod wasm;


// Configure `wee_alloc`.
#[cfg(feature = "wasm")]
#[global_allocator]
static ALLOC: wee_alloc::WeeAlloc = wee_alloc::WeeAlloc::INIT;


/// Represent the type of the input elements.
/// Represent the type of a parser input element. See
/// [`Input`](./type.Input.html) for more information.
pub type InputElement = u8;

/// Represent the type of the input.
/// Represent the type of a parser input.
///

/// The parser does not analyse a `String` nor a `&str`, but a slice
/// of bytes `&[u8]`. One of the consequence is that there is no UTF-8
/// validation (Rust guarantees that all strings are valid UTF-8
/// data). There is many arguments for this decision, one of them is
/// that the post format are likely to contain JSON encoded data, and
/// JSON has a weird encoding format for strings (e.g. surrogate
/// pairs), which might not be compatible with UTF-8. Other arguments
/// are mostly related to memory efficiency.
pub type Input<'a> = &'a [InputElement];

/// Test
/// The `root` function represents the axiom of the grammar, i.e. the top rule.
///
/// This is the main function to call to parse a traditional post.
///
/// # Examples
///
/// In this example, one might notice that the output is a pair, where
/// the left side contains the remaining data (i.e. data that have not
/// been parsed, because the parser has stopped), and the right side
/// contains the Abstract Syntax Tree (AST).
///
/// The left side should ideally always be empty.
///
/// ```
/// extern crate gutenberg_post_parser;
///
/// use gutenberg_post_parser::{root, ast::Block};
///
/// let input = &b"<!-- wp:foo {\"bar\": true} /-->"[..];
/// let output = Ok(
/// (
/// // The remaining data.
/// &b""[..],
///
/// // The Abstract Syntax Tree.
/// vec![
/// Block {
/// name: (&b"core"[..], &b"foo"[..]),
/// attributes: Some(&b"{\"bar\": true}"[..]),
/// inner_blocks: vec![]
/// }
/// ]
/// )
/// );
///
/// assert_eq!(root(input), output);
/// ```
pub fn root(input: Input) -> Result<(Input, Vec<ast::Block>), nom::Err<Input>> {
parser::block_list(input)
}

0 comments on commit c6fa94b

Please sign in to comment.