Skip to content
This repository has been archived by the owner on Sep 14, 2023. It is now read-only.

very slow on large argument string #195

Closed
BurntSushi opened this issue Aug 31, 2016 · 14 comments
Closed

very slow on large argument string #195

BurntSushi opened this issue Aug 31, 2016 · 14 comments

Comments

@BurntSushi
Copy link
Member

Rust program:

extern crate docopt;
extern crate rustc_serialize;

const USAGE: &'static str = "
Usage:
    bindgen [options] \
        [--link=<lib>...] \
        [--static-link=<lib>...] \
        [--framework-link=<framework>...] \
        [--match=<name>...] \
        [--raw-line=<raw>...] \
        [--dtor-attr=<attr>...] \
        [--opaque-type=<type>...] \
        [--blacklist-type=<type>...] \
        <input-header> \
        [-- <clang-args>...]
    bindgen (-h | --help)
Options:
    -h, --help                    Display this help message.
    -l=<lib>, --link=<lib>        Link to a dynamic library, can be provided
                                  multiple times.
    --static-link=<lib>           Link to a static library, can be provided
                                  multiple times.
    --framework-link=<framework>  Link to a framework.
    -o=<outputrustfile>           Write bindings to <output-rust-file>
    --match=<name>                Only output bindings for definitions from
                                  files whose name contains <name>. If multiple
                                  match options are provided, files matching any
                                  rule are bound to.
    --builtins                    Output bindings for builtin definitions
    --ignore-functions            Don't generate bindings for functions and
                                  methods. This is useful when you only care
                                  about struct layouts.
    --enable-cxx-namespaces       Enable support for C++ namespaces.
    --no-type-renaming            Don't rename types.
    --allow-unknown-types         Don't fail if we encounter types we do not
                                  support, instead treat them as void
    --emit-clang-ast              Output the ast
    --use-msvc-mangling           Handle MSVC C++ ABI mangling; requires that
                                  target be set to
    --override-enum-type=<type>   Override enum type, type name could be
                                    uchar
                                    schar
                                    ushort
                                    sshort
                                    uint
                                    sint
                                    ulong
                                    slong
                                    ulonglong
                                    slonglong
    --raw-line=<raw>              TODO
    --dtor-attr=<attr>            TODO
    --no-class-constants          TODO
    --no-unstable-rust            TODO
    --no-namespaced-constants     TODO
    --no-bitfield-methods         TODO
    --ignore-methods              TODO
    --opaque-type=<type>          TODO
    --blacklist-type=<type>       TODO
    <clang-args>                  Options other than stated above are passed
                                  directly through to clang.
";

#[derive(Debug, RustcDecodable)]
struct Args {
    arg_input_header: String,
    flag_link: Vec<String>,
    flag_static_link: Vec<String>,
    flag_framework_link: Vec<String>,
    flag_o: Option<String>,
    flag_match: Vec<String>,
    flag_builtins: bool,
    flag_ignore_functions: bool,
    flag_enable_cxx_namespaces: bool,
    flag_no_type_renaming: bool,
    flag_allow_unknown_types: bool,
    flag_emit_clang_ast: bool,
    flag_use_msvc_mangling: bool,
    flag_override_enum_type: String,
    flag_raw_line: Vec<String>,
    flag_dtor_attr: Vec<String>,
    flag_no_class_constants: bool,
    flag_no_unstable_rust: bool,
    flag_no_namespaced_constants: bool,
    flag_no_bitfield_methods: bool,
    flag_ignore_methods: bool,
    flag_opaque_type: Vec<String>,
    flag_blacklist_type: Vec<String>,
    arg_clang_args: Vec<String>,
}


fn main() {
    let args: Args = docopt::Docopt::new(USAGE)
        .and_then(|d| d.decode())
        .unwrap_or_else(|e| e.exit());
    println!("{:?}", args);
}

Argv string:

./target/release/docopt-slow --allow-unknown-types --no-unstable-rust --no-type-renaming --no-namespaced-constants --ignore-methods --raw-line use\ heapsize::HeapSizeOf\; --match ServoBindingList.h --match ServoBindings.h --match nsStyleStructList.h --raw-line pub\ enum\ nsINode\ \{\} --raw-line pub\ enum\ nsIDocument\ \{\} --raw-line pub\ enum\ nsIPrincipal\ \{\} --raw-line pub\ enum\ nsIURI\ \{\} --blacklist-type ServoComputedValuesStrong --raw-line pub\ type\ ServoComputedValuesStrong\ =\ ::sugar::ownership::Strong\<ServoComputedValues\>\; --blacklist-type ServoComputedValuesMaybeBorrowed -raw-line pub\ type\ ServoComputedValuesMaybeBorrowed\<\'a\>\ =\ ::sugar::ownership::Borrowed\<\'a,\ ServoComputedValues\>\; -blacklist-type ServoComputedValues -raw-line pub\ enum\ ServoComputedValuesVoid\{\ \} -raw-line pub\ struct\ ServoComputedValues\(ServoComputedValuesVoid\)\; --blacklist-type RawServoStyleSheetStrong --raw-line pub\ type\ RawServoStyleSheetStrong\ =\ ::sugar::ownership::Strong\<RawServoStyleSheet\>\; --blacklist-type RawServoStyleSheetMaybeBorrowed -raw-line pub\ type\ RawServoStyleSheetMaybeBorrowed\<\'a\>\ =\ ::sugar::ownership::Borrowed\<\'a,\ RawServoStyleSheet\>\; -blacklist-type RawServoStyleSheet -raw-line pub\ enum\ RawServoStyleSheetVoid\{\ \} -raw-line pub\ struct\ RawServoStyleSheet\(RawServoStyleSheetVoid\)\; --blacklist-type ServoDeclarationBlockStrong --raw-line pub\ type\ ServoDeclarationBlockStrong\ =\ ::sugar::ownership::Strong\<ServoDeclarationBlock\>\; --blacklist-type ServoDeclarationBlockMaybeBorrowed -raw-line pub\ type\ ServoDeclarationBlockMaybeBorrowed\<\'a\>\ =\ ::sugar::ownership::Borrowed\<\'a,\ ServoDeclarationBlock\>\; -blacklist-type ServoDeclarationBlock -raw-line pub\ enum\ ServoDeclarationBlockVoid\{\ \} -raw-line pub\ struct\ ServoDeclarationBlock\(ServoDeclarationBlockVoid\)\; -blacklist-type RawGeckoNodeBorrowed --raw-line pub\ type\ RawGeckoNodeBorrowed\<\'a\>\ =\ \&\'a\ RawGeckoNode\; --blacklist-type RawGeckoNodeMaybeBorrowed --raw-line pub\ type\ RawGeckoNodeMaybeBorrowed\<\'a\>\ =\ ::sugar::ownership::Borrowed\<\'a,\ RawGeckoNode\>\; -blacklist-type RawGeckoNode -raw-line pub\ enum\ RawGeckoNodeVoid\{\ \} -raw-line pub\ struct\ RawGeckoNode\(RawGeckoNodeVoid\)\; -blacklist-type RawGeckoElementBorrowed --raw-line pub\ type\ RawGeckoElementBorrowed\<\'a\>\ =\ \&\'a\ RawGeckoElement\; --blacklist-type RawGeckoElementMaybeBorrowed --raw-line pub\ type\ RawGeckoElementMaybeBorrowed\<\'a\>\ =\ ::sugar::ownership::Borrowed\<\'a,\ RawGeckoElement\>\; -blacklist-type RawGeckoElement -raw-line pub\ enum\ RawGeckoElementVoid\{\ \} -raw-line pub\ struct\ RawGeckoElement\(RawGeckoElementVoid\)\; -blacklist-type RawGeckoDocumentBorrowed --raw-line pub\ type\ RawGeckoDocumentBorrowed\<\'a\>\ =\ \&\'a\ RawGeckoDocument\; --blacklist-type RawGeckoDocumentMaybeBorrowed --raw-line pub\ type\ RawGeckoDocumentMaybeBorrowed\<\'a\>\ =\ ::sugar::ownership::Borrowed\<\'a,\ RawGeckoDocument\>\; -blacklist-type RawGeckoDocument -raw-line pub\ enum\ RawGeckoDocumentVoid\{\ \} -raw-line pub\ struct\ RawGeckoDocument\(RawGeckoDocumentVoid\)\; --blacklist-type RawServoStyleSetBorrowed --raw-line pub\ type\ RawServoStyleSetBorrowed\<\'a\>\ =\ \&\'a\ RawServoStyleSet\; --blacklist-type RawServoStyleSetBorrowedMut --raw-line pub\ type\ RawServoStyleSetBorrowedMut\<\'a\>\ =\ \&\'a\ mut\ RawServoStyleSet\; --blacklist-type RawServoStyleSetOwned --raw-line pub\ type\ RawServoStyleSetOwned\ =\ ::sugar::ownership::Owned\<RawServoStyleSet\>\; -blacklist-type RawServoStyleSet -raw-line pub\ enum\ RawServoStyleSetVoid\{\ \} -raw-line pub\ struct\ RawServoStyleSet\(RawServoStyleSetVoid\)\; --blacklist-type ServoNodeDataMaybeBorrowed --raw-line pub\ type\ ServoNodeDataMaybeBorrowed\<\'a\>\ =\ ::sugar::ownership::Borrowed\<\'a,\ ServoNodeData\>\; --blacklist-type ServoNodeDataMaybeBorrowedMut --raw-line pub\ type\ ServoNodeDataMaybeBorrowedMut\<\'a\>\ =\ ::sugar::ownership::BorrowedMut\<\'a,\ ServoNodeData\>\; --blacklist-type ServoNodeDataMaybeOwned --raw-line pub\ type\ ServoNodeDataMaybeOwned\ =\ ::sugar::ownership::MaybeOwned\<ServoNodeData\>\; -blacklist-type ServoNodeData -raw-line pub\ enum\ ServoNodeDataVoid\{\ \} -raw-line pub\ struct\ ServoNodeData\(ServoNodeDataVoid\)\; --blacklist-type nsStyleFont --raw-line use\ structs::nsStyleFont\; --raw-line unsafe\ impl\ Send\ for\ nsStyleFont\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleFont\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleFont\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleColor --raw-line use\ structs::nsStyleColor\; --raw-line unsafe\ impl\ Send\ for\ nsStyleColor\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleColor\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleColor\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleList --raw-line use\ structs::nsStyleList\; --raw-line unsafe\ impl\ Send\ for\ nsStyleList\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleList\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleList\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleText --raw-line use\ structs::nsStyleText\; --raw-line unsafe\ impl\ Send\ for\ nsStyleText\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleText\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleText\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleVisibility --raw-line use\ structs::nsStyleVisibility\; --raw-line unsafe\ impl\ Send\ for\ nsStyleVisibility\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleVisibility\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleVisibility\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleUserInterface --raw-line use\ structs::nsStyleUserInterface\; --raw-line unsafe\ impl\ Send\ for\ nsStyleUserInterface\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleUserInterface\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleUserInterface\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleTableBorder --raw-line use\ structs::nsStyleTableBorder\; --raw-line unsafe\ impl\ Send\ for\ nsStyleTableBorder\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleTableBorder\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleTableBorder\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleSVG --raw-line use\ structs::nsStyleSVG\; --raw-line unsafe\ impl\ Send\ for\ nsStyleSVG\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleSVG\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleSVG\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleVariables --raw-line use\ structs::nsStyleVariables\; --raw-line unsafe\ impl\ Send\ for\ nsStyleVariables\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleVariables\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleVariables\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleBackground --raw-line use\ structs::nsStyleBackground\; --raw-line unsafe\ impl\ Send\ for\ nsStyleBackground\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleBackground\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleBackground\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStylePosition --raw-line use\ structs::nsStylePosition\; --raw-line unsafe\ impl\ Send\ for\ nsStylePosition\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStylePosition\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStylePosition\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleTextReset --raw-line use\ structs::nsStyleTextReset\; --raw-line unsafe\ impl\ Send\ for\ nsStyleTextReset\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleTextReset\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleTextReset\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleDisplay --raw-line use\ structs::nsStyleDisplay\; --raw-line unsafe\ impl\ Send\ for\ nsStyleDisplay\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleDisplay\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleDisplay\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleContent --raw-line use\ structs::nsStyleContent\; --raw-line unsafe\ impl\ Send\ for\ nsStyleContent\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleContent\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleContent\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleUIReset --raw-line use\ structs::nsStyleUIReset\; --raw-line unsafe\ impl\ Send\ for\ nsStyleUIReset\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleUIReset\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleUIReset\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleTable --raw-line use\ structs::nsStyleTable\; --raw-line unsafe\ impl\ Send\ for\ nsStyleTable\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleTable\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleTable\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleMargin --raw-line use\ structs::nsStyleMargin\; --raw-line unsafe\ impl\ Send\ for\ nsStyleMargin\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleMargin\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleMargin\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStylePadding --raw-line use\ structs::nsStylePadding\; --raw-line unsafe\ impl\ Send\ for\ nsStylePadding\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStylePadding\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStylePadding\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleBorder --raw-line use\ structs::nsStyleBorder\; --raw-line unsafe\ impl\ Send\ for\ nsStyleBorder\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleBorder\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleBorder\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleOutline --raw-line use\ structs::nsStyleOutline\; --raw-line unsafe\ impl\ Send\ for\ nsStyleOutline\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleOutline\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleOutline\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleXUL --raw-line use\ structs::nsStyleXUL\; --raw-line unsafe\ impl\ Send\ for\ nsStyleXUL\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleXUL\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleXUL\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleSVGReset --raw-line use\ structs::nsStyleSVGReset\; --raw-line unsafe\ impl\ Send\ for\ nsStyleSVGReset\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleSVGReset\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleSVGReset\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleColumn --raw-line use\ structs::nsStyleColumn\; --raw-line unsafe\ impl\ Send\ for\ nsStyleColumn\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleColumn\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleColumn\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleEffects --raw-line use\ structs::nsStyleEffects\; --raw-line unsafe\ impl\ Send\ for\ nsStyleEffects\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleEffects\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleEffects\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleImage --raw-line use\ structs::nsStyleImage\; --raw-line unsafe\ impl\ Send\ for\ nsStyleImage\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleImage\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleImage\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleGradient --raw-line use\ structs::nsStyleGradient\; --raw-line unsafe\ impl\ Send\ for\ nsStyleGradient\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleGradient\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleGradient\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleCoord --raw-line use\ structs::nsStyleCoord\; --raw-line unsafe\ impl\ Send\ for\ nsStyleCoord\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleCoord\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleCoord\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleGradientStop --raw-line use\ structs::nsStyleGradientStop\; --raw-line unsafe\ impl\ Send\ for\ nsStyleGradientStop\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleGradientStop\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleGradientStop\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleImageLayers --raw-line use\ structs::nsStyleImageLayers\; --raw-line unsafe\ impl\ Send\ for\ nsStyleImageLayers\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleImageLayers\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleImageLayers\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type Layer --raw-line use\ structs::nsStyleImageLayers_Layer\ as\ Layer\; --blacklist-type LayerType --raw-line use\ structs::nsStyleImageLayers_LayerType\ as\ LayerType\; --blacklist-type nsStyleUnit --raw-line use\ structs::nsStyleUnit\; --raw-line unsafe\ impl\ Send\ for\ nsStyleUnit\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleUnit\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleUnit\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type nsStyleUnion --raw-line use\ structs::nsStyleUnion\; --raw-line unsafe\ impl\ Send\ for\ nsStyleUnion\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleUnion\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleUnion\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type CalcValue --raw-line use\ structs::nsStyleCoord_CalcValue\ as\ CalcValue\; --blacklist-type Calc --raw-line use\ structs::nsStyleCoord_Calc\ as\ Calc\; --blacklist-type nsRestyleHint --raw-line use\ structs::nsRestyleHint\; --blacklist-type ServoElementSnapshot --raw-line use\ structs::ServoElementSnapshot\; --blacklist-type nsChangeHint --raw-line use\ structs::nsChangeHint\; --blacklist-type SheetParsingMode --raw-line use\ structs::SheetParsingMode\; --blacklist-type nsMainThreadPtrHandle --raw-line use\ structs::nsMainThreadPtrHandle\; --blacklist-type nsMainThreadPtrHolder --raw-line use\ structs::nsMainThreadPtrHolder\; --blacklist-type nscolor --raw-line use\ structs::nscolor\; --blacklist-type nsFont --raw-line use\ structs::nsFont\; --blacklist-type FontFamilyList --raw-line use\ structs::FontFamilyList\; --blacklist-type FontFamilyType --raw-line use\ structs::FontFamilyType\; --blacklist-type nsIAtom --raw-line use\ structs::nsIAtom\; --blacklist-type nsStyleContext --raw-line use\ structs::nsStyleContext\; --raw-line unsafe\ impl\ Send\ for\ nsStyleContext\ \{\} --raw-line unsafe\ impl\ Sync\ for\ nsStyleContext\ \{\} --raw-line impl\ HeapSizeOf\ for\ nsStyleContext\ \{\ fn\ heap_size_of_children\(\&self\)\ -\>\ usize\ \{\ 0\ \}\ \} --blacklist-type StyleClipPath --raw-line use\ structs::StyleClipPath\; --blacklist-type StyleBasicShapeType --raw-line use\ structs::StyleBasicShapeType\; --blacklist-type StyleBasicShape --raw-line use\ structs::StyleBasicShape\; --blacklist-type nsCSSShadowArray --raw-line use\ structs::nsCSSShadowArray\; -o ../gecko_bindings/bindings.rs -- -x c++ -std=c++14 -DTRACING=1 -DIMPL_LIBXUL -DMOZ_STYLO_BINDINGS=1 -DMOZILLA_INTERNAL_API -DRUST_BINDGEN -DOS_POSIX=1 -DOS_MACOSX=1 -I /Users/manishearth/mozilla/muon-central/obj-x86_64-apple-darwin15.3.0//dist/include -I /Users/manishearth/mozilla/muon-central/obj-x86_64-apple-darwin15.3.0//dist/include/nspr -I /Users/manishearth/mozilla/muon-central/obj-x86_64-apple-darwin15.3.0//../nsprpub/pr/include -include /Users/manishearth/mozilla/muon-central/obj-x86_64-apple-darwin15.3.0//mozilla-config.h /Users/manishearth/mozilla/muon-central/obj-x86_64-apple-darwin15.3.0//dist/include/mozilla/ServoBindings.h

Not only does Docopt take a very long time, but it consumes a lot of memory as well.

Originally reported in: rust-lang/rust-bindgen#46

@anka-213
Copy link
Contributor

This is not the cause of the issue, but the parser becomes confused by the lack of a blank line between the "Usage:" lines and the "Options:" lines. This in turn causes the parser to interpret every line in the documentation as list of options/commands. Perhaps worth opening a separate issue for (or not, it's pretty minor, but it is different from the reference implementation).

@anka-213
Copy link
Contributor

Nope, I was wrong, this is the same behavior as in the reference implementation: Example

@anka-213
Copy link
Contributor

Regarding the actual issue: From what I can tell, for n repeats of one argument and m repeats of another, the method Matcher::states have a complexity of at least O(n*m). In each iteration the involved strings (in MState) are copied and (in most cases) retained until the whole function (Matcher::matches) returns.

Some options from the top of my head:

  • Increase laziness by replacing the Vec<MState> by some Iterator<MState>. This should improve the best case scenario into becoming linear. This might have the risk of increasing the retention in some cases though.
  • Increase the sharing, eg. by converting the strings in MState into references of some kind (&/Rc/Cow). This is a case where manual memory management acutally led to worse performance.
  • Reduce/remove backtracking as much as is possible without rejecting valid cases.
  • Scrap the whole thing (Matcher) and replace it with some proper parser framework (library or hand written). Nom comes to mind, but it is char/byte oriented, not token-oriented. Might be nice for cleaning up the rest of the parser though.

Below is the final output from states when running with the input --match=1 --match=2 --raw-line=3 --raw-line=4 --match 5 x. During the time, the method is called 205 times (compared to 31 times with just x as argument).

sitional("clang-args")))])])
MState { argvi: 1, counts: {Long("raw-line"): 1, Long("match"): 2}, max_counts: {Long("raw-line"): 0, Long("match"): 0}, vals:
 {Positional("input-header"): Plain(Some("x"))} }
MState { argvi: 1, counts: {Long("raw-line"): 0, Long("match"): 2}, max_counts: {Long("raw-line"): 0, Long("match"): 0}, vals:
 {Positional("input-header"): Plain(Some("x"))} }
MState { argvi: 1, counts: {Long("raw-line"): 2, Long("match"): 2}, max_counts: {Long("match"): 0}, vals: {Positional("input-h
eader"): Plain(Some("x"))} }
MState { argvi: 1, counts: {Long("raw-line"): 1, Long("match"): 1}, max_counts: {Long("raw-line"): 0, Long("match"): 0}, vals:
 {Positional("input-header"): Plain(Some("x"))} }
MState { argvi: 1, counts: {Long("raw-line"): 0, Long("match"): 1}, max_counts: {Long("raw-line"): 0, Long("match"): 0}, vals: {Positional("input-header"): Plain(Some("x"))} }
MState { argvi: 1, counts: {Long("raw-line"): 2, Long("match"): 1}, max_counts: {Long("match"): 0}, vals: {Positional("input-header"): Plain(Some("x"))} }
MState { argvi: 1, counts: {Long("raw-line"): 1, Long("match"): 0}, max_counts: {Long("raw-line"): 0, Long("match"): 0}, vals: {Positional("input-header"): Plain(Some("x"))} }
MState { argvi: 1, counts: {Long("raw-line"): 0, Long("match"): 0}, max_counts: {Long("raw-line"): 0, Long("match"): 0}, vals: {Positional("input-header"): Plain(Some("x"))} }
MState { argvi: 1, counts: {Long("raw-line"): 2, Long("match"): 0}, max_counts: {Long("match"): 0}, vals: {Positional("input-header"): Plain(Some("x"))} }
MState { argvi: 1, counts: {Long("raw-line"): 1, Long("match"): 3}, max_counts: {Long("raw-line"): 0}, vals: {Positional("input-header"): Plain(Some("x"))} }
MState { argvi: 1, counts: {Long("raw-line"): 0, Long("match"): 3}, max_counts: {Long("raw-line"): 0}, vals: {Positional("input-header"): Plain(Some("x"))} }
MState { argvi: 1, counts: {Long("raw-line"): 2, Long("match"): 3}, max_counts: {}, vals: {Positional("input-header"): Plain(Some("x"))} }

As you can see, we try every possibility before we check the validity, which is why an iterator would probably help. In this code:

    m.states(pat, &init)
     .into_iter()
     .filter(|s| m.state_consumed_all_argv(s))
     .filter(|s| m.state_has_valid_flags(s))
     .filter(|s| m.state_valid_num_flags(s))

If m.states returned an iterator, we would only need to calculate until the first valid option, which should by the first, since we are greedy by default. However, looking at the output above shows that this is not the case (both counts should be 0). On the other hand, I don't believe we really need to look at all the cases at all, especially not here, but I assume that there are some cases where the greedy algorithm would fail.

@BurntSushi
Copy link
Member Author

@anka-213 Thank you for the analysis. :-)

To give my thoughts here: I'd like to scrap the parser and rewrite it from the ground up. I am personally not a big fan of parser frameworks like nom and would prefer we do it by hand.

Sorry I don't have the time right now to do any more digging with you, but it sounds like you've got it figured out. :-)

@anka-213
Copy link
Contributor

anka-213 commented Sep 26, 2016

By the way, I'm curious about the next few lines in the code. They seem very redundant:

     .collect::<Vec<MState>>()
     .into_iter()
     .next()

Why not just call next() on the first iterator? ;)

@anka-213
Copy link
Contributor

anka-213 commented Sep 26, 2016

Ok. Sounds like a good idea. Might be a lot of work though, especially since there is no formal specification.

Just curious, why do you dislike big parser frameworks? You do get a lot for free.

@anka-213
Copy link
Contributor

In this specific case, we have 185*raw-line, 77*blacklist-type, 4*match, 1*(allow-unknown-types, no-unstable-rust, no-type-renaming, no-namespaced-constants, ignore-methods, -o) and 19*<clang-args>. This means that the final copy of Vec<MState> will have length 186*78*2^6*20 = 18570240. Each of those have arrays containing on average 30 strings, which are on average 20 characters long. Thus, we will need at least 10Gb of memory and a lot of time to run this example. ;)

@anka-213
Copy link
Contributor

Here is a benchmark that shows the issue. Each additional letter added doubles the running time:

#[bench]
fn some_bench(b: &mut test::Bencher) {
    const USAGE: &'static str = "
    Usage:
        slow [-abcdefghijklmnopqrs...]
    ";

    let argv = &["slow", "-abcdefg"];
    let dopt : docopt::Docopt = docopt::Docopt::new(USAGE).unwrap().argv(argv);

    b.iter(|| dopt.parse().unwrap());
}

@anka-213
Copy link
Contributor

I just realized that the example that makes backtracking necessary is not supported by the original script. I vote for removing backtracking completely. People should put mandatory positional arguments before repeatable positional arguments. Are there any software that uses the feature?

Alternatively, we could keep track of how many mandatory positional arguments are left and not consume more than that. That way we can keep But that will probably not work in more complex cases. On the other hand, more complex cases are often not very useful anyways, since the structure gets lost. (But that example doesn't need backtracking either).

Here is how to do it for the examples mentioned by keleshev:

prog <x> [<y>] <z>                // Manditory posargs left = 1
prog <x> <y>... <z>               // Manditory posargs left = 1
prog <x>... command <y>...        // Don't do this! It's just confusing for the user.

A hybrid approach would be to keep backtracking, but only do it when explicitly needed by future patterns (eg. command above). This is more complicated and can't guarantee linear time.

Regardless, if we want to keep backtracking, we should only do it for positional arguments. Flags are inherently unordered and it doesn't make sense to enforce multiple copies of the same flag, while at the same time having the flag be repeatable.

TL;DR I don't think backtracking is ever necessary, at least not if we limit ourselves to a sensible subset of commands.

@BurntSushi
Copy link
Member Author

I just realized that the example that makes backtracking necessary is not supported by the original script.

Correct.

Are there any software that uses the feature?

Yes. cp, as the example shows.

I think your analysis about doing backtracking only when necessary (e.g., only on positional arguments) might be worthwhile, but does certainly seem like a pretty complex endeavor.

@anka-213
Copy link
Contributor

Actually, thinking a bit more, I wouldn't call the algorithm backtracking. It's more like brute force, since we enumerate (and even save!) every single possibility. (We would need a lazy list like in Haskell for this to be backtracking.) Changing it to do actual backtracking should make the best (and hopefully average) case linear, but I doubt we will ever get rid of an exponential (more specifically n^k, where n is number of supplied arguments and k is the number of repeatable options) time in the worst case if we want to support any possible combination.

// If we consume as little as possible in A in this case:
Usage cmd A... opt B...

// We will (probably) need to check every possibility (4^2) in this case:
Usage: cmd (A -x)... (B -y)... [C...]

cmd a b c d e f g h -xxxx -yyyy

However, ignoring all that, the fix for this particular issue is very simple. There is already a special case for flags inside optional. If we just expand it to support repeated flags inside optional as well, this particular issue goes away completely. I'll submit a pull request shortly.

@BurntSushi
Copy link
Member Author

@anka-213 Awesome! Thank you much. You have no idea how much I appreciate your analysis on this!

anka-213 added a commit to anka-213/docopt.rs that referenced this issue Sep 28, 2016
anka-213 added a commit to anka-213/docopt.rs that referenced this issue Sep 28, 2016
BurntSushi added a commit that referenced this issue Sep 28, 2016
@anka-213
Copy link
Contributor

So, I guess we could close this issue now and report back at rust-lang/rust-bindgen#46?

@BurntSushi
Copy link
Member Author

Yup! Sorry I forgot about this. Looks like @Yamakaky gave them a heads up.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants