simultaneous-productions

[0/3] overall timeline

[ ] implement S.P. (see code TODO)
[ ] begin S.P. real benchmarking, potentially alongside rsc pre-parsing
[ ] hack up a P.S. prototype and develop a pitch
- the pitch should explain that i haven’t published yet because i didn’t want to get scooped on the real prize

refs

packed AST format for idempotent evaluation of parent/child results

code TODO [33%]

[X] remove no_std impl
[X] graphviz for every intermediate graph in grammar processing! [3/3]
1. [X] S.P. [2/2]
  1. [X] implement graphvis for basic S.P. in text_backend
  2. [X] expose basic S.P. and rest of text_backend publicly!
2. [X] TokenGrammar
3. [X] PreprocessedGrammar
[X] text-based grammar input format (for generating test cases)! [4/4]
1. [X] make basic grammar format
2. [X] expose it publicly!
3. [X] make it support literal > chars!
  - a single literal ~>~ is escaped by simply doubling it: ~>>~ becomes ~>~!
    - this made more sense than needing to resort to a separate escape character such as backslash!
4. [X] make it go through SerializableGrammar [4/4]
  1. [X] implement SP::parse()
  2. [X] implement SP::serialize() [1/1]
    1. [X] make it accept any production name with doubled $$ if needed!
  3. [X] put SerializableGrammar in grammar_specification::constraints!
  4. [X] do proptest tests!!!
[ ] make parsing understand stack cycles [0/3]
1. [ ] represent an entire stack cycle graph as a case of NamedOrAnonStep
2. [ ] make it possible to traverse/resume all paths through the stack cycle graph in e.g. ParseableGrammar::stack_diff_pair_zipper
3. [ ] graphvis for ParseableGrammar!
[ ] introduce a mode to generate a string from a given PreprocessedGrammar
[-] enable regex operators! [2/8]
1. [X] introduce Group as a CaseElement!
2. [X] enable grouping with arbitrary depth in SPTextFormat
  - e.g. (<ab> -> $A$) to group CaseHead lists
  - THIS WAS CRAZY HARD AND EXTREMELY USEFUL!!!
3. [ ] VERIFY THAT ALL OF THESE TRANSFORMATIONS PARSE CORRECTLY ONCE PARSING WORKS!
  - i.e.: make parsing with stack cycles work first!!!
4. [-] suffix [1/4]
  1. [X] ?
  2. [ ] *
  3. [ ] +
  4. [ ] {n,m}
5. [ ] alternator |
6. [ ] literal
  - e.g. /a(b)+/, with // as a new type of CaseHead
7. [ ] generate these operators for serde tests!
8. [ ] do proptest tests for equality of PreprocessedGrammar instances (not just serde)!
  - e.g. <a>? is equivalent to (<a>)?
[ ] implement typed callbacks [0/5]
1. [ ] execute a callback whenever a sub-parse succeeds
2. [ ] signal parse failures
3. [ ] implement typed callbacks (for e.g. AST construction) by stuffing intermediate arguments in the stack symbols!
4. [ ] replace reconstruction.rs with this!
5. [ ] do a graphviz for this!
[ ] implement context-sensitive and recursively-enumerable grammatical constructs!
- also consider backrefs (can we make these sub-exponential?)!
[ ] make a single iteration of parsing/resolution into a re-entrant(/lock-free?) process!
- we’ll probably want to explicitly separate:
  1. selecting an adjacent pair to resolve: pop off an adjacent pair of sub-parses from the sub-parse forest.
  2. resolving the adjacent pair: push the merged sub-parse back onto the sub-parse forest.
- having a lock/atomic flag per sub-parse might be the easiest way to do this?
  - atomic flag is more feasible than usual for this use case, because there are so many other sub-parses to choose from
  - could make it: per sub-parse /greater than ~k~ tokens wide/?

S.P. paper

abstract

grammar specification

motivation

with some basic example(s), without introducing new notation
<<eureka>> realization that this nice interface that i’d been looking for happened to be highly amenable to parallelizable/incremental parsing
- and that i was starting off looking for something totally different!
- not too much into depth

usage

introduce notation for describing the grammar, ellipses, etc
delve a little into eureka

parsing algorithm description

describe the input

the input is a SimultaneousProductions instance
make sure to make it clear how this generalizes to arbitrary tokens, not just text
- and try to go into why

the lowering steps, eventually into PreprocessedGrammar

to be figured out in code

applying parsing

to be figured out in code

analysis

runtime

this is where you can show people how everyone has always been wrong. this should be the first section. no games.

reduction from SAT

don’t even need to mention this except in the abstract maybe? it can be a fun surprise and make the reader go “huh, i guess that’s where the runtime comes from”
- make it clear how this doesn’t become a nondeterministic turing machine
  - maybe this has something to do with the fact that it only processes straight line input? this might be wrong

differences from “formal grammars”

but don’t even go into this too much, just enough to explain how we can have better performance with a better interface
make sure to explain what has been wrong about parsing and not get caught up in why

implementation

talk a little about how rust is a truly fantastic language to implement algorithms in
- move construction by default and lifetimes are amazing for correctness
benchmarks
- what use cases does it do better or worse on?
- what’s holding it back?
PARALLELISM
- this needs some intense thought, because this is how we can demonstrate massive speedups over other methods

unknown / future work

simd or other stuff
- enough to show i’ve thought about how to implement it on a microprocessor level as well
- gives people who know what they’re talking about enough of a ladder to almost immediately do that

[0/3] running it in reverse to guess grammars YES, BEFORE PUBLISHING! (BUT AFTER THE FORWARD ALGORITHM)

this may all be invalidated by tweet translation
this is a good idea because we have proven the model can be reduced from SAT
- and therefore capable of arbitrary computation, or that’s the idea
- so if you figure out how to tweak the knobs you can maybe assume it’ll be a <<perfectly general inference method>>
  - (the idea of this is completely bonkers to me)
IF YOU DON’T PUBLISH THIS ALONG WITH THE ORIGINAL PAPER, SOMEONE ELSE WHO IS MORE FAMOUS WILL, SO YES, IT NEEDS TO BE IN HERE, AND IT NEEDS TO BE DEVELOPED
- this is a sad but unfortunate reality
- if you do this right though, then you really have you choice of <<phd>> locked in
  - so in that case, no need to rush
this should be a separate paper
- but it would need to be posted at the exact same time thanks to lack of trust
- should cite the first paper
[ ] find a good example of a nondeterministic sequentual input which isn’t necessarily hierarchical
- <<DNA/RNA>>
  - there may be many strong examples of this throughout bio which are not related to genes
    - alternative: guessing chaotic models based off of readings taken at regular intervals
      - e.g. heartbeat, see “Does God Play Dice?” CITE THAT BOOK!!!
- <<natural language>>
  - tweet translation!!!!!!
[ ] determine a good statistical model to tweak
- honestly, i would be very surprised if the answer wasn’t “hook up a monte carlo tree search and call it a day”
[ ] get a good result
- this is maybe going to be easier with natural language than with DNA/RNA due to data availability, however:
  1. i care about bio
  2. the natural language field is oversaturated and it’ll be hard to get a unique result
  3. i don’t think anyone is doing anything like this in bioinformatics (and i think they should be)
    - and i want that phd
- patience is key, i have forever
- we definitely want a good result, but we don’t need to go as hard as on the initial algorithm
  - i would love to take on a collaborator, but i don’t <<trust>> anyone enough
  - so we want something here that:
    1. is pretty significant
      - demonstrates clear advancement of the state of the art
      - could be considered a founding paper of a field
    2. shows i know what i’m talking about
    3. shows the idea was mine
this work is likely to spark ideas about the original algorithm!

tweet translation

a hell of a shower thought <2019-01-21 Mon 13:23:24> (MLK day)

why this is the best idea ever

allows me to stay at twitter (forever?)
- twitter likes using patents defensively (has taken a pledge to do so? FIND THE WORDING OF THIS PLEDGE)
- if the rsc pitch works, then i can even remain on the build team, which would be incredible
gives me ML hardware, expertise, and guidance
provides a FANTASTIC, maybe the BEST example of why “S.P in reverse” (“P.S.”?) is a great idea
- tweets are <<small bits of language>>, UNLIKE what other machine translation services train on (presumably)
  - S.P. allows for cross-serial dependencies and is a perfectly general inference method (?)
  - S.P. works in parallel by default as opposed to running sequentially across a long string of text
allows twitter to do its own translation
- can’t tell if this is immediately a win for cost/maintainability/flexibility reasons
  - it probably is, though, just because we don’t have to ship our text to an external service
    - and if the external service only knows about the individual tweet it’s asked to translate?
      - then the fact that tweets are small bits of language that twitter alone can train on at scale might mean we can achieve domain-specific accuracy that would be impossible for an external service to achieve

[0/2] rollout / pitch inside twitter

[ ] <<S.P. real benchmarking>>

start off with S.P. and showing there is some nontrivial speedup against at least lex/bison

<<rsc>>: this continues the investment in tooling performance as per rsc and expands the already-unprecedented mindshare we have for making compilers fast and easy to use
- (i think this is a very good pitch line)
start off with either of the following, to demonstrate some nontrivial speedup in specific scenarios:
1. implementing rsc pre-parsing to decouple file ingestion from compilation
2. implementing rsc pre-parsing along with S.P. at the same time
it may not be necessary to do it along with pre-parsing for rsc, but pre-parsing may be a good way for me to become familiar enough with the performance characteristics and benchmarking so that i can know whether to make the S.P. proposal

[ ] <<P.S. prototype>>

demonstrate some prototype of P.S. (reverse) working

this might be hard without asking for help
people are going to assume i think this is a good idea because it’s my pet project
- that can be fine, if we make part of the pitch “give me time to develop this P.S. concept” along with S.P
  - find clear success criteria to propose
  - iterate on the application
  - might be possible to get someone else excited about trying this or showing it doesn’t work
    - “showing it doesn’t work” would be an acceptable end goal for me, because i can then know for a fact it is ok to publish S.P. by itself, and be sure that i’m not missing out on <<the real prize>>
      - “the real prize” part can be a good pitch line
        
        it explains why i myself really want to investigate it, and why i really wanted to work with twitter for this
        
        (along with the relationship of S.P to rsc work)
        
        in the contex of “i am a compiler person who wants to write compilers” (easy to show), this is believable
        
        it also might excite someone else
“P.S.” also sounds like “post script”, and if i put that in the proposal, people will think it is funny and also maybe see more how it is the secondary goal
in pitch, can ask for “second half of the year” to work on P.S. (or something)

old

A Scala parser combinator library efficiently implementing “simultaneous productions”, a model equivalent to a Turing Machine (I think). The method of simultaneous productions allows specifying languages extremely naturally, and maps perfectly to the parser combinator operations I have in mind. It can also be implemented with a linear (?) partitioning algorithm.

Ideal Code

let expr = sp![
  E = ( e: E ) => e;
  E = (base:E "^" exp:E) => Pow(base, exp);
  /* \.E = { \.base[.E] "^" \.exp[.E] } => $Pow(.base, .exp); */
  /* \.E = { \.[.E] "^" \.[.E] } ~=> $Pow; */
  E = (E "^" E) => Pow(_.1, _.2);
  E = (E "^" E) => Pow;
  E = E "*" E;
  E = E "/" E;
  E = E "+" E;
  E = E "-" E;
  E = IntegerLiteral => IntLit(_);
  E = FloatingPointLiteral => FPLit(_);
];

val FloatingPointLiteral = sp.productions(
  ("float-signed" -> Cases(Parser(Tok("-") * Ref("float-unsigned"), { - _._2 }),
                           Parser(Tok("+") * Ref("float-unsigned"), { _._2 }))),
  // NB: should make sure sp.NumberLiterals returns 0 for an empty string
  ("float-base" -> SingleCase(sp.NumberLiterals)),
  ("float-mantissa" -> SingleCase(sp.NumberLiterals)),
  ("float-unsigned" -> Cases(Parser(Ref("float-base"), { toFloat(sp.parseIntegral(_._1)) }),
                             Parser(Ref("float-base") * Tok(".") * Ref("float-mantissa"), {
                               // glossing over the details of converting e.g. ".123" to 1/10 + 2/10 + 3/10
                               case (base, _, mantissa) => toFloat(sp.parseIntegral(base)) + sp.parseFloat(mantissa)
                             }))),
  ("exponent" -> Cases(Parser(Tok("e") * Ref("exponent-negated")),
                       Parser(Tok("e") * Ref("exponent-unsigned")))),
  ("exponent-negated" -> SingleCase(Parser(Tok("-") * Ref("exponent-unsigned")))),
  // sp.NumberLiterals is a whole Parser, and should probably return a string
  ("exponent-unsigned" -> SingleCase(sp.NumberLiterals)),
)

val WithWeirdIntegerLiterals = Grammars.C.productions.entry[IntegerLiteral] // Use a type-indexed map!
  .replaceCases
  .addCase(('A', SomeSubProductionType, 'C') ~> { (a: Token, inner: SomeSubProductionType, c: Token) =>
    IntegerLiteral(s"${a}${inner.toString}${c}")
  }.build() // This could be hidden behind an implicit.
)

[0/6]

[ ] specify a simple language so that it compiles
- use fixed strings instead of regex for now
- use strings instead of type-indexing the productions for now
[ ] implement the simple language so that it can be parsed
[ ] figure out how to allow productions to be type-indexed and require type-checking for that type in all the cases of the production
[ ] make a simple language that is usable for some simple task
- csv parsing? or at least a simple subset of it
[ ] develop benchmarking and (fuzz)? testing methods
[ ] parse C and C++

LICENSE

AGPL 3.0+

Name		Name	Last commit message	Last commit date
Latest commit History 308 Commits
core		core
literature		literature
macros		macros
papers		papers
src		src
.dir-locals.el		.dir-locals.el
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.org		README.org
graphviz-test-new.dot		graphviz-test-new.dot
graphviz-test-new.pdf		graphviz-test-new.pdf
graphviz-test.dot		graphviz-test.dot
graphviz-test.pdf		graphviz-test.pdf
license-header-template		license-header-template
rust-toolchain		rust-toolchain
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

simultaneous-productions

[0/3] overall timeline

refs

code TODO [33%]

S.P. paper

abstract

grammar specification

motivation

usage

parsing algorithm description

describe the input

the lowering steps, eventually into PreprocessedGrammar

applying parsing

analysis

runtime

reduction from SAT

differences from “formal grammars”

implementation

unknown / future work

[0/3] running it in reverse to guess grammars YES, BEFORE PUBLISHING! (BUT AFTER THE FORWARD ALGORITHM)

tweet translation

why this is the best idea ever

[0/2] rollout / pitch inside twitter

old

Ideal Code

[0/6]

LICENSE

About

Licenses found

Releases

Packages

Languages

License

Licenses found

cosmicexplorer/simultaneous-productions

Folders and files

Latest commit

History

Repository files navigation

simultaneous-productions

[0/3] overall timeline

refs

code TODO [33%]

S.P. paper

abstract

grammar specification

motivation

usage

parsing algorithm description

describe the input

the lowering steps, eventually into PreprocessedGrammar

applying parsing

analysis

runtime

reduction from SAT

differences from “formal grammars”

implementation

unknown / future work

[0/3] running it in reverse to guess grammars YES, BEFORE PUBLISHING! (BUT AFTER THE FORWARD ALGORITHM)

tweet translation

why this is the best idea ever

[0/2] rollout / pitch inside twitter

old

Ideal Code

[0/6]

LICENSE

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages