New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New lexer #486
New lexer #486
Conversation
Co-authored-by: Iban Eguia <razican@protonmail.ch>
Co-authored-by: Iban Eguia <razican@protonmail.ch>
Co-authored-by: Iban Eguia <razican@protonmail.ch>
Co-authored-by: Iban Eguia <razican@protonmail.ch>
Co-authored-by: Iban Eguia <razican@protonmail.ch>
I would go with option 3, lexing and parsing is one operation now because of What do you think? @Razican @jasonwilliams |
Agree option 3 they can’t run separately anyway |
Yes, that was the reason behind us adding the lexing part to parsing benchmarks. Sorry I didn't have time these last weeks to review this. Hopefully I will get some time these days. This is an important step! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed some files, will continue when I have a bit of time. Thanks for the awesome work!!
/// It will fill the buffer with checked ASCII bytes. | ||
pub(super) fn fill_bytes(&mut self, buf: &mut [u8]) -> io::Result<()> { | ||
unimplemented!("Lexer::cursor::fill_bytes {:?}", buf) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, its from the string_literal code and I assumed you were using it for a specific reason so I left it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we still using it in string literals, or did you find a better way to do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we still using it in string literals, or did you find a better way to do it?
I didn't really change the string literal stuff so its still there but equally all the tests pass so it can't be doing anything major at the moment - will need to look back into the source code and see.
Its effectively sugar for:
for (x = 0; x < buf.len(); x++) {
buf[x] = next()?;
}
so we could just write that out in the one place it is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think we might need it - the only usage seems to require the raw u8 bytes rather than a rust char (and so therefore the regular .next() method won't work).
About benchmarks, It seems that the parser is now 2x-3x slower. Except, as expected, for long files, where we get a huge speed bump (from 6.1±0.25ms to 742.7±54.50ns), which is an 87.8% speed improvement. I will review the parser, especially the parser cursor, to see if further improvements can be made. |
It would be interesting to see a benchmark with a lot of goal symbol switches vs one without |
I suspect that mechanism used to allow peeking 2 tokens ahead might be a good place to start (it might be faster to replace the VecDeque with just an array of a certain size) |
Do you have an example code that would cause this?
Using an array in the stack would be very helpful, I think. What do we use the |
VecDeque is basically a queue implementation which is used to store peeked elements - it is also used in one case for the 'push_back' method which pushes an element back onto the peek queue. We only ever need to skip a single element when peeking ahead but because of the possibility of push_back() we need the buffer peeked array to be of at least size 3 (peek + peek_skip + push_back). |
function foo(regex, num) {}
let i = 0;
while (i < 1000000) {
foo(/ab+c/, 5.0/5);
i++;
} |
I will add this as a parsing benchmark, so that we can see differences. |
Co-authored-by: Iban Eguia <razican@protonmail.ch>
@Lan2u could you rebase the branch to get the new benchmarks and compare? I think we will get nice insights there :) |
Catchup master
Done (I think) |
Should we close this in favour of #559? |
This PR includes the new lexer + parser made to work with goal symbols.
Addresses: #294 primarily and #456 as a side
Hopefully once this is finished we can complete #12