Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New lexer #486

Closed
wants to merge 163 commits into from
Closed

New lexer #486

wants to merge 163 commits into from

Conversation

Lan2u
Copy link

@Lan2u Lan2u commented Jun 12, 2020

This PR includes the new lexer + parser made to work with goal symbols.

Addresses: #294 primarily and #456 as a side

Hopefully once this is finished we can complete #12

@Lan2u Lan2u marked this pull request as draft June 12, 2020 14:08
@Razican Razican added the lexer Issues surrounding the lexer label Jun 13, 2020
@HalidOdat
Copy link
Member

HalidOdat commented Jul 7, 2020

Thoughts?

I would go with option 3, lexing and parsing is one operation now because of goal symbols.

What do you think? @Razican @jasonwilliams

boa/benches/parser.rs Outdated Show resolved Hide resolved
@jasonwilliams
Copy link
Member

jasonwilliams commented Jul 7, 2020

Agree option 3 they can’t run separately anyway

@Razican
Copy link
Member

Razican commented Jul 8, 2020

Thoughts?

Yes, that was the reason behind us adding the lexing part to parsing benchmarks.

Sorry I didn't have time these last weeks to review this. Hopefully I will get some time these days. This is an important step!

Copy link
Member

@Razican Razican left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed some files, will continue when I have a bit of time. Thanks for the awesome work!!

Comment on lines +134 to +137
/// It will fill the buffer with checked ASCII bytes.
pub(super) fn fill_bytes(&mut self, buf: &mut [u8]) -> io::Result<()> {
unimplemented!("Lexer::cursor::fill_bytes {:?}", buf)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, its from the string_literal code and I assumed you were using it for a specific reason so I left it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still using it in string literals, or did you find a better way to do it?

Copy link
Author

@Lan2u Lan2u Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still using it in string literals, or did you find a better way to do it?

I didn't really change the string literal stuff so its still there but equally all the tests pass so it can't be doing anything major at the moment - will need to look back into the source code and see.

Its effectively sugar for:

for (x = 0; x < buf.len(); x++) {
buf[x] = next()?;
}

so we could just write that out in the one place it is used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think we might need it - the only usage seems to require the raw u8 bytes rather than a rust char (and so therefore the regular .next() method won't work).

boa/src/syntax/lexer/comment.rs Outdated Show resolved Hide resolved
boa/src/syntax/lexer/comment.rs Outdated Show resolved Hide resolved
boa/src/syntax/lexer/comment.rs Outdated Show resolved Hide resolved
boa/src/syntax/lexer/comment.rs Outdated Show resolved Hide resolved
boa/src/syntax/lexer/comment.rs Outdated Show resolved Hide resolved
@Razican
Copy link
Member

Razican commented Jul 8, 2020

About benchmarks, It seems that the parser is now 2x-3x slower. Except, as expected, for long files, where we get a huge speed bump (from 6.1±0.25ms to 742.7±54.50ns), which is an 87.8% speed improvement.

I will review the parser, especially the parser cursor, to see if further improvements can be made.

@Lan2u
Copy link
Author

Lan2u commented Jul 8, 2020

About benchmarks, It seems that the parser is now 2x-3x slower. Except, as expected, for long files, where we get a huge speed bump (from 6.1±0.25ms to 742.7±54.50ns), which is an 87.8% speed improvement.

I will review the parser, especially the parser cursor, to see if further improvements can be made.

It would be interesting to see a benchmark with a lot of goal symbol switches vs one without

@Lan2u
Copy link
Author

Lan2u commented Jul 8, 2020

About benchmarks, It seems that the parser is now 2x-3x slower. Except, as expected, for long files, where we get a huge speed bump (from 6.1±0.25ms to 742.7±54.50ns), which is an 87.8% speed improvement.

I will review the parser, especially the parser cursor, to see if further improvements can be made.

I suspect that mechanism used to allow peeking 2 tokens ahead might be a good place to start (it might be faster to replace the VecDeque with just an array of a certain size)

@Razican
Copy link
Member

Razican commented Jul 8, 2020

It would be interesting to see a benchmark with a lot of goal symbol switches vs one without

Do you have an example code that would cause this?

I suspect that mechanism used to allow peeking 2 tokens ahead might be a good place to start (it might be faster to replace the VecDeque with just an array of a certain size)

Using an array in the stack would be very helpful, I think. What do we use the VecDeque for?

@Lan2u
Copy link
Author

Lan2u commented Jul 8, 2020

It would be interesting to see a benchmark with a lot of goal symbol switches vs one without

Do you have an example code that would cause this?

I suspect that mechanism used to allow peeking 2 tokens ahead might be a good place to start (it might be faster to replace the VecDeque with just an array of a certain size)

Using an array in the stack would be very helpful, I think. What do we use the VecDeque for?

VecDeque is basically a queue implementation which is used to store peeked elements - it is also used in one case for the 'push_back' method which pushes an element back onto the peek queue.

We only ever need to skip a single element when peeking ahead but because of the possibility of push_back() we need the buffer peeked array to be of at least size 3 (peek + peek_skip + push_back).

@Lan2u
Copy link
Author

Lan2u commented Jul 8, 2020

It would be interesting to see a benchmark with a lot of goal symbol switches vs one without

Do you have an example code that would cause this?

I suspect that mechanism used to allow peeking 2 tokens ahead might be a good place to start (it might be faster to replace the VecDeque with just an array of a certain size)

Using an array in the stack would be very helpful, I think. What do we use the VecDeque for?

function foo(regex, num) {}

let i = 0;
while (i < 1000000) {
	foo(/ab+c/, 5.0/5);
  i++;
}

@Razican
Copy link
Member

Razican commented Jul 8, 2020

function foo(regex, num) {}

let i = 0;
while (i < 1000000) {
	foo(/ab+c/, 5.0/5);
  i++;
}

I will add this as a parsing benchmark, so that we can see differences.

Co-authored-by: Iban Eguia <razican@protonmail.ch>
@Razican
Copy link
Member

Razican commented Jul 9, 2020

@Lan2u could you rebase the branch to get the new benchmarks and compare? I think we will get nice insights there :)

@Lan2u
Copy link
Author

Lan2u commented Jul 9, 2020

@Lan2u could you rebase the branch to get the new benchmarks and compare? I think we will get nice insights there :)

Done (I think)

@Razican
Copy link
Member

Razican commented Jul 13, 2020

Should we close this in favour of #559?

@Lan2u Lan2u closed this Jul 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lexer Issues surrounding the lexer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants