Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to lex heredocs (Caddyfile) #929

Closed
francislavoie opened this issue Feb 16, 2024 · 4 comments
Closed

How to lex heredocs (Caddyfile) #929

francislavoie opened this issue Feb 16, 2024 · 4 comments

Comments

@francislavoie
Copy link
Contributor

I'm working on updating the Caddyfile lexer, to fix bugs and support new syntax features we've implemented in the past couple years.

We've added heredocs support https://caddyserver.com/docs/caddyfile/concepts#heredocs. I'm trying to figure out how to implement this in the lexer.

How do I store the heredoc marker (the part after <<), push into a "heredoc" state, and then pop the stack once that same string is found again? Is there somekind of storage mechanism in the state?

I see that Raku uses a custom mutator func. Is that my only option? Looks complicated.

I also noticed that since last time I contributed, most of the lexers got translated to XML. Is this something we should aim to do (tbh, ew 😬 looks like much worse UX to develop than the Go DSL) or is it fine for me to keep using Go?

@alecthomas
Copy link
Owner

Look at how the bash lexer does it, it's the same concept.

I will not accept Go lexers unless there's a good reason, you should use XML. Chroma is used by a lot of people who aren't programmers, so XML is a better choice to foster contributions than Go. It's also much much more efficient in terms of binary size and startup time.

@francislavoie
Copy link
Contributor Author

francislavoie commented Feb 16, 2024

I will not accept Go lexers unless there's a good reason, you should use XML.

To be clear, I'm not making a new lexer, only updating an existing one. https://github.com/alecthomas/chroma/blob/master/lexers/caddyfile.go

It's also much much more efficient in terms of binary size and startup time.

I find that surprising. Why is compiled Go code slower/larger?

Look at how the bash lexer does it, it's the same concept.

I can't find the heredoc lexing. Can you point me to it? I don't see any << stuff in bash.xml.

Edit: oh, &lt;&lt;&lt; 🤦‍♂️

@alecthomas
Copy link
Owner

If it's already in Go then it's definitely fine to update.

I find that surprising. Why is compiled Go code slower/larger?

This is a bit surprising, but each lexer is a whole bunch of what is effectively dynamic code to the Go compiler, so it a) expands to a very large amount of machine code and b) takes a considerable amount of time to execute at init time. If the Go compiler were a lot smarter this would probably end up being static data sections, but it isn't.

I can't find the heredoc lexing. Can you point me to it? I don't see any << stuff in bash.xml.

Search for &lt;&lt;. It's a single regex with a backreference.

@francislavoie
Copy link
Contributor Author

francislavoie commented Feb 16, 2024

I see the \2. So it's a single regexp.

Is there a way to do this with state though (push + pop)? I have some other token types I want to handle within the heredoc.

For example, if something like {foo} appears within the heredoc, I want that to be a LiteralStringEscape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants