Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenize css before beautifying #545

Open
bitwiseman opened this issue Sep 30, 2014 · 7 comments
Open

Tokenize css before beautifying #545

bitwiseman opened this issue Sep 30, 2014 · 7 comments

Comments

@bitwiseman
Copy link
Member

See #506.
This is the kind of bug that is much less likely to occur if you actually tokenize instead of just streaming through characters spewing output.

@mhnaeem
Copy link
Contributor

mhnaeem commented Apr 9, 2022

@bitwiseman Are you still interested in this? I might be willing to work on this a few tokens at a time instead of tokenising the entire beautifier at once.

@bitwiseman
Copy link
Member Author

@mhnaeem
Yes, I absolutely am interested in this. This task has high long term value - it will make it much easier to maintain and improve the css beautifier. However, I think the bugs you've been fixing have much higher immediate value to the project.

I'm not sure how you'd convert to using tokens one or two at a time.
Maybe if we start by looking for strings. So you'd have TK_RAW and TK_STRING and the raw processor is basically the current CSS beautifier.

@bitwiseman
Copy link
Member Author

@mhnaeem
I've started a branch for this feature/css-tokenizing.

If you see where I'm going from that mess off code, please go for it. Otherwise hold off until I get comments and strings working. Then it should be at a point where it is easier to add more tokens one at a time.

@mhnaeem
Copy link
Contributor

mhnaeem commented Apr 10, 2022

@bitwiseman Actually that is very similar to what I had in mind. I actually wrote a partially working tokenizer to understand the mechanics of CSS. My biggest issue was handling whitespace and newlines with the with partially tokenized code. You have to back track all the whitespace and newline characters in the input scanner if an unknown token comes along so the rest of the beautifier code can work as normal and peek for spaces if needed. I will let you take a stab at fixing strings to see how you handle working with positions before I can take a look.

Things that I learned from experimentation:

  • Strings can be easily parsed because css natively doesn't support template literals or multiline strings. No need for recursion can repurpose this.eatString type code in tokenizer
  • Reducing direct usage of this._input is more important and local alternatives for lookBack and peekUntilAfter are important
  • No alternatives for this._input.back() in TokenStream
  • this.NON_SEMICOLON_NEWLINE_PROPERTY needs "grid-template-areas", added to it
  • The flags need to go in this._flags object instead of directly on this._insideSomething so we can check all the flags for the previous ones
  • Class and id selectors can be very easily tokenized
    var pattern_reader = new Pattern(this._input);
    this.__patterns = {

        // https://stackoverflow.com/questions/448981/which-characters-are-valid-in-css-class-names-selectors
        class_selector: pattern_reader.matching(/\.-?[_a-zA-Z]+[_a-zA-Z0-9-]*/),
        id_selector: pattern_reader.matching(/#-?[_a-zA-Z]+[_a-zA-Z0-9-]*/)
    };

@bitwiseman
Copy link
Member Author

bitwiseman commented Apr 11, 2022

@mhnaeem
If you have already done the work even partially, please share your branch or submit a PR. Even if it is incomplete it would make for good discussion.

From you're comments it sounds like you've already done significant work.

@mhnaeem
Copy link
Contributor

mhnaeem commented Apr 11, 2022

I have created a draft PR #2032, not sure if it has any usefulness. I wasn't really focused on committing a lot of code because that was just me learning so most of the work I did is lost 😅.

The code you have is definitely the right way to go if we were to look at writing a tokenizer

@bitwiseman
Copy link
Member Author

@mhnaeem
I'll take a look later this week.

Given your comments, I really just wanted to see what you'd done and try to pull any useful parts of it into my work with commits that credit you as the author of those bits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants