Tokenize css before beautifying #545

bitwiseman · 2014-09-30T20:03:27Z

See #506.
This is the kind of bug that is much less likely to occur if you actually tokenize instead of just streaming through characters spewing output.

mhnaeem · 2022-04-09T20:23:05Z

@bitwiseman Are you still interested in this? I might be willing to work on this a few tokens at a time instead of tokenising the entire beautifier at once.

bitwiseman · 2022-04-10T02:51:33Z

@mhnaeem
Yes, I absolutely am interested in this. This task has high long term value - it will make it much easier to maintain and improve the css beautifier. However, I think the bugs you've been fixing have much higher immediate value to the project.

I'm not sure how you'd convert to using tokens one or two at a time.
Maybe if we start by looking for strings. So you'd have TK_RAW and TK_STRING and the raw processor is basically the current CSS beautifier.

bitwiseman · 2022-04-10T06:57:37Z

@mhnaeem
I've started a branch for this feature/css-tokenizing.

If you see where I'm going from that mess off code, please go for it. Otherwise hold off until I get comments and strings working. Then it should be at a point where it is easier to add more tokens one at a time.

mhnaeem · 2022-04-10T08:20:12Z

@bitwiseman Actually that is very similar to what I had in mind. I actually wrote a partially working tokenizer to understand the mechanics of CSS. My biggest issue was handling whitespace and newlines with the with partially tokenized code. You have to back track all the whitespace and newline characters in the input scanner if an unknown token comes along so the rest of the beautifier code can work as normal and peek for spaces if needed. I will let you take a stab at fixing strings to see how you handle working with positions before I can take a look.

Things that I learned from experimentation:

Strings can be easily parsed because css natively doesn't support template literals or multiline strings. No need for recursion can repurpose this.eatString type code in tokenizer
Reducing direct usage of this._input is more important and local alternatives for lookBack and peekUntilAfter are important
No alternatives for this._input.back() in TokenStream
this.NON_SEMICOLON_NEWLINE_PROPERTY needs "grid-template-areas", added to it
The flags need to go in this._flags object instead of directly on this._insideSomething so we can check all the flags for the previous ones
Class and id selectors can be very easily tokenized

    var pattern_reader = new Pattern(this._input);
    this.__patterns = {

        // https://stackoverflow.com/questions/448981/which-characters-are-valid-in-css-class-names-selectors
        class_selector: pattern_reader.matching(/\.-?[_a-zA-Z]+[_a-zA-Z0-9-]*/),
        id_selector: pattern_reader.matching(/#-?[_a-zA-Z]+[_a-zA-Z0-9-]*/)
    };

bitwiseman · 2022-04-11T03:34:37Z

@mhnaeem
If you have already done the work even partially, please share your branch or submit a PR. Even if it is incomplete it would make for good discussion.

From you're comments it sounds like you've already done significant work.

mhnaeem · 2022-04-11T04:25:35Z

I have created a draft PR #2032, not sure if it has any usefulness. I wasn't really focused on committing a lot of code because that was just me learning so most of the work I did is lost 😅.

The code you have is definitely the right way to go if we were to look at writing a tokenizer

bitwiseman · 2022-04-12T05:27:43Z

@mhnaeem
I'll take a look later this week.

Given your comments, I really just wanted to see what you'd done and try to pull any useful parts of it into my work with commits that credit you as the author of those bits.

bitwiseman added this to the v1.6.0 milestone Sep 30, 2014

bitwiseman added the language: css label Sep 30, 2014

bitwiseman added the type: enhancement label Mar 3, 2015

bitwiseman modified the milestones: 2.0.0, v1.6.0 Jan 26, 2016

bitwiseman closed this as completed in c4f03cc Aug 12, 2018

bitwiseman reopened this Aug 12, 2018

This was referenced Sep 28, 2018

Add option to keep single-line rules in CSS #1299

Open

CSS Single line formatting #1124

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokenize css before beautifying #545

Tokenize css before beautifying #545

bitwiseman commented Sep 30, 2014

mhnaeem commented Apr 9, 2022

bitwiseman commented Apr 10, 2022

bitwiseman commented Apr 10, 2022

mhnaeem commented Apr 10, 2022 •

edited

Loading

bitwiseman commented Apr 11, 2022 •

edited

Loading

mhnaeem commented Apr 11, 2022 •

edited

Loading

bitwiseman commented Apr 12, 2022

Tokenize css before beautifying #545

Tokenize css before beautifying #545

Comments

bitwiseman commented Sep 30, 2014

mhnaeem commented Apr 9, 2022

bitwiseman commented Apr 10, 2022

bitwiseman commented Apr 10, 2022

mhnaeem commented Apr 10, 2022 • edited Loading

bitwiseman commented Apr 11, 2022 • edited Loading

mhnaeem commented Apr 11, 2022 • edited Loading

bitwiseman commented Apr 12, 2022

mhnaeem commented Apr 10, 2022 •

edited

Loading

bitwiseman commented Apr 11, 2022 •

edited

Loading

mhnaeem commented Apr 11, 2022 •

edited

Loading