Humbly Checking for Status Updates #10

WonderPanda · 2020-09-30T21:19:29Z

First off, thanks for all the incredible things you do for the open source commuinity @benjie. You're definitely an inspiration on multiple levels. Hopefully this issue doesn't come as a nuisance so please feel free to close if this just isn't on your radar right now.

I went down a crazy rabbit hole today trying to find some kind of tool I could use inside of our repo to apply consistent formatting to Postgres SQL migrations and a bunch of more complex Postgres Functions that we manage. I'm spoiled into assuming that in the Node ecosystem these types of tools are readily accessible but there doesn't seem to be many good options out there. This project definitely seems the most promising both because of it's integration with Prettier and the architectural approach you're taking.

Was just wondering if you plan to revisit and continue work on this or if it's on the backburner because of your other more high profile projects (totally understandable)? Anyways, thanks again and I just wanted to give a huge thumbs up that if this ever got released it would be something that would definitely help out both the org I work for and me personally!

benjie · 2020-09-30T22:16:34Z

Thanks for your kind words ❤️

This project eats at me. I really really want to implement it. But time and again, the different approaches I take, it’s clear I need to build a new PostgreSQL parser to solve it. None of the existing ones will do, they’re either not sufficiently accurate, or they are based on Postgres’ own parser which has a wide range of issues that mean it’s not really suitable for this task (unintuitively).

It’s very frustrating. I pick at it in my hobby hours from time to time. I’m pulling on a few threads in private. I’ve got some interesting ideas that might turn out great or might not turn into anything (except maybe some pull requests to Postgres’ docs...) Mostly I need to invest some time in learning parsing/lexing properly rather than the dabbling I’ve done in the past.

I’ve been considering crowd-funding (e.g. kickstarter) some work on it, but I’m not 100% sure I can deliver, and I’m not sure there’s sufficient demand that the fundraise would be successful either, so right now it remains a hobby project that gently eats at my soul.

FWIW the best formatter I’m aware of is pgFormatter.

WellspringCS · 2020-09-30T22:26:10Z

@benjie it pains me to think of this unfinished task eating away at you. I'm watching this project because I'm keen to see a good solution. But please don't lose sleep on it until people (like myself!) are paying you to do it. :)

karlhorky · 2020-11-19T19:50:37Z

Would any of the newer parsing libraries like sql-surveyor offer the accuracy / capabilities that you would be looking for?

benjie · 2020-11-19T20:14:13Z

Does it do deparsing too? If not, someone has to write converting the AST to SQL for every single possible AST representation, which is a lot of work as I'm sure you can imagine. Also if it's not PostgreSQL specific then it's unlikely it would have full coverage of the PostgreSQL features that many of us use; but I'd certainly like to see the results of running it against some decent schemas; a good starting point might be Graphile Starter's schema: https://github.com/graphile/starter/blob/main/data/schema.sql

karlhorky · 2020-11-19T20:32:35Z

Deparsing (I guess aka printing? or maybe I'm misunderstanding...) is not one of its goals, so that would need to be built (unless I'm misunderstanding I guess pretty-printing usually would make up the majority of the code of a Prettier plugin).

But maybe @mtriff or @modeldba (maintainers at https://github.com/modeldba/sql-surveyor) could shed more light here on how this could be done and how difficult it would be.

Here's the output of that Graphile schema when parsed by sql-surveyor using SQLDialect.PLpgSQL: file.log

Excerpt:

ParsedSql {
  parsedQueries: {
    '163': ParsedQuery {
      outputColumns: [],
      referencedColumns: [],
      referencedTables: {},
      tokens: {
        '163': Token {
          value: 'SET',
          location: TokenLocation {
            lineStart: 8,
            lineEnd: 8,
            startIndex: 163,
            stopIndex: 165
          }
        },
        '167': Token {
          value: 'statement_timeout',
          location: TokenLocation {
            lineStart: 8,
            lineEnd: 8,
            startIndex: 167,
            stopIndex: 183
          }
        },
        '185': Token {
          value: '=',
          location: TokenLocation {
            lineStart: 8,
            lineEnd: 8,
            startIndex: 185,
            stopIndex: 185
          }
        },
        '187': Token {
          value: '0',
          location: TokenLocation {
            lineStart: 8,
            lineEnd: 8,
            startIndex: 187,
            stopIndex: 187
          }
        },
        '188': Token {
          value: ';',
          location: TokenLocation {
            lineStart: 8,
            lineEnd: 8,
            startIndex: 188,
            stopIndex: 188
          }
        }
      },
      query: 'SET statement_timeout = 0',
      queryType: 'DDL',
      queryLocation: TokenLocation {
        lineStart: 8,
        lineEnd: 8,
        startIndex: 163,
        stopIndex: 187
      },
      queryErrors: [],
      subqueries: {},
      commonTableExpressions: {}
    },
    '190': ParsedQuery {
      outputColumns: [],
      referencedColumns: [],
      referencedTables: {},
      tokens: {
        '190': Token {
          value: 'SET',
          location: TokenLocation {
            lineStart: 9,
            lineEnd: 9,
            startIndex: 190,
            stopIndex: 192
          }
        },
        '194': Token {
          value: 'lock_timeout',
          location: TokenLocation {
            lineStart: 9,
            lineEnd: 9,
            startIndex: 194,
            stopIndex: 205
          }
        },
        '207': Token {
          value: '=',
          location: TokenLocation {
            lineStart: 9,
            lineEnd: 9,
            startIndex: 207,
            stopIndex: 207
          }
        },
        '209': Token {
          value: '0',
          location: TokenLocation {
            lineStart: 9,
            lineEnd: 9,
            startIndex: 209,
            stopIndex: 209
          }
        },
        '210': Token {
          value: ';',
          location: TokenLocation {
            lineStart: 9,
            lineEnd: 9,
            startIndex: 210,
            stopIndex: 210
          }
        }
      },
      query: 'SET lock_timeout = 0',
      queryType: 'DDL',
      queryLocation: TokenLocation {
        lineStart: 9,
        lineEnd: 9,
        startIndex: 190,
        stopIndex: 209
      },
      queryErrors: [],
      subqueries: {},
      commonTableExpressions: {}
    },
    // ...
  }
}

karlhorky · 2020-11-19T20:44:17Z

Or if by deparsing you mean getting the original SQL representation back as part of the parsed data structure, I guess that is happening (both in the Token.values as well as in the ParsedQuery.query field).

mtriff · 2020-11-19T21:31:42Z

Jumping into this conversation a bit blind, so I'm not totally familiar with this project and its goals.

Under the hood, sql-surveyor uses this ANTLR4 grammar to parse PL/pgSQL.

I'm assuming that the deparsing requirement is in support of this from the README:

Where a statement can be expressed in many orders, express it in the same order that pg_dump would

Deparsing like this isn't really a goal of sql-surveyor, but might be possible to build from the output if some additional metadata were added (i.e. specifying which tokens are reserved keywords, identifying expressions, etc.). But, like Benjie said, still a lot of work regardless.

benjie · 2020-11-20T13:32:50Z

Thanks @karlhorky and @mtriff.

Looking at file.log, the first issue I see is that comments (not the COMMENT statement, but literal -- comments) are not represented in the AST; without them we cannot read a file, convert it to AST, and write it back to the file without losing user content (which is what prettier needs to do, it's a code formatter). Perhaps support for comments wouldn't be too complex to add to the grammar? I'm very new to parsers/lexers and have never worked with an ANTLR4 grammar before.

It also seems like we'd need to use the token stream as the digests aren't generally sufficient for our needs, for example in the below parsed SELECT pg_catalog.set_config('search_path', '', false); statement the selection is represented as a columnName which would mean we cannot cascade the formatting into the function arguments.

    '325': ParsedQuery {
      outputColumns: [
        OutputColumn {
          columnName: "pg_catalog.set_config('search_path', '', false)",
          columnAlias: null,
          tableName: null,
          tableAlias: null
        }
      ],
      referencedColumns: [],
      referencedTables: {},
      tokens: {
        '325': Token {
          value: 'SELECT',
          location: TokenLocation {
            lineStart: 13,
            lineEnd: 13,
            startIndex: 325,
            stopIndex: 330
          }
        },
        '332': Token {
          value: 'pg_catalog',
          location: TokenLocation {
            lineStart: 13,
            lineEnd: 13,
            startIndex: 332,
            stopIndex: 341
          }
        },
        '342': Token {
          value: '.',

...

        '378': Token {
          value: ')',
          location: TokenLocation {
            lineStart: 13,
            lineEnd: 13,
            startIndex: 378,
            stopIndex: 378
          }
        },
        '379': Token {
          value: ';',
          location: TokenLocation {
            lineStart: 13,
            lineEnd: 13,
            startIndex: 379,
            stopIndex: 379
          }
        }
      },
      query: "SELECT pg_catalog.set_config('search_path', '', false)",
      queryType: 'DML',
      queryLocation: TokenLocation {
        lineStart: 13,
        lineEnd: 13,
        startIndex: 325,
        stopIndex: 378
      },
      queryErrors: [],
      subqueries: {},
      commonTableExpressions: {}
    },

Alas I don't think sql-surveyor would give us many advantages over pg-query-native-latest which we're using currently.

mtriff · 2020-11-23T03:12:29Z

The comments are available in the AST, but they're loaded onto a separate channel and sql-surveyor ignores them (not relevant for the use case). You may want to try playing around with the low-level library behind sql-surveyor, antlr4ts-sql. It will give you access to the entire AST. You'll also be able to go as granular as you need on the tokens so you can get access to all the function arguments, etc.

If you go this route, you'll probably want to implement a PL/pgSQL parser directly with anltr4ts eventually, since you don't need all the other dialects supported by antlr4ts-sql. But the package will save you from configuring and generating the parser/lexer while you're experimenting.

benjie · 2020-11-23T10:31:34Z

Awesome, thanks @mtriff. At the moment this project is deep on the back burner as I don't have sufficient quantities of time available to work on it (when I tried before I figured it'd take all my OSS hours for a few months to get a decent solution out, I am a beginner at parsers/lexers after all); but I'm keeping my eyes open for better solutions and your pointers help 👍

bradleyayers · 2021-02-24T09:12:42Z

Just stumbled on https://github.com/mtxr/vscode-sqltools/tree/b8938cafa59dd378d22a7ef24ecb710de51b400e/packages/formatter/src/languages which could be interesting.

benjie · 2021-02-24T10:29:05Z

Also https://github.com/oguimbal/pgsql-ast-parser

wesselvdv · 2021-03-29T21:05:08Z

@benjie seems the issue you raised has been resolved? Are there more major blockers?

benjie · 2021-03-29T21:48:32Z

Time ;) See the open PR

karlhorky · 2023-10-26T16:27:31Z

Wrote about some alternatives over here too, in case anyone wants to try some things in the meantime:

Alternative formatters before this is ready #28

karlhorky mentioned this issue Nov 20, 2020

SQL formatting frigus02/typescript-sql-tagged-template-plugin#4

Closed

thorn0 mentioned this issue Feb 15, 2021

Create another section for "official" plugins and the community ones on the homepage prettier/prettier#9876

Open

benjie mentioned this issue Feb 24, 2021

Parsing comments (not the COMMENT statement, but /* comments */) oguimbal/pgsql-ast-parser#23

Closed

benjie mentioned this issue Jan 25, 2023

Failed to build and run the tests #25

Open

karlhorky mentioned this issue Oct 26, 2023

Alternative formatters before this is ready #28

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Humbly Checking for Status Updates #10

Humbly Checking for Status Updates #10

WonderPanda commented Sep 30, 2020

benjie commented Sep 30, 2020

WellspringCS commented Sep 30, 2020

karlhorky commented Nov 19, 2020

benjie commented Nov 19, 2020

karlhorky commented Nov 19, 2020 •

edited

karlhorky commented Nov 19, 2020

mtriff commented Nov 19, 2020

benjie commented Nov 20, 2020

mtriff commented Nov 23, 2020

benjie commented Nov 23, 2020

bradleyayers commented Feb 24, 2021

benjie commented Feb 24, 2021

wesselvdv commented Mar 29, 2021

benjie commented Mar 29, 2021

karlhorky commented Oct 26, 2023

Humbly Checking for Status Updates #10

Humbly Checking for Status Updates #10

Comments

WonderPanda commented Sep 30, 2020

benjie commented Sep 30, 2020

WellspringCS commented Sep 30, 2020

karlhorky commented Nov 19, 2020

benjie commented Nov 19, 2020

karlhorky commented Nov 19, 2020 • edited

karlhorky commented Nov 19, 2020

mtriff commented Nov 19, 2020

benjie commented Nov 20, 2020

mtriff commented Nov 23, 2020

benjie commented Nov 23, 2020

bradleyayers commented Feb 24, 2021

benjie commented Feb 24, 2021

wesselvdv commented Mar 29, 2021

benjie commented Mar 29, 2021

karlhorky commented Oct 26, 2023

karlhorky commented Nov 19, 2020 •

edited