Skip to content
This repository has been archived by the owner on Jul 15, 2021. It is now read-only.

Feature Request: Give an API to reconstruct the query back from AST #23

Open
gurvinder372 opened this issue Sep 5, 2016 · 13 comments
Open

Comments

@gurvinder372
Copy link

This will really help in testing different queries and hence it will be possible to write an extensive and exhaustive test suite with it.

@jdrew1303
Copy link

This would be best implemented as a separate library. This is how many other projects handle this. There is one change still to be made to the AST (standardising the type and variant to a single type see issue #3 ). There could also be a bit of change in the AST with issue #20

I'd be willing to work on this, along with a traverser allowing a user to move along the AST in a standard manner. (I've built similar libraries for Salesforce SOQL using this AST as inspiration). It would be cool to give back if people are interested in this.

It could in theory be then used to build more advanced language features that would compile down to raw SQL (or different dialects), similar to SASS, Babel, etc. (This would be a move away from the original intention of the library as a validator for Codeschool SQL classes so not sure how that would go. Which by the way 👍 to Codeschool. 😄 )

@jdrew1303
Copy link

Hey @gurvinder372,
I thought I'd give a quick update on this. I'm working on the AST walker at the moment. Im hoping to have this finished this weekend. It's not in Github yet, I'll upload it once I've simple docs and some more of the main statement types covered.

This is it in action moving over create-check-1.json. So it's going well. :)
image

Once this is done I should have the generator done within the next week-ish (SQL is a lot more expressive than SOQL and that one took 2 days to build. Tests are the time consuming part). With this you'll be able to modify the AST in a standard manner and regenerate it back to SQL.

@nwronski I also may have some pull request to clean up some small bits of the AST structure/naming to make movement a bit easier, I'm not sure if they're 100% needed yet so I'll hold off on issues or pull requests. But just a heads up :)

Let me know what you think. Any ideas on the generator too would be good (template/string concat/etc).

@gurvinder372
Copy link
Author

Thanks James, I apologies for delayed reply but really appreciate your response.

Looking at AST-Walker in the screenshot, I think it will be helpful in getting more clarity on how AST is generated and what steps are involved once this AST walker has been implemented.

I want to go through the walker and understand in depth on how I can reverse the steps from AST and get the input string back again. It will help in creating a library which can construct input string from AST and will make adding test-queries easier and faster. I am looking forward to it.

@jdrew1303
Copy link

Hey,

The AST is generated from this library. It's just a simple JSON object. The walker works exactly the same as the estraverse library does for the JavaScript AST. The API is the exact same so the docs for that can give a good overview.

The walker is just for manipulating the AST, allowing you to build linters or items like that on top of it.

Generation of code from the AST is really just string concatenation. To give an example: if we take the following :

CREATE INDEX `bees`.`hive_state`
ON `hive` (`happiness` ASC, `anger` DESC)
WHERE
  `anger` != null AND NOT `happiness`

and pass it through sqlite-parser, then we get:

{
  "statement": [
    {
      "type": "statement",
      "target": {
        "type": "identifier",
        "variant": "index",
        "name": "bees.hive_state"
      },
      "on": {
        "type" : "on",
        "target": "hive",
        "columns": [
          {
            "type": "identifier",
            "variant": "column",
            "format": "indexed",
            "name": "happiness",
            "direction": "asc"
          },
          {
            "type": "identifier",
            "variant": "column",
            "format": "indexed",
            "name": "anger",
            "direction": "desc"
          }
        ]
      },
      "variant": "create",
      "format": "index",
      "where": [
        {
          "type": "expression",
          "format": "binary",
          "variant": "operation",
          "operation": "and",
          "left": {
            "type": "expression",
            "format": "binary",
            "variant": "operation",
            "operation": "!=",
            "left": {
              "type": "identifier",
              "variant": "column",
              "name": "anger"
            },
            "right": {
              "type": "literal",
              "variant": "null",
              "value": "null"
            }
          },
          "right": {
            "type": "expression",
            "format": "unary",
            "variant": "operation",
            "expression": {
              "type": "identifier",
              "variant": "column",
              "name": "happiness"
            },
            "operator": "not"
          }
        }
      ]
    }
  ]
}

To reverse the operation we walk the AST taking each node in turn and putting it through a template. A really basic version of this would be the following for the on node:

function on(node){
  var columns = node.columns.map((column) => {
    return `\`${column.name}\` ${column.direction}`
  }).join(', ')
  return `ON \`${node.target}\` (${columns})`;
}

You recurse down the tree and build up the program from the bottom up (it's easier). In this case we would create the node in the leave function of the traverser.

Again this is just an idea of how it might work. The final product needs to take into account edge cases, alternative syntax, etc. Also how you handle the strings is a case in itself. I've used template strings in the above case but you could use a template library (underscore, moustache, etc).

Some of the harder bits are when you have to handle logic for math operations, functions, etc. Hope this gives a bit of an overview?

@nwronski
Copy link
Contributor

nwronski commented Oct 1, 2016

@jdrew1303 sorry I haven't been able to respond sooner. I am definitely open to suggestions for how I can further standardize the AST. I am also hoping to clear out some of the outstanding issues soon, such as the expression parsing problems, in the next couple weeks. I think having a tool for traversing the AST would be greatly appreciated by many users, including myself, so thank you. I'll be available later today to take a look at what you've been working on, if you want, as well.

@jdrew1303
Copy link

Hey,

@gurvinder372 Better late than never, here's the preliminary repo. It will work at the moment for the majority of cases. I'll be spending today cleaning up the docs and code (ive a lot to delete 👍 ). If it blows up when encountering a node it doesn't understand then you can fallback to iteration. Also the code will be heavily refactored over the next day so dont look under the hood. 😃

@nwronski I'll also start writing up some more info on the AST (all nodes having a type is the main one, even the base node). On a side note, have you ever looked at Arel before? It's the AST manager thats used by Rails Active Record under the hood. I was thinking that maybe by looking at that as a model for the node types for the parser to standardise interactions/names/etc? If they've done the thinking before and worked out kinks might as well use it?

Hopefully tomorrow I can start working on a generator for SQLite. Ill make it pluggable to be able to hook in other syntaxes (MySQL, Postgres, etc). I'll be basing a lot of the work off squel.js. Never know it might be good for them to get on board to remove the burden of this. Also going to take a look at their AST to see if there's anything worth stealing borrowing. 🤔

@jdrew1303
Copy link

Traverser is good to go. (You'll have to use the iterator fallback until I finish pull requests for some ast changes in the sqlite-parser as there are some nodes that it doesnt know how to move through).

Also, may I present the generator repo: https://github.com/jdrew1303/sqlgenerate

It's in process at the moment. You'll see commits going through over the next few days. 👍 I've been a bit over with the traverser (and that was relatively straightforward vs the work needed for a generator) so it's looking like it'll take 2-ish weeks for a beta quality library. It will only generate SQLite syntax but over the next while I'll add in different syntax generators.

@nwronski
Copy link
Contributor

nwronski commented Oct 4, 2016

@jdrew1303 very cool! If you are thinking about submitting a PR for this repo, please take a look at this first and this first, as I've already started adding missing type properties where I find them. If you find other nodes without a type please let me know and I will add them as well.

@jdrew1303
Copy link

jdrew1303 commented Oct 6, 2016

@nwronski Thanks 😃 I think you've hit most of the ones I was looking at. 👍 If I come across any more I'll give you a shout for the more complicated stuff or I'll put together an issue and a PR for simple stuff. Oh actually, would you be able to add a tag to your current work? Im having problems getting the latest commit to be pulled in (its always pulling in 1.5-beta but that is missing a few commits).

@nwronski
Copy link
Contributor

nwronski commented Oct 9, 2016

@jdrew1303 I made a ton of changes and fixes over the last two weekends and I am almost ready to release v1.0.0. In the meantime, I published v1.0.0-beta so you can pull the v1.0.0-beta tag of the repo to get the latest stuff or you can install it via npm using npm i sqlite-parser@beta. Let me know if you have a chance to check it out.

@jdrew1303
Copy link

Hey @nwronski !
Cool. I was waiting until the majority of the changes were in before working on the generator. I'll be spending tomorrow reworking the traverser to accommodate the new node types and working on the generator after that. So far it looks really good 👍 😃 Thanks for cutting a version for it. For some reason I couldn't get node to pull in the hashed version into the test suite 😫 with the beta its able to pull that in. This is going to be so cool! 😎 I can't wait to start building tools on top of it. First up, a linter; its always great to ruin someones day :trollface: 😂

There are two items at the moment that aren't handled by the parser :

  • location and range information on nodes (for sourcemaps, adding comments inline, debugging, etc)
  • comments (will allow for documentation generation tools and retaining comments on code when the parser and generator is used for refactorings on codebases)

I dont know if these are items of value for you at the moment? If there's anything in there that doesn't make sense for codeschool let me know and I can handle development of those bits? ☺️

I'll give you a shout with anything else I come across :neckbeard: Sorry I didn't get back sooner on this (Github isn't sending me notifications on repos 😑)

@nwronski
Copy link
Contributor

I could probably add a node for comments if that is worthwhile. They are parsed but dropped from the AST right now. Could you create a new issue here for that one?

Will have to look into the sourcemap stuff some more to see what would be possible.

@MrGarry2016
Copy link

+1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants