Skip to content
This repository has been archived by the owner on Dec 6, 2022. It is now read-only.

Converting to different DSL #4

Closed
Ocramius opened this issue Apr 22, 2014 · 7 comments
Closed

Converting to different DSL #4

Ocramius opened this issue Apr 22, 2014 · 7 comments

Comments

@Ocramius
Copy link

Different data-sources, such as RDBMSs or MongoDB, use different DSLs (SQL/JS queries, etc).

What I found out so far is this page in the docs, which seems to simulate usage of a visitor on different types of filtering/aggregation requests.

It is really hard for newcomers to understand how to use the visitors to lazily evaluate Pinq expressions to convert them into, for example, SQL.

A simplified SQL that appends conditionals to a given SQL string would probably be useful in those examples.

@Ocramius
Copy link
Author

@TimeToogo to be more precise (and I didn't try it myself, but I will as soon as I have time), I want to be sure if I'm grokking this correctly:

  • the parameter FunctionExpressionTree $ExpressionTree passed to the ExpressionTreeEvaluator contains ALL the filters/expressions applied to a Pinq IRepository instance so far
  • the ExpressionTreeEvaluator should be responsible for generating the SQL from the given $ExpressionTree (from supported expressions)
  • the QueryScopeEvaluator would produce something like a CurrentQueryScope object that carries information such as the current SQL (produced via the ExpressionTreeEvaluator) and the current DB connection, and it would pass it down to the RequestEvaluator
  • RequestEvaluator applies GROUP BY or similar aggregation to the CurrentQueryScope (or creates a new one, if CurrentQueryScope is immutable) and executes it

Is this the right approach or am I getting it totally wrong? It really takes a bit of creativity to follow the example with the array, as it is still very near to the iterator logic that is already implemented within Pinq. Also, the examples also directly manipulate the data, which is something that isn't happening in any query-based persistence layer.

@TimeToogo
Copy link
Owner

Not quite, the structure first:

  • Starting with the IRequestQuery, this is the object that will be passed to IQueryProvider::load.
  • The IRequestQuery contains two parts, the scope (IRequestQuery::getScope()) and the request (IRequestQuery::getRequest()).
  • Scope
    • The scope consists of many segments, each of which contain a part of the users query. Each segment represents a transformation of the previous values.
    • Usually, each segment will represent a single called method such as Filter (where(...)) or Select (select(...)) but this is not the case for all. Some such as Join (join(...)->on(...)->to(...)) and OrderBy (orderBy(...)->thenBy(...)...) may represent multiple.
    • The FunctionExpressionTree class only ever represents one function. These are contained within the segment classes.
    • All segments are strongly typed with each as their own class and possess the appropriate methods to retrieve the transformation data.
    • Example: the Filter segment contains Filter::getFunctionExpressionTree() method whereas the OrderBy segment contains OrderBy::getFunctionExpressionTrees() and OrderBy::getIsAscendingArray().
  • Request
    • The request represents the actual values that are to be returned.
    • Like segments, requests are strongly typed with each as their own class.
    • asArray() would become the Values request class. count() would become the Count request class.
    • Some of these, such as Maximum, Minimum and Sum contain an optional projection function Maximum::getFunctionExpressionTree(). These would have to be evaluated before the final value is returned.

Sorry for this, it may confuse you even further. It got out of hand rather quickly. As for the most part, I was building the concept in my head aswell

Here is a rough guide to implementing SQL query provider:

I would start off with an class that represents a SQL SELECT statement with mutable properties such as COLUMNS/ALIASES FROM, JOIN, WHERE, ORDER BY, GROUP BY, LIMIT etc. Lets call that the SelectStatment class.

Interpreting the Scope

You will also need a (probably multiple) class that extends the ExpressionVisitor class to interpret the expression trees. I say you may need many as in some segments, the expression tree will have to be treated differently. For simplicity I would have each define a static method, take an Expression and return an equivalent SQL string. You may also need to pass some additional configuration such as table definitions / aliases.

To interpret this part of the query you should create a class which extends the SegmentVisitor class. This would create your SelectStatment and initialize it with the appropriate FROM clause.

Override each segment visitor method to handle the type of segment:

  • visitFilter: Add this as a WHERE clause. Infact if you wanted to get advanced, this could also be a HAVING clause if it was after a GroupBy segment as Pinq does not make the distinction. This segment contains a single FunctionExpressionTree. You can get it via Filter::getFunctionExpressionTree(), you would want to evaluate the return value expression . As this is so common, there is a special method for this: FunctionExpressionTree::getFirstResolvedReturnValueExpression() which will return the expression you want to evaluate. You would have to pass this to your ExpressionVisitor and convert this into the appropriate SQL.
  • visitOrderBy: Like the Filter segment you should convert the return value expression. But this segment contains multiple expression trees (OrderBy::getFunctionExpressionTrees()). Each of these should be a seperate column of the ORDER BY clause. It also contains an array of booleans representing the direction of each function (they are mapped via matching array keys), OrderBy::getIsAscendingArray(). This would allow you to add the additional ASC/DESC for each.
  • visitUnique: Add a DISTINCT clause
  • visitRange: Add the LIMIT clause with Range::getRangeAmount() and the OFFSET with Range::getRangeStart().
  • visitGroupBy: Pretty much the same approach as OrderBy but without the direction clause. This would create significant complexity as you would have to handle the values becoming ITraversable. I would not touch this at the moment.
  • visitSelect: Again want to convert the return value expression. For simplicity, just support parsing the return of an ArrayExpression with contant keys. These would become the SELECT clause in the format <Value expression> AS <Constant key value>,....
  • visitSelectMany: I am not sure if this would fit into the SQL domain without a proper ORM backing it up.
  • visitOperation: This would best be unsupported for now.
  • visitIndexBy: This is interesting, for this I would see fit as to convert the return value expression and select it under the alias of __indexColumn and this would have to sorted out in memory after retrieving the result set.
  • visitJoin/visitEqualityJoin: For this, I would verify the joined values (Join::getValues()) is of the appropriate type and from that get the table name. If you were to handle Join::isGroupJoin(), this for the most part would be mapped to LEFT JOIN instead of an INNER JOIN and there would be much more complexity as with GroupBy in handling the subsequent usages of ITraversable as the value. Ignoring that for now, there are two types of join filters, on/onEquality, for on just convert the return value expression as the ON clause. For onEquality convert the return value expressions for both the functions and stick a = between them. You would want to handle the Join::getJoiningFunctionExpressionTree() rather in the same manner as Select but you would have to handle the two join arguments instead of one.

The above is me just mocking how I would handle each query segment. Now this would become infinitely more complex when you apply things like multiple Select or even just the segments not being in the order of standard SQL. For example, if there was a Join after an OrderBy, the entire current query would have to become a derived table and there would have to be a new SelectStatement with a FROM as the original and the JOIN would be applied to that instead.

Interpreting the Requests

To interpret the request you should create a class which extends the RequestVisitor class. This will take your SelectStatment instance and where it will be finally be compiled and executed.

Again, override each method to handle every request type:

  • visitValues: Compile the existing SelectStatment execute it and return the result set as an \Iterator.
  • visitCount: Ignore COLUMNS, replace with COUNT(*), execute and return the value.
  • visitExists: Wrap statement with a SELECT EXISTS(...).
  • visitFirst: Return first row value and optimize by by setting LIMIT to 1 or null if none exists.
  • visitLast: Invert all the ORDER BY clauses and set LIMIT to 1, null if non existent.
  • visitContains: Verify the appropriate type from Contains::getValue(), then add WHERE with primary key contraint and wrap with SELECT EXISTS(...)
  • visitAggregate: This could not be supported.
  • visitMaximum/visitMinimum/visitSum/visitAverage/visitImplode: These aggregates contain optional projection function expression trees. These are like a shortcut Select for readablitity: ->select(function(x))->maximum() should always yield the same result as ->maximum(function(x)). So if Maximum::hasFunctionExpressionTree(), you would convert the return value expression, verify it returns a single column and wrap that in the appropriate aggregate function.
  • visitAll/visitAny: These are also like the above, but im not sure how they would be implemented in a standard way. I know MySQL supports BIT_AND and BIT_OR which could be used for this.
  • visitGetIndex/visitIssetIndex: Seems kind of useless to me but if I were to implement these, I would add a WHERE clause with the magic __indexColumn equal to the GetIndex::getIndex() value, or if there was no IndexBy segment maybe COUNT(*) > <Index value>.

All of this would come together in an IQueryProvider implementation, not much should change from the example at http://timetoogo.github.io/Pinq/query-provider.html.

IQueryProvider::load

  • Initialize your SegmentVisitor class.
  • Evaluate the IRequestQuery::getScope() with your SegmentVisitor::walk($scope).
  • Initialize your RequestVisitor class with SelectStatment from the segment visitor.
  • Return the RequestVisitor instance and it will evaluate the IRequestQuery::getRequest() and return the result.

I am planning to add a parallel to the current online example with SQL queries as you have mentioned.
I really hope, this helps you structure it properly in your mind. If you have any further queries, please let me know :)

@Ocramius
Copy link
Author

@TimeToogo woah, this is awesome feedback!

I'll read through it and see if I can make any use of it. Thanks a lot for this very detailed explanation.

Paid work calls, but I'll go through it as soon as I have time. Thanks a lot!

@TimeToogo
Copy link
Owner

@Ocramius Just realising this, but there is a perfect RequestVisitor class for this use case. So instead of subclassing Pinq\Queries\Requests\RequestVisitor, subclass the Pinq\Providers\Loadable\RequestEvaluator.

This is also a subclass of RequestVisitor but provides the functionality such that if the values are loaded into memory (Values request), any further requests on that scope will be evaluated in memory.

This will avoid unnessecary queries to the DB:

$someRows = $table->where(function ($row) { return $row['id'] <= 50; });

foreach($someRows as $row) {
    //This will load the rows, (Values request)
}

//This will be evaluated in memory, (Maximum request)
$maxId = $someRows->maximum(function ($row) { return $row['id']; });

But note that instead of overriding visit... methods they will instead be load....
You can view the source here, it is rather simple:
https://github.com/TimeToogo/Pinq/blob/master/Source/Providers/Loadable/RequestEvaluator.php

@TimeToogo
Copy link
Owner

I have just implemented Pinq\Providers\Loadable\Provider.
If using Pinq\Providers\Loadable\RequestEvaluator, you can extend and use this query provider instead.

This has additional capability of evaluating additional query scopes rather than just the same scope with different requests. This will even further reduce the number of unnecessary queries to the db:

$someRows = $table->where(function ($row) { return $row['id'] <= 50; });

foreach($someRows as $row) {
    //This will load the rows
}

//This will be evaluated in memory because the parent scope is loaded
$someOtherRows = $someRows->orderByDescending(function ($row) { return $row['id']; });

@TimeToogo
Copy link
Owner

Well better late than never. Here is a documented demo repository containing an implementation of the DSL query provider platform. This maps a subset of the PINQ API to a MySQL database backend.

https://github.com/TimeToogo/Pinq.Demo.Sql

@Ocramius
Copy link
Author

Absolutely awesome!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants