Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pivot LINQ extension #11963

Open
izzycoding opened this issue May 10, 2018 · 28 comments
Open

Add Pivot LINQ extension #11963

izzycoding opened this issue May 10, 2018 · 28 comments

Comments

@izzycoding
Copy link

I have various types in my app which are sudo-dynamic or generalised.
In table form I have an entity table that just has ID and type columns and then a details table with key, value, FK(ID) for the properties (not actually this but is the simplest way to demo).
It would be nice to be able to generate a query that gets the properties so that the key field is the column name and the values are listed grouping them by the ID. This would allow me to query data and apply it as-if it is a view and map it to a Poco.
This would also be useful for various statistics queries we need to run (which we have to do in C# currently due to mixed DB implementations of Pivot functionality).
It would be good to have this such that if the DB supports it then it’s done in SQL and if not it is done in-memory after executing the initial query.

@pmiddleton
Copy link
Contributor

@izzycoding - What do you envision the linq method would look like?

I actually started working on this a few weeks ago. I'm still debating what the actual Linq method should look like. That is why I'm curious to see how you think it should work.

Currently I am using QueryTypes to define the pivot results which has lead to pivot calls in this form.

public class PivotSchool
{
       public double? Wisconsin { get; set; }
       public double? Minnesota { get; set; }
       public double? Oklahoma { get; set; }
       public double? Alabama { get; set; }
}

var results = context.Schools.Pivot(s => s.Name,
                                    sl => sl.Sum(s => s.Attendence),
                                    s => new PivotSchool()).ToList();

In order to pivot you need to specify the pivot column and aggregate. In order to then compose further operators onto the query you need a result type. This syntax hits all of those points, but I'm still not completely sold on it.

cc: @divega

@ajcvickers ajcvickers added this to the Backlog milestone May 11, 2018
@ajcvickers ajcvickers added the help wanted This issue involves technologies where we are not experts. Expert help would be appreciated. label May 11, 2018
@divega
Copy link
Contributor

divega commented May 12, 2018

@pmiddleton Does Pivot() in your example have two generic arguments for the source element type (School) and the output element type (PivotSchool)? I am undecided, but perhaps this is better, even if you have to repeat the source element type:

var results = context.Schools.Pivot<School, PivotSchool>(
    s => s.Name,
    sl => sl.Sum(s => s.Attendence)).ToList();

@izzycoding
Copy link
Author

@pmiddleton I would actually expect the composure to be done via .Select() or .Cast().
As the Select could be mapped into the SQL to perform filtering. Cast, would essentially just construct a type with any values available in the dataset returned. This would then fit in more with existing semantics and syntax used.

Something like:

var results = context.EntityData.Where(e => e.Id == ...Entity Id I Want...)
    .Pivot(k => k.Name, v => v.Value)
    .Select(ed => new EntityType {
        propA = ed.propA,
        propB = ed.propB
    });

// or;

var results = context.EntityData.Where(e => e.Id == ...Entity Id I Want...)
    .Pivot(k => k.Name, v => v.Value)
    .Cast<EntityData>();

This to me would feel cleaner and more readable from code.
How that is achieved I am not sure.

@izzycoding
Copy link
Author

izzycoding commented May 12, 2018

@divega - I would expect the extension method would already know about School in @pmiddleton example.
Also if just providing the type for result generically, then you loose the ability to compose all the data. E.G. if you store ledger info with qty & unitCost, how could you compose a Pivot for totalCost?
Not to mention that the .Cast() extension already exists and a Dev might want to further filter the Pivot results before returning them. With a direct cast you could potentially drop data, especially if the DB table holds property values for multiple different types.

@pmiddleton
Copy link
Contributor

@divega

I do have the overload you mentioned. If you are pivoting on an entity type or query type then you can use that overload and avoid the => new ... parameter.

However if the shape of the query feeding into the pivot is an anonymous type (say from a select or join) then you need the compiler to infer all the generic types. Therefore the => new ... syntax is required in order to set the pivot result shape.

I really don't like the syntax, but I don't see anyway around it. I'm all ears if anyone has an idea.

@izzycoding
Copy link
Author

@pmiddleton
Would like to see what you have so far to see how it could work for some of my projects.

@divega
Copy link
Contributor

divega commented May 14, 2018

@izzycoding Re these examples:

var results = context.EntityData.Where(e => e.Id == ...Entity Id I Want...)
    .Pivot(k => k.Name, v => v.Value)
    .Select(ed => new EntityType {
        propA = ed.propA,
        propB = ed.propB
    });

// or;

var results = context.EntityData.Where(e => e.Id == ...Entity Id I Want...)
    .Pivot(k => k.Name, v => v.Value)
    .Cast<EntityData>();

The main question is what type Pivot returns. It seems you are assuming that it would be able to dynamically make up a type that contains PropA and PropB based on the data returned by the query. Also Cast works differently. You seem to be after something more akin AutoMapper's ProjectTo method.

One interesting option your example suggest that I think could work is that we could have a version of Pivot that returns a property bag.

I am not sure if that would be possible for EF Core currently, but I suspect we could make it work. But I believe if we could make this work it would have a couple of nice characteristics:

  1. The in-memory implementation of this version of Pivot should be simple and not require reflection
  2. You can always bind to a type on the next step using something similar to AutoMapper's ProjectTo (although I am not sure if the latter currently works with property bags as an input, I think it would be a good feature for it to support).

So the example above would become something like this:

// manual biding:
var results = context.EntityData.Where(predicate)
    .Pivot(k => k.Name, v => v.Value)
    .Select(ed => new EntityType {
        propA = ed["propA"],
        propB = ed["propB"]
    });

// automatic binding:
var results = context.EntityData.Where(predicate)
    .Pivot(k => k.Name, v => v.Value)
    .ProjectTo<EntityData>();

cc @pmiddleton have you considered this as an option?

@izzycoding
Copy link
Author

@divega - I like the idea of having a PropertyBag as the output. However, as a first attempt I would guess an IEnumerable<KeyValuePair<string, string>> return value (as the most generic form), or just use reflection and then improve the implementation later. It would be easy then to provide an explicit conversion operator in MyCustomResultType to enable the Cast() to work. I have not really looked into the source of EFC as yet so unsure how the rest works with regards to mapping values to class properties. However, I would suspect some of the 2.1 converters might be useful to enable DBType to PivotType conversions.

@pmiddleton
Copy link
Contributor

@izzycoding - In your example I am confused as to what the parameters are in the pivot call

.Pivot(k => k.Name, v => v.Value)

In order to pivot you need a pivot column and and an aggregate. I don't follow what you expect k and v to be.

@pmiddleton
Copy link
Contributor

@divega - I had considered a client side implementation, but I wasn't sure if it was a good idea. Given that pivot is typically used for statistics/reporting a client side implementation could require pulling a lot of data into memory from the database in order to perform the pivot.

If we do want to add a client implementation I think adding an overload which returns a property bag should be doable.

One thing which I think would be extremely hard to do is trying to get a pivot to work server side without knowing the result type. You need that result type in order to know what the valid pivot values are. To work without it we would have to first run a query to get the valid pivot values and then build a second query with those values. I don't think things are built today to deal with that scenario.

@izzycoding
Copy link
Author

izzycoding commented May 15, 2018

@pmiddleton - My example was probably not clear enough. Please view it as:

.Pivot(col => col.RequiredKeyColumn, agr => agr.AgregateFunctionOrColumn);

In my example the Agregate was based on an ID column instead of an agregate function like Sum()

@divega
Copy link
Contributor

divega commented May 16, 2018

One thing which I think would be extremely hard to do is trying to get a pivot to work server side without knowing the result type...

@pmiddleton fair enough. I had completely forgotten that the column names were mandatory in the PIVOT syntax.

@pmiddleton
Copy link
Contributor

@smitpatel @ajcvickers @anpete

I have run into an issue that I need help with.

Take the following pivot query where PivotSchool is a Query Type and School is an Entity Type.

public class School
{
     public int Id { get; set; }
     public string Name { get; set; }
     public string Conference { get; set; }
     public double Attendence { get; set; }
}

public class PivotSchool
{
     public double? Wisconsin { get; set; }
     public double? Minnesota { get; set; }
     public double? Oklahoma { get; set; }
     public double? Alabama { get; set; }
}

context.Schools.Pivot<School, PivotSchool>(s => s.Name, sl => sl.Sum(s => s.Attendence)).ToList();

The Pivot feature is implemented as a ReLinq ResultOperator and handled by EF inside of RelationalResultOperatorHandler.

The issue I'm having with this query is that it directly selects the results of the pivot operation (in this case PivotSchool). However the shape of this query is set to School by RelationalEntityQueryableExpressionVisitor when the MainFromClause of the query model is visited.

I need a way to change the shape of the query in RelationalResultOperatorHandler, but there is no way to create a new Shaper/Materializer at this point in the code.

Does anyone have any ideas on how I can reshape the query, short of re-implementing RelationalResultOperatorHandler.CreateShaper inside of RelationalResultOperatorHandler?

@smitpatel
Copy link
Member

@pmiddleton - Presently there are only 2 kind of shaper changes happen in ResultOperatorHandler.

  • Single scalar value handler for aggregating operators (through TransformClientExpression)
  • GroupBy changes. But for server group by we just inject ValueBufferShaper.

Since this is new kind of requirement (which happens on server side) there is no way to do it right now.
Can you post the QueryModel generated, client Expression before you visit the ResultOperator & what is the expecation after you visit the ResultOperator?

@pmiddleton
Copy link
Contributor

@smitpatel - let's pretend for a moment I'm dumb... yea pretend :)

What exactly are you looking for? Just the results of calling .toString() on the model and expression, or is there a helper class which gives me something prettier?

@pmiddleton
Copy link
Contributor

@smitpatel - Oh and if it is just that simple then this is what I have

For the model (I don't have PivotResultOperator printing anything pretty right now)

value(Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryable`1[Microsoft.EntityFrameworkCore.Query.PivotSqlServerTests+School]) => Pivot

The Expression going in is.

_ShapedQuery(queryContext, value(Microsoft.EntityFrameworkCore.Query.Internal.ShaperCommandContext), UnbufferedEntityShaper)

I believe I just need this coming out (won't know for sure until I can get this far)

_ShapedQuery(queryContext, value(Microsoft.EntityFrameworkCore.Query.Internal.ShaperCommandContext), UnbufferedEntityShaper)

RIght now I'm thinking of trying to refactor the CreateShape method out of RelationalEntityQueryableExpressionVisitor and into it's own class so that it can be shared with RelationalResultOperatorHandler.

I have already tried creating a fake EntityQueryable constant I could send through
RelationalEntityQueryableExpressionVisitor, but that hit a dead end because I don't have access to the IAsyncQueryProvider.

@smitpatel
Copy link
Member

output of following things inside your HandlePivot method in RelationalResultOperatorHandler
handlerContext.QueryModel.Print() (query model got extension method print for pretty printing)
new Microsoft.EntityFrameworkCore.Query.Internal.RelationalExpressionPrinter().Print(handlerContext.QueryModelVisitor.Expression)

And once you see output of the last statement, what is your expectation, how it should look like once pivot result operator is applied.

@pmiddleton
Copy link
Contributor

handlerContext.QueryModel.Print() gives me

(from School s in DbSet
select [s]).Pivot

new Microsoft.EntityFrameworkCore.Query.Internal.RelationalExpressionPrinter().Print(handlerContext.QueryModelVisitor.Expression) gives me

IEnumerable ShapedQuery(
|
_ queryContext: Unhandled parameter: queryContext,
|__ shaperCommandContext: SelectExpression:
| SELECT [s].[Id], [s].[Attendence], [s].[Conference], [s].[Name]
| FROM [Schools] AS [s],
|__ shaper: UnbufferedEntityShaper)

Here is what is currently outputting at the end of the PivotMethod.

IEnumerable ShapedQuery(
|
_ queryContext: Unhandled parameter: queryContext,
|__ shaperCommandContext: SelectExpression:
| SELECT 1
| FROM (
| SELECT [s].[Id], [s].[Attendence], [s].[Conference], [s].[Name]
| FROM [Schools] AS [s]
| ) AS [t]
| PIVOT(SUM([t].[Attendence]) for [t].[Name] in ([Alabama],[Minnesota],[Oklahoma],[Wisconsin])) AS [t0],
|__ shaper: UnbufferedEntityShaper)

What I need is is that UnbufferedEntityShaper changed to PivotSchool. The "Select 1" outer query Sql gets cleaned up after the HandlePivot method so you can ignore that here.

@smitpatel
Copy link
Member

Copy over code for CreateShaper from here.
You have SelectExpression (as above), Type & EntityType from PivotResultOperator.
https://github.com/aspnet/EntityFrameworkCore/blob/f053431cd078eb68a5ae6dd0d57cc5e278c9ec4e/src/EFCore.Relational/Query/ExpressionVisitors/RelationalEntityQueryableExpressionVisitor.cs#L241

@pmiddleton
Copy link
Contributor

@smitpatel - Ok that worked, but I don't like the fact that there is now duplicate code. I am going to refactor this into a shared class. Any thoughts on what folder it should live in? I'm leaning toward Relational/Query/Internal.

@smitpatel
Copy link
Member

@pmiddleton - I suggest keep it duplicated and file an issue to clean up. We may or may not need it in 3.0 depending on outcome of #12795

@pmiddleton
Copy link
Contributor

@divega

Sorry to dump more on you but I have run into a few issues which need some design decisions :)

I need a way to identify the values from the pivot column which go into the in clause on the pivot sql. Those values are all expressed as properties on the result type of the pivot operation, but it does not include all properties on that type. Therefore I need to add a fluent configuration method to setup those columns.

Here is an example of what I am talking about.

Take these two entity types which represent players on a school American Football team. Soccer teams will be supported in the unit tests 2.5 upgrade 😁

public class Player
{
       public int Id { get; set; }
       public string Name { get; set; }
       public string Position { get; set; }
       public int Weight { get; set; }

       public int SchoolId { get; set; }
}

public class School
{
      public int Id { get; set; }
      public string Name { get; set; }
}

Now take a pivot query against those two tables which produces the following output Query Type.

public class SchoolPositionWeight
{
      public int? Quarterback { get; set; }
      public int? WideReceiver { get; set; }
      public string SchoolName { get; set; }
}
context.Players.Select(p =>
new {
     ShoolName = p.Name,
     p.Position,
     p.Weight
}).Pivot(b => b.Position,
         bl => bl.Max(b => b.Weight),
         b => new SchoolPositionWeight())

In order to build the query correctly I need to run the following configuration during model creation.

var psp = modelBuilder.Query(typeof(SchoolPositionWeight));
psp.Property("Quarterback").IsPivot();
psp.Property("WideReceiver").IsPivot();

Ok now for the questions.

  1. What should we call this fluent method - right now I have IsPivot.
  2. Should we do a convention which tries to pickup the pivot columns automatically based on property names?
  3. The PropertyBuilder is shared between Entity Types and Query Types. However using an Entity Type as a pivot result doesn't make sense and therefore I currently have it limited to Query Types. Should we create a new PropertyBuilder for just Query Types to avoid having this IsPivot method on Entity Types where it doesn't make sense?
  4. Do you have any other ideas of ways to identify the pivot column values?

@pmiddleton
Copy link
Contributor

@divega - I have another one for yea :)

PostgreSQL does not support the Pivot operator as defined by Sql Server and Oracle. However it does have something very similar, the Crosstab function. This function performs a pivot on the source however it does not support an aggregate.

This means our Linq Pivot function needs some design work. How do we want to deal with some providers who support the aggregate and some who don't? We could go with two Pivot methods one with and one without the aggregate. We could make the aggregate the last parameter and optional. We could define a PostgreSql specific Crosstab function.

Any thoughts on the best way to go to keep in line with existing EF design.

@roji - What are your thoughts? Do you see anything in this thread which would be an issue for PostgreSql?

@roji
Copy link
Member

roji commented Aug 28, 2018

@pmiddleton I've tried to find some time to dive into this but haven't been very successful :( In any case you seem to know at least as much as I do about pivoting in PostgreSQL.

From a quick look, according to this issue it seems like aggregating in PostgreSQL is supported by using an aggregate function inside the crosstab() SQL - or am I totally misunderstanding the problem?

Is pivoting something that's generally supported in other databases (Oracle, MySQL..)? Is support more or less uniform (or is it fragmented, like the lack of aggregate for PostgreSQL)? Would client evaluation occur on databases which don't support pivoting?

@pmiddleton
Copy link
Contributor

@roji - Thanks for that link. Based on that it looks like PostgreSQL can support the same type of pivot that Sql and Oracle do. It will just require more because the base query needs to have the aggregate manually applied by the provider.

From my research Oracle, Sql, and PostgreSQL are the only databases with a pivot operation built in. Maybe @divega has more knowledge in this area for other databases.

I'm not sure about adding client pivoting support. @divega thoughts?

This feature as a whole is really dependent on the outcome of #12795 as it is currently tightly coupled to ReLinq.

@divega divega added good first issue This issue should be relatively straightforward to fix. help wanted This issue involves technologies where we are not experts. Expert help would be appreciated. and removed help wanted This issue involves technologies where we are not experts. Expert help would be appreciated. good first issue This issue should be relatively straightforward to fix. labels May 31, 2019
@ajcvickers ajcvickers added good first issue This issue should be relatively straightforward to fix. help wanted This issue involves technologies where we are not experts. Expert help would be appreciated. and removed help wanted This issue involves technologies where we are not experts. Expert help would be appreciated. good first issue This issue should be relatively straightforward to fix. labels Aug 5, 2019
@ajcvickers ajcvickers removed the help wanted This issue involves technologies where we are not experts. Expert help would be appreciated. label Aug 30, 2019
@mojtabakaviani
Copy link

Can add this to efcore 6 roadmap?

@roji
Copy link
Member

roji commented Nov 10, 2020

@mojtabakaviani at the moment, since this feature has received few votes and is very non-trivial, and EF Core 6 already has a lot of features planned, there's probably little chance we'll be able to include it in that release. As always, that could change based on user feedback and requests.

@pmiddleton
Copy link
Contributor

I am going to try take another look at this again this winter/spring. Assuming I can find some time between work and helping with virtual school for my daughter. No guarantees.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants