Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SkipWhile() #14104

Open
corstian opened this Issue Dec 6, 2018 · 14 comments

Comments

Projects
None yet
5 participants
@corstian
Copy link

corstian commented Dec 6, 2018

Feature Request

In order to facilitate my use case of implementing cursor based pagination I would find the LINQ .SkipWhile() method useful.

I would imagine this expression to result in something like the following SQL statement:

(DbSet<T>)
    .OrderBy(q => q.DepartureTime)
    .ThenBy(q => q.ArrivalTime)
    .SkipWhile(q => q.Id != "84CFC944-B64E-4C7D-80CD-053C771DC478")
    .Take(50);
SELECT * FROM [Flights] ORDER BY [DepartureTime], [ArrivalTime]
OFFSET (
    SELECT TOP 1 rn
    FROM (
        SELECT Id, ROW_NUMBER() OVER (ORDER BY [DepartureTime], [ArrivalTime]) as rn FROM [Flights]
    ) T
    WHERE T.Id = '84CFC944-B64E-4C7D-80CD-053C771DC478'
) rows
FETCH NEXT 50 ROWS ONLY

Until now I have been unable to find a reusable and composable way to achieve this behaviour by using the FromSql() extension method for multiple models.

Related Issues

#12046

@smitpatel smitpatel self-assigned this Dec 6, 2018

@ajcvickers ajcvickers added this to the Backlog milestone Dec 12, 2018

@smitpatel smitpatel removed their assignment Jan 16, 2019

@benmccallum

This comment has been minimized.

Copy link

benmccallum commented Feb 7, 2019

After almost creating another issue I've ended up here and thought I'd share my situation and get some more weight behind this proposal.

I'm building a GraphQL service and trying to support cursor-based pagination.

What I'm finding is that those articles make it sound really easy to implement, as do a lot of blog posts, but in reality that's rarely going to work. The examples I see always assume: "You sort by CreatedAt" or "You sort by BookingID" but that fails to address two potential issues:

  • The client wanting to order by another field
  • Order by another field that can have duplicates (see below)

In my situation right now, I'm facing both issues. I'm working on a set of Bookings. A booking has:

  • a booking id,
  • a booking date (date and time of arrival), and
  • a shop (where the booking is completed).

The client would like to be able to sort by booking date (issue 1), so this would therefore be the logical cursor to use, but when the client is an admin user, the result set (across shops) will have duplicate booking dates (issue 2) as it is essentially an discrete list of arrival times for a date. So, that means booking date alone can't be used as the cursor as getting a page after a given cursor would effectively skip all the remaining pages of bookings that had that same booking date.

So, the search for a cursor continues. I first though it could be a combination of booking date and booking id, but although that makes sense for the ORDER BY clause (ORDER BY BookingDate, BookingID), booking ids don't progress in line with booking dates, so the WHERE clause (WHERE BookingDate > Cursor.BookingDate AND BookingID > Cursor.BookingID) would effectively skip results if a booking with a later BookingDate has a lower BookingID than a previous page.

Finally, it would seem the best cursor to use would be a ROWNUMBER generated by SQL Server. That way I can do my ORDER BY clause across multiple fields and then in the WHERE clause filter by ROWNUMBER/cursor.

Ideal solution
Essentially what I'd like to be able to do take my existing IQueryable and just tell it to do a WHERE ROWNUMBER > cursor, and the OVER expression just dupes the ORDER BY of my query, happy days. Is that possible, or would it be easy to add support for that?

OFFSET solution
@corstian , issue poster here wrote his blog post with the same solution he's proposed above. Essentially he's calculating the OFFSET based on a subquery for the ROWNUMBER.

Is it possible now to re-use the IQueryable I've already built up with various filters and other logic earlier, and tack on this OFFSET clause with a subquery? Is there an elegant way to do that now in EF Core pending his SkipWhile suggestion above?

PK lookup first, then query results solution
Failing that, I think I'd be able to live with doing an upfront for the page of BookingIDs I'd need to do my full query on.

  1. SELECT BookingID, ROWNUMBER() OVER(BookingDate, BookingID) AS cursor FROM Booking ORDER BY BookingDate, BookingID) WHERE cursor > myCursor
    2.`SELECT * FROM Booking WHERE BookingID IN (...list of booking ids grabbed in above)

But similarly to above, I want to re-use my IQueryable and tack on the ROWNUMBER into my SELECT, how can I do that elegantly?

Painful solution
Use @corstian 's SqlKata solution to maybe do the PKs lookup first, therefore implementing all my filtering in that ORM and the queries to get totalCount, hasNextPage, hasPreviousPage, etc., just to get a list of PKs to do my real, object-returning EF query. Could be worse, but it's another dependency to bring on board probably with it's own quirks and not as much LINQ support as EF Core team can provide one would imagine.

@corstian

This comment has been minimized.

Copy link
Author

corstian commented Feb 7, 2019

The biggest issue is that I did not succeed in materializing the IQueryable in a way which is usable for me. My current solution is a way of building queries with SqlKata, comparable with EF LINQ, but weakly typed. It is this query which I am using as a subquery for pagination purposes. This way I could tackle all the problems you have described above (and yes, I'd also rather do that with EF Core, although SqlKata is an amazing library).

I'll write up another blogpost soon showing the details about my current solution, and maybe put it in a handy NuGet package. I'm currently running around on a exhibition so it might take a few days though.

@benmccallum

This comment has been minimized.

Copy link

benmccallum commented Feb 7, 2019

Thanks for the response. It definitely should be easier than it is, hopefully the EF Core team can prioritise it in the backlog. Would love to see your solution. I may need to do the same. I found a .ToSql() extension method that gets the raw SQL but like you said it's a bit of a nightmare to be dealing with something so raw. Theoretically I could just pop on the OFFSET with subquery in there and see what happens, but it's pretty loose haha.

Update: scratch that. Few people have said it's super slow to generate the SQL, which makes sense given it's using reflection. I wish I could just generate the WHERE clause somehow though, to pop it into a raw query for the PKs and ROWNUMBER()...

cc: @RahenSaeed , this will be useful once it's in place as far as EF Core goes for implementing cursor-based pagination effectively in real-world scenarios.

@corstian

This comment has been minimized.

Copy link
Author

corstian commented Feb 7, 2019

@benmccallum I'll elaborate it more expansively later, but this is the gist of it:

/// <summary>
/// A ggeneric method for applying cursor based pagination
/// </summary>
/// <param name="query">The base query to apply pagination to</param>
/// <param name="queryFactory">A queryfactory instance</param>
/// <param name="column">The column to use for cursor based pagination</param>
/// <param name="cursor">The cursor value</param>
/// <param name="count">The number of items to retrieve after the cursor</param>
/// <returns></returns>
public static Query CursoredOffset(
    this Query query,
    QueryFactory queryFactory,
    string column,
    string cursor,
    int count)
{
    var compiler = new SqlServerCompiler();
    var ctx = new SqlResult
    {
        Query = query
    };
            
    var order = compiler.CompileOrders(ctx);

    query.Clauses.RemoveAll(q => q.Component == "order");

    if (string.IsNullOrWhiteSpace(order))
    {
        throw new Exception($"{nameof(query)} does not have an order by clause");
    }

    var internalQuery = queryFactory.Query()
        .With("d", query)
        .From("d")
        .CombineRaw(order)
        .CombineRaw($@"OFFSET (
            SELECT TOP 1 rn
            FROM(
                SELECT [d].{column}, ROW_NUMBER() OVER({ order }) AS rn FROM [d]
            ) T
            WHERE T.{column.Split('.').LastOrDefault() ?? column} = '{cursor}'
        ) rows
        FETCH NEXT {count} ROWS ONLY");

    return internalQuery;
}

So few things to note here; it uses the query factory and makes life easier for me. The SqlKata query is required to have at least one order specified. If you want to retrieve the resultcount you can grab the count from the query you provided the method above.

@benmccallum

This comment has been minimized.

Copy link

benmccallum commented Feb 8, 2019

Thx for sharing. I've taken a slightly different approach for now as an example to our team and we're just deciding whether we want to roll that out around the place or try wait for EF Core to support something and avoid another dependency. I almost wonder how hard it would be (without any EF Core repo experience) for us to have a crack at this ourselves. My fear is that the combination of standard OrderBy, ThenBy and Take could makes things difficult unless we can detect the SkipWhile and branch off to generate the OFFSET clause.

My POC to my team I build the filtering query in SqlKata, then hand off to a function (that I could probably make generic if I parameterised it a bit more) that fires off the necessary queries to get the page of records' PKs with the SqlKata query and then feeding them into the EF Core query. It's super roundabout but at least I can support the whole Connection (hasNextPage, hasPreviousPage, totalCount, edges) in a way that I could make generic with a few more parameters into the query.

Code dump below in case it's useful. Note: there's a few things in there that look hairy, but essentially I'm using AutoMapper's ProjectTo<TDto> to only get the fields requested by the GQL client from the DB and a Connection wrapper built with a static method For to help me out a bit.

I quite like how I've been able to use Expressions to define the predicates for use in the query later, and that's even cleaner and strongly-typed when EF Core is possible, so the end goal is really to be able to pass all those expressions in to this helper method to get the paged result.

public async Task<ConnectionResponse<BookingDto>> Handle(Query request, CancellationToken cancellationToken)
{
    // Init query
    var db = new QueryFactory(new SqlConnection(_db.Database.GetDbConnection().ConnectionString), new SqlServerCompiler());
    var query = db.Query("Booking");

    // ... apply all query filters here...

    // Paging - the fun begins
    var totalCount = await query.Clone().CountAsync<int>();
    if (totalCount < 1)
    {
        return ConnectionResponse<BookingDto>.Empty();
    }

    var orderByFwd = $"{nameof(BookingEntity.ServiceDate)}, {nameof(BookingEntity.BookingID)}";
    var orderByBwd = $"{nameof(BookingEntity.ServiceDate)} DESC, {nameof(BookingEntity.BookingID)} DESC";
    return await GetPageByCursor(request, totalCount, query, orderByFwd, orderByBwd, cancellationToken);
}

private async Task<ConnectionResponse<BookingDto>> GetPageByCursor(Query request, int totalCount,
    SqlKata.Query query,
    string orderByFwd,
    string orderByBwd,
    CancellationToken cancellationToken)
{
    var getPage = "1 = 1";
    Func<int, string> hasPrevPageFactory = null;
    Func<int, string> hasNextPageFactory = null;
    var currentCursor = Cursor.FromCursor<int?>(request.PagingArgs.After ?? request.PagingArgs.Before);
    var limit = request.PagingArgs.First ?? request.PagingArgs.Last ?? 1;

    var isPagingForward = request.PagingArgs.First.HasValue;
    if (isPagingForward)
    {
        if (currentCursor.HasValue)
        {
            getPage = $"[RowNumber] > {currentCursor}";
            hasPrevPageFactory = currCursor => $"[RowNumber] <= {currCursor}";
        }
        hasNextPageFactory = nextCursor => $"[RowNumber] > {nextCursor}";
    }
    else
    {
        if (currentCursor.HasValue)
        {
            getPage = $"[RowNumber] < {currentCursor}";
            hasNextPageFactory = currCursor => $"[RowNumber] >= {currCursor}";
        }
        hasPrevPageFactory = nextCursor => $"[RowNumber] < {nextCursor}";
    }            

    var orderBy = isPagingForward ? orderByFwd : orderByBwd;
    query
        .SelectRaw($"ROW_NUMBER() OVER (ORDER BY {orderBy}) AS [RowNumber]")
        .Select("BookingID", "ServiceDate")
        .As("rows");

    var db = new QueryFactory(new SqlConnection(_db.Database.GetDbConnection().ConnectionString), new SqlServerCompiler());
    var getPageQuery = db.Query()
        .Select("BookingID", "RowNumber")
        .From(sq => query)
        .WhereRaw(getPage)
        .OrderByRaw(orderBy)
        .Take(limit);
    //var sqlResult = db.Compiler.Compile(getPageQuery);

    var page = db.Get(getPageQuery);
    var bookingIds = page.Select(b => (int)b.BookingID).ToList();
    var cursorsByBookingId = page.ToDictionary(b => (int)b.BookingID, b => (int)b.RowNumber);
    var cursors = page.Select(b => (int)b.RowNumber);

    var items = (await _db.Bookings
        .AsNoTracking()
        .Where(b => bookingIds.Contains(b.BookingID))
        .ProjectToArrayAsync<BookingDto>(_mapperConfig, request.FieldsRequested, cancellationToken))
        .OrderBy(b => bookingIds.IndexOf(b.BookingID))
        .ToArray();

    var hasPrevPageTask = Task.FromResult(false);
    if (request.PagingArgs.IsHasPreviousPageRequested && hasPrevPageFactory != null)
    {
        var cursor = isPagingForward ? currentCursor : (cursors.Any() ? cursors.First() : (int?)null);
        if (cursor.HasValue)
        {
            var hasPrevPage = hasPrevPageFactory(cursor.Value);
            hasPrevPageTask = db.Query()
                .From(sq => query)
                .WhereRaw(hasPrevPage)
                .OrderByRaw(orderBy)
                .AnyAsync();
        }
    }

    var hasNextPageTask = Task.FromResult(false);
    if (request.PagingArgs.IsHasNextPageRequested && hasNextPageFactory != null)
    {
        var cursor = isPagingForward ? (cursors.Any() ? cursors.Last() : (int?)null) : currentCursor;
        if (cursor.HasValue)
        {
            var hasNextPage = hasNextPageFactory(cursor.Value);
            hasNextPageTask = db.Query()
                .From(sq => query)
                .WhereRaw(hasNextPage)
                .OrderByRaw(orderBy)
                .AnyAsync();
        }
    }

    Task.WaitAll(hasPrevPageTask, hasNextPageTask);

    return ConnectionResponse<BookingDto>.For(items, b => cursorsByBookingId[b.BookingID],
        totalCount, await hasPrevPageTask, await hasNextPageTask);
}
@benmccallum

This comment has been minimized.

Copy link

benmccallum commented Feb 11, 2019

Having a dig into the EF Core code, I'm thinking we could take inspiration from how they do .UseRowNumberPaging() internally. See the implementation of the SELECT generation here.

@corstian, how would you suggest supporting reverse direction paging with your proposal. Would it just be a matter of reversing the ORDER BY expression and taking LIMIT? I guess so.

I'd be keen to hear whether @smitpatel or @ajcvickers think we can support your suggested syntax or whether we'd use a specific cursor-paging extension methods, like: GetNextPageByCursor(cursor, limit) and GetPreviousPageByCursor(cursor, limit), and then it's just a matter or extracting the ORDER BY in the query (if any), generating the OFFSET subquery and LIMIT and tacking it on.

I might give it a crack if we have some guidance from the core team first as to how they'd like it to look.

@corstian

This comment has been minimized.

Copy link
Author

corstian commented Feb 11, 2019

According to the docs it should be possible to use FETCH PRIOR instead of FETCH NEXT. Haven't tried that yet, though.

I think @benmccallum makes a valid point when it comes to specific cursor paging extension methods. While technically it would be totally possible to support backwards pagination using the SkipWhile method there is no way this will make sense semantically. In a more LINQ way I'd be thinking about something like:

dbSet
    .OrderBy(q => q.Timestamp)
    .From(q => q.Id == Guid.NewGuid())
    .Next(100);
// or `.Previous(100);`
@benmccallum

This comment has been minimized.

Copy link

benmccallum commented Feb 12, 2019

Ah, yea should've realised there was a FETCH PRIOR, didn't look at the time.

I like your LINQ method names more than mine! I wonder if From could get confused with SQL FROM clause though. Perhaps FromMatch or FromWhere (that's a little weird), but you get what I mean.

@benmccallum

This comment has been minimized.

Copy link

benmccallum commented Feb 13, 2019

Been digging into the EF Core source code and it's a little overwhelming haha. Came unstuck on the Getting And Building the Code guide running build /t:Compile.
image

If I can get past this then I'll be able to navigate the codebase a bit better. I want to try and find some tests that would be a good starting point to copy, paste, manipulate for what we want and then TDD this thing.

Update:
Strangely enough, running a straight up build without specifying the Compile target worked, so maybe the docs are out of date and like the error msg says, that target no longer exists...

Update 2:
Installed VS 2019 Preview and all the projects fail to open with the following. Calling it for tonight. C:\GitHub\EntityFrameworkCore\src\dotnet-ef\dotnet-ef.csproj : error : Project file is incomplete. Expected imports are missing.

I think the issue is that I can't run build /t:Compile and maybe that is why I don't have C:\Users\Developer/.dotnet/x64 which should be added as part of running that cmd.

@roji

This comment has been minimized.

Copy link
Member

roji commented Feb 14, 2019

@benmccallum the docs are indeed out of date with regards to /t:Compile, as you've found out a simple build should run fine.

The latest version of VS 2019 is a requirement to build the source. After build completes successfully, try starting VS by running startvs EFCore.Runtime.sln (that's a reduced solution which doesn't contain some projects which shouldn't be necessary for what you're trying to do). You can achieve the same by dragging and dropping the solution file over startvs.cmd in Windows Explorer.

If you're still having issues please post back.

@benmccallum

This comment has been minimized.

Copy link

benmccallum commented Feb 23, 2019

Thanks @roji , I haven't tried this yet as in the end I realised it might be better for me to try implement this in 2.2 and port forward to 3.0. Will see how I go!

@corstian

This comment has been minimized.

Copy link
Author

corstian commented Mar 6, 2019

I finally got to finish that post on cursor based pagination. And I found ways to paginate both forwards and backwards, with reasonable speed.

TL;DR: Put the SQL statement in a CTE, select ROW_COUNT within the CTE, and use this for your final selection.

benmccallum added a commit to benmccallum/EntityFrameworkCore that referenced this issue Mar 10, 2019

@benmccallum

This comment has been minimized.

Copy link

benmccallum commented Mar 10, 2019

So I've been trying to introduce two new extension methods: TakeFromMatch and TakeBeforeMatch, both accepting matchPredicate and count. Naming would be subject to whatever everyone agrees is best of course. For the meantime, I've been trying to reverse-engineer how EF works, via unit tests and as I expected it's a beast.

PR draft linked above for where I've got to atm.

So far I've been able to:

  1. Add the extension methods in EntityFrameworkQueryableExtensions.
  2. With this in place the test failed and led me to find the MethodInfoBasedNodeTypeRegistryFactory which registers the MethodInfo setup above as a supported method in a MethodCallExpressionNodeBase implementation.
  3. Taking inspiration from the existing ones in EFCore\Query\ResultOperators\Internal, I've setup a TakeBesideMatchExpressionNode and TakeBesideMatchResultOperator, but this I'm a bit unsure at the moment.

Questions:

  1. Should my TakeBesideMatchExpressionNode be inheriting ResultOperatorExpressionNodeBase or one of the other ones here
  2. Should my TakeBesideMatchResultOperator be inheriting SequenceTypePreservingResultOperatorBase or one of the other ones here

I think next time I have a crack I need to take reference from how the Take/Skip or OrderBy implementations work, or maybe even the Where one. It seems like Re-linq Remotion is all about building a QueryModel, so I guess my goal is to figure out how to represent the OFFSET FETCH NEXT or previous in that query model and make that manipulation.

@corstian, great blog post btw!

@smitpatel

This comment has been minimized.

Copy link
Contributor

smitpatel commented Mar 14, 2019

@benmccallum - With Relinq going away, all that processing would be unnecessary. We need to add additional methods to translate newly added methods. Currently the PR I have is much in flux. I will let you know once it starts stabilizing but it would be certainly easier to implement this than with Relinq.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.