Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to paginate and limit #16

Closed
mingard opened this issue Jan 19, 2017 · 21 comments
Closed

Ability to paginate and limit #16

mingard opened this issue Jan 19, 2017 · 21 comments
Assignees

Comments

@mingard
Copy link

mingard commented Jan 19, 2017

Currently there doesn't seem to be a good way to paginate or limit the results. An ideal solution would be something like:

.select()
  .from({$p: Person})
  .where({$p: {name: "Joe"}})
  .limit(5)  // Result limit
  .offset(1) // Page
@anywhichway
Copy link
Owner

@mingard Not sure of the semantics on this. Don't you really want to paginate a cursor? Doing things at the query level assumes you know what page in a result set things exist or that the query will get called multiple times with the caller incrementing the page. This would be quite inefficient, although it would conserve memory since only the ids on the given page would be resolved into objects.

Just want to make sure your request is understood. It would not be hard to implement.

@mingard
Copy link
Author

mingard commented Jan 20, 2017

@anywhichway totally agree. As per my comments in #17 I think there's scope for a reasondb-wrapper that takes care of the semantics and syntactic sugar

@mingard
Copy link
Author

mingard commented Jan 20, 2017

@anywhichway I would be happy to build said wrapper

@anywhichway
Copy link
Owner

@mingard seems like page and offset are a core capability that many people would want. It is also not hard to implement within the existing query builder ... probably just a couple of hours. The syntactic sugar for approaching projections is another matter and if you want to write a wrapper that would be great!

@anywhichway
Copy link
Owner

@mingard this enhancement has been pushed to GitHub in the master branch. Added 5 units tests to cover it. The call is limit(pagesize).page(offset).

@mingard
Copy link
Author

mingard commented Jan 22, 2017

@anywhichway testing this now

@mingard
Copy link
Author

mingard commented Jan 23, 2017

@anywhichway works well! It could perhaps do with a few extra values on the cursor, so along with maxCount, a totalCount, offset and totalPages (being calculated by Math.ceil(totalCount/pageSize).

The only really important one there is totalCount. I've started work on the NodeJS driver/wrapper so much of this can be taken care of in the metadata block on output.

Should I raise a separate ticket for this?

@anywhichway
Copy link
Owner

What is totalCount? Is it the same as count? Or, is it the number of records returned so far?

@mingard
Copy link
Author

mingard commented Jan 23, 2017

@anywhichway so this is what I'm trying to get the metadata to look like:

{
  "total": "{total matching records, after when is applied}",
  "limit": "{maxCount, seems to return the number of results actually returned}",
  "page": "{current page}",
  "limit": "{The requested limit}"
}

@anywhichway
Copy link
Owner

@mingard maxCount is the number of possible records in the cursor. When one loops through the cursor, undefined rows may be returned if the query included joins. The reason for this is joins are not evaluated until there is an attempt to access a row. This is extraordinarily beneficial for large result sets. The ReasonDB engine does not have to compute all possible combinations prior to yielding a cursor. The downside is that it is not possible to know how many records actually exist without looping across the entire result set. Rows are generated on the fly based on their index. Take a look at http://phrogz.net/lazy-cartesian-product. The .count() method gives you an actual count but can be very expensive. count()===maxCount for all single entity queries. count()===maxCount for all queries where there are matches on both side of a join, i.e. the result is the crossproduct of all matching rows from either entity otherwise count()<maxCount. Do you need a concrete example to grasp this?

{
  "total": "{total matching records, after when (do you mean where?) is applied}", <-- probably the same as count().
  "limit": "{maxCount, seems to return the number of results actually returned}",
  "page": "{current page}",
  "limit": "{The requested limit}"
}

@mingard
Copy link
Author

mingard commented Jan 23, 2017

@anywhichway
Yes when should read where

So in an example where an index is populated by X documents, where Y is the number of documents that are returned after the where filter, and Z is the documents after a limit() and page() filter, I would ideally like to be able to display Y as total, being the count of all documents available with the filtered query, excluding pagination and pagesize, in order to include prevPage and nextPage in the metadata.

The issue I'm having is that maxCount will be 2 if I set the limit value to 2, even though without limit, the number of documents that will return after the where filter is far greater.

If indeed the page and limit methods don't have a positive impact on the response time on the server, it sounds like this is something I should be handling in a separate driver/wrapper so I'm able to leverage maxCount.

Ideally, I'd building something with a similar syntax as the nodeJS MongoDB Driver with a cursor wrapper to minimise boilerplate when using within the RESTful API work I've been using ReasonDB across.

@anywhichway
Copy link
Owner

Indeed, you may need a client side cursor. BTW, if you want an array that is just the results of the query with all joins resolved and no undefined rows, then use the code below. Just be careful, it is easy to create multi-million row results across multiple objects.

const results = await yourCursor.forEach((row) => row);

or

yourCursor.forEach((row) => row).then((results) { ... });

Would a utility method yourCursor.toArray() be useful?

@mingard
Copy link
Author

mingard commented Jan 24, 2017

@anywhichway toArray would certainly be useful. Currently I've been doing as you suggest and looping. Difference is I'm using node 6.9, so promise based

@mingard
Copy link
Author

mingard commented Jan 30, 2017

@anywhichway Started on the wrapper I mentioned: https://github.com/mingard/reasondb-wrapper
Very much work in progress :)

@anywhichway
Copy link
Owner

@mingard Took a quick look. I like it.

@mingard
Copy link
Author

mingard commented Feb 2, 2017

@anywhichway I've been testing with and without the page/limit method. with 5k records, all moderate in size (single articles with 20 fields) the entire collection returns on 0.1 seconds, but if I specify page = 1 and limit = 10 it takes 4 seconds

@mingard
Copy link
Author

mingard commented Feb 2, 2017

Update: It appears that regardless of the page and limit parameters, the more records in the cursor the slower it responds. By 5000 mid-sized records the response time is around 5 seconds

@mingard
Copy link
Author

mingard commented Feb 2, 2017

Further update: This only happens when I don't have a complex filter. There doesn't seem to be a way to get all results without specifying a where condition so I've been using this { '$obj': { '@key': { '$neq': null } } }. Is there an alternative option?

@anywhichway
Copy link
Owner

This is because of the lazy cross-product behind the cursor. The lazy cross-product does not know if joins will result in a row of nulls; hence, the maxCount of a lazy cross-product may be larger than the actuall number of useful rows. This is why the cursor can return so quickly. When you specify a page and size it forces the cursor to iterate across rows, resolve joins and ids to actual objects and return those. Try the same amount of data and simply loop through the cursor without paging and the app will appear to be faster since it can do something with each record before resolving the any other joins or ids.

The need for { '$obj': { '@key': { '$neq': null } } } is intentional. We want people to think hard before making a request to retrieve all records.

@mingard
Copy link
Author

mingard commented Feb 3, 2017

@anywhichway there are definitely times where a filter, other than page and limit, is required. Take the example of a CMS where you might wish to list all documents with pagination, and 20 per page. Currently with only 5k records each page load will take 3 seconds despite the fact you're requesting so little data at a time. What would you suggest here?

@anywhichway
Copy link
Owner

@mingard ReasonDB has re-implemented cursors. An attempt to load an object is now only done when the next value is pulled from a cursor. This works across network boundaries; hence, pagination is not necessary now. Just use slice and you will only get as many as you request:

db.select()
.from({$p: Person})
.where({$p: {name: "Joe"}})
.slice(5)
.forEach(item => ...)

alternatively, just keep the cursor available and access it one item at a time until you are done or have what you need. The cursors work with for(item of cursor) also.

const cursor = db.select()
.from({$p: Person})
.where({$p: {name: "Joe"}}).

for(await const item of cursor) {

}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants