Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: X-Result-Count is always the limit when limit is specified #9422

Open
1 task done
mdavis332 opened this issue Nov 29, 2023 · 2 comments
Open
1 task done

Bug: X-Result-Count is always the limit when limit is specified #9422

mdavis332 opened this issue Nov 29, 2023 · 2 comments
Labels
needs triage This issue has been automatically labelled and needs further triage

Comments

@mdavis332
Copy link

Actual behavior

on /attributes/restSearch endpoint when specifying a limit param, the X-Result-Count response header is usually limit+1 (the +1 seems odd).

Expected behavior

I would expect that X-Result-Count always contains the total number of overall results for pagination.

Steps to reproduce

curl -vv \
 -d '{"returnFormat":"json","enforceWarninglist":"true","to_ids":"true","last_seen":["2023-11-28T13:56:54.841Z","2023-11-29T15:23:18.000Z"],"limit":1000}' \
 -H "Authorization: <api_key>" \
 -H "Accept: application/json" \
 -H "Content-type: application/json" \
 -X POST http://misp_host/attributes/restSearch

See that the X-Result-Count = 1001

Version

2.4.178

Operating System

CentOS

Operating System version

Stream 8

PHP version

7.4

Browser

Chrome, Edge

Browser version

119.0.2151.72 (64-bit)

Relevant log output

No response

Extra attachments

This appears to be similar to #7950 and #4161.

Additionally, correcting this behavior would, I believe, resolve #9129 since you could specify a limit of 0 or 1 to effectively get the same end result. That, or giving the /attributes/restSearch endpoint the ability to take the same metadata flag as the /events/restSearch endpoint.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@mdavis332 mdavis332 added the needs triage This issue has been automatically labelled and needs further triage label Nov 29, 2023
@advbarbieux
Copy link

advbarbieux commented Mar 12, 2024

Facing the same issue in a project involving MISP at my company.

Further investigation on our end has led us to the following conclusions.

1. Most of the time, X-Result-Count equals page * limit + 1 when querying all pages prior to the last page

On a local instance of MISP (based on the Docker image) running on my machine with attributes fed from the CIRCL OSINT Feed default feed (277,934 attributes), the POST /attributes/restSearch API yielded the following results upon querying 10,000-attribute pages:

  • Page 1 ({"page": 1, "limit": 10000}) returns 10,000 results with X-Result-Count header equal to 10,001;
  • Page 2 ({"page": 2, "limit": 10000}) returns 10,000 results with X-Result-Count header equal to 20,001;
  • ...
  • Page 27 ({"page": 27, "limit": 10000}) returns 10,000 results with X-Result-Count header equal to 270,001;

Through source code-level investigation, we figured out the following:

2. Most of the time, X-Result-Count equals the correct total number of attributes when querying the last page

That is, (page - 1) * limit + <number of attributes in the last page>.

In our example, querying page 28 ({"page": 28, "limit": 10000}) returns 7,934 results with a X-Result-Count header equal to 277,934, which is the expected result.

The same process as described above leads to this correct result, thanks to 2 changes:

  • The overwriting of $elementCounter in Attribute.php:1897 is based on count($results), which equals the number of attributes in the last page (< limit) when querying the last page;
  • In Attribute.php:3245, $incrementTotalBy is set to 0 when the number of attributes retrieved is lower than the limit (signalling that we've just reached the last page), so the + 1 in the page * limit + 1 formula in section 1. of this comment no longer applies.

3. X-Result-Count equals (page - 1) * limit + 1 when querying all pages beyond the total number of attributes

In our example:

  • querying page 29 ({"page": 29, "limit": 10000}) returns 0 result with X-Result-Count header equal to 280,001, (not 290,001);
  • querying page 30 ({"page": 30, "limit": 10000}) returns 0 result with X-Result-Count header equal to 290,001, (not 300,001)
  • and so on.

This is due to the same reasons as detailed in 2., with count($results) equal to 0 this time.

4. Should the total number of attributes be a multiple of limit, the API will never be able to return the actual total attribute count in X-Result-Count

However, there is a downside to the current approach used to detect the last page: whenever the total number of attributes is a multiple of limit, the API is no longer able to detect that we've reached the last page.

In our example, had we had exactly 270,000 entries in our database, the actual last page would have been page 27.
But due to the count($results) < $params['limit'] condition in Attribute.php:3245, the $incrementTotalBy variable would not have been set to 0, and the returned X-Result-Count value would have been 270,001.

Then, querying page 28 would have returned 0 result with a X-Result-Count header equal to 270,001 by the logic described in 3..

Therefore, at no point will the correct value (270,000) ever be returned by the API.

However, in the title of sections 1. and 2., I specified "most of the time", because of the following.

5. Behavior changes a fair bit as soon as you exclude decayed attributes

Specifying "excludeDecayed" : 1 in the API causes decayed attributes to be excluded as expected, but also causes both X-Result-Count and the page size to be inconsistent.

As an example, with decay models "Phishing model", "NIDS Simple Decaying Model" and "Vishing model" on the very same dataset in my local MISP instance:

  • Page 1 ({"page": 1, "limit": 10000}) : returns 5,097 results with X-Result-Count header equal to 10,000;
  • Page 2 ({"page": 2, "limit": 10000}) : returns 6,096 results with X-Result-Count header equal to 20,000;
  • Page 3 ({"page": 3, "limit": 10000}) : returns 8,102 results with X-Result-Count header equal to 30,000;
  • ...
  • Page 26 ({"page": 26, "limit": 10000}) : returns 3,354 results with X-Result-Count header equal to 260,000;
  • Page 27 ({"page": 27, "limit": 10000}) : returns 5,662 results with X-Result-Count header equal to 270,000;
  • Page 28 ({"page": 28, "limit": 10000}) : returns 7,288 results with X-Result-Count header equal to 277,934;
  • Page 29 ({"page": 29, "limit": 10000}) : returns 0 result with X-Result-Count header equal to 280,001;

Here, the X-Result-Count value no longer matches the number of matching attributes anymore.
Instead, it matches the number of non-decayed matching attributes found up to the current page (page * limit), until the last page is reached ((page - 1) * limit + <number of attributes in last page> attributes are then returned).
After reaching the last page, the X-Result-Count is computed as (page - 1) * limit + 1 as described in 3..*

In this example we're rather lucky, because none of the pages returns 0 result, but nothing stops the MISP API as-is from returning empty pages before the last page should we retrieve a page of which all attributes are decayed.

This means API clients cannot rely on the condition that a page is empty to detect whether the end of the attribute list has been reached.

Investigation on our end showed the following:

  • Decayed attributes are filtered out, not by the actual database query, but using a continue PHP instruction in a for loop iterating over all (and potentially non-decayed) attributes in Attribute.php:1962;
  • But the $result_count is still computed as count($results) ($results being the attribute entries retrieved using the database query before decay score-based filtering);
  • Also, due to the number of results returned by Attribute::fetchAttributes being strictly lower than limit, the $incrementTotalBy = 0; instruction in Attribute.php:3245 is executed and causes all pages until the last one to no longer add 1 to the value of X-Result-Count, as though we had we reached the last page;
  • After reaching the last page, the break instruction in Attribute.php:3231 is executed because no entry has been retrieved, so we're back to incrementing the X-Result-Count header value by 1, causing the 280,001 result for page 29 above.

Overall, API clients wishing to retrieve all attributes can work around this unexpected behavior exhibited by the X-Result-Count header by querying pages starting with page 1 until an X-Result-Count value lower or equal to (page - 1) * limit + 1 is returned, but this stop condition seems rather unintuitive, and depending on this behavior to query all attributes appears risky to us.

Fixing this issue appears to require a fair bit of redesigning, but could the following be considered?

  • Returning an HTTP 404 error when querying a page beyond the last page in order to provide a reliable pagination stop condition;
  • Update the docs to further elaborate on values returned by the X-Result-Count header, which is not the total number of attributes matching the filters but more of an indicator telling us whether we've reached the end of the attribute list.

These sound to us like reasonable quick wins.

@mdavis332
Copy link
Author

incredible sleuthing! and nice suggestions for some quick wins. perhaps that will help the team address some of this more quickly. thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage This issue has been automatically labelled and needs further triage
Projects
None yet
Development

No branches or pull requests

2 participants