Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes feed with descending=true is incorrect #4714

Open
glynnbird opened this issue Aug 2, 2023 · 10 comments
Open

Changes feed with descending=true is incorrect #4714

glynnbird opened this issue Aug 2, 2023 · 10 comments

Comments

@glynnbird
Copy link
Contributor

Description

A CouchDB changes feed from a database where q is > 1, when queried with ?descending=true provides a set of results with repeated seq values and a negative pending value.

{
  "results": [
    {
      "seq": "4-g1AAAAfjeJy91E9OhDAUBvDqmLh0bqBrE5DS8qcr5wZ6A32vLSETBowDa72B3kBvoDfQG-gN9AZ6AwZSE-isBhPYtAkJ75ePryUjhMzTmSKnCmVxqxcKOXVx5UjpVGtnXVRl6njclVlRKchLN9dl1ryzDwSP6rpepjMgq-bBIXIRSImDJtkwi3aBcd6seGzZkiMoHgwathV6N_uktc8sWwQRAAsGDduy-U6219rnlh0q5oOAQcP-UzYuWvvCsn3gKJN4_L4vW_vqz94zZ43GGITe-H1ft3Zh5Y5ChrE_Qd83rX1n5eYq9CjK0fvOD5qV3Ddbwz902YFHcUjD0Ts3_qPxn3p-wBOqx7_nxn82_kvv3AlGpfBG7974r8Z_6_JTRlWMdKL-343_0fmaAgjtT9T_p_G_uu8fg9JaRRP1_238n15-xUB4cqL-f43f-_coLlkC_qCByw34y4Gw",
      "id": "6f92c987479f11d6a435ef6e54b79341",
      "changes": [
        {
          "rev": "1-23202479633c2b380f79507a776743d5"
        }
      ]
    },
    {
      "seq": "4-g1AAAAfjeJy91E9OhDAUBvDqmLh0bqBrE5DS8qcr5wZ6A32vLSETBowDa72B3kBvoDfQG-gN9AZ6AwZSE-isBhPYtAkJ75ePryUjhMzTmSKnCmVxqxcKOXVx5UjpVGtnXVRl6njclVlRKchLN9dl1ryzDwSP6rpepjMgq-bBIXIRSImDJtkwi3aBcd6seGzZkiMoHgwathV6N_uktc8sWwQRAAsGDduy-U6219rnlh0q5oOAQcP-UzYuWvvCsn3gKJN4_L4vW_vqz94zZ43GGITe-H1ft3Zh5Y5ChrE_Qd83rX1n5eYq9CjK0fvOD5qV3Ddbwz902YFHcUjD0Ts3_qPxn3p-wBOqx7_nxn82_kvv3AlGpfBG7974r8Z_6_JTRlWMdKL-343_0fmaAgjtT9T_p_G_uu8fg9JaRRP1_238n15-xUB4cqL-f43f-_coLlkC_qCByw34y4Gw",
      "id": "c712333ff3f3297b9df038cea77b9cb2",
      "changes": [
        {
          "rev": "1-23202479633c2b380f79507a776743d5"
        }
      ]
    },
    {
      "seq": "4-g1AAAAfjeJy91E9OhDAUBvDqmLh0bqBrE5DS8qcr5wZ6A32vLSETBowDa72B3kBvoDfQG-gN9AZ6AwZSE-isBhPYtAkJ75ePryUjhMzTmSKnCmVxqxcKOXVx5UjpVGtnXVRl6njclVlRKchLN9dl1ryzDwSP6rpepjMgq-bBIXIRSImDJtkwi3aBcd6seGzZkiMoHgwathV6N_uktc8sWwQRAAsGDduy-U6219rnlh0q5oOAQcP-UzYuWvvCsn3gKJN4_L4vW_vqz94zZ43GGITe-H1ft3Zh5Y5ChrE_Qd83rX1n5eYq9CjK0fvOD5qV3Ddbwz902YFHcUjD0Ts3_qPxn3p-wBOqx7_nxn82_kvv3AlGpfBG7974r8Z_6_JTRlWMdKL-343_0fmaAgjtT9T_p_G_uu8fg9JaRRP1_238n15-xUB4cqL-f43f-_coLlkC_qCByw34y4Gw",
      "id": "c9fb072787c20167ab71bd567f03c3d7",
      "changes": [
        {
          "rev": "1-23202479633c2b380f79507a776743d5"
        }
      ]
    },
    {
      "seq": "4-g1AAAAfjeJy91E9OhDAUBvDqmLh0bqBrE5DS8qcr5wZ6A32vLSETBowDa72B3kBvoDfQG-gN9AZ6AwZSE-isBhPYtAkJ75ePryUjhMzTmSKnCmVxqxcKOXVx5UjpVGtnXVRl6njclVlRKchLN9dl1ryzDwSP6rpepjMgq-bBIXIRSImDJtkwi3aBcd6seGzZkiMoHgwathV6N_uktc8sWwQRAAsGDduy-U6219rnlh0q5oOAQcP-UzYuWvvCsn3gKJN4_L4vW_vqz94zZ43GGITe-H1ft3Zh5Y5ChrE_Qd83rX1n5eYq9CjK0fvOD5qV3Ddbwz902YFHcUjD0Ts3_qPxn3p-wBOqx7_nxn82_kvv3AlGpfBG7974r8Z_6_JTRlWMdKL-343_0fmaAgjtT9T_p_G_uu8fg9JaRRP1_238n15-xUB4cqL-f43f-_coLlkC_qCByw34y4Gw",
      "id": "801f6800f561950b6a0dadc353512e7c",
      "changes": [
        {
          "rev": "1-23202479633c2b380f79507a776743d5"
        }
      ]
    }
  ],
  "last_seq": "4-g1AAAAfjeJy91E9OhDAUBvDqmLh0bqBrE5DS8qcr5wZ6A32vLSETBowDa72B3kBvoDfQG-gN9AZ6AwZSE-isBhPYtAkJ75ePryUjhMzTmSKnCmVxqxcKOXVx5UjpVGtnXVRl6njclVlRKchLN9dl1ryzDwSP6rpepjMgq-bBIXIRSImDJtkwi3aBcd6seGzZkiMoHgwathV6N_uktc8sWwQRAAsGDduy-U6219rnlh0q5oOAQcP-UzYuWvvCsn3gKJN4_L4vW_vqz94zZ43GGITe-H1ft3Zh5Y5ChrE_Qd83rX1n5eYq9CjK0fvOD5qV3Ddbwz902YFHcUjD0Ts3_qPxn3p-wBOqx7_nxn82_kvv3AlGpfBG7974r8Z_6_JTRlWMdKL-343_0fmaAgjtT9T_p_G_uu8fg9JaRRP1_238n15-xUB4cqL-f43f-_coLlkC_qCByw34y4Gw",
  "pending": -4
}

Note the changes feed with descending=false (the default) is fine. This only affects ?descending=true requests.

Steps to Reproduce

Create a database and populate:

curl -X PUT '/mydb?q=16'
curl -X POST -H'Content-type:application/json' -d'{}' "$COUCH_URL/mydb"
curl -X POST -H'Content-type:application/json' -d'{}' "$COUCH_URL/mydb"
curl -X POST -H'Content-type:application/json' -d'{}' "$COUCH_URL/mydb"
curl -X POST -H'Content-type:application/json' -d'{}' "$COUCH_URL/mydb"

Fetch the changes feed with ?descending=true

curl "$COUCH_URL/mydb/_changes?descending=true"

Note: if I use q=1 on database creation, then I correctly get four different seq values back in descending order, but still with a negative pending value.

Expected Behaviour

It is expected that fetching a CouchDB's changes feed with descending=true would provide an array of results in descending seq order, a last_seq and a positive pending number indicating how many more changes are available to fetch.

Your Environment

  • 3.3.1
  • curl
  • MacOS Ventura - M1
@nickva
Copy link
Contributor

nickva commented Aug 2, 2023

I had tried it in 2.3.1 and I see the same negative pending value behavior. Not clear if it was meant to do that (there is a plausible semantic explanation for it) or it was a genuine bug.

@nickva
Copy link
Contributor

nickva commented Aug 2, 2023

Returning some of the same start sequence numbers comes as a side-effect of having individual shards.

Each shard has an integer sequence number that's incrementing. 0 is a special "start" sequence value. When we emit a sequence row in the changes feed we combine all the sequences from individual shards, the numeric prefix is the sum of all the individual sequences in each range. In the regular, forward direction, all sequences initialize at 0. So if there are 2 shard ranges (q=2), 2 docs on one range and 3 on the other, we might get this sequence:

  0+1 = 1
  1+1 = 2
  1+2 = 3
  1+3 = 4
  2+3 = 5

Could also be 0+1, 0+2, 0+3, 1+3, 2+3, etc.

In the reverse case, we initialize first with the top sequence from each shard range (2 and 3) instead of 0. So then, if we don't hear from a row yet we assume it's has its top sequence (2 or 3), so we might get:

  2+3 = 5
  2+3 = 5
  1+3 = 4
  1+2 = 3
  1+1 = 2

So we get 5 emitted twice. For a large Q we'd get a higher chance of emitting identical sum prefixes with a frequency proportional to Q or so.

The prefix bit, turns out, is mostly cosmetic, it's usually thrown away anyway when we get back a sequence in Apache CouchDB. (Though, older version of CouchDB or other couches might parse it). We can probably improve this part and attempt to emit a decrementing sum, at the expense that it won't actually match the sum of sequences from the encoded per-node sequence part (-g1AAAAf...) associated with it.

@nickva
Copy link
Contributor

nickva commented Aug 2, 2023

I guess a plausible interpretation of the pending number is that it reflects both the direction and offset from the end of the changes feed, regardless in which direction we're traversing.

In a simple q=1 case with 5 changes, where the end/top is 5:

  • If limit=2&descending=false we emit 2 docs and get pending = +3. That is, 3 more docs to go forward to get to 5.

  • If we do limit=2&descending=true, we get pending = -2 since we have to go 2 docs backwards to get to the 5.

I think it would make more sense to return it as a positive value, reflecting how many more rows are left till the end of the current direction of traversal. If we're traversing backwards it would be how many till we reach 0, so it would return 3 in both cases. But, given it's been like this for quite a while I don't know if we should change it...

nickva added a commit that referenced this issue Aug 3, 2023
Previously, pending changes count for the `descending=true` didn't indicate the
number of pending changes. Instead, it returned the negative value of the
already emitted rows. So for `descending=true&limit=2`, it would return -2; for
`descending=true&limit=5`, it would return -5. That wasn't a very useful value,
so let's fix it to return the actual number of pending changes.

When traversing the changes feed backwards, the number of pending changes until
the start will be equal the total number of changes minus the number of changes
from the current start sequence until the end.

Issue: #4714
@nickva
Copy link
Contributor

nickva commented Aug 3, 2023

Let's try to fix the pending value to actually return something useful: #4715

@glynnbird
Copy link
Contributor Author

glynnbird commented Aug 3, 2023

Thanks for looking @nickva and for looking at the pending value 👍.

The repeated values make it tricky, and impossible in some cases, to traverse the changes feed backwards, but it's a bit of a niche requirement (I've been using CouchDB for a decade and have never hit the changes feed with descending=true).

@rnewson
Copy link
Member

rnewson commented Aug 3, 2023

in theory it should be possible to fix _changes so it works in reverse order, I remember some previous fixes in this space. I suspect it's a bigger fix than this though. Just fixing the pending count but not addressing the encoded sequences doesn't seem useful. It is potentially misleading even.

@nickva
Copy link
Contributor

nickva commented Aug 3, 2023

Just fixing the pending count but not addressing the encoded sequences doesn't seem useful. It is potentially misleading even.

Hmm, I don't see how the pending count fix in #4715 is misleading. It shows the number of changes left to process in the direction of the traversal. It's should be about the same level of usefulness as the forward pending count. Perhaps the pending count overall is not a terribly used thing? But at lest they can now both work in about the same way.

@rnewson
Copy link
Member

rnewson commented Aug 3, 2023

"pending" means "this is how many newer changes there are yet to be received" so it only means pending in one direction

@nickva
Copy link
Contributor

nickva commented Aug 3, 2023

"pending" means "this is how many newer changes there are yet to be received"

Exactly. That's what #4715 fixes it to be.

Our docs state that:

pending (number) – Count of remaining items in the feed

@rnewson
Copy link
Member

rnewson commented Aug 3, 2023

🤷 ok then.

nickva added a commit that referenced this issue Aug 3, 2023
Previously, pending changes count for the `descending=true` didn't indicate the
number of pending changes. Instead, it returned the negative value of the
already emitted rows. So for `descending=true&limit=2`, it would return -2; for
`descending=true&limit=5`, it would return -5. That wasn't a very useful value,
so let's fix it to return the actual number of pending changes.

When traversing the changes feed backwards, the number of pending changes until
the start will be equal the total number of changes minus the number of changes
from the current start sequence until the end.

Issue: #4714
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants