Skip to content

Smoosh never triggered with conflicted documents #3410

@wknd

Description

@wknd

Autocompaction is never triggered in certain situations causing unusually large disk usage.

Description

When updating documents smoosh triggers at predictable intervals and cleans up the old files.
However, when the new data comes in via replication (pouchDB) this does not seem to trigger, or at least not in all cases.
Manually triggering compaction with curl does work.
This happened on a production server, so the exact cause is a bit unclear to me. But I was able to reproduce it on a fresh docker container using curl.

Steps to Reproduce

Set up some fresh install of couchdb, set a password and create a userdb. Whatever it is you usually do.
create a database and add a document to it 'testtest/8029e17efe934779a44c54c7050006ec' will be used in my examples.
In the document I set a random property with an extremely large string to be able to trigger it faster.

I use this curl config to make things a bit easier:

compressed
cookie = "cookie_couch.txt"
cookie-jar = "cookie_couch.txt"
header = "Content-Type: application/json"
header = "Accept: application/json"
write-out = "\n"

login on your couchdb server:

curl -X POST http://localhost:5984/_session -K curl-proxyauth.conf \
-d '{"name": "admin", "password": "adminPassword"}'

Proof smoosh triggers when expected:

this script will get the related document, then put the same one.
it'll return the sizes.active, sizes.file, the ratio.
it also returns a boolean to represent when default settings should trigger compaction.

db="testtest";
doc="8029e17efe934779a44c54c7050006ec";
while true; do
  curl -s -X GET http://localhost:5984/$db/$doc -K curl-proxyauth.conf | curl -s -X PUT http://localhost:5984/$db/$doc -K curl-proxyauth.conf -d @- > /dev/null;
  str=`curl -s -X GET http://localhost:5984/testtest -K curl-proxyauth.conf | jq '[.sizes.file, .sizes.active, .sizes.file/.sizes.active, .sizes.file/.sizes.active > 2, .sizes.file - .sizes.active >= 16777216]'`
  echo $str
  if [[ $str =~ "true" ]]; then
    break
  fi
done;
# ask and print once more
curl -s -X GET http://localhost:5984/$db -K curl-proxyauth.conf | jq '.sizes';

you should see the sizes.file increase constantly until the script stops. After which auto compaction will trigger (or because of rounding errors, it could be exactly 1 write later)

attempting to create document conflicts, not triggering smoosh

I wasn't exactly sure how to intentionally create conflicts, but this seemed to at the very least demonstrate the problem.
It is similar to the above aproach, except we're not going to stop after reaching the threshold.

db="testtest";
doc="8029e17efe934779a44c54c7050006ec";
originalrev=`curl -s -X GET http://localhost:5984/$db/$doc -K curl-proxyauth.conf | jq -r '._rev'`;
echo "creating branch on $originalrev";
# create update on correct branch
curl -s -X GET http://localhost:5984/$db/$doc -K curl-proxyauth.conf | curl -s -X PUT http://localhost:5984/$db/$doc -K curl-proxyauth.conf -d @- > /dev/null;
while true; do

  curl -s -X GET http://localhost:5984/$db/$doc -K curl-proxyauth.conf | jq -r --arg REV $originalrev '{ _id: ._id, extraTestData: .extraTestData, _rev: $REV}' | curl -s -X PUT http://localhost:5984/$db/$doc?new_edits=false -K curl-proxyauth.conf -d @- > /dev/null
  str=`curl -s -X GET http://localhost:5984/testtest -K curl-proxyauth.conf | jq '[.sizes.file, .sizes.active, .sizes.file/.sizes.active, .sizes.file/.sizes.active > 2, .sizes.file - .sizes.active >= 16777216]'`
  echo $str
  #if [[ $str =~ "true" ]]; then
  #  break
  #fi
done;
# ask and print once more
curl -s -X GET http://localhost:5984/$db -K curl-proxyauth.conf | jq '.sizes';

(if you do want to automatically stop at the right time, uncomment the if statement)

Expected Behaviour

expect compaction to start after reaching a default ratio of 2. It doesn't.
But it will trigger if you run the first script again for a while.. or you can manually compact data.

Your Environment

  • CouchDB version used: 3.1.1
  • Browser name and version: curl
  • Operating system and version: ubuntu 20.04

Additional Context

In the real world, this was happening with just a few clients (that may be running an old version of our software and aren't updating..) constantly spamming the server with invalid or conflicted documents.

We'd get an alert about excessive disk usage. I'd identify the evil user and compact his database.

This temporarily resolves the server issue but it'd be better of auto compaction did it for me.
(and even better if those clients answered their emails and followed the provided steps to stop them sending the evil data in the first place.. but thats out of my control)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions