Please sign in to comment.
Handle invalid UTF-8 byte sequences gracefully by replacing them with…
… 0xFFFD CouchDB's Erlang JSON parser allows storing of invalid UTF-8 byte sequences. The Query Server inside CouchDB fails upon encountering these byte sequences. The view process fails for the current batch of document updates. The result is that the view is invariably broken. Only removing the document in question solves this otherwise, but finding that is hard as the `log()` inside the Query Server dies with the invalid byte sequence because our protocol is synchronous and map results an `log()` messages generated therein are submitted together. This patch replaces invalid bytes with the the surrogate chacracter 0xFFFD. Closes COUCHDB-1425. Patch by Sam Rijs <firstname.lastname@example.org> and Paul Davis. Eventually, this should be fixed at the HTTP level, so that no documents with invalid byte sequences can be written to CouchDB. The jiffy encoder we'll get with BigCouch will do that for us. This is a fix for the releases until then.
- Loading branch information...
Showing with 17 additions and 13 deletions.