-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Improve escapeLiteral performance of long strings by 30x #3553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I don't understand the error the linter is returning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m against the complexity of having a second path for short strings. (I don’t even think this function should have a second path for strings without backslashes, but that’d have to be a pg 9 change.) There are probably cleaner optimization options, since IIRC the original implementation here was just ported from C.
Also not sure how much performance matters past a certain point: as mentioned in #3194, people should pretty much always be using parameters instead of escapeLiteral
anyway.
The linter is complaining about trailing spaces.
|
Yeah, then I can go for the first one, which is way less complex.
The problem is that the performance of parameters is very low. I have a 4x penalty with 5k strings being 100k characters long each one, so if we can help the users to improve performance by just writing optimized code, I think that it should be the way to go. They still have the option to go the low performant way in this case with parameters if they want, but if they need performance (which is pretty usual when you are moving hundreds of thousands of characters), they can use the included utils in the library. If we keep This function is actually the one I use because I need that performance, and I'm not alone as there's another ticket with that question. And also there are going to be users that just struggle with the performance because there's nothing better but don't open a issue: "this insert query lasts 5 seconds. It is what it is. Next ticket". So I just decided to publish it to help users gain this performance they didn't knew they were losing.
Ok, problems of the GitHub editor. I've done it online directly. Going to find the trailing spaces. EDIT: Can't find the trailing spaces in the editor. If you like this, I'll clone and |
How are you using parameters? Do you have a benchmark you can share for this too? It’s possible that a parameter-based approach is inherently slower with PostgreSQL, but it’s also possible that pg is just missing several optimizations that we could implement1, with a much cleaner result than serializing data into code – and/or that the way you’re using parameters is suboptimal.
I know it feels that way, but consider that the library’s const escapeLiteralStandard = str =>
`'${str.replaceAll("'", "''")}'`; or const escapeLiteralStandard = str => {
if (typeof str !== 'string') {
throw new TypeError('string expected');
}
let escaped = "'";
let start = 0;
for (let i; (i = str.indexOf("'", start)) !== -1;) {
escaped += str.substring(start, i);
escaped += "''";
start = i + 1;
}
escaped += str.substring(start);
escaped += "'";
return escaped;
}; If being one configurable setting away from SQL injection makes you uneasy, I’d still opt for consistently returning C-escape string literals: const escapeLiteralPg = str => {
if (typeof str !== 'string') {
throw new TypeError('string expected');
}
let escaped = "E'";
let start = 0;
const re = /[\\']/g;
for (let m; (m = re.exec(str)) !== null;) {
escaped += str.substring(start, m.index);
escaped += '\\';
escaped += m[0];
start = m.index + 1;
}
escaped += str.substring(start);
escaped += "'";
return escaped;
}; or const escapeLiteralPg = str => {
if (typeof str !== 'string') {
throw new TypeError('string expected');
}
let escaped = "E'";
let start = 0;
for (;;) {
const i = str.indexOf("'", start);
const sub = i === -1 ? str : str.substring(0, i);
for (let j; (j = sub.indexOf('\\', start)) !== -1;) {
escaped += str.substring(start, j);
escaped += '\\\\';
start = j + 1;
}
if (i === -1) {
break;
}
escaped += str.substring(start, i);
escaped += "\\'";
start = i + 1;
}
escaped += str.substring(start);
escaped += "'";
return escaped;
}; Footnotes
|
I noticed that on long strings,
escapeLiteral
goes really slow. I created my ownescapeLiteral
and I decided to push it. I saw this issue when searching: #3194On short strings, the performance loss is very low compared to the benefit on long strings. On short strings the original is only 1.17x faster, but on long strings the new one is 30x (!!) faster, so at the moment that the strings are a bit longer than 10 characters, the performance profit is going to kick in, and the performance loss on short strings is negligible.
Anyway, with this information I decided to add a quick check: If the string is shorter than 13 characters, use the original one, and if it is longer, use the optimized one. It has the performance profit in both situations:

EDIT: Had a typo in the benchmark, it is only 30x faster, not 55x.
Here the benchmark code: