-
-
Notifications
You must be signed in to change notification settings - Fork 959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Managing strings is very slow in a large component #6207
Comments
That is too expensive for the few attributes we need. Issue #6207
We have already some improvements ready for 4.7.1 which should improve the situation. In our tests it reduced the processing time to 60-70%, but it depends on many aspects. |
I tried bleeding just now:
Browser is blocked for all the processing time, so... 5 or 3 or even 1 minute is effectively the same - not usable. It seems to spend most of the time in SQL, I sampled some queries: SELECT "trans_variant"."id", "trans_variant"."component_id", "trans_variant"."variant_regex", "trans_variant"."key" FROM "trans_variant" INNER JOIN "trans_variant_defining_units" ON ("trans_variant"."id" = "trans_variant_defining_units"."variant_id") WHERE "trans_variant_defining_units"."unit_id" = 1432484
UPDATE "trans_unit" SET "translation_id" = 64, "id_hash" = 1475286454103327925, "location" = 'data/text/english/dialog/palguard.msg:103, data/text/english/dialog/pcargrd.msg:103', "context" = '', "note" = '', "flags" = '', "source" = 'What a nice day. I hope we don''t have any attacks today.', "previous_source" = '', "target" = 'Hace un día precioso. Espero que hoy no nos ataquen.', "state" = 20, "original_state" = 20, "details" = '{}', "position" = 14051, "num_words" = 12, "priority" = 100, "pending" = false, "timestamp" = '2017-08-28T21:30:59+00:00'::timestamptz, "extra_flags" = '', "explanation" = '', "variant_id" = NULL, "source_unit_id" = 1427001 WHERE "trans_unit"."id" = 469669
UPDATE "trans_unit" SET "translation_id" = 64, "id_hash" = -5770396855701994789, "location" = 'data/text/english/game/combatai.msg:7013', "context" = '', "note" = '', "flags" = '', "source" = 'Let''s flatline you!', "previous_source" = '', "target" = '¡Vamos a ponerte a raya!', "state" = 20, "original_state" = 20, "details" = '{}', "position" = 21114, "num_words" = 3, "priority" = 100, "pending" = false, "timestamp" = '2017-08-28T21:32:51+00:00'::timestamptz, "extra_flags" = '', "explanation" = '', "variant_id" = NULL, "source_unit_id" = 1433689 WHERE "trans_unit"."id" = 476732
SELECT "trans_unit"."id", "trans_unit"."translation_id", "trans_unit"."id_hash", "trans_unit"."location", "trans_unit"."context", "trans_unit"."note", "trans_unit"."flags", "trans_unit"."source", "trans_unit"."previous_source", "trans_unit"."target", "trans_unit"."state", "trans_unit"."original_state", "trans_unit"."details", "trans_unit"."position", "trans_unit"."num_words", "trans_unit"."priority", "trans_unit"."pending", "trans_unit"."timestamp", "trans_unit"."extra_flags", "trans_unit"."explanation", "trans_unit"."variant_id", "trans_unit"."source_unit_id" FROM "trans_unit" WHERE "trans_unit"."id" = 1435741 LIMIT 21
UPDATE "trans_unit" SET "location" = 'dialog/dcjoey.msg:241', "note" = '', "flags" = '', "position" = 5429 WHERE "trans_unit"."id" = 1367982
SELECT "trans_variant"."id", "trans_variant"."component_id", "trans_variant"."variant_regex", "trans_variant"."key" FROM "trans_variant" INNER JOIN "trans_variant_defining_units" ON ("trans_variant"."id" = "trans_variant_defining_units"."variant_id") WHERE "trans_variant_defining_units"."unit_id" = 1374356 Could it be that after a string is added or deleted, Weblate goes on to recalculate unit ids, and whatever is based on it, and does that one by one? |
Bleeding probably still doesn't have the changes, tomorrow one should have them |
Built one myself, tried it:
|
I doubt it's actually bleeding. If you just build https://github.com/WeblateOrg/docker/ you will get what is published as |
yes, I did use bleeding script |
Added dogslow:
|
This looks pretty much as expected - Weblate rescans the file after removing the string and this select is the most expensive part of the update. |
So... are you saying that this performance is expected? Or you can't reproduce it? I enabled query logging for postgres, apparently deleting a string results in ~50000 SELECT + 50000 UPDATE queries. $ #before
$ docker logs wl_database_1 2>&1 | grep "LOG: statement" | awk '{print $7}' | sort | uniq -c
5
863 BEGIN
614 CLOSE
862 COMMIT
614 DECLARE
13 DELETE
1226 FETCH
861 INSERT
2306 SELECT
3 UPDATE
$ #after
$ docker logs wl_database_1 2>&1 | grep "LOG: statement" | awk '{print $7}' | sort | uniq -c
5
863 BEGIN
614 CLOSE
862 COMMIT
614 DECLARE
33 DELETE
1226 FETCH
863 INSERT
1 ROLLBACK
51667 SELECT
1 SHOW
49324 UPDATE |
All strings present in the file after the deleted one will be updated, so the number of updates is expected. It's this query: UPDATE "trans_unit" SET "location" = 'dialog/dcjoey.msg:241', "note" = '', "flags" = '', "position" = 5429 WHERE "trans_unit"."id" = 1367982 There should not be that many of select queries though. Maybe it's the variants causing this as this query should be prefetched at once and not executed for every unit: SELECT "trans_variant"."id", "trans_variant"."component_id", "trans_variant"."variant_regex", "trans_variant"."key" FROM "trans_variant" INNER JOIN "trans_variant_defining_units" ON ("trans_variant"."id" = "trans_variant_defining_units"."variant_id") WHERE "trans_variant_defining_units"."unit_id" = 1374356 |
This saves additional query per created unit as the can not be variants for unit which has been just created. Issue #6207
So, it's like I thought. But, adding a variant does not result in "processing po, content changed", so maybe deleting also needn't? I did some more tests:
So, reads and writes are likely doubled somewhere. But even if that is fixed, working with large components is impossible. I guess deletion should be offloaded? In fact, in PO terms strings probably should not be deleted from the file, simply marked as "obsolete". Although in fact, I'm more interested in adding than deleting. The problem is that while adding to small POs also works fine, with large ones I have even less luck: didn't have the patience to get at least one variant added. It's just loading forever.
I see only one SQL query in PG activity, but it's in idle state for 10 minutes already:
|
This avoids updating every unit separately in most cases. Issue #6207
This makes it super slow for big translations and brings very little value. Such cases should be catched in the testsuite and not at runtime. Issue #6207
There is always the source string which needs to be updated as well...
I've made updating of the units positions in a single query in 4976d8b, that should make the removal same speed regardless the actual location of the string in the file.
That should be way faster with 277150d. |
I built 277150d
|
Is the deleting slow when executed after adding a variant or even without that (for example after another deleting or after importing the translation)? Looking at the code, adding a variant might mess up this... |
Also make sure there are no pending changes prior doing removal, otherwise you are measuring commit + delete. |
Assume the underlying code is reliable and skip the validation at runtime. Issue #6207
It will be added to the end in most cases. Issue #6207
I tried deleting first, it was before adding (cold start).
|
Also detect whether it is needed on creating units. Issue #6207
I can't reproduce this. Can you please share your component configuration (especially file format, new base and monolingual base)? Is this newly imported translation or something migrated from an older Weblate version? Adding variant should perform better with bc18758. |
Still 25k on deleteion with bc18758 . Weblate upgraded from 3.1.1. PO format, bilingual, no base. Adding triggers only one set of UPDATEs, but still all of them 1 by 1, feels pretty slow:
|
Do you always start from the fresh container? In the past, the position of source strings was not properly tracked, so the first deletion would fill these in and that could explain the number of queries we see. In case you do the removal repeatedly on same database, it should not trigger that much of updates on consequent runs... |
The same database, only weblate container is re-created. |
Do you have configured "Template for new translations"? |
6474d1b should address the 50k update queries. |
It's 25k now, and unfortunately still there for me. More importantly, adding appears to perform worse and worse with each new variant added:
|
I did several tests with PO file you've referenced, and I cannot get the 25k queries on removal - it always stays within a sane range around 50. |
Regardless of the number of queries, what times are you getting? cdd78b0 results:
If it takes so long, translators just won't use it. And considering that adding seems to perform worse with each variant... for now there's only 30, but there will be thousands. In order to be usable, both actions would need to take the same time as saving a translation - under 1 sec, max 2. Edit: and I take back what I said earlier about not worrying about deleting much. Both adding and deleting are important. |
Thank you for your report; the issue you have reported has just been fixed.
|
Describe the issue
Using new "Manage strings" permission, I delete strings from components. Small ones work reasonably well, large ones take unreasonably long time.
Adding is even worse... uwsgi is crunching something for 15 minutes already and no result.
My POs are a few MBs.
I already tried
Nothing comes to mind, really.
I searched for a large component with "manage strings" enabled on Hosted to compare and see if it's the same, but didn't find any.
If you didn’t try already, try to search there what you wrote above.
To Reproduce the issue
Delete/add a string from/to a large PO component.
Expected behavior
String deleted/added quickly.
Server configuration and status
Using Docker 4.7-1 with pretty much default config.
Weblate deploy checks
The text was updated successfully, but these errors were encountered: