Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow automatic translation from other components #9118

Closed
2 tasks done
tomkolp opened this issue Apr 19, 2023 · 27 comments · Fixed by #9267
Closed
2 tasks done

Slow automatic translation from other components #9118

tomkolp opened this issue Apr 19, 2023 · 27 comments · Fixed by #9267
Assignees
Labels
bug Something is broken.
Milestone

Comments

@tomkolp
Copy link
Contributor

tomkolp commented Apr 19, 2023

Describe the issue

After using the automatic translation of a component with "other components", the translation never ends.
The console always shows:

weblate_1       | celery-translate stderr | [2023-04-19 09:29:47,038: DEBUG/422] Refreshing Lock('lock:component-update-lock-4787').
weblate_1       | celery-translate stderr | [2023-04-19 09:29:47,038: DEBUG/ForkPoolWorker-1] Refreshing Lock('lock:component-update-lock-4787').
weblate_1       | celery-translate stderr | [2023-04-19 10:06:18,320: DEBUG/426] Refreshing Lock('lock:component-update-lock-2550').
weblate_1       | celery-translate stderr | [2023-04-19 10:06:18,320: DEBUG/ForkPoolWorker-2] Refreshing Lock('lock:component-update-lock-2550').
weblate_1       | celery-translate stderr | [2023-04-19 10:06:27,128: DEBUG/422] Refreshing Lock('lock:component-update-lock-4787').
weblate_1       | celery-translate stderr | [2023-04-19 10:06:27,128: DEBUG/ForkPoolWorker-1] Refreshing Lock('lock:component-update-lock-4787').
weblate_1       | celery-celery stderr | [2023-04-19 10:06:44,291: INFO/MainProcess] Task weblate.utils.tasks.heartbeat[2b0af915-e0e8-4849-b386-656b577b6725] received
weblate_1       | nginx stdout | 127.0.0.1 - - [19/Apr/2023:10:06:45 +0200] "GET /healthz/ HTTP/1.1" 200 12 "-" "curl/7.74.0"
weblate_1       | celery-translate stderr | [2023-04-19 10:06:58,322: DEBUG/426] Refreshing Lock('lock:component-update-lock-2550').
weblate_1       | celery-translate stderr | [2023-04-19 10:06:58,322: DEBUG/ForkPoolWorker-2] Refreshing Lock('lock:component-update-lock-2550').
weblate_1       | celery-translate stderr | [2023-04-19 10:07:07,130: DEBUG/422] Refreshing Lock('lock:component-update-lock-4787').
weblate_1       | celery-translate stderr | [2023-04-19 10:07:07,130: DEBUG/ForkPoolWorker-1] Refreshing Lock('lock:component-update-lock-4787').

One of the automatic translation tasks was triggered manually via the web interface. The second task comes from the automatic translation add-on.
Automatic translation does not work completely.

I already tried

  • I've read and searched the documentation.
  • I've searched for similar issues in this repository.

Steps to reproduce the behavior

No response

Expected behavior

No response

Screenshots

No response

Exception traceback

No response

How do you run Weblate?

Docker container

Weblate versions

  • Weblate: 4.18-e9fceeacde
  • Django: 4.2
  • siphashc: 2.1
  • translate-toolkit: 3.8.6
  • lxml: 4.9.2
  • Pillow: 9.5.0
  • nh3: 0.2.9
  • python-dateutil: 2.8.2
  • social-auth-core: 4.4.1
  • social-auth-app-django: 5.2.0
  • django-crispy-forms: 2.0
  • oauthlib: 3.2.2
  • django-compressor: 4.3.1
  • djangorestframework: 3.14.0
  • django-filter: 23.1
  • django-appconf: 1.0.5
  • user-agents: 2.2.0
  • filelock: 3.12.0
  • rapidfuzz: 3.0.0
  • openpyxl: 3.1.2
  • celery: 5.2.7
  • django-celery-beat: 2.5.0
  • kombu: 5.2.4
  • translation-finder: 2.15
  • weblate-language-data: 2023.4
  • html2text: 2020.1.16
  • pycairo: 1.23.0
  • pygobject: 3.44.1
  • diff-match-patch: 20200713
  • requests: 2.28.2
  • django-redis: 5.2.0
  • hiredis: 2.2.2
  • sentry_sdk: 1.19.1
  • Cython: 0.29.34
  • misaka: 2.1.1
  • GitPython: 3.1.31
  • borgbackup: 1.2.4
  • pyparsing: 3.0.9
  • pyahocorasick: 2.0.0
  • python-redis-lock: 4.0.0
  • charset-normalizer: 3.1.0
  • Python: 3.11.3
  • Git: 2.30.2
  • psycopg2: 2.9.6
  • psycopg2-binary: 2.9.6
  • phply: 1.2.6
  • ruamel.yaml: 0.17.21
  • tesserocr: 2.6.0
  • boto3: 1.26.115
  • zeep: 4.2.1
  • aeidon: 1.12
  • iniparse: 0.5
  • mysqlclient: 2.1.1
  • Mercurial: 6.4.2
  • git-svn: 2.30.2
  • git-review: 2.3.1
  • Redis server: 7.0.11
  • PostgreSQL server: 15.2
  • Database backends: django.db.backends.postgresql
  • Cache backends: default:RedisCache, avatar:FileBasedCache
  • Email setup: django.core.mail.backends.smtp.EmailBackend:
  • OS encoding: filesystem=utf-8, default=utf-8
  • Celery: redis://cache:6379/1, redis://cache:6379/1, regular
  • Platform: Linux 5.15.0-70-generic (x86_64)

Weblate deploy checks

System check identified no issues (1 silenced).

Additional context

very slow database process:

database_1      | 2023-04-19 08:10:56.024 UTC [22] LOG:  checkpoint starting: time
database_1      | 2023-04-19 08:10:57.326 UTC [22] LOG:  checkpoint complete: wrote 13 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=1.275 s, sync=0.014 s, total=1.302 s; sync files=13, longest=0.008 s, average=0.002 s; distance=56 kB, estimate=20112 kB
weblate=# SELECT * FROM pg_stat_activity WHERE state <> 'idle';
 datid | datname |  pid  | leader_pid | usesysid | usename | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |          query_start          |         state_change          | wait_event_type | wait_event | state  | backend_xid | backend_xmin | query_id |                                                                                                                                                                                                                                                                                                                  query                                                                                                                                                                                                                                                                                                                  |  backend_type
-------+---------+-------+------------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------+--------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------
 18558 | weblate |  1292 |            |       10 | weblate |                  | 172.18.0.5  |                 |       35028 | 2023-04-19 07:38:18.202637+00 | 2023-04-19 07:38:18.259572+00 | 2023-04-19 07:38:18.27536+00  | 2023-04-19 07:38:18.275368+00 |                 |            | active |             |      1136538 |          | SELECT "trans_unit"."source", "trans_unit"."state", "trans_unit"."target" FROM "trans_unit" INNER JOIN "trans_translation" ON ("trans_unit"."translation_id" = "trans_translation"."id") WHERE ("trans_unit"."state" >= 20 AND "trans_translation"."component_id" = 2550 AND "trans_translation"."plural_id" = 401 AND "trans_unit"."source" IN (SELECT U0."source" FROM "trans_unit" U0 LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id") WHERE (U0."translation_id" = 5427 AND U2."id" IS NULL AND U0."state" = 0)))                                                                                                  | client backend
 18558 | weblate | 22623 |            |       10 | weblate | psql             |             |                 |          -1 | 2023-04-19 08:37:47.484502+00 | 2023-04-19 08:38:26.597332+00 | 2023-04-19 08:38:26.597332+00 | 2023-04-19 08:38:26.597336+00 |                 |            | active |             |      1147954 |          | SELECT * FROM pg_stat_activity WHERE state <> 'idle';                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | client backend
 18558 | weblate |    73 |            |       10 | weblate |                  | 172.18.0.5  |                 |       52322 | 2023-04-19 07:17:46.942815+00 | 2023-04-19 07:17:47.042901+00 | 2023-04-19 07:17:47.05182+00  | 2023-04-19 07:17:47.051825+00 |                 |            | active |             |      1136531 |          | SELECT "trans_unit"."source", "trans_unit"."state", "trans_unit"."target" FROM "trans_unit" INNER JOIN "trans_translation" ON ("trans_unit"."translation_id" = "trans_translation"."id") INNER JOIN "trans_component" ON ("trans_translation"."component_id" = "trans_component"."id") WHERE ("trans_unit"."state" >= 20 AND "trans_component"."project_id" = 5 AND "trans_component"."source_language_id" = 131 AND "trans_translation"."plural_id" = 401 AND NOT ("trans_unit"."translation_id" = 9651) AND "trans_unit"."source" IN (SELECT U0."source" FROM "trans_unit" U0 WHERE (U0."translation_id" = 9651 AND U0."state" = 0))) | client backend
(3 rows)
weblate=# SELECT * FROM pg_stat_activity;
 datid | datname |  pid  | leader_pid | usesysid | usename | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |          query_start          |         state_change          | wait_event_type |     wait_event      | state  | backend_xid | backend_xmin | query_id |                                                                                                                                                                                                                                                                                                                  query                                                                                                                                                                                                                                                                                                                  |         backend_type
-------+---------+-------+------------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+---------------------+--------+-------------+--------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------
       |         |    26 |            |          |         |                  |             |                 |             | 2023-04-19 07:15:55.599719+00 |                               |                               |                               | Activity        | AutoVacuumMain      |        |             |              |          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | autovacuum launcher
       |         |    27 |            |       10 | weblate |                  |             |                 |             | 2023-04-19 07:15:55.600019+00 |                               |                               |                               | Activity        | LogicalLauncherMain |        |             |              |          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | logical replication launcher
 18558 | weblate | 22739 |            |       10 | weblate |                  | 172.18.0.5  |                 |       49838 | 2023-04-19 08:43:55.050462+00 |                               | 2023-04-19 08:43:55.071166+00 | 2023-04-19 08:43:55.073476+00 | Client          | ClientRead          | idle   |             |              |          | SELECT "django_celery_beat_periodictasks"."ident", "django_celery_beat_periodictasks"."last_update" FROM "django_celery_beat_periodictasks" WHERE "django_celery_beat_periodictasks"."ident" = 1 LIMIT 21                                                                                                                                                                                                                                                                                                                                                                                                                               | client backend
 18558 | weblate |  1292 |            |       10 | weblate |                  | 172.18.0.5  |                 |       35028 | 2023-04-19 07:38:18.202637+00 | 2023-04-19 07:38:18.259572+00 | 2023-04-19 07:38:18.27536+00  | 2023-04-19 07:38:18.275368+00 |                 |                     | active |             |      1136538 |          | SELECT "trans_unit"."source", "trans_unit"."state", "trans_unit"."target" FROM "trans_unit" INNER JOIN "trans_translation" ON ("trans_unit"."translation_id" = "trans_translation"."id") WHERE ("trans_unit"."state" >= 20 AND "trans_translation"."component_id" = 2550 AND "trans_translation"."plural_id" = 401 AND "trans_unit"."source" IN (SELECT U0."source" FROM "trans_unit" U0 LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id") WHERE (U0."translation_id" = 5427 AND U2."id" IS NULL AND U0."state" = 0)))                                                                                                  | client backend
 18558 | weblate | 22623 |            |       10 | weblate | psql             |             |                 |          -1 | 2023-04-19 08:37:47.484502+00 | 2023-04-19 08:43:55.567296+00 | 2023-04-19 08:43:55.567296+00 | 2023-04-19 08:43:55.5673+00   |                 |                     | active |             |      1147956 |          | SELECT * FROM pg_stat_activity;                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | client backend
 18558 | weblate |    73 |            |       10 | weblate |                  | 172.18.0.5  |                 |       52322 | 2023-04-19 07:17:46.942815+00 | 2023-04-19 07:17:47.042901+00 | 2023-04-19 07:17:47.05182+00  | 2023-04-19 07:17:47.051825+00 |                 |                     | active |             |      1136531 |          | SELECT "trans_unit"."source", "trans_unit"."state", "trans_unit"."target" FROM "trans_unit" INNER JOIN "trans_translation" ON ("trans_unit"."translation_id" = "trans_translation"."id") INNER JOIN "trans_component" ON ("trans_translation"."component_id" = "trans_component"."id") WHERE ("trans_unit"."state" >= 20 AND "trans_component"."project_id" = 5 AND "trans_component"."source_language_id" = 131 AND "trans_translation"."plural_id" = 401 AND NOT ("trans_unit"."translation_id" = 9651) AND "trans_unit"."source" IN (SELECT U0."source" FROM "trans_unit" U0 WHERE (U0."translation_id" = 9651 AND U0."state" = 0))) | client backend
       |         |    23 |            |          |         |                  |             |                 |             | 2023-04-19 07:15:55.58512+00  |                               |                               |                               | Activity        | BgWriterHibernate   |        |             |              |          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | background writer
       |         |    22 |            |          |         |                  |             |                 |             | 2023-04-19 07:15:55.584846+00 |                               |                               |                               | Activity        | CheckpointerMain    |        |             |              |          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | checkpointer
       |         |    25 |            |          |         |                  |             |                 |             | 2023-04-19 07:15:55.599472+00 |                               |                               |                               | Activity        | WalWriterMain       |        |             |              |          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | walwriter
(9 rows)
@nijel
Copy link
Member

nijel commented Apr 19, 2023

This query is fetching the translations: SELECT "trans_unit"."source", "trans_unit"."state", "trans_unit"."target" FROM "trans_unit" INNER JOIN "trans_translation" ON ("trans_unit"."translation_id" = "trans_translation"."id") WHERE ("trans_unit"."state" >= 20 AND "trans_translation"."component_id" = 2550 AND "trans_translation"."plural_id" = 401 AND "trans_unit"."source" IN (SELECT U0."source" FROM "trans_unit" U0 LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id") WHERE (U0."translation_id" = 5427 AND U2."id" IS NULL AND U0."state" = 0))). So updating the lock while the query is running is okay (the difference to previous versions is that log updates are logged).

Can you try running EXPLAIN on the query? (EXPLAIN ANALYZE would be useful as well, but that will run the same long as the actual query).

What work_mem do you have configured for PostgreSQL?

@nijel nijel added the question This is more a question for the support than an issue. label Apr 19, 2023
@github-actions
Copy link

This issue looks more like a support question than an issue. We strive to answer these reasonably fast, but purchasing the support subscription is not only more responsible and faster for your business but also makes Weblate stronger.

In case your question is already answered, making a donation is the right way to say thank you!

@tomkolp
Copy link
Contributor Author

tomkolp commented Apr 19, 2023

This query is fetching the translations: SELECT "trans_unit"."source", "trans_unit"."state", "trans_unit"."target" FROM "trans_unit" INNER JOIN "trans_translation" ON ("trans_unit"."translation_id" = "trans_translation"."id") WHERE ("trans_unit"."state" >= 20 AND "trans_translation"."component_id" = 2550 AND "trans_translation"."plural_id" = 401 AND "trans_unit"."source" IN (SELECT U0."source" FROM "trans_unit" U0 LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id") WHERE (U0."translation_id" = 5427 AND U2."id" IS NULL AND U0."state" = 0))). So updating the lock while the query is running is okay (the difference to previous versions is that log updates are logged).

Can you try running EXPLAIN on the query? (EXPLAIN ANALYZE would be useful as well, but that will run the same long as the actual query).

What work_mem do you have configured for PostgreSQL?

work_mem=16MB

@tomkolp
Copy link
Contributor Author

tomkolp commented Apr 19, 2023

I understand that I have to wait for some result?
EXPLAIN ANALYZE SELECT "trans_unit"."source", "trans_unit"."state", "trans_unit"."target" FROM "trans_unit" INNER JOIN "trans_translation" ON ("trans_unit"."translation_id" = "trans_translation"."id") WHERE ("trans_unit"."state" >= 20 AND "trans_translation"."component_id" = 2550 AND "trans_translation"."plural_id" = 401 AND "trans_unit"."source" IN (SELECT U0."source" FROM "trans_unit" U0 LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id") WHERE (U0."translation_id" = 5427 AND U2."id" IS NULL AND U0."state" = 0)));

CONTAINER ID   NAME                            CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O        PIDS
52da6cb0fd17   weblate-docker_weblate_1        1.21%     3.053GiB / 39.16GiB   7.80%     1.15GB / 235MB    129MB / 142MB    80
1d245341a3b9   weblate-docker_cache_1          0.54%     47.24MiB / 39.16GiB   0.12%     97.2MB / 518MB    0B / 1.22GB      7
d9686a64d916   weblate-docker_database_1       300.53%   3.343GiB / 39.16GiB   8.54%     113MB / 632MB     5.24GB / 394MB   12
96d7495b7a35   weblate-docker_https-portal_1   0.00%     5.926MiB / 39.16GiB   0.01%     25.4MB / 9.42MB   115kB / 360kB    14

Edit:
After 140 minutes, the translation of the file that was triggered from the interface ended.

Refreshing Lock for this file still appears. Does turning on automatic translation through the web interface, closing the window (no response of the progress bar), turning it on again for this file, create several translation queues?
EDIT2:

docker-compose exec --user weblate weblate supervisorctl stop celery-translate
docker-compose exec --user weblate weblate supervisorctl start celery-translate

What helped temporarily:

  1. I stopped the translation queue
  2. I started the translation queue
  3. I stopped all containers
  4. I started all containers

The translation queue has cleared. I tried to translate the file through the web interface. Now it took 30 seconds to translate 90,000 strings. Previously, this file was queued for 240 minutes.

I took away everyone's permission to auto-translate for the duration of the test. I am waiting for the auto translation add-on to run on schedule.

EDIT3:
Simply stopping celer and restarting does not clear the queue. How to clear the automatic translation queue?

@tomkolp
Copy link
Contributor Author

tomkolp commented Apr 21, 2023

Unfortunately, once automatic translation works in a few seconds, and sometimes it takes several hours. In tests, it doesn't matter if I use all components or just one component as the source. Today there are no celer queues, I turned on the translations of one component, I entered the same component as the source. Again, the translation takes several hours and the progress bar has not moved. What else can I check? I first reported the problem on Feb 23 #8832 and it's been there ever since.

@nijel
Copy link
Member

nijel commented Apr 24, 2023

Try increasing work_mem to 32MB, but 16MB should be IMHO enough.

As for the SQL do just explain, that will not actually execute the query and show what the scheduler would do:

EXPLAIN SELECT "trans_unit"."source", "trans_unit"."state", "trans_unit"."target" FROM "trans_unit" INNER JOIN "trans_translation" ON ("trans_unit"."translation_id" = "trans_translation"."id") WHERE ("trans_unit"."state" >= 20 AND "trans_translation"."component_id" = 2550 AND "trans_translation"."plural_id" = 401 AND "trans_unit"."source" IN (SELECT U0."source" FROM "trans_unit" U0 LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id") WHERE (U0."translation_id" = 5427 AND U2."id" IS NULL AND U0."state" = 0))); 

What would be useful as well is to know how many strings we're hitting in this query:

SELECT COUNT("id") FROM "trans_unit" U0 LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id") WHERE (U0."translation_id" = 5427 AND U2."id" IS NULL AND U0."state" = 0);

@tomkolp
Copy link
Contributor Author

tomkolp commented Apr 25, 2023

I use sentry to measure performance. I ran a test automatic translation on a component with 100 segments.
The translation took 1.36m. For example, before February 2023, the same operation lasted about 5-10s.
The problem seems to be update_checks.
I will test the components with a lot more strings and give the results.
image

@tomkolp
Copy link
Contributor Author

tomkolp commented Apr 25, 2023

Screenshot_20230425_201915_Edge
‐---------------------------------------------------------
Screenshot_20230425_203556_Edge

@tomkolp
Copy link
Contributor Author

tomkolp commented Apr 26, 2023

I found the reason why automatic translation stops working.

When there is no automatic addition of suggestions in the background, automatic translation works.
I tested 20 components:

  1. I chose automatic translation
  2. For untranslated strings (about 20,000 in each component)
  3. I selected all other components as the source
    All 20 components were successfully translated, and the translation time for each of them did not exceed 5 seconds.

Then I turned on translation which adds suggestions for untranslated strings (about 4000 strings). Adding suggestions has been going on for over an hour now.

Then I used the automatic translation again as before (points 1,2,3). The automatic translation of the component (4,000 strings) took 30 minutes.

I've had this problem since February. Earlier, even when adding suggestions, I had no problems with automatic translation.

@nijel
Copy link
Member

nijel commented Apr 26, 2023

Can you please try looking at the query I mentioned before ():

SELECT COUNT("id") FROM "trans_unit" U0 LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id") WHERE (U0."translation_id" = 5427 AND U2."id" IS NULL AND U0."state" = 0);

How many rows does it hit? How slow it is?

EXPLAIN ANALYZE SELECT COUNT("id") FROM "trans_unit" U0 LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id") WHERE (U0."translation_id" = 5427 AND U2."id" IS NULL AND U0."state" = 0);

It's a subquery of the main query seen above.

@tomkolp
Copy link
Contributor Author

tomkolp commented May 5, 2023

The next automatic translations will be when new components appear, then I will test these database queries.

I've been getting new errors for a few days, I don't know if they are related to this problem? If not, I can open a separate ticket if it's something serious. This is the first time I see this kind of error.

https://eso-spolszczenie.sentry.io/share/issue/31cf28698d534edc9f10d15e23ae8d31/

and

https://eso-spolszczenie.sentry.io/share/issue/30d82b5b7d7e429b8348a82a52c48eb4/

@tomkolp
Copy link
Contributor Author

tomkolp commented May 8, 2023

Can you please try looking at the query I mentioned before ():

SELECT COUNT("id") FROM "trans_unit" U0 LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id") WHERE (U0."translation_id" = 5427 AND U2."id" IS NULL AND U0."state" = 0);

How many rows does it hit? How slow it is?

EXPLAIN ANALYZE SELECT COUNT("id") FROM "trans_unit" U0 LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id") WHERE (U0."translation_id" = 5427 AND U2."id" IS NULL AND U0."state" = 0);

It's a subquery of the main query seen above.

I couldn't check this query specifically because I have an error:
EXPLAIN ANALYZE SELECT COUNT("id") FROM "trans_unit" U0 LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id") WHERE (U0."translation_id" = 5427 AND U2."id" IS NULL AND U0."state" = 0);
@nijel
I tested this query:

weblate=# EXPLAIN (ANALYZE, BUFFERS, SETTINGS, WAL)
weblate-# SELECT "trans_unit"."source", "trans_unit"."state", "trans_unit"."target"
weblate-# FROM "trans_unit"
weblate-# INNER JOIN "trans_translation" ON ("trans_unit"."translation_id" = "trans_translation"."id")
weblate-# INNER JOIN "trans_component" ON ("trans_translation"."component_id" = "trans_component"."id")
weblate-# WHERE ("trans_unit"."state" >= 20
weblate(# AND "trans_component"."project_id" = 5
weblate(# AND "trans_component"."source_language_id" = 131
weblate(# AND "trans_translation"."plural_id" = 401
weblate(# AND NOT ("trans_unit"."translation_id" = 5427)
weblate(# AND "trans_unit"."source" IN (
weblate(#     SELECT U0."source"
weblate(#     FROM "trans_unit" U0
weblate(#     LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id")
weblate(#     WHERE (U0."translation_id" = 5350 AND U2."id" IS NULL AND U0."state" = 0)
weblate(# ));



                                                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=106389.99..107052.50 rows=196 width=110) (actual time=67858.005..18949744.960 rows=322 loops=1)
   Buffers: shared hit=583403950
   ->  Nested Loop  (cost=106389.71..106565.68 rows=198 width=114) (actual time=67857.954..18949741.635 rows=327 loops=1)
         Buffers: shared hit=583402969
         ->  Nested Loop  (cost=106389.42..106445.53 rows=397 width=114) (actual time=1083.759..18948668.221 rows=62466 loops=1)
               Buffers: shared hit=583215571
               ->  HashAggregate  (cost=106374.78..106374.79 rows=1 width=57) (actual time=183.456..339.424 rows=22847 loops=1)
                     Group Key: u0.source
                     Batches: 1  Memory Usage: 9001kB
                     Buffers: shared hit=15545
                     ->  Hash Left Join  (cost=701.96..106374.77 rows=1 width=57) (actual time=67.917..159.512 rows=23404 loops=1)
                           Hash Cond: (u0.id = u2.unit_id)
                           Filter: (u2.id IS NULL)
                           Buffers: shared hit=15545
                           ->  Bitmap Heap Scan on trans_unit u0  (cost=689.04..106337.91 rows=6384 width=61) (actual time=67.713..153.149 rows=23404 loops=1)
                                 Recheck Cond: (translation_id = 5350)
                                 Filter: (state = 0)
                                 Rows Removed by Filter: 23813
                                 Heap Blocks: exact=15337
                                 Buffers: shared hit=15538
                                 ->  Bitmap Index Scan on trans_unit_translation_id_513bb910  (cost=0.00..687.44 rows=47868 width=0) (actual time=5.754..5.754 rows=47876
                                       Index Cond: (translation_id = 5350)
                                       Buffers: shared hit=48
                           ->  Hash  (cost=9.63..9.63 rows=263 width=8) (actual time=0.171..0.173 rows=263 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 19kB
                                 Buffers: shared hit=7
                                 ->  Seq Scan on trans_suggestion u2  (cost=0.00..9.63 rows=263 width=8) (actual time=0.023..0.096 rows=263 loops=1)
                                       Buffers: shared hit=7
               ->  Bitmap Heap Scan on trans_unit  (cost=14.65..70.58 rows=16 width=114) (actual time=829.211..829.337 rows=3 loops=22847)
                     Recheck Cond: (source = u0.source)
                     Rows Removed by Index Recheck: 416
                     Filter: ((state >= 20) AND (translation_id <> 5427))
                     Rows Removed by Filter: 3
                     Heap Blocks: exact=916068 lossy=140000
                     Buffers: shared hit=583200026
                     ->  Bitmap Index Scan on unit_source_fulltext  (cost=0.00..14.64 rows=18 width=0) (actual time=827.684..827.684 rows=286 loops=22847)
                           Index Cond: (source = u0.source)
                           Buffers: shared hit=582098223
         ->  Index Scan using trans_translation_pkey on trans_translation  (cost=0.28..0.30 rows=1 width=8) (actual time=0.013..0.013 rows=0 loops=62466)
               Index Cond: (id = trans_unit.translation_id)
               Filter: (plural_id = 401)
               Rows Removed by Filter: 1
               Buffers: shared hit=187398
   ->  Index Scan using trans_component_pkey on trans_component  (cost=0.28..2.46 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=327)
         Index Cond: (id = trans_translation.component_id)
         Filter: ((project_id = 5) AND (source_language_id = 131))
         Rows Removed by Filter: 0
         Buffers: shared hit=981
 Settings: effective_cache_size = '30GB', effective_io_concurrency = '2', max_parallel_workers = '30', max_parallel_workers_per_gather = '15', work_mem = '8738kB'
 Planning:
   Buffers: shared hit=957
 Planning Time: 18.808 ms
 JIT:
   Functions: 41
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 5.742 ms, Inlining 0.000 ms, Optimization 2.328 ms, Emission 56.711 ms, Total 64.782 ms
 Execution Time: 18949782.708 ms

@tomkolp
Copy link
Contributor Author

tomkolp commented May 8, 2023


weblate=# EXPLAIN ANALYZE
weblate-# SELECT COUNT("id")
weblate-# FROM "trans_unit" U0
weblate-# LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id")
weblate-# WHERE (U0."translation_id" = 5427 AND U2."id" IS NULL AND U0."state" = 0);
ERROR:  column reference "id" is ambiguous
LINE 2: SELECT COUNT("id")

I changed:
SELECT COUNT("id")
on
SELECT COUNT(U0."id")

EXPLAIN ANALYZE
SELECT COUNT(U0."id")
FROM "trans_unit" U0
LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id")
WHERE (U0."translation_id" = 5427 AND U2."id" IS NULL AND U0."state" = 0);

and I have this result:

weblate=# EXPLAIN ANALYZE
weblate-# SELECT COUNT(U0."id")
weblate-# FROM "trans_unit" U0
weblate-# LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id")
weblate-# WHERE (U0."translation_id" = 5427 AND U2."id" IS NULL AND U0."state" = 0);
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=89457.65..89457.66 rows=1 width=8) (actual time=40.987..40.992 rows=1 loops=1)
   ->  Hash Left Join  (cost=413.86..89457.65 rows=1 width=4) (actual time=10.452..39.877 rows=15507 loops=1)
         Hash Cond: (u0.id = u2.unit_id)
         Filter: (u2.id IS NULL)
         ->  Bitmap Heap Scan on trans_unit u0  (cost=400.95..89426.34 rows=4903 width=4) (actual time=10.300..36.916 rows=15507 loops=1)
               Recheck Cond: (translation_id = 5427)
               Filter: (state = 0)
               Rows Removed by Filter: 21587
               Heap Blocks: exact=7045
               ->  Bitmap Index Scan on unit_source_fulltext  (cost=0.00..399.72 rows=36763 width=0) (actual time=8.912..8.913 rows=41895 loops=1)
                     Index Cond: (translation_id = 5427)
         ->  Hash  (cost=9.63..9.63 rows=263 width=8) (actual time=0.136..0.138 rows=263 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 19kB
               ->  Seq Scan on trans_suggestion u2  (cost=0.00..9.63 rows=263 width=8) (actual time=0.010..0.071 rows=263 loops=1)
 Planning Time: 1.273 ms
 Execution Time: 41.364 ms
(16 rows)

@nijel
Copy link
Member

nijel commented May 18, 2023

I've been getting new errors for a few days, I don't know if they are related to this problem? If not, I can open a separate ticket if it's something serious. This is the first time I see this kind of error.

This is a new feature from Sentry, that's why you didn't see it before.

https://eso-spolszczenie.sentry.io/share/issue/31cf28698d534edc9f10d15e23ae8d31/

Hard to tell as it does not show which query was actually slow.

https://eso-spolszczenie.sentry.io/share/issue/30d82b5b7d7e429b8348a82a52c48eb4/

This is expected, the Zen mode fetches glossary for each string and this cannot be done in less queries due to their complexity.

-> Bitmap Index Scan on unit_source_fulltext (cost=0.00..14.64 rows=18 width=0) (actual time=827.684..827.684 rows=286 loops=22847)
Index Cond: (source = u0.source)
Buffers: shared hit=582098223

This is actually the problematic part, it appears that the index is not really helping here. It might be a similar issue as in #8719, just hidden in a subquery.

@tomkolp
Copy link
Contributor Author

tomkolp commented May 18, 2023

What else can I do to diagnose the problem?

@nijel
Copy link
Member

nijel commented May 18, 2023

Can you try whether adding an index will make the difference?

  1. Add an index:

    CREATE INDEX test_tmp ON trans_unit(source);

    It will take some time to build.

  2. Please check the sizes of the indexes:

    SELECT pg_table_size('unit_source_fulltext'), pg_table_size('test_tmp');
  3. Now rerun the explain query from Slow automatic translation from other components #9118 (comment)

  4. Drop the index, so that the database is back in the original state:

    DROP INDEX test_tmp;

Having output from 2 and 3 will help in deciding how to address this.

@tomkolp
Copy link
Contributor Author

tomkolp commented May 18, 2023

  1. Now rerun the explain query from Refreshing Lock never ends. #9118 (comment)

Please indicate which query to use.

@nijel
Copy link
Member

nijel commented May 18, 2023

This one:

weblate=# EXPLAIN (ANALYZE, BUFFERS, SETTINGS, WAL)
weblate-# SELECT "trans_unit"."source", "trans_unit"."state", "trans_unit"."target"
weblate-# FROM "trans_unit"
weblate-# INNER JOIN "trans_translation" ON ("trans_unit"."translation_id" = "trans_translation"."id")
weblate-# INNER JOIN "trans_component" ON ("trans_translation"."component_id" = "trans_component"."id")
weblate-# WHERE ("trans_unit"."state" >= 20
weblate(# AND "trans_component"."project_id" = 5
weblate(# AND "trans_component"."source_language_id" = 131
weblate(# AND "trans_translation"."plural_id" = 401
weblate(# AND NOT ("trans_unit"."translation_id" = 5427)
weblate(# AND "trans_unit"."source" IN (
weblate(#     SELECT U0."source"
weblate(#     FROM "trans_unit" U0
weblate(#     LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id")
weblate(#     WHERE (U0."translation_id" = 5350 AND U2."id" IS NULL AND U0."state" = 0)
weblate(# ));

@nijel nijel changed the title Refreshing Lock never ends. Slow automatic translation from other components May 18, 2023
@nijel nijel added bug Something is broken. and removed question This is more a question for the support than an issue. labels May 18, 2023
@nijel nijel modified the milestones: 5.0, 4.18 May 18, 2023
@tomkolp
Copy link
Contributor Author

tomkolp commented May 18, 2023

weblate=# CREATE INDEX test_tmp ON trans_unit(source);
CREATE INDEX
weblate=# SELECT pg_table_size('unit_source_fulltext'), pg_table_size('test_tmp');
 pg_table_size | pg_table_size
---------------+---------------
         16384 |          8192

->

weblate=# EXPLAIN (ANALYZE, BUFFERS, SETTINGS, WAL)
weblate-# SELECT "trans_unit"."source", "trans_unit"."state", "trans_unit"."target"
weblate-# FROM "trans_unit"
weblate-# INNER JOIN "trans_translation" ON ("trans_unit"."translation_id" = "trans_translation"."id")
weblate-# INNER JOIN "trans_component" ON ("trans_translation"."component_id" = "trans_component"."id")
weblate-# WHERE ("trans_unit"."state" >= 20
weblate(# AND "trans_component"."project_id" = 5
weblate(# AND "trans_component"."source_language_id" = 131
weblate(# AND "trans_translation"."plural_id" = 401
weblate(# AND NOT ("trans_unit"."translation_id" = 5427)
weblate(# AND "trans_unit"."source" IN (
weblate(#     SELECT U0."source"
weblate(#     FROM "trans_unit" U0
weblate(#     LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id")
weblate(#     WHERE (U0."translation_id" = 5350 AND U2."id" IS NULL AND U0."state" = 0)
weblate(# ));
                                                                                QUERY PLAN                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.74..38.02 rows=1 width=68) (actual time=0.054..0.068 rows=0 loops=1)
   Join Filter: (trans_translation.component_id = trans_component.id)
   Buffers: shared hit=2
   ->  Nested Loop Semi Join  (cost=4.61..29.86 rows=1 width=72) (actual time=0.050..0.059 rows=0 loops=1)
         Buffers: shared hit=2
         ->  Nested Loop  (cost=0.29..16.34 rows=1 width=72) (actual time=0.046..0.052 rows=0 loops=1)
               Buffers: shared hit=2
               ->  Index Scan using trans_translation_plural_id_5cf36dc7 on trans_translation  (cost=0.14..8.16 rows=1 width=8) (actual time=0.040..0.042 rows=0 loops=1)
                     Index Cond: (plural_id = 401)
                     Buffers: shared hit=2
               ->  Index Scan using trans_unit_translation_id_513bb910 on trans_unit  (cost=0.14..8.17 rows=1 width=72) (never executed)
                     Index Cond: (translation_id = trans_translation.id)
                     Filter: ((state >= 20) AND (translation_id <> 5427))
         ->  Nested Loop Left Join  (cost=4.33..13.52 rows=1 width=32) (never executed)
               Filter: (u2.id IS NULL)
               ->  Index Scan using test_tmp on trans_unit u0  (cost=0.14..0.83 rows=1 width=36) (never executed)
                     Index Cond: (source = trans_unit.source)
                     Filter: ((translation_id = 5350) AND (state = 0))
               ->  Bitmap Heap Scan on trans_suggestion u2  (cost=4.18..12.64 rows=4 width=8) (never executed)
                     Recheck Cond: (u0.id = unit_id)
                     ->  Bitmap Index Scan on trans_suggestion_unit_id_eb46fceb  (cost=0.00..4.18 rows=4 width=0) (never executed)
                           Index Cond: (unit_id = u0.id)
   ->  Index Scan using trans_component_pkey on trans_component  (cost=0.12..8.14 rows=1 width=4) (never executed)
         Filter: ((project_id = 5) AND (source_language_id = 131))
 Planning:
   Buffers: shared hit=818 read=1
 Planning Time: 21.878 ms
 Execution Time: 0.494 ms

@tomkolp
Copy link
Contributor Author

tomkolp commented May 18, 2023

weblate=# CREATE INDEX test_tmp ON trans_unit(source);
CREATE INDEX
weblate=# SELECT pg_table_size('unit_source_fulltext'), pg_table_size('test_tmp');
 pg_table_size | pg_table_size
---------------+---------------
         16384 |          8192

->

weblate=# EXPLAIN (ANALYZE, BUFFERS, SETTINGS, WAL)
weblate-# SELECT "trans_unit"."source", "trans_unit"."state", "trans_unit"."target"
weblate-# FROM "trans_unit"
weblate-# INNER JOIN "trans_translation" ON ("trans_unit"."translation_id" = "trans_translation"."id")
weblate-# INNER JOIN "trans_component" ON ("trans_translation"."component_id" = "trans_component"."id")
weblate-# WHERE ("trans_unit"."state" >= 20
weblate(# AND "trans_component"."project_id" = 5
weblate(# AND "trans_component"."source_language_id" = 131
weblate(# AND "trans_translation"."plural_id" = 401
weblate(# AND NOT ("trans_unit"."translation_id" = 5427)
weblate(# AND "trans_unit"."source" IN (
weblate(#     SELECT U0."source"
weblate(#     FROM "trans_unit" U0
weblate(#     LEFT OUTER JOIN "trans_suggestion" U2 ON (U0."id" = U2."unit_id")
weblate(#     WHERE (U0."translation_id" = 5350 AND U2."id" IS NULL AND U0."state" = 0)
weblate(# ));
                                                                                QUERY PLAN                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.74..38.02 rows=1 width=68) (actual time=0.054..0.068 rows=0 loops=1)
   Join Filter: (trans_translation.component_id = trans_component.id)
   Buffers: shared hit=2
   ->  Nested Loop Semi Join  (cost=4.61..29.86 rows=1 width=72) (actual time=0.050..0.059 rows=0 loops=1)
         Buffers: shared hit=2
         ->  Nested Loop  (cost=0.29..16.34 rows=1 width=72) (actual time=0.046..0.052 rows=0 loops=1)
               Buffers: shared hit=2
               ->  Index Scan using trans_translation_plural_id_5cf36dc7 on trans_translation  (cost=0.14..8.16 rows=1 width=8) (actual time=0.040..0.042 rows=0 loops=1)
                     Index Cond: (plural_id = 401)
                     Buffers: shared hit=2
               ->  Index Scan using trans_unit_translation_id_513bb910 on trans_unit  (cost=0.14..8.17 rows=1 width=72) (never executed)
                     Index Cond: (translation_id = trans_translation.id)
                     Filter: ((state >= 20) AND (translation_id <> 5427))
         ->  Nested Loop Left Join  (cost=4.33..13.52 rows=1 width=32) (never executed)
               Filter: (u2.id IS NULL)
               ->  Index Scan using test_tmp on trans_unit u0  (cost=0.14..0.83 rows=1 width=36) (never executed)
                     Index Cond: (source = trans_unit.source)
                     Filter: ((translation_id = 5350) AND (state = 0))
               ->  Bitmap Heap Scan on trans_suggestion u2  (cost=4.18..12.64 rows=4 width=8) (never executed)
                     Recheck Cond: (u0.id = unit_id)
                     ->  Bitmap Index Scan on trans_suggestion_unit_id_eb46fceb  (cost=0.00..4.18 rows=4 width=0) (never executed)
                           Index Cond: (unit_id = u0.id)
   ->  Index Scan using trans_component_pkey on trans_component  (cost=0.12..8.14 rows=1 width=4) (never executed)
         Filter: ((project_id = 5) AND (source_language_id = 131))
 Planning:
   Buffers: shared hit=818 read=1
 Planning Time: 21.878 ms
 Execution Time: 0.494 ms

Sorry, but I got the database wrong and checked on the test one, which is practically empty.

nijel added a commit to nijel/weblate that referenced this issue May 18, 2023
This improves exact strings lookups like done in automatic translation
or consistency checks.

Issue WeblateOrg#9118
Issue WeblateOrg#1994
@tomkolp
Copy link
Contributor Author

tomkolp commented May 18, 2023

Point 1 fails in production:

weblate=# CREATE INDEX test_tmp ON trans_unit(source);
ERROR:  index row size 2736 exceeds btree version 4 maximum 2704 for index "test_tmp"
DETAIL:  Index row references tuple (39270,18) in relation "trans_unit".
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

nijel added a commit to nijel/weblate that referenced this issue May 18, 2023
This improves exact strings lookups like done in automatic translation
or consistency checks.

Issue WeblateOrg#9118
Issue WeblateOrg#1994
@nijel
Copy link
Member

nijel commented May 18, 2023

Hmm, we will need something more complex then. I've created #9267, let's see if all tests pass, I will then test it on some larger database.

@nijel nijel linked a pull request May 19, 2023 that will close this issue
5 tasks
nijel added a commit that referenced this issue May 19, 2023
This improves exact strings lookups like done in automatic translation
or consistency checks.

Issue #9118
Issue #1994
@github-actions
Copy link

Thank you for your report; the issue you have reported has just been fixed.

  • In case you see a problem with the fix, please comment on this issue.
  • In case you see a similar problem, please open a separate issue.
  • If you are happy with the outcome, don’t hesitate to support Weblate by making a donation.

@nijel
Copy link
Member

nijel commented May 19, 2023

After testing #9267 on several test data, it seems to address the issue, but it would be great to see how that performs in your environment.

@nijel nijel self-assigned this May 19, 2023
@tomkolp
Copy link
Contributor Author

tomkolp commented May 19, 2023

I will test when the bleeding docker image appears

@nijel
Copy link
Member

nijel commented May 19, 2023

It should be there tomorrow.

@tomkolp
Copy link
Contributor Author

tomkolp commented May 20, 2023

I did the updates, I don't have access to the console today. I checked adding suggestions using the weblate interface, instead of several hours now it takes ten seconds. @nijel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants