Support for PostgreSQL and MySQL only #2825

nijel · 2019-06-22T06:22:08Z

Currently Weblate supports any database backend supported by Django. That's nice, however that limits possibilities to use database specific features.

For example switching to PostgreSQL only would allow us to use it's full text search removing need for external full text engine. The Whoosh usage currently leads to some problems for example see #2549 and #2543. Having in database full text engine would also heavily improve it's performance, because currently it's first doing search in Whoosh to get fulltext matches and then filter the results in the database to perform filtering by translation and permission checks, what leads to issues as #2876.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

nijel · 2019-06-22T06:25:11Z

The migration guide for the database can be found here https://www.calazan.com/migrating-django-app-from-mysql-to-postgresql/

We might eventually drop support for MySQL, so better give users an early warning. Issue #2825 Signed-off-by: Michal Čihař <michal@cihar.com>

We will most likely drop support for MySQL, so avoid pointing new users to it. See #2825 Signed-off-by: Michal Čihař <michal@cihar.com>

nijel · 2019-07-25T12:25:57Z

Another nice benefit of switching to Postgres might be using Agagama instead of writing translation memory service on our own, see https://github.com/translate/amagama

Add check showing warning about not supported setup in future and add migration docs. Issue #2825 Signed-off-by: Michal Čihař <michal@cihar.com>

nijel · 2019-08-31T13:16:34Z

Things to improve based on this change:

Migrate to JSONField
Use fulltext search

maxking · 2019-09-20T16:21:07Z

While whoosh is pretty slow, we have seen a lot of performance improvements when using Django-haystack with Elasticsearch in Mailman project. It is definitely another service to support, but with containers it isn't a lot of work to setup. There are system packages for most operating system.

Xapian is written in C++ and also is one the supported backends in Django-haystack.

Just a thought, in case the only reason to drop support for other databases is slow searching.

nijel · 2019-09-20T17:53:06Z

In case the PostgreSQL only solution will not perform well, Elasticsearch is certainly worth trying. However my current impression is that we can do all what we need with PostgreSQL and reasonable query parser. We might stick with Whoosh for query parsing as it was quite easy to adapt for our purpose, see https://github.com/WeblateOrg/weblate/blob/master/weblate/utils/search.py.

PS: The performance is not the biggest issue with Whoosh for me, it does search quite fast. The biggest issue is huge memory consumption (the updating process easily consumes gigabytes of memory) and lockups while updating (I've not yet found time to debug this).

See #2825 Signed-off-by: Michal Čihař <michal@cihar.com>

unho · 2019-11-20T18:17:01Z

Another nice benefit of switching to Postgres might be using Agagama instead of writing translation memory service on our own, see https://github.com/translate/amagama

amaGama is a standalone service, so there is no need for Weblate to migrate to PostgreSQL since Weblate will communicate with it using http://docs.translatehouse.org/projects/amagama/en/latest/api.html. And yes, amaGama is PostgreSQL only, AFAICT.

nijel · 2019-11-21T07:45:25Z

@unho I know it's standalone service. Still for most deployments it makes sense if they can share same database.

Anyway, the main motivation of switch to PostgreSQL is to have database with decent full-text search and to get rid of Whoosh as a full-text engine which apparently doesn't scale well for our purposes.

DominicWatson · 2019-11-26T15:45:24Z

I would love it if the database engine implementation remained agnostic. For those of us deploying Weblate into existing infrastructure, being able to use our existing database engines is a real bonus.

If this were Postgres only, I simply wouldn't use it (I don't expect anyone else to care about that, but just stating my case here).

Seems crazy to break compatibility when you already have it.

nijel · 2019-11-26T19:27:23Z

I'd love to keep the compatibility, but I don't see way to do it right now. We need to replace Whoosh with something what scales better and doing the full-text search in the database seems to be the way to go. Django has built in support for it on PostgreSQL so that looks like the best choice. Support for MySQL is not implemented (see for example adamchainz/django-mysql#314 which is open for years) and we certainly don't have manpower to implement that.

I won't reject merge requests adding support for other databases, but we will definitely switch to using in database full-text search and native JSON fields in the 4.0 with initial support for PostgreSQL only.

unho · 2019-11-27T22:30:23Z

Sorry for interject. I guess that Whoosh was being used to provide translation memory matches, and you need a faster replacemente, have you considered using Elasticsearch instead?

nijel · 2019-11-28T07:18:41Z

It's used for search in the translations as well. Doing that in Elasticsearch would mean either to duplicate whole Weblate database there (to be able to do filtering based on user permissions, projects or languages). Right now we do ID lookup in Whoosh and then additional filtering in the database, what performs terrible in case of common terms which produce millions of matches in the full-text search.

But you're right that Elasticseach would be indeed great solution for translation memory, which is pretty much write only and needs only fulltext searches. We can also switch to using external service for translation memory such as amaGama.

DominicWatson · 2019-11-28T16:05:03Z

Support for MySQL is not implemented (see for example adamchainz/django-mysql#314 which is open for years) and we certainly don't have manpower to implement that.

Roger that (didn't realise that MySQL was already not supported, spent quite a bit of time trying to get it working ;) ).

Makes sense in that case.

unho · 2019-11-28T18:45:08Z

If I recall correctly Pootle has the concept of external TM which are not meant to be altered (like amaGama), and local TM using Elasticsearch which are updated every time a translation is saved and can be updated/bootstraped using TMX or PO files. More in https://pootle.readthedocs.io/en/latest/features/translation_memory.html

Just sharing in case it might help.

nijel · 2019-11-28T21:36:34Z

MySQL is currently supported. The Django integration just doesn't support features we will use in upcoming 4.0 release.

@unho We have external read only memories supported as well. What we're looking for is read-write support and I'm still not sure whether we will support that for third party services, but some people are interested in that as well.

nijel · 2020-02-20T14:40:53Z

Did some more research on this and it seems that keeping support for MySQL is doable:

Skip using native JSON fields (for now we do not need lookups inside them, so that is not that big drawback)
Support both MySQL and PostgreSQL full-text by adding appropriate index in migration (https://stackoverflow.com/a/45232678/225718 shows how to execute SQL depending on database engine)
Implement search either using custom SQL or combine django.contrib.postgresql search with MySQL one (see https://stackoverflow.com/a/55985693/225718)
It still won't perform as good as PostgreSQL because MySQL cannot do index merge for full-text (see https://dev.mysql.com/doc/refman/8.0/en/index-merge-optimization.html)

The question is whether it's worth of the effort ;-).

nijel · 2020-02-21T07:29:37Z

On the other side, there are some limitations of MySQL which we are hitting already and causing performance penalties or other limitations:

Lack of support for distinct on field. See https://code.djangoproject.com/ticket/17974, current workaround is to fetch all results and do this in Python:

weblate/weblate/trans/util.py

Lines 94 to 108 in 651586b

    
           def get_distinct_translations(units): 
        
               """Return list of distinct translations. 
        
               It should be possible to use distinct('target') since Django 1.4, but it is not 
        
               supported with MySQL, so let's emulate that based on presumption we won't get too 
        
               many results. 
        
               """ 
        
               targets = {} 
        
               result = [] 
        
               for unit in units: 
        
                   if unit.target in targets: 
        
                       continue 
        
                   targets[unit.target] = 1 
        
                   result.append(unit) 
        
               return result

Bulk creation limits, big queries make server fail with "MySQL server has gone away" error, so bulk_insert has to be used with limits

Case insensitive comparison forces us to compare strings again in Python, see https://code.djangoproject.com/ticket/9682 and workaround we use:

weblate/weblate/trans/tasks.py

Lines 212 to 221 in 651586b

    
           sugs = Suggestion.objects.filter( 
        
               unit=suggestion.unit, target=suggestion.target 
        
           ).exclude(id=suggestion.id) 
        
           # Do not rely on the SQL as MySQL compares strings case insensitive 
        
           for other in sugs: 
        
               if other.target == suggestion.target: 
        
                   suggestion.delete_log( 
        
                       anonymous_user, change=Change.ACTION_SUGGESTION_CLEANUP 
        
                   ) 
        
                   break

The utf8m4 charset (required for higher Unicode plans such as emojis) limits indexed char field to at most 192 chars
The regex engine in MySQL is limited what reduces search capabilities
The fulltext index behaves wrongly inside transactions, see https://dev.mysql.com/doc/refman/8.0/en/innodb-fulltext-index.html#innodb-fulltext-index-transaction

rob006 · 2020-02-21T09:45:15Z

* Case insensitive comparison forces us to compare strings again in Python, see https://code.djangoproject.com/ticket/9682 and workaround we use:

That depends on used collation. If you want strict comparison, utf8mb4_bin can be used as table charset.

* The utf8m4 charset (required for higher Unicode plans such as emojis) limits indexed char field to at most 192 chars

This is configurable, and default configuration was changed years ago, it should not be a problem anymore.

nijel · 2020-02-21T11:35:55Z

That depends on used collation. If you want strict comparison, utf8mb4_bin can be used as table charset.

True, but that breaks case-insensitive lookups in Django. See https://code.djangoproject.com/ticket/9682

This is configurable, and default configuration was changed years ago, it should not be a problem anymore.

Thanks for update, I was not aware of that.

I believe that many of the issues we see could be addressed at ORM level in Django, but PosgtgreSQL seems to be the primary focus for them.

mestaritonttu · 2020-02-27T19:11:44Z

Just ran into this, which might be useful for you: Postgresql Search: From the trenches

nijel · 2020-02-28T08:36:08Z

Thanks, I've already read a lot on the topic and I'm aware that it's not a silver bullet.

I've done some testing on data from Hosted Weblate and PostgreSQL performs reasonably well on that (about 10 times faster than our current implementation using Whoosh) and MySQL is performance wise about similar to current solution (this is mostly caused by lack of index merging). In both cases it would address huge memory consumption we see with Whoosh. So, it seems to be the way to go for now because it will scale from small installations to bigger ones.

As for Elasticseach, I don't think it's way to go for us. It's definitely great in doing searches on text. The problem is in the more complex queries we support. We probably would have to mirror most of our database to it (what could easily go wrong). The alternative approach of doing full text in Elasticsearch and then filtering results in database doesn't scale well for things that have millions of matches in Elasticsearch and would be heavily filtered on the database side.

Issue #2825

Support for it will be removed in the 4.0 release. Issue #2825

Issue #2825

It will be gone in 4.0. Issue #2825

This is probably more widely used than original MySQL. Issue #2825

github-actions · 2020-03-10T18:45:35Z

Thank you for your report, the issue you have reported has just been fixed.

In case you see a problem with the fix, please comment on this issue.
In case you see similar problem, please open separate issue.
If you are happy with the outcome, consider supporting Weblate by donating.

nijel · 2020-03-12T09:50:46Z

The good news is that I've maintained MySQL support for 4.0. The bad news is that it seems to perform really badly. The Weblate testsuite takes 35 minutes on PostgreSQL, but 150 minutes on MySQL/MariaDB. I've probably made something wrong there (or MySQL really sucks in this use case). Any suggestions to improve the peformance are welcome.

mestaritonttu · 2020-03-12T10:45:14Z

The good news is that I've maintained MySQL support for 4.0. The bad news is that it seems to perform really badly. The Weblate testsuite takes 35 minutes on PostgreSQL, but 150 minutes on MySQL/MariaDB. I've probably made something wrong there (or MySQL really sucks in this use case). Any suggestions to improve the peformance are welcome.

In your commit 9c75b52 you have a comment "This is workaroud for MySQL as FULL TEXT index does not work well inside a transaction, so we avoid using transactions for tests".

From what little I know, InnoDB perf can be horrible, if not utilising transactions.

nijel · 2020-03-12T12:35:13Z

Yes, that can be reason for the slowness (what would be good as it would not affect real usage). The tests before executed in reasonable time (https://github.com/WeblateOrg/weblate/runs/501385902?check_suite_focus=true, though there are several failures caused by not updated fulltext indices). This change however also makes MySQL actually update the full-text indices, so that might be the slow part as well.

The method however doesn't affect whether transactions are user or not, but only how tests are executed in Django. The default approach is to use transactions to rollback database to original state, which sadly doesn't work with MySQL because fulltext index is updated only on commit and thus using it during the transaction results in not getting up to date results (see https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-index.html#innodb-fulltext-index-transaction).

nijel added enhancement Adding or requesting a new feature. undecided These features might not be implemented. Can be prioritized by sponsorship. labels Jun 22, 2019

nijel added a commit that referenced this issue Jun 24, 2019

Docs: Recommend PostgreSQL database

b5417ac

We might eventually drop support for MySQL, so better give users an early warning. Issue #2825 Signed-off-by: Michal Čihař <michal@cihar.com>

nijel added a commit that referenced this issue Jul 5, 2019

Docs: Remove MySQL instructions from quick install

cf775b0

We will most likely drop support for MySQL, so avoid pointing new users to it. See #2825 Signed-off-by: Michal Čihař <michal@cihar.com>

nijel mentioned this issue Jul 11, 2019

OpenShift image build should not be adding database details to image. #2875

Closed

nijel mentioned this issue Jul 18, 2019

Large memory usage by celery workers #2549

Closed

nijel removed the undecided These features might not be implemented. Can be prioritized by sponsorship. label Aug 31, 2019

nijel added this to the 4.0 milestone Aug 31, 2019

nijel added a commit that referenced this issue Aug 31, 2019

Backend: Prepare for supporting PostgreSQL only

ff3327c

Add check showing warning about not supported setup in future and add migration docs. Issue #2825 Signed-off-by: Michal Čihař <michal@cihar.com>

nijel changed the title ~~Reconsider database backends support~~ Support PostgreSQL only Aug 31, 2019

nijel mentioned this issue Sep 1, 2019

Support object store besides file store for persistancy in Weblate #2984

Open

nijel pinned this issue Sep 1, 2019

This was referenced Sep 16, 2019

Add morphology support to glossary and search #3023

Open

Cannot run container 3.8.1 WeblateOrg/docker#379

Closed

nijel added a commit that referenced this issue Sep 22, 2019

Docs: Make example config prepared for PosgtreSQL

cf8d074

See #2825 Signed-off-by: Michal Čihař <michal@cihar.com>

nijel mentioned this issue Jan 17, 2020

Add batch API for machinery #3393

Closed

zenhack mentioned this issue Feb 29, 2020

Use Weblate for continuous translation sandstorm-io/sandstorm#3248

Open

nijel changed the title ~~Support PostgreSQL only~~ Support for PostgreSQL and MySQL only Mar 2, 2020

nijel added a commit that referenced this issue Mar 2, 2020

CI: Add migration tests for MySQL

cafc54b

Issue #2825

nijel added a commit that referenced this issue Mar 2, 2020

CI: Remove migration tests for SQLite

5242e48

Support for it will be removed in the 4.0 release. Issue #2825

nijel added a commit that referenced this issue Mar 2, 2020

CI: Switch tests to MySQL

3f366cc

Issue #2825

nijel added a commit that referenced this issue Mar 2, 2020

Misc: Remmove mentions of sqlite support

dcc9392

It will be gone in 4.0. Issue #2825

nijel added a commit that referenced this issue Mar 2, 2020

CI: Add tests using MariaDB

18c3c81

This is probably more widely used than original MySQL. Issue #2825

nijel self-assigned this Mar 10, 2020

nijel unpinned this issue Mar 10, 2020

nijel closed this as completed in a091b10 Mar 10, 2020

nijel mentioned this issue Aug 6, 2020

Weblate+Apache+MySQL+WSCGI setup docs are unusable (/ non-existent?) #4261

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for PostgreSQL and MySQL only #2825

Support for PostgreSQL and MySQL only #2825

nijel commented Jun 22, 2019 •

edited

nijel commented Jun 22, 2019

nijel commented Jul 25, 2019

nijel commented Aug 31, 2019

maxking commented Sep 20, 2019

nijel commented Sep 20, 2019

unho commented Nov 20, 2019

nijel commented Nov 21, 2019

DominicWatson commented Nov 26, 2019

nijel commented Nov 26, 2019

unho commented Nov 27, 2019

nijel commented Nov 28, 2019

DominicWatson commented Nov 28, 2019

unho commented Nov 28, 2019

nijel commented Nov 28, 2019

nijel commented Feb 20, 2020

nijel commented Feb 21, 2020 •

edited

rob006 commented Feb 21, 2020

nijel commented Feb 21, 2020

mestaritonttu commented Feb 27, 2020

nijel commented Feb 28, 2020

github-actions bot commented Mar 10, 2020

nijel commented Mar 12, 2020

mestaritonttu commented Mar 12, 2020

nijel commented Mar 12, 2020

Support for PostgreSQL and MySQL only #2825

Support for PostgreSQL and MySQL only #2825

Comments

nijel commented Jun 22, 2019 • edited

nijel commented Jun 22, 2019

nijel commented Jul 25, 2019

nijel commented Aug 31, 2019

maxking commented Sep 20, 2019

nijel commented Sep 20, 2019

unho commented Nov 20, 2019

nijel commented Nov 21, 2019

DominicWatson commented Nov 26, 2019

nijel commented Nov 26, 2019

unho commented Nov 27, 2019

nijel commented Nov 28, 2019

DominicWatson commented Nov 28, 2019

unho commented Nov 28, 2019

nijel commented Nov 28, 2019

nijel commented Feb 20, 2020

nijel commented Feb 21, 2020 • edited

rob006 commented Feb 21, 2020

nijel commented Feb 21, 2020

mestaritonttu commented Feb 27, 2020

nijel commented Feb 28, 2020

github-actions bot commented Mar 10, 2020

nijel commented Mar 12, 2020

mestaritonttu commented Mar 12, 2020

nijel commented Mar 12, 2020

nijel commented Jun 22, 2019 •

edited

nijel commented Feb 21, 2020 •

edited