New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for PostgreSQL and MySQL only #2825
Comments
The migration guide for the database can be found here https://www.calazan.com/migrating-django-app-from-mysql-to-postgresql/ |
We might eventually drop support for MySQL, so better give users an early warning. Issue #2825 Signed-off-by: Michal Čihař <michal@cihar.com>
We will most likely drop support for MySQL, so avoid pointing new users to it. See #2825 Signed-off-by: Michal Čihař <michal@cihar.com>
Another nice benefit of switching to Postgres might be using Agagama instead of writing translation memory service on our own, see https://github.com/translate/amagama |
Add check showing warning about not supported setup in future and add migration docs. Issue #2825 Signed-off-by: Michal Čihař <michal@cihar.com>
Things to improve based on this change:
|
While whoosh is pretty slow, we have seen a lot of performance improvements when using Django-haystack with Elasticsearch in Mailman project. It is definitely another service to support, but with containers it isn't a lot of work to setup. There are system packages for most operating system. Xapian is written in C++ and also is one the supported backends in Django-haystack. Just a thought, in case the only reason to drop support for other databases is slow searching. |
In case the PostgreSQL only solution will not perform well, Elasticsearch is certainly worth trying. However my current impression is that we can do all what we need with PostgreSQL and reasonable query parser. We might stick with Whoosh for query parsing as it was quite easy to adapt for our purpose, see https://github.com/WeblateOrg/weblate/blob/master/weblate/utils/search.py. PS: The performance is not the biggest issue with Whoosh for me, it does search quite fast. The biggest issue is huge memory consumption (the updating process easily consumes gigabytes of memory) and lockups while updating (I've not yet found time to debug this). |
See #2825 Signed-off-by: Michal Čihař <michal@cihar.com>
amaGama is a standalone service, so there is no need for Weblate to migrate to PostgreSQL since Weblate will communicate with it using http://docs.translatehouse.org/projects/amagama/en/latest/api.html. And yes, amaGama is PostgreSQL only, AFAICT. |
@unho I know it's standalone service. Still for most deployments it makes sense if they can share same database. Anyway, the main motivation of switch to PostgreSQL is to have database with decent full-text search and to get rid of Whoosh as a full-text engine which apparently doesn't scale well for our purposes. |
I would love it if the database engine implementation remained agnostic. For those of us deploying Weblate into existing infrastructure, being able to use our existing database engines is a real bonus. If this were Postgres only, I simply wouldn't use it (I don't expect anyone else to care about that, but just stating my case here). Seems crazy to break compatibility when you already have it. |
I'd love to keep the compatibility, but I don't see way to do it right now. We need to replace Whoosh with something what scales better and doing the full-text search in the database seems to be the way to go. Django has built in support for it on PostgreSQL so that looks like the best choice. Support for MySQL is not implemented (see for example adamchainz/django-mysql#314 which is open for years) and we certainly don't have manpower to implement that. I won't reject merge requests adding support for other databases, but we will definitely switch to using in database full-text search and native JSON fields in the 4.0 with initial support for PostgreSQL only. |
Sorry for interject. I guess that Whoosh was being used to provide translation memory matches, and you need a faster replacemente, have you considered using Elasticsearch instead? |
It's used for search in the translations as well. Doing that in Elasticsearch would mean either to duplicate whole Weblate database there (to be able to do filtering based on user permissions, projects or languages). Right now we do ID lookup in Whoosh and then additional filtering in the database, what performs terrible in case of common terms which produce millions of matches in the full-text search. But you're right that Elasticseach would be indeed great solution for translation memory, which is pretty much write only and needs only fulltext searches. We can also switch to using external service for translation memory such as amaGama. |
Roger that (didn't realise that MySQL was already not supported, spent quite a bit of time trying to get it working ;) ). Makes sense in that case. |
If I recall correctly Pootle has the concept of external TM which are not meant to be altered (like amaGama), and local TM using Elasticsearch which are updated every time a translation is saved and can be updated/bootstraped using TMX or PO files. More in https://pootle.readthedocs.io/en/latest/features/translation_memory.html Just sharing in case it might help. |
MySQL is currently supported. The Django integration just doesn't support features we will use in upcoming 4.0 release. @unho We have external read only memories supported as well. What we're looking for is read-write support and I'm still not sure whether we will support that for third party services, but some people are interested in that as well. |
Did some more research on this and it seems that keeping support for MySQL is doable:
The question is whether it's worth of the effort ;-). |
On the other side, there are some limitations of MySQL which we are hitting already and causing performance penalties or other limitations:
|
That depends on used collation. If you want strict comparison,
This is configurable, and default configuration was changed years ago, it should not be a problem anymore. |
True, but that breaks case-insensitive lookups in Django. See https://code.djangoproject.com/ticket/9682
Thanks for update, I was not aware of that. I believe that many of the issues we see could be addressed at ORM level in Django, but PosgtgreSQL seems to be the primary focus for them. |
Just ran into this, which might be useful for you: Postgresql Search: From the trenches |
Thanks, I've already read a lot on the topic and I'm aware that it's not a silver bullet. I've done some testing on data from Hosted Weblate and PostgreSQL performs reasonably well on that (about 10 times faster than our current implementation using Whoosh) and MySQL is performance wise about similar to current solution (this is mostly caused by lack of index merging). In both cases it would address huge memory consumption we see with Whoosh. So, it seems to be the way to go for now because it will scale from small installations to bigger ones. As for Elasticseach, I don't think it's way to go for us. It's definitely great in doing searches on text. The problem is in the more complex queries we support. We probably would have to mirror most of our database to it (what could easily go wrong). The alternative approach of doing full text in Elasticsearch and then filtering results in database doesn't scale well for things that have millions of matches in Elasticsearch and would be heavily filtered on the database side. |
Support for it will be removed in the 4.0 release. Issue #2825
This is probably more widely used than original MySQL. Issue #2825
Thank you for your report, the issue you have reported has just been fixed.
|
The good news is that I've maintained MySQL support for 4.0. The bad news is that it seems to perform really badly. The Weblate testsuite takes 35 minutes on PostgreSQL, but 150 minutes on MySQL/MariaDB. I've probably made something wrong there (or MySQL really sucks in this use case). Any suggestions to improve the peformance are welcome. |
In your commit 9c75b52 you have a comment "This is workaroud for MySQL as FULL TEXT index does not work well inside a transaction, so we avoid using transactions for tests". From what little I know, InnoDB perf can be horrible, if not utilising transactions. |
Yes, that can be reason for the slowness (what would be good as it would not affect real usage). The tests before executed in reasonable time (https://github.com/WeblateOrg/weblate/runs/501385902?check_suite_focus=true, though there are several failures caused by not updated fulltext indices). This change however also makes MySQL actually update the full-text indices, so that might be the slow part as well. The method however doesn't affect whether transactions are user or not, but only how tests are executed in Django. The default approach is to use transactions to rollback database to original state, which sadly doesn't work with MySQL because fulltext index is updated only on commit and thus using it during the transaction results in not getting up to date results (see https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-index.html#innodb-fulltext-index-transaction). |
Currently Weblate supports any database backend supported by Django. That's nice, however that limits possibilities to use database specific features.
For example switching to PostgreSQL only would allow us to use it's full text search removing need for external full text engine. The Whoosh usage currently leads to some problems for example see #2549 and #2543. Having in database full text engine would also heavily improve it's performance, because currently it's first doing search in Whoosh to get fulltext matches and then filter the results in the database to perform filtering by translation and permission checks, what leads to issues as #2876.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
The text was updated successfully, but these errors were encountered: