Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UWSGI workers got busy for too long when processing big PO files upload #5691

Closed
dgarana opened this issue Mar 18, 2021 · 6 comments
Closed
Assignees
Labels
bug Something is broken.
Milestone

Comments

@dgarana
Copy link

dgarana commented Mar 18, 2021

When managing big PO files (5MB -- 20.000 phrases on components) UWSGI workers get busy for very long period hence weblate get completely stuck due no available workers to process the request.

I already tried to split the file into different chunks and use a different upload "method" to handle the file (using add instead of replace). But haven't experienced any improvement regarding the response time for projects with 20.000 phrases and 120 languages.

To Reproduce the issue

I've created a script to reproduce this locally:

# System imports
import logging
import uuid

# Third-party imports
import polib
import requests

# Local imports
# Set up logging capabilities
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(message)s')
LOGGER = logging.getLogger("tester")

# Set up Weblate AUTH
TOKEN = 'YOUT TOKEN GOES HERE'
WEBLATE_URL = 'http://localhost:8080/api'
HEADERS = {'Authorization': 'Token {}'.format(TOKEN)}

# Set up globals
PROJECT = 'testing_project'
COMPONENT = 'testing_component'
SOURCE_LANGUAGE = 'en'
LANGUAGES = ['es', 'it', 'fr', 'pt', 'de']
POT_FILE = 'example.pot'
PO_ENTRIES = 100

DEFAULT_METADATA = {
    'Project-Id-Version': '1.0',
    'Report-Msgid-Bugs-To': 'you@example.com',
    'POT-Creation-Date': '2007-10-18 14:00+0100',
    'PO-Revision-Date': '2007-10-18 14:00+0100',
    'Last-Translator': 'you <you@example.com>',
    'Language-Team': 'English <yourteam@example.com>',
    'MIME-Version': '1.0',
    'Content-Type': 'text/plain; charset=utf-8',
    'Content-Transfer-Encoding': '8bit',
}


def generate_pot_file(filename, num_entries, with_context=False, metadata=DEFAULT_METADATA):
    LOGGER.debug('Generating POT file')
    pot_file = polib.POFile()
    pot_file.metadata = metadata
    for _ in range(num_entries):
        unique_id = str(uuid.uuid4())
        entry_args = {
            'msgid': unique_id,
            'msgstr': "",
        }
        if with_context:
            entry_args['msgctxt'] = '/scontext/{}'.format(unique_id)
        pot_entry = polib.POEntry(**entry_args)
        pot_file.append(pot_entry)
    pot_file.save(filename)
    LOGGER.debug('Generated POT file')


def translate_pot(pot_filename, language, is_source=False):
    po_filename = '{}.po'.format(language)
    LOGGER.debug('Generating PO file for %s', language)
    po_file = polib.pofile(pot_filename)
    for entry in po_file:
        entry.msgstr = entry.msgid if is_source else '{} -- {}'.format(language, entry.msgid)
    po_file.save(po_filename)
    LOGGER.debug('Generated PO file for %s', language)
    return po_filename


def create_weblate_project(project_name):
    LOGGER.debug('Creating project %s on weblate', project_name)
    r = requests.post('{}/projects/'.format(WEBLATE_URL),
                      headers=HEADERS,
                      data={'name': project_name,
                            'slug': project_name,
                            'web': 'https://google.com'})
    try:
        r.raise_for_status()
    except Exception as e:
        LOGGER.critical(r.text)
        raise
    LOGGER.debug('Created project %s on weblate', project_name)

    
def create_weblate_component(project_name, component_name, po_file):
    LOGGER.debug('Creating component %s under project %s in weblate', component_name, project_name)
    data = {
        "project": project_name,
        "file_format": "po-mono",
        "edit_template": True,
        "filemask": "*.po",
        "name": component_name,
        "repo": "local:",
        "slug": component_name,
        "vcs": "local",
        "source_language": "en",
    }
    r = requests.post('{}/projects/{}/components/'.format(WEBLATE_URL, project_name),
                      headers=HEADERS,
                      data=data,
                      files={"docfile": open(po_file, 'rb').read()})
    try:
        r.raise_for_status()
    except Exception as e:
        LOGGER.critical(r.text)
        raise
    LOGGER.debug('Created component %s under project %s in weblate', component_name, project_name)


def create_component_translation(project_name, component_name, language):
    LOGGER.debug('Creating %s translation for component %s under project %s in weblate', language, component_name, project_name)
    r = requests.post('{}/components/{}/{}/translations/'.format(WEBLATE_URL, project_name, component_name),
                     headers=HEADERS,
                     data={'language_code': language})
    try:
        r.raise_for_status()
    except Exception as e:
        LOGGER.critical(r.text)
        raise
    LOGGER.debug('Created %s translation for component %s under project %s in weblate', language, component_name, project_name)


def upload_component_translation(project_name, component_name, po_file, language, method="replace"):
    LOGGER.debug('Creating %s translation for component %s under project %s in weblate', language, component_name, project_name)
    data = {
        "method": method,
    }
    r = requests.post('{}/translations/{}/{}/{}/file/'.format(WEBLATE_URL, project_name, component_name, language),
                      headers=HEADERS,
                      data=data,
                      files={"file":open(po_file, 'rb').read()})
    try:
        r.raise_for_status()
    except Exception as e:
        LOGGER.critical(r.text)
        raise
    LOGGER.debug('Created %s translation for component %s under project %s in weblate', language, component_name, project_name)

generate_pot_file(POT_FILE, PO_ENTRIES, with_context=True)
create_weblate_project(PROJECT)
create_weblate_component(PROJECT, COMPONENT, translate_pot(POT_FILE, SOURCE_LANGUAGE, is_source=True))
for language in LANGUAGES:
    create_component_translation(PROJECT, COMPONENT, language)
    upload_component_translation(PROJECT, COMPONENT, translate_pot(POT_FILE, language), language)

Ideally, every time there's a long-running task, it should be handled by celery (as we do when creating the component, the API returns the task_url https://docs.weblate.org/en/latest/api.html#post--api-projects-(string-project)-components-)

I've attached some screenshots of what is silk reporting regarding the performance:
Screenshot 2021-03-18 at 11 26 57

@nijel
Copy link
Member

nijel commented Mar 18, 2021

You upload all the files sequentially, and that will cause quite some overhead in processing on the Weblate side (at least the monolingual base file has to be parsed for every language). Creating a ZIP file will all the translations and uploading it at once when creating a component will perform much better.

using add instead of replace

Add performance will be much better in 4.5.2.

Ideally, every time there's a long-running task, it should be handled by celery

That would have to be opt-in to avoid breaking existing API usage.

@nijel nijel added the question This is more a question for the support than an issue. label Mar 18, 2021
@github-actions
Copy link

This issue looks more like a support question than an issue. We strive to answer these reasonably fast, but purchasing the support subscription is not only more responsible and faster for your business but also makes Weblate stronger. In case your question is already answered, making a donation is the right way to say thank you!

@github-actions
Copy link

This issue has been automatically marked as stale because there wasn’t any recent activity.

It will be closed soon if no further action occurs.

Thank you for your contributions!

@github-actions github-actions bot added the wontfix Nobody will work on this. label Mar 29, 2021
@github-actions github-actions bot closed this as completed Apr 2, 2021
@nijel nijel reopened this Jul 12, 2021
@nijel nijel self-assigned this Jul 12, 2021
@nijel nijel removed the wontfix Nobody will work on this. label Jul 12, 2021
@github-actions
Copy link

This issue has been automatically marked as stale because there wasn’t any recent activity.

It will be closed soon if no further action occurs.

Thank you for your contributions!

@github-actions github-actions bot added the wontfix Nobody will work on this. label Jul 23, 2021
@nijel nijel added bug Something is broken. and removed question This is more a question for the support than an issue. wontfix Nobody will work on this. labels Jul 23, 2021
@nijel nijel added this to the 4.8 milestone Jul 23, 2021
@nijel nijel modified the milestones: 4.8, 4.8.1 Aug 21, 2021
@nijel nijel modified the milestones: 4.8.1, 4.9 Sep 10, 2021
@nijel
Copy link
Member

nijel commented Nov 9, 2021

There are several changes which should improve the performance in 4.9. Still, it will run synchronously - the API change is something I'd like to avoid at this point.

@nijel nijel closed this as completed Nov 9, 2021
@github-actions
Copy link

github-actions bot commented Nov 9, 2021

Thank you for your report; the issue you have reported has just been fixed.

  • In case you see a problem with the fix, please comment on this issue.
  • In case you see a similar problem, please open a separate issue.
  • If you are happy with the outcome, don’t hesitate to support Weblate by making a donation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken.
Projects
None yet
Development

No branches or pull requests

2 participants