New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling task failures+retries during testing #57

Closed
rpkilby opened this Issue Apr 9, 2018 · 12 comments

Comments

3 participants
@rpkilby
Contributor

rpkilby commented Apr 9, 2018

As someone new to dramatiq, the retry middleware may cause task failures during testing to appear as if the test worker is hanging indefinitely or is otherwise failing to execute tasks. What actually happened:

  • Some of my application task code failed during testing, causing the retry middleware to kick in.
  • The retry middleware defaults to a 7-day maximum backoff.
  • pytest tends to capture stdout/stderr, and terminating the test run causes that output to be lost.

Given the lack of output and apparent hang, my initial reaction was confusion and to assume that I had somehow misconfigured the test worker. The reality was that everything was working correctly, and pytest was just eating the output.

The unit testing guide should probably mention that apparent hangs may be the result of the retry middleware, and that they should try temporarily setting max_retries=0 or consider disabling the retry middleware during testing.

@rpkilby

This comment has been minimized.

Contributor

rpkilby commented Apr 9, 2018

Additionally, would it make sense for the broker and worker .join() methods to have a timeout?

@Bogdanp Bogdanp self-assigned this Apr 10, 2018

@Bogdanp Bogdanp added the enhancement label Apr 10, 2018

@Bogdanp

This comment has been minimized.

Owner

Bogdanp commented Apr 10, 2018

I've run into this problem myself and been confused by it so it's definitely something that ought to get fixed. I think adding a timeout to join() (as you suggest) could be a first step toward improving this. In addition to that, it may make sense to have some kind of a mode where join raises task exceptions in the current thread (I think this is possible, though it might turn out a little ugly).

@Bogdanp Bogdanp closed this in d676c3a Apr 14, 2018

@nachopro

This comment has been minimized.

nachopro commented Jul 18, 2018

Excuseme againg @Bogdanp :)

I'm using PyTest (and I followed your example with a broker and a worker as fixtures) in a Django project and I not understand how to set the timeout properly (if it's the fix for this issue).

Setting an timeout in the worker.join and/or worker.join raises a django.db.utils.OperationalError: database table is locked exception.

Following that @rpkilby says, I tried to disable dramatiq.middleware.Retries from the settins.py that I use for testing, but raises a django.db.utils.OperationalError: database table is locked exception too.
Using Actor's decorator with max_retries=0 raises the same exception.

Thanks!

Here is my code:
conf.test

import dramatiq
import pytest


@pytest.fixture
def broker():
    broker = dramatiq.get_broker()
    broker.flush_all()
    return broker


@pytest.fixture
def worker(broker):
    worker = dramatiq.Worker(broker, worker_timeout=100)
    worker.start()
    yield worker
    worker.stop()

Some test_blabla.py

@pytest.mark.django_db
def test_optout(broker, worker):
    # A bunch of code (?)
    create_my_model_instance_task.send("Pepe")

    broker.join("default")
    worker.join()

    assert MyModel.objects.get(name="Pepe")
@rpkilby

This comment has been minimized.

Contributor

rpkilby commented Jul 18, 2018

Hi @nachopro - I believe you need to use @pytest.mark.django_db(transaction=True).

Based on the error, I assume you're using sqlite, which doesn't support concurrent writes (see: "High Concurrency"). What's happening is that your test case opens a database transaction, creating a lock and preventing other connections from writing. While the test case holds the transaction open, the task is sent and processed in the worker thread, attempting to issue a write with a different connection. Since the database is already locked by a separate connection, sqlite complains with an OperationalError.

By using transaction=True, this changes the test case to mirror django.test.TransactionTestCase, which counterintuitively doesn't wrap the test case in a transaction. Without the open transaction and database lock, the task/worker thread is free to write to the database.


@Bogdanp - please correct me if I got anything wrong here. My knowledge of this area is not that deep.

@Bogdanp

This comment has been minimized.

Owner

Bogdanp commented Jul 18, 2018

@rpkilby I think you're right. @nachopro please try @rpkilby's suggestion and let us know if that solves the issue for you.

@nachopro

This comment has been minimized.

nachopro commented Jul 18, 2018

Hi @rpkilby
Yes, I'm using SQLite. I tried with transaction=True but I have the same issue.

The real exception that my create_my_model_instance_task raises is django.db.utils.IntegrityError: NOT NULL constraint failed: ... (I saw it after removing .send from my source code).

Maybe I must test the task isolated (without .send) previous to integration test in order to use SQLite.
This is the first testcase that generates this kind of issue in my short life with Dramatiq.

I hope @Bogdanp has a good idea :) (Sorry if I sound rude or pedantic, English is't my native language).

Thanks @Bogdanp and @rpkilby

@nachopro

This comment has been minimized.

nachopro commented Jul 24, 2018

Hi @Bogdanp I wrote a sample project that expose the problem that I'm facing.
Thank you!

https://github.com/nachopro/test_failed

@Bogdanp

This comment has been minimized.

Owner

Bogdanp commented Jul 24, 2018

@nachopro cool! I'll try to take a look tonight.

@nachopro

This comment has been minimized.

nachopro commented Aug 5, 2018

Sorry @Bogdanp 😊
Any suggestion about this? Thanks a lot!

@Bogdanp

This comment has been minimized.

Owner

Bogdanp commented Aug 5, 2018

Apologies @nachopro, I could swear I made a PR to the project you linked a couple of weeks ago fixing the problem, but I don't see it anymore so I must've done something wrong.

If you change this line to

@pytest.mark.django_db(transaction=True)

(what @rpkilby was suggesting), everything will work fine.

@nachopro

This comment has been minimized.

nachopro commented Aug 5, 2018

Ohhh!!! So so sorry to @rpkilby and you too @Bogdanp
When I tried transaction = True (some weeks ago), I only applied it to the specific test that "broken" my suite and not made difference :goberserk: . But now I applied him to the whole tests and runs fine! 😊

Thanks!

@Bogdanp

This comment has been minimized.

Owner

Bogdanp commented Aug 5, 2018

@nachopro no worries! Glad to hear it's working for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment