New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear environment caches and recomputations upon failures #131

Merged
merged 1 commit into from Nov 6, 2015

Conversation

Projects
None yet
3 participants
@npiganeau
Contributor

npiganeau commented Oct 21, 2015

Two fixes here for v8.0:

Recompute fields before closing the cursor

Current behaviour
When there are fields to be recomputed inside the job function and that the latter crashes with a retryable OperationalError, nothing happens. The job remains marked 'started' and blocks the channel. On the log, we can see a Unable to use closed cursor exception after the OperationalError of the job.

Expected behaviour
The job should be set to pending according thanks to rety_postpone() function, and free the channel.

Explanation of the problem
If the job fails on a transaction where there are fields to recompute (in Environments.todo) then
these fields will never be recomputed because the cursor will be closed and this will lead to a "Unable to use close cursor exception on the next write operation (since write triggers the recomputation of fields).

Since the next write operation is the postponing of the job in retry_postpone(), the job remains marked as 'started' whereas it is actually crashed.

Proposed solution
The proposed solution is to clear the recompute Environments.todo list when an exception occurs.

Show outdated Hide outdated connector/controllers/main.py
raise

This comment has been minimized.

@guewen

guewen Oct 23, 2015

Member

This code is duplicated with the next except. Could you add an inner try: except: so that the raise for OperationalError which are not in PG_CONCURRENCY_ERRORS_TO_RETRY will be trapped by the global Exception except? As we'll have one more indentation level, it would be worth to extract the processing.

Something as follows:

try:
    try:
        return self._try_perform_job(session_hdl, job)
    except OperationalError as err:
        if err.pgcode not in PG_CONCURRENCY_ERRORS_TO_RETRY:
            raise
        retry_postpone(job, unicode(err), seconds=PG_RETRY)
        [...]
except RetryableJobError as err:
    [...]
except (FailedJobError, Exception):
    [...]
@guewen

guewen Oct 23, 2015

Member

This code is duplicated with the next except. Could you add an inner try: except: so that the raise for OperationalError which are not in PG_CONCURRENCY_ERRORS_TO_RETRY will be trapped by the global Exception except? As we'll have one more indentation level, it would be worth to extract the processing.

Something as follows:

try:
    try:
        return self._try_perform_job(session_hdl, job)
    except OperationalError as err:
        if err.pgcode not in PG_CONCURRENCY_ERRORS_TO_RETRY:
            raise
        retry_postpone(job, unicode(err), seconds=PG_RETRY)
        [...]
except RetryableJobError as err:
    [...]
except (FailedJobError, Exception):
    [...]
Show outdated Hide outdated connector/session.py
# Try to recompute fields before closing the cursor
self.env['ir.model'].recompute()
except:
self.rollback()

This comment has been minimized.

@guewen

guewen Oct 23, 2015

Member

I am not comfortable with this change, because I had great trouble to understand why it happens and how this fix resolves it. retry_postpone opens a new transaction and a new Environment (https://github.com/OCA/connector/blob/9.0/connector/session.py#L73), so I assumed that the todo added before the OperationalError would not leak into the new transaction. Happens I was wrong, because todo is shared by all environments (https://github.com/odoo/odoo/blob/9.0/openerp/api.py#L852) which is rather unexpected.

However, this fix seems a bit "radical" to me, shouldn't we prevent this to happen at the first place and clear the Environment (Environment.clear() https://github.com/odoo/odoo/blob/9.0/openerp/api.py#L821-L826) only at the moment when we catch the OperationalError?

@guewen

guewen Oct 23, 2015

Member

I am not comfortable with this change, because I had great trouble to understand why it happens and how this fix resolves it. retry_postpone opens a new transaction and a new Environment (https://github.com/OCA/connector/blob/9.0/connector/session.py#L73), so I assumed that the todo added before the OperationalError would not leak into the new transaction. Happens I was wrong, because todo is shared by all environments (https://github.com/odoo/odoo/blob/9.0/openerp/api.py#L852) which is rather unexpected.

However, this fix seems a bit "radical" to me, shouldn't we prevent this to happen at the first place and clear the Environment (Environment.clear() https://github.com/odoo/odoo/blob/9.0/openerp/api.py#L821-L826) only at the moment when we catch the OperationalError?

This comment has been minimized.

@npiganeau

npiganeau Oct 23, 2015

Contributor

This means we would remove the todo instead of calculating them. It should come to the same result since they are stored fields, and is probably cleaner than trying to calculate them.

But shouldn't we keep this cleaning in ConnectorSession.close() in case we get the error from somewhere else ?

Or maybe we could modify ConnectorSessionHandler.session() by using clear_upon_failure this way ?

        with openerp.api.Environment.manage():
            db = openerp.sql_db.db_connect(self.db_name)
            session = ConnectorSession(db.cursor(),
                                       self.uid,
                                       context=self.context)
            try:
                RegistryManager.check_registry_signaling(self.db_name)
                with session.clear_upon_failure():
                    yield session
                RegistryManager.signal_caches_change(self.db_name)
            except:
@npiganeau

npiganeau Oct 23, 2015

Contributor

This means we would remove the todo instead of calculating them. It should come to the same result since they are stored fields, and is probably cleaner than trying to calculate them.

But shouldn't we keep this cleaning in ConnectorSession.close() in case we get the error from somewhere else ?

Or maybe we could modify ConnectorSessionHandler.session() by using clear_upon_failure this way ?

        with openerp.api.Environment.manage():
            db = openerp.sql_db.db_connect(self.db_name)
            session = ConnectorSession(db.cursor(),
                                       self.uid,
                                       context=self.context)
            try:
                RegistryManager.check_registry_signaling(self.db_name)
                with session.clear_upon_failure():
                    yield session
                RegistryManager.signal_caches_change(self.db_name)
            except:

This comment has been minimized.

@guewen

guewen Oct 23, 2015

Member

This means we would remove the todo instead of calculating them. It should come to the same result since they are stored fields, and is probably cleaner than trying to calculate them.

Yes and calculating them seems useless if anyway the transaction is broken.

But shouldn't we keep this cleaning in ConnectorSession.close() in case we get the error from somewhere else ?
Or maybe we could modify ConnectorSessionHandler.session() by using clear_upon_failure this way ?

I would prefer if we could first prove that it could happen somewhere else and then fix this other place.

Also ConnectorSession is quite redundant with Odoo's Environment and in the future we could start to remove ConnectorSession in favor of openerp.Environment so we should try not to add behavioural changes that we wouldn't know how to replace.

@guewen

guewen Oct 23, 2015

Member

This means we would remove the todo instead of calculating them. It should come to the same result since they are stored fields, and is probably cleaner than trying to calculate them.

Yes and calculating them seems useless if anyway the transaction is broken.

But shouldn't we keep this cleaning in ConnectorSession.close() in case we get the error from somewhere else ?
Or maybe we could modify ConnectorSessionHandler.session() by using clear_upon_failure this way ?

I would prefer if we could first prove that it could happen somewhere else and then fix this other place.

Also ConnectorSession is quite redundant with Odoo's Environment and in the future we could start to remove ConnectorSession in favor of openerp.Environment so we should try not to add behavioural changes that we wouldn't know how to replace.

Show outdated Hide outdated connector/controllers/main.py
if err.pgcode not in PG_CONCURRENCY_ERRORS_TO_RETRY:
raise
session.env.clear()

This comment has been minimized.

@guewen

guewen Oct 23, 2015

Member

One last thing, could you add a comment explaining why we need to clear the env?

Thanks already for the changes!

@guewen

guewen Oct 23, 2015

Member

One last thing, could you add a comment explaining why we need to clear the env?

Thanks already for the changes!

This comment has been minimized.

@guewen

guewen Oct 23, 2015

Member

I was able to reproduce this issue after this correction. I will investigate on Monday. BTW we can wonder if it should not be fixed in openerp itself.

@guewen

guewen Oct 23, 2015

Member

I was able to reproduce this issue after this correction. I will investigate on Monday. BTW we can wonder if it should not be fixed in openerp itself.

This comment has been minimized.

@guewen

guewen Oct 23, 2015

Member

Note: the session variable here is surely closed as it is used outside its context manager. Not sure it is a problem though. Maybe we should put the OperationalError except inside of a session_hdl.session context manager and pass the session to try_perform_job instead of the handler, thus we would act on the correct env. I can try that Monday.

@guewen

guewen Oct 23, 2015

Member

Note: the session variable here is surely closed as it is used outside its context manager. Not sure it is a problem though. Maybe we should put the OperationalError except inside of a session_hdl.session context manager and pass the session to try_perform_job instead of the handler, thus we would act on the correct env. I can try that Monday.

This comment has been minimized.

@npiganeau

npiganeau Oct 23, 2015

Contributor

Actually, this code might have undesirable side effects.

Indeed, if we have another env (completely outside the connector) which has fields to recalculate at the same time that our job is failing, then the fields for that env will be cleared too since the todos are shared between envs.

I wonder what is the correct way to handle this in Odoo ...

@npiganeau

npiganeau Oct 23, 2015

Contributor

Actually, this code might have undesirable side effects.

Indeed, if we have another env (completely outside the connector) which has fields to recalculate at the same time that our job is failing, then the fields for that env will be cleared too since the todos are shared between envs.

I wonder what is the correct way to handle this in Odoo ...

This comment has been minimized.

@guewen

guewen Oct 26, 2015

Member

Actually, this code might have undesirable side effects.

Indeed, if we have another env (completely outside the connector) which has fields to recalculate at the same time that our job is failing, then the fields for that env will be cleared too since the todos are shared between envs.

This controller is the entry point of the request. So we should not have other environments than the ones we created here for the handling of the job or for its execution.

@guewen

guewen Oct 26, 2015

Member

Actually, this code might have undesirable side effects.

Indeed, if we have another env (completely outside the connector) which has fields to recalculate at the same time that our job is failing, then the fields for that env will be cleared too since the todos are shared between envs.

This controller is the entry point of the request. So we should not have other environments than the ones we created here for the handling of the job or for its execution.

This comment has been minimized.

@guewen

guewen Oct 26, 2015

Member

Thinking again, I think your proposition to use clear_upon_failure in ConnectorSessionHandler is better and should cover all places where this error can occur.

    @contextmanager
    def session(self):
        """ Context Manager: start a new session and ensure that the
        session's cursor is:

        * rollbacked on errors
        * commited at the end of the ``with`` context when no error occured
        * always closed at the end of the ``with`` context
        * it handles the registry signaling
        """
        with openerp.api.Environment.manage():
            db = openerp.sql_db.db_connect(self.db_name)
            session = ConnectorSession(db.cursor(),
                                       self.uid,
                                       context=self.context)

            try:
                with session.env.clear_upon_failure():
                    RegistryManager.check_registry_signaling(self.db_name)
                    yield session
                    RegistryManager.signal_caches_change(self.db_name)
            except:
                session.rollback()
                raise
            else:
                session.commit()
            finally:
                session.close()

@guewen

guewen Oct 26, 2015

Member

Thinking again, I think your proposition to use clear_upon_failure in ConnectorSessionHandler is better and should cover all places where this error can occur.

    @contextmanager
    def session(self):
        """ Context Manager: start a new session and ensure that the
        session's cursor is:

        * rollbacked on errors
        * commited at the end of the ``with`` context when no error occured
        * always closed at the end of the ``with`` context
        * it handles the registry signaling
        """
        with openerp.api.Environment.manage():
            db = openerp.sql_db.db_connect(self.db_name)
            session = ConnectorSession(db.cursor(),
                                       self.uid,
                                       context=self.context)

            try:
                with session.env.clear_upon_failure():
                    RegistryManager.check_registry_signaling(self.db_name)
                    yield session
                    RegistryManager.signal_caches_change(self.db_name)
            except:
                session.rollback()
                raise
            else:
                session.commit()
            finally:
                session.close()

@guewen

This comment has been minimized.

Show comment
Hide comment
@guewen

guewen Oct 26, 2015

Member

I opened odoo/odoo#9236 for the first part of the bug.

@npiganeau if you don't mind to split the Mark the job as failed after non-recoverable OperationalError part in another PR, we could get it merged sooner as it's ok for me.

Member

guewen commented Oct 26, 2015

I opened odoo/odoo#9236 for the first part of the bug.

@npiganeau if you don't mind to split the Mark the job as failed after non-recoverable OperationalError part in another PR, we could get it merged sooner as it's ok for me.

npiganeau added a commit to npiganeau/connector that referenced this pull request Oct 26, 2015

npiganeau added a commit to npiganeau/connector that referenced this pull request Oct 26, 2015

@guewen

This comment has been minimized.

Show comment
Hide comment
@guewen

guewen Oct 26, 2015

Member

Does the last correction works for you? Could you remove the commits of #132 and squash?

Thanks for your great work on this PR and #132!

Member

guewen commented Oct 26, 2015

Does the last correction works for you? Could you remove the commits of #132 and squash?

Thanks for your great work on this PR and #132!

@npiganeau

This comment has been minimized.

Show comment
Hide comment
@npiganeau

npiganeau Oct 26, 2015

Contributor

Here you go. It works for me.

Contributor

npiganeau commented Oct 26, 2015

Here you go. It works for me.

@guewen

This comment has been minimized.

Show comment
Hide comment
@guewen

guewen Oct 27, 2015

Member

#132 is merged, could you remove the changes in connector/controllers/main.py which are done in #132?

Member

guewen commented Oct 27, 2015

#132 is merged, could you remove the changes in connector/controllers/main.py which are done in #132?

@guewen guewen changed the title from [8.0] Fixes on job error catching to Clear environment caches and recomputations upon failures Oct 28, 2015

@guewen

This comment has been minimized.

Show comment
Hide comment
@guewen

guewen Nov 6, 2015

Member

Works fine, thanks a lot

👍

Member

guewen commented Nov 6, 2015

Works fine, thanks a lot

👍

@lmignon

This comment has been minimized.

Show comment
Hide comment
@lmignon

lmignon Nov 6, 2015

Contributor

I've not personally tested the changes but followed your discussions and all these changes makes sens... 👍

Contributor

lmignon commented Nov 6, 2015

I've not personally tested the changes but followed your discussions and all these changes makes sens... 👍

guewen added a commit that referenced this pull request Nov 6, 2015

Merge pull request #131 from npiganeau/8.0-npiganeau-patch-1
Clear environment caches and recomputations upon failures

@guewen guewen merged commit 10015f8 into OCA:8.0 Nov 6, 2015

2 checks passed

ci/runbot runbot build 3118404-131-cf2b6b (runtime 30s)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment