Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with Jobs Queue #148

Closed
aignermi opened this issue Sep 17, 2015 · 29 comments
Closed

Problem with Jobs Queue #148

aignermi opened this issue Sep 17, 2015 · 29 comments

Comments

@aignermi
Copy link

hi,

we went live yesterday with our odoo setup. the whole sync from magento worked till we came to the sales-order part. 1200 sales orders were imported fine, but then the jobs queue started to make problems. i have now around 200 jobs in the queue which are all in state "Started" but are not running. We have 5 workers running and when i look in the odoo-connector.log i see the following error message:

2015-09-17 08:54:22,125 31162 ERROR odoo-production openerp.sql_db: bad query: UPDATE "queue_job" SET "job_function_id"=7,"channel"='root.magento' WHERE id IN (8267)
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/openerp/sql_db.py", line 234, in execute
res = self._obj.execute(query, params)
TransactionRollbackError: could not serialize access due to concurrent update
CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."queue_worker" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"

Can I pause all jobs and restart each one by one to find the issue or what is the best way to solve this?

thanks,
michael

@guewen
Copy link
Member

guewen commented Sep 17, 2015

Hi,

The concurrent update in the logs should not be an issue.
However, having 200 jobs started is all but normal.
Which revision of connector and connector-magento do you use?

By the way, if you don't yet, I recommend you to use the new job runner instead of the jobs workers.

As a quick fix, you can restart the connector workers and requeue all the started jobs.

@aignermi
Copy link
Author

hi,

thanks for the quick reply. we use connector 8.0.3.1.0 and connector-magento 8.0.3.0.0.

ok i'll definitely have a look at the job runner.

i tried restarting the job connector workers and reduced the workers to 1 and requeued all jobs and now the one active worker has 200 started jobs. something seems really wrong here.

and i just found another error message in the odoo-server.log:

2015-09-17 09:35:10,223 31367 ERROR odoo-production openerp.http: Exception during JSON request handling.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 537, in _handle_exception
return super(JsonRequest, self)._handle_exception(exception)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 574, in dispatch
result = self._call_function(*_self.params)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 310, in _call_function
return checked_call(self.db, *args, *_kwargs)
File "/usr/lib/python2.7/dist-packages/openerp/service/model.py", line 113, in wrapper
return f(dbname, _args, *_kwargs)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 307, in checked_call
return self.endpoint(_a, *_kw)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 803, in call
return self.method(_args, *_kw)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 403, in response_wrap
response = f(_args, *_kw)
File "/usr/lib/python2.7/dist-packages/openerp/addons/bus/bus.py", line 188, in poll
raise Exception("bus.Bus unavailable")
Exception: bus.Bus unavailable

@guewen
Copy link
Member

guewen commented Sep 17, 2015

the bus.Bus unavailable error is unrelated and comes from the bus addon, search on help.odoo.com for that.

Never saw such a thing. Could you activate the debug logs and show the logs of a few jobs?

@aignermi
Copy link
Author

sorry for my late reply, i was traveling the last days. could you please tell me how to activate the debug logs for the connector and where i can find the log file for it?

@guewen
Copy link
Member

guewen commented Sep 24, 2015

You should start your server with the option --log-level=debug or change this option in openerp.cfg.
You could also set log_handler = :INFO,openerp.addons.connector:DEBUG in openerp.cfg
The log file for the workers depends of the way you start it, I suggest you to provide a path in the --log-file.

@guewen
Copy link
Member

guewen commented Sep 24, 2015

(or just start to use the job runner and don't lose time on that)

@aignermi
Copy link
Author

ok you convinced me, i just tried setting up the job runner and i guess i have some questions to the setup. here is what i did, please correct me if i'm doing something wrong:

  1. add the following to the openerp-server.conf:
    workers = 5
    load = web,connector

  2. add the following to the /etc/init.d/odoo script:
    ODOO_CONNECTOR_CHANNELS=root:5
    ODOO_CONNECTOR_PORT=8069

  3. restart odoo

  4. disable the enqueue jobs scheduler process in odoo

but how are the still open jobs now started? is the job runner automatically picking them up? I enabled the debug log option but I don't see anything the in log file regarding the job runner.

thanks for your help!

@guewen
Copy link
Member

guewen commented Sep 24, 2015

If you could copy-paste your /etc/init.d/odoo in something like hastebin.com it could help.

AFAIK load is not read from the configuration file so you should add --load=web,connector on start command line.

ODOO_CONNECTOR_CHANNELS=root:5
ODOO_CONNECTOR_PORT=8069

Look like you declared them as local variables but they should be set as environment variables either with export either by prefixing the start command with them.

@aignermi
Copy link
Author

Thanks, I guess I'm at the next step right now. At least I see something in the log now:

2015-09-24 12:48:03,325 8151 INFO ? openerp.addons.connector.jobrunner: starting jobrunner thread (in prefork server)
2015-09-24 12:48:03,326 8151 DEBUG ? openerp.service.server: Multiprocess starting
2015-09-24 12:48:03,331 8159 INFO ? openerp.service.server: Worker WorkerHTTP (8159) alive
2015-09-24 12:48:03,332 8158 INFO ? openerp.service.server: Worker WorkerHTTP (8158) alive
2015-09-24 12:48:03,334 8157 INFO ? openerp.service.server: Worker WorkerHTTP (8157) alive
2015-09-24 12:48:03,341 8156 INFO ? openerp.service.server: Worker WorkerHTTP (8156) alive
2015-09-24 12:48:03,343 8155 INFO ? openerp.service.server: Worker WorkerHTTP (8155) alive
2015-09-24 12:48:03,347 8162 INFO ? openerp.service.server: Worker WorkerCron (8162) alive
2015-09-24 12:48:03,348 8161 INFO ? openerp.service.server: Worker WorkerCron (8161) alive
...
2015-09-24 12:48:08,335 8151 INFO ? openerp.addons.connector.jobrunner.runner: starting
2015-09-24 12:48:08,335 8151 INFO ? openerp.addons.connector.jobrunner.runner: initializing database connections
...
2015-09-24 12:48:08,945 8151 WARNING ? openerp.addons.connector.jobrunner.channels: unknown channel root.magento, using root channel for job d15339cd-7d6e-492b-aa3a-ff4798d78a6e
2015-09-24 12:48:08,945 8151 DEBUG ? openerp.addons.connector.jobrunner.channels: job d15339cd-7d6e-492b-aa3a-ff4798d78a6e marked failed in channel root(C:5,Q:0,R:0,F:1)
2015-09-24 12:48:08,945 8151 WARNING ? openerp.addons.connector.jobrunner.channels: unknown channel root.magento, using root channel for job c30b7189-8c18-4b07-a298-12fd547e63e5
2015-09-24 12:48:08,945 8151 DEBUG ? openerp.addons.connector.jobrunner.channels: job c30b7189-8c18-4b07-a298-12fd547e63e5 marked failed in channel root(C:5,Q:0,R:0,F:2)
2015-09-24 12:48:08,945 8151 WARNING ? openerp.addons.connector.jobrunner.channels: unknown channel root.magento, using root channel for job cbcb1057-d147-4f12-b75a-9c0267bb4891
2015-09-24 12:48:08,945 8151 DEBUG ? openerp.addons.connector.jobrunner.channels: job cbcb1057-d147-4f12-b75a-9c0267bb4891 marked failed in channel root(C:5,Q:0,R:0,F:3)
... (after I requeued the last failed job I get this message in the log)
2015-09-24 12:55:15,958 8151 WARNING ? openerp.addons.connector.jobrunner.channels: unknown channel root.magento, using root channel for job cbcb1057-d147-4f12-b75a-9c0267bb4891
2015-09-24 12:55:15,958 8151 DEBUG ? openerp.addons.connector.jobrunner.channels: job cbcb1057-d147-4f12-b75a-9c0267bb4891 marked pending in channel root(C:5,Q:169,R:1130,F:121)

But still, no processes are really worked on. I also still see one Worker (which is not alive) and has a lot of Jobs attached. Can it be that this is causing a problem?

@guewen
Copy link
Member

guewen commented Sep 24, 2015

The workers become useless with the job runner even if it shows one, just ignore it.

From what I see, the jobs are correctly executed, but they seem to all be failing. You should check why jobs are failing on the form view of the jobs.

What do you mean by?

But still, no processes are really worked on.

Note aside: you should either put more Odoo workers, either put less places in the job runner than 5 (5 is too much for Magento IMO), otherwise the job runner can use the 5 Odoo workers and it will struggle to answer to HTTP requests.

@aignermi
Copy link
Author

What I mean by "no process is really worked on" is the following: I had a failed process, which I requeued and this is the log output:

2015-09-24 12:48:08,945 8151 WARNING ? openerp.addons.connector.jobrunner.channels: unknown channel root.magento, using root channel for job cbcb1057-d147-4f12-b75a-9c0267bb4891
2015-09-24 12:48:08,945 8151 DEBUG ? openerp.addons.connector.jobrunner.channels: job cbcb1057-d147-4f12-b75a-9c0267bb4891 marked failed in channel root(C:5,Q:0,R:0,F:3)

... (after I requeued this job I get this message in the log)

2015-09-24 12:55:15,958 8151 WARNING ? openerp.addons.connector.jobrunner.channels: unknown channel root.magento, using root channel for job cbcb1057-d147-4f12-b75a-9c0267bb4891
2015-09-24 12:55:15,958 8151 DEBUG ? openerp.addons.connector.jobrunner.channels: job cbcb1057-d147-4f12-b75a-9c0267bb4891 marked pending in channel root(C:5,Q:169,R:1130,F:121)

and the job is still not running and in status "Pending" (and no other job is worked on too).

@aignermi
Copy link
Author

Ok, well it seems that the jobrunner is executing jobs, but I found a new error message in the server-log:

2015-09-24 20:55:00,616 21644 DEBUG XXX openerp.modules.registry: Multiprocess signaling check: [Registry - old# 2 new# 2] [Cache - old# 3671 new# 3671]
2015-09-24 20:55:00,621 21644 INFO XXX werkzeug: 127.0.0.1 - - [24/Sep/2015 20:55:00] "GET /connector/runjob?db=XXX&job_uuid=59126664-fdf6-442c-bf87-19e01ea8e59f HTTP/1.1" 500 -
2015-09-24 20:55:00,621 21637 DEBUG ? urllib3.connectionpool: "GET /connector/runjob?db=XXX&job_uuid=59126664-fdf6-442c-bf87-19e01ea8e59f HTTP/1.1" 500 291
2015-09-24 20:55:00,628 21644 ERROR XXX werkzeug: Error on request:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/werkzeug/serving.py", line 177, in run_wsgi
execute(self.server.app)
File "/usr/lib/python2.7/dist-packages/werkzeug/serving.py", line 165, in execute
application_iter = app(environ, start_response)
File "/usr/lib/python2.7/dist-packages/openerp/service/wsgi_server.py", line 216, in application
return application_unproxied(environ, start_response)
File "/usr/lib/python2.7/dist-packages/openerp/service/wsgi_server.py", line 202, in application_unproxied
result = handler(environ, start_response)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 1290, in call
return self.dispatch(environ, start_response)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 1264, in call
return self.app(environ, start_wrapped)
File "/usr/lib/python2.7/dist-packages/werkzeug/wsgi.py", line 579, in call
return self.app(environ, start_response)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 1435, in dispatch
result = ir_http._dispatch()
File "/usr/lib/python2.7/dist-packages/openerp/addons/crm/ir_http.py", line 13, in _dispatch
response = super(ir_http, self)._dispatch()
File "/usr/lib/python2.7/dist-packages/openerp/addons/base/ir/ir_http.py", line 177, in _dispatch
return self._handle_exception(e)
File "/usr/lib/python2.7/dist-packages/openerp/addons/base/ir/ir_http.py", line 147, in _handle_exception
return request._handle_exception(exception)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 666, in _handle_exception
return super(HttpRequest, self)._handle_exception(exception)
File "/usr/lib/python2.7/dist-packages/openerp/addons/base/ir/ir_http.py", line 173, in _dispatch
result = request.dispatch()
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 684, in dispatch
r = self._call_function(*_self.params)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 310, in _call_function
return checked_call(self.db, *args, *_kwargs)
File "/usr/lib/python2.7/dist-packages/openerp/service/model.py", line 113, in wrapper
return f(dbname, _args, *_kwargs)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 307, in checked_call
return self.endpoint(_a, *_kw)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 803, in call
return self.method(_args, *_kw)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 403, in response_wrap
response = f(_args, *_kw)
File "/opt/openerp-custom/addons/connector/connector/controllers/main.py", line 97, in runjob
retry_postpone(job, unicode(err))
File "/opt/openerp-custom/addons/connector/connector/controllers/main.py", line 53, in retry_postpone
job.postpone(result=message, seconds=seconds)
File "/opt/openerp-custom/addons/connector/connector/queue/job.py", line 592, in postpone
self.eta = timedelta(seconds=eta_seconds)
TypeError: unsupported type for timedelta seconds component: NoneType

Do you have an idea whats going on here?

@guewen
Copy link
Member

guewen commented Sep 25, 2015

Seems same than OCA/connector#102. It has been fixed in the release 3.2.0 of OCA/connector (OCA/connector#103)

@aignermi
Copy link
Author

Perfect, that worked!

But I now have a problem syncing products:

Traceback (most recent call last):
File "/opt/openerp-custom/addons/connector/connector/controllers/main.py", line 81, in runjob
job.perform(session)
File "/opt/openerp-custom/addons/connector/connector/queue/job.py", line 467, in perform
self.result = self.func(session, _self.args, *_self.kwargs)
File "/opt/openerp-custom/addons/connector-magento/magentoerpconnect/unit/import_synchronizer.py", line 379, in import_record
importer.run(magento_id, force=force)
File "/opt/openerp-custom/addons/connector-magento/magentoerpconnect/unit/import_synchronizer.py", line 225, in run
self._after_import(binding)
File "/opt/openerp-custom/addons/connector-magento/magentoerpconnect/product.py", line 584, in _after_import
image_importer.run(self.magento_id, binding.id)
File "/opt/openerp-custom/addons/connector-magento/magentoerpconnect/product.py", line 358, in run
binary = self._get_binary_image(images.pop())
File "/opt/openerp-custom/addons/connector-magento/magentoerpconnect/product.py", line 339, in _get_binary_image
binary = urllib2.urlopen(request)
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 448, in error
return self._call_chain(_args)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(_args)
File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 500: Internal Server Error

@guewen
Copy link
Member

guewen commented Sep 25, 2015

Seems to be an error returned by Magento when Odoo tries to download an image.

@aignermi
Copy link
Author

Well thank you very much for your support, the jobrunner is working like a charm now!

For everyone who does not directly understand the howto, here are the steps that it took me to migrate to the jobrunner:

  1. change workers setting in openerp-server.conf:
    my setting is now:
workers = 10
  1. change the startup script in /etc/init.d/odoo:
    add (choose whatever count fits best for you):
export ODOO_CONNECTOR_CHANNELS=root:3
export ODOO_CONNECTOR_PORT=8069

and change the start-stop-daemon to:

start-stop-daemon --start --quiet --pidfile $PIDFILE --chuid $USER:$USER --background --make-pidfile --exec $DAEMON -- --config $CONFIG --logfile $LOGFILE --load=web,connector
  1. disable "Enqueue Jobs" cron:
    go to Settings->Automation->Scheduled Actions and deactivate Enqueue Jobs

  2. update to 3.2.0 of OCA/connector

@guewen
Copy link
Member

guewen commented Sep 25, 2015

Great! Thanks for reporting back what you did.

@guewen guewen closed this as completed Sep 25, 2015
@l-arnold
Copy link

What is the "Add" declaration about: (as in)
add (choose whatever count fits best for you):

Example might be (5) or (50 ?)

@guewen
Copy link
Member

guewen commented Sep 25, 2015

@l-arnold you may find what you are looking for on http://odoo-connector.com/guides/jobrunner.html

@l-arnold
Copy link

Thx @guewen for the context. JobRunner link says "obviously not for production purposes". Is that still the case?

@l-arnold
Copy link

My links above had to do with some "renaming" that had been done in a project and Michael's outline made a good outline of the different compents that are "interlaced" to get the server running properly.

Add to that my test Connector stopped working the other day and this seemed potentially a path towards a fix.

@sbidoul
Copy link
Member

sbidoul commented Sep 25, 2015

Thx @guewen for the context. JobRunner link says "obviously not for production purposes". Is that still the case?

The rendering of the documentation page is not optimal. What is not for production purpose is "running with the threaded server". The jobrunner is perfectly fit for production.

@l-arnold
Copy link

Thanks @sbidoul Testing this on my side.

I've gotten a generally clean start now. I had to remove (from Jobrunner Page) "–workers > 1" which Michael also does not declare but does declare in openerp-server.conf. (workers = 4 in my testing)

The problem I have is that I have 2 "requeued" jobs now, but they do not ever start. I will go back to see if any other "addons" need to be set for jobrunner. Using a Setup likely from July, but with current Git Clones.

It is possible that a "requeue" is not possible as it had been created "pre jobrunner setup". I just created a NEW JOB however and getting the same result.

Likely need to look at the logs. Any other ways to check JobRunner's functioning?

UPDATE
I can see "attempts" being made, as "Execute only After" xyz time, will move back 10 Minutes at a Step.

@sbidoul
Copy link
Member

sbidoul commented Sep 25, 2015

I've gotten a generally clean start now. I had to remove (from Jobrunner Page) "–workers > 1"
which Michael also does not declare but does declare in openerp-server.conf.
(workers = 4 in my testing)

Hm. Indeed --workers > 1 is not a valid command line option for Odoo. The doc just means you must use a value greated than 1.

Likely need to look at the logs. Any other ways to check JobRunner's functioning?

Looking at the log with the debug option mentioned by @guewen above is the way to go.

A frequent issue is when Odoo is forced-shutdown. In such a situation, running jobs are killed while the runner has no way to notice it, so they stay in the started state when they are actually not running.
Such jobs must be requeued.

@l-arnold
Copy link

Not really getting to "started". Just "enqueued".

I assume there were no new dependencies that needed to be installed for JobRunner to function. Is Jobrunner a component of the Connector or a component of the Odoo Base?

My current setup has obscured Logs so that will take me a bit to provision.

@l-arnold
Copy link

Previously I had not had the following as part of my .init script.
--load=web,connector
Can I put this rather in my openerp-server.conf file as workers is also loaded?
(UPDATE) I believe these have been loading all along as part of my AddOns. Not seeing if these are "jobrunner" specific declarations.

@sbidoul
Copy link
Member

sbidoul commented Sep 25, 2015

@l-arnold I've never tried to put --load=web,connector in the config file.

Jobs that stay enqueued after a forced-shutdown must be requeued too.

@l-arnold
Copy link

@sbidoul A: I will work on some soft Shutdowns.
the Jobs actually say "pending" rather than enqued (will work on that font trick of yours).

Getting some different response now. Looks like I have either a Protocol Error (can I run over https ?) or a XMLRPC Error

(when set to http)
ProtocolError: ProtocolError for www.mydomain.com/index.php/api/xmlrpc: 404 Not Found

(when set to https)
A network error caused the failure of the job: [SSL: UNKNOWN_PROTOCOL] unknown protocol (_ssl.c:581)

@l-arnold
Copy link

@sbidoul (Hadn't looked at the Magento Server).. In ouch mode. Restarting. Most probably the issue, though need to see what is underlying that.

Your Guidance on those details will be helpful moving forward. The jobrunner is much much faster to my view. I will explore other Updates that may have come from the code.

Seems to have solved it. Everything worked through except one order showed up with a New Payment Option (need to fix). So we are getting there.

Resolution:

  • Go to 'Sales > Configuration > Sales > Customer Payment Method
  • Create a new Payment Method with name 'paypal_express'
    -Eventually link the Payment Method to an existing Workflow Process or create a new one.

Shouldn't this be able to run over HTTPS ?

UPDATE: I need to update my Server Cert on Magento, thus I am running the connector for test purposes over HTTP. I expect the error over HTTPS is generated by CERT error. Not completely clear however.

Kudos to all for a remarkable system.

jcoux pushed a commit to camptocamp/connector-magento that referenced this issue May 14, 2019
[BIZ-2512] fix file parameter value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants