Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataPusher raises exception when updating resource #5597

Closed
vitorbaptista opened this issue Sep 11, 2020 · 13 comments · Fixed by #6137
Closed

DataPusher raises exception when updating resource #5597

vitorbaptista opened this issue Sep 11, 2020 · 13 comments · Fixed by #6137
Assignees

Comments

@vitorbaptista
Copy link
Contributor

CKAN version: 2.9 (commit f2c92c3)
DataPusher version: 45be714c

Describe the bug

Everything runs fine when I create a new dataset resource. However, when I try to update the data on that resource, I get an Internal server error. The logs say:

2020-09-10 22:01:05,018 DEBUG [ckanext.datapusher.plugin] Submitting resource a1b7fa8e-6d34-49b2-9771-ac5651e6c5a7 to DataPusher
2020-09-10 22:01:05,062 DEBUG [ckanext.datapusher.plugin] Skipping DataPusher submission for resource a1b7fa8e-6d34-49b2-9771-ac5651e6c5a7
2020-09-10 22:01:05,228 ERROR [ckan.config.middleware.flask_app] 'NoneType' object has no attribute 'transaction'
Traceback (most recent call last):
  File "/home/vitor/Projetos/okfn/ckan/env/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/vitor/Projetos/okfn/ckan/env/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/vitor/Projetos/okfn/ckan/env/lib/python3.7/site-packages/flask/views.py", line 89, in view
    return self.dispatch_request(*args, **kwargs)
  File "/home/vitor/Projetos/okfn/ckan/env/lib/python3.7/site-packages/flask/views.py", line 163, in dispatch_request
    return meth(*args, **kwargs)
  File "/home/vitor/Projetos/okfn/ckan/env/lib/python3.7/site-packages/ckan-2.10.0a0-py3.7.egg/ckan/config/middleware/../../views/resource.py", line 367, in post
    get_action(u'resource_update')(context, data)
  File "/home/vitor/Projetos/okfn/ckan/env/lib/python3.7/site-packages/ckan-2.10.0a0-py3.7.egg/ckan/logic/__init__.py", line 473, in wrapped
    result = _action(context, data_dict, **kw)
  File "/home/vitor/Projetos/okfn/ckan/env/lib/python3.7/site-packages/ckan-2.10.0a0-py3.7.egg/ckan/logic/action/update.py", line 104, in resource_update
    updated_pkg_dict = _get_action('package_update')(context, pkg_dict)
  File "/home/vitor/Projetos/okfn/ckan/env/lib/python3.7/site-packages/ckan-2.10.0a0-py3.7.egg/ckan/logic/__init__.py", line 473, in wrapped
    result = _action(context, data_dict, **kw)
  File "/home/vitor/Projetos/okfn/ckan/env/lib/python3.7/site-packages/ckan-2.10.0a0-py3.7.egg/ckan/logic/action/update.py", line 344, in package_update
    model.repo.commit()
  File "/home/vitor/Projetos/okfn/ckan/env/lib/python3.7/site-packages/sqlalchemy/orm/scoping.py", line 163, in do
    return getattr(self.registry(), name)(*args, **kwargs)
  File "/home/vitor/Projetos/okfn/ckan/env/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1042, in commit
    self.transaction.commit()
  File "/home/vitor/Projetos/okfn/ckan/env/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 504, in commit
    self._prepare_impl()
  File "/home/vitor/Projetos/okfn/ckan/env/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 474, in _prepare_impl
    stx = self.session.transaction
AttributeError: 'NoneType' object has no attribute 'transaction'

Steps to reproduce

  1. Create a new dataset
  2. Add a CSV resource to the dataset
  3. Modify the resource, removing and submitting the same CSV

Expected behavior

The resource will be updated, with the new data being submitted to the DataStore.

Additional details

This is the same error described in #5173, #5137, and #4751. However, while these issues talk about creating a new resource, this error only happens when updating a resource.

@elrokket
Copy link

elrokket commented Oct 9, 2020

I have the same error, when update dataset with associated resources return this error 2020-10-09 11:27:30,649 ERROR [ckan.views.api] 'NoneType' object has no attribute 'transaction'

any news about this issue? thanks

@jodiegardiner
Copy link

jodiegardiner commented Oct 14, 2020

can confirm we're having the same issue with resource_update via API. We get a 500 status code (but nevertheless, the resource is successfully updated).

2020-10-13 13:33:55,455 ERROR [ckan.views.api] 'NoneType' object has no attribute 'transaction'
Traceback (most recent call last):

File "/usr/lib/ckan/default/src/ckan/ckan/views/api.py", line 291, in action
  result = function(context, request_data)

File "/usr/lib/ckan/default/src/ckan/ckan/logic/__init__.py", line 473, in wrapped
  result = _action(context, data_dict, **kw)

File "/usr/lib/ckan/default/src/ckan/ckan/logic/action/update.py", line 104, in resource_update
  updated_pkg_dict = _get_action('package_update')(context, pkg_dict)

File "/usr/lib/ckan/default/src/ckan/ckan/logic/__init__.py", line 473, in wrapped
  result = _action(context, data_dict, **kw)

File "/usr/lib/ckan/default/src/ckan/ckan/logic/action/update.py", line 344, in package_update
  model.repo.commit()

File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 162, in do
  return getattr(self.registry(), name)(*args, **kwargs)

File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1027, in commit
  self.transaction.commit()

File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 494, in commit
  self._prepare_impl()

File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 464, in _prepare_impl
  stx = self.session.transaction

AttributeError: 'NoneType' object has no attribute 'transaction'

2020-10-13 13:33:55,484 INFO  [ckan.config.middleware.flask_app]  /api/3/action/resource_update render time 1.153 seconds

[pid: 31928|app: 0|req: 6100/54849] 84.203.62.62 () {52 vars in 949 bytes} [Tue Oct 13 13:33:54 2020] POST /api/3/action/resource_update => generated 189 bytes in 1204 msecs (HTTP/1.1 500) 3 headers in 156 bytes (1 switches on core 0)

@rohillion
Copy link

rohillion commented Oct 16, 2020

Any clues of when this will be fixed? We are getting a 500 error after uploading a new CSV to an existing resource.

@jonathansberry
Copy link

jonathansberry commented Oct 30, 2020

I've spent a while looking in to this... As @amercader has already laid out in the PR: #5173

  • the notify method of the datapusher plugin is calling...
  • the _submit_to_datapusher method of the datapusher plugin calling...
  • the datapusher_submit action calling the...
  • update_task_status action which...
  • gets the main session, makes some changes, commits then closes the session

I noticed that we we were closing the session here. However, removing the close statement didn't fix the issue. So I tried using a new session with ckan.model.meta.create_local_session() instead of ckan.model.Session and that appeared to fix the issue. I then started going back through the git history and found this commit showing that in CKAN 2.8 when this feature was stable, we did indeed create a new session back then (which is noticeable as everywhere else we are just using model.Session):

ca4464a

Maybe @smotornyuk could comment on the intent of the above commit and how it checks out against this issue? Whilst I acknowledge that in CKAN we generally shouldn't create new DB sessions, I'm hypothesising that a new session was created here for the purpose of escaping this very bug, but that @smotornyuk wasn't aware of this when making his commit "Fix legacy tests" and it hasn't been picked up by an integration test. Though I could well be wrong and no doubt others will understand better how this commit impacts the rest of the project...

For anyone else (@rohillion, @jodiegardiner, @elrokket) as a stop-gap measure whilst we wait for a solution in a stable core ckan release, we have overridden the 'task_status_update' action from one of our extensions, and implemented this fix there:
fjelltopp/ckanext-unaids@5b5d719

@abedkhooli
Copy link

Any clues of when this will be fixed? We are getting a 500 error after uploading a new CSV to an existing resource.

Also confirming same in 2.9.1

@smotornyuk
Copy link
Member

Yes, I replaced the call to create_local_session because it was the single usage of the local session through the whole codebase, and there was no real distinction of this action from any other, so this line looked really nasty. Even now it looks like an attempt to fix the consequences of a problem, instead of solving its origin, so it would be great to search for alternative solutions(not using local session). BTW, I removed the local session there, because some tests were failing on py3 because of it, so it's more than "just getting this line back" in any case

@jonathansberry
Copy link

jonathansberry commented Nov 10, 2020

@smotornyuk That's great. Thanks for the explanation - also great to know that there are py3 tests failing because of it. We'll just hold onto this as a stop gap fix and watch carefully for a better solution devised by the team.

@jqnatividad
Copy link
Contributor

Also confirming that I'm getting this error as well...

@jqnatividad
Copy link
Contributor

@smotornyuk digging into this a bit, there was a specific reason why create_local_session was explicitly created by @johnglover for task_status_update back in Dec 2011.

Per the commit msg:

Use a local session object in task_status_update so we can commit it at the end of the function

520959c#diff-0d9877c849b4b462a8bdc9337551a3e63ddb1189513bd0e13ed59fc502d9d42c

CKAN was using sqlalchemy 0.7.3 at the time, and sqlalchemy has seen a lot of changes since then in terms of handling sessions...

Perhaps, its not necessarily a "nasty" but necessary thing to "commit it at the end of the function" at the time, and there's now a more elegant way of doing it as the CKAN codebase is being modernized?

cc @jonathansberry @abedkhooli @rohillion @jodiegardiner @elrokket @vitorbaptista

@jqnatividad
Copy link
Contributor

@smotornyuk , just wondering if the root cause for using local session has been found, and if not, if it will be reverted back to using a local session in the meantime, so it can be part of 2.9.2...

@ghost
Copy link

ghost commented Apr 16, 2021

Hey, I'm using 2.9.2 and facing the exact same issue. Is it going to be fixed in the next release?

@jqnatividad
Copy link
Contributor

Any ETA on a fix for this? I can confirm that the workaround (using LocalSession) works. Has the team considered reverting back in the meantime?

@smotornyuk
Copy link
Member

smotornyuk commented Jun 7, 2021

Opened PR #6137 with a fix

amercader added a commit that referenced this issue Jun 9, 2021
[#5597] Datapusher error during resource_update
amercader added a commit that referenced this issue Jun 9, 2021
Create local session in the DataPusher submit action and pass that to
task_status_update to avoid interferring with the overall
resource_update transaction.

See #6137 for details
fostermh added a commit to cioos-siooc/ckan that referenced this issue May 27, 2022
* [docs-only] Override click version to avoid conflict

* [docs-only] Remove click version from other req files

* [docs-only] Remove click override

* Remove erroneous argument

`--config` isn't a valid option to `rebuild`

* Catch TypeError from invalid thrown by dateutils

Also provide better logs so that it is possible to identify the issue.

* Update search index date tests to throw TypeErrors

* Revert "Catch TypeError from invalid dates thrown by dateutils"

* Update version for 2.9.4b

* [ckan#5597] Fix DataPusher error during resource_update

Create local session in the DataPusher submit action and pass that to
task_status_update to avoid interferring with the overall
resource_update transaction.

See ckan#6137 for details

* Improve documentation: resource_patch

When someone reads "Uploading a new version of a resource file" is not expected that deletes all extras.

* [ckan#6113] Fix user email validator when using name as id parameter

CKAN 2.9 introduced the `email_is_unique` validator to prevent duplicates. But the
current implementation only works when the id param contains the actual UUID not
the name.

* Update ckan (paster) commands for ckan 2.9.x

* Update ckan (paster) commands for ckan 2.9.x

* [I18N] [FIX ckan#5586] Don't cache license translations across requests

* Licenses are read and cached in the package model
* Translate the license in the helper for each request.

* Use CKANRequest object implicitly in templates

* Test to ensure that we are using CKANRequest in templates

* fix test

* fix pep8 test...

* [ckan#5998] db init error in alembic

* ckan#6086 fix sqlalchemy configuration for datastore

* render activity timestamps with title= attribute

* Fixture for plugin DB migrations

* Remove unnecessary imports

* Add pending-migrations

* Drop extra tables after tests

* update argument

* Fix py3 security alerts

* Remove blogspam link from the documentation footer

* [ckan#6162] fix pagination links for custom org types

* [ckan#6179] Fix exception when using solr_user and solr_password on Py3

Fixes ckan#6179

Use the base64 module to encode the Solr Basic Auth header to avoid
bytes/str exception on Python 3

* [ckan#6179] Encoding definition

* [ckan#6181] Fix page render errors when search facets are not defined

* fix guard clause on has_more_facets, ckan#6190

* restore Page import so that other modules can refer to it via 'h.Page', ckan#6190

* [ckan#6207] Fix for g.__timer

* Remove not accessed user object

* Restore user object to _get_action

* Remove not accessed variables

* [ckan#2317] fix default ordering consuming resource content from datastore

* [ckan#2317] added changelog fragement

* [ckan#2317] made default sort by _id less aggressive

* [ckan#2317] restrict default order to non distinct case

* [2317] accepted proposed change in changelog

Co-authored-by: Ian Ward <ian@excess.org>

* [ckan#6159] Add and Update Docstring for existing functions

* Add doctring to `chained_action` function
* Update `chained_auth` docstring
* update `chained_helper` docstring

* solve conflict

* fix pep8 error

* update `chained_helper` and `chain_action` doctsting

* Fix documentation typo

* [ckan#6270] Typo in default_package_sort config example

* DataPusher and XLoader documented in Source docs

* Update install-from-source.rst

* fixes HTTP basic auth cred handling

* [ckan#6319] Fix datetime formatting when listing user tokens on py2.

* Fix typo

* coerce query string keys/values before passing to quote()

* ensure fallback behaviour matches default

* remove unused import

* reversed function used

* failing test case for validation error list position

* Revert "[ckan#4801] Simplify even more the erros handling in navl"

This reverts commit 7b6702b.

* Revert "[ckan#4801] Refactor processing of errors in NAVL"

This reverts commit a632078.

* Revert "[ckan#4801] Refactor processing of errors in NAVL"

This reverts commit a632078.

* Remove valid lists from error output, add tests

* [ckan#6340] Handle Traceback Exception for HTTP and HTTP status Code in logging

The traceback for HTTP Exception will be added only when debug is true

* [ckan#6340] Add the changelog

* [ckan#6340] Coding Standard

* [ckan#6340] Change the error level to debug/info

* fix for render_datetime

* [ckan#6051] Allow UTF-8 in JS translations

See issue ckan#6051

* Fix runaway preview height

* Fix unpriviledged users being able to access bulk process

Any user was able to access and view the bulk process page for organization
management. This fix checks access and returns unauthorized exception
if theuser shouldn't be there.

* Move auth check to _prepare method

Previously implementation only worked for GET requests, now works with
GET and POST requests

* Use bulk_update_public to check access

Was using group_update. Also removed duplicate check from POST request

* Update tests

* fixing access atom feed for a deleted organization

* fixing access atom feed for a deleted group

* Call to defaul action to allow links to be opened in new tab

* Switch to JQuery recommended method

* [ckan#6051] Py2/py3 compatible version of open

* Remove leftover paste import

* Coding standards

* Rebuild requirements.txt to sync it with requirements.in

* Fix io.open call in lib/i18n.py

* Pin watchdog on py3 so it installs a compatible version

* Use codecs instead of io

* Fix bad merge of ckan#6149

* Rename lib/io.py module which was giving problems

* Rename __init__.py file in extension

* Show job title on job start/finish log messages

To make it easier to debug background job calls.

Before:

```
INFO  [ckan.lib.jobs] Worker rq:worker:f0792c8bd67344f288b5704d39c43124 starts job 2baa42e5-4582-4103-92e5-b4a384d0b1da from queue "default"
```

After:

```
INFO  [ckan.lib.jobs] Worker rq:worker:f0792c8bd67344f288b5704d39c43124 starts job 2baa42e5-4582-4103-92e5-b4a384d0b1da (Process data fields) from queue "default"
```

* Add missing __init__.py file

* String literals

* snippet names rendered in non-debug mode

* Update changelog for 2.9.4

* Build frontend

* [i18n] Pull po files from Transifex

* [i18n] Compile mo files

* Upgrade version for 2.9.4

* Update version for 2.9.5b

* Consistent cli behavior

* pep8

* Py2 compatible fix for ckan#6135

* [ckan#6390] fix user create/edit email validators

* Allow strict types for user/group uploads

CKAN 2.9 specific changes when cherry-picking:

* Replace f-strings with .format()
* Don't use faker / Pillow for tests, as there is no faker fixture in
the Python 2 version

* Add changelog entry for group image types

* Move type verification into upload method

* Fix APIToken CLI test

* Update docs

* Link to config options from changelog

* Allow children for select2

* Fix children type

* [ckan#6531] Py2/py3 compatible version of open

* Add select2 features

* Undo change

* Replace f-string

* Fix standards

* [ckan#6530] Add Solr 8 support

* Set logging level to error in error mail handler

* Add RootPathMiddleware to flask stack to support non-root installs running on python 3

* Add previously removed RootPathMiddleware back to common middleware as it is still needed

* Added utility functions for common CKAN admin commands.

* Use correct auth function when editing organizations

* [ckan#5820] fix invite user with existing email error

* Fix regression when validating resource subfields (by @TomeCirun)

* [ckan#6408] Add timeout param to request get calls (by @EricSoroos)

* [ckan#6408] Document new options

* Accept empty string in one of validator

* Negate empty string check

* Fix pep8

* [i18n] Pull translation from Transifex

* [i18n] Compile mo files

* Compile frontend

* Small fix adding virtual env path to ckan command.

* Update changelog before 2.9.5

* Include the Solr 8 schema file in the 2.9 branch

* Update version for 2.9.5

* Updates to ckan_utils.sh.

* move spatial harvester into ckanext-cioos_harvest extension and allow POST requests to the spatial search api endpoint in the spatial api

* document spatil harvester config

* update submodules

* create dev branch in submodules and update

* add compile css command

* upgrade solr to 8.11.1

* update

* update all submodules to latest dev version

* update submodules again

* update schema

* Updated submodule contrib/docker/src/ckanext-cioos_theme

* Updated submodule contrib/docker/src/ckanext-spatial

* fix a few integration bugs

* update

* Updated submodule contrib/docker/src/ckanext-cioos_theme

* update translations

* fix bugs, update logos

* Updated submodule contrib/docker/src/ckanext-scheming

* add atlantic eov icons

* Updated submodule contrib/docker/src/ckanext-cioos_theme

* add collapsed option to indicate how truncated fields initially load

* add wasRevisionOf to schema.org profile output

* release all submodules -  merge dev into main

* remove geoview pip file from dockerfile

Co-authored-by: amercader <amercadero@gmail.com>
Co-authored-by: Chris Wood <wood-chris@users.noreply.github.com>
Co-authored-by: Ken Tsang <ken.tsang@digital.cabinet-office.gov.uk>
Co-authored-by: Jari Voutilainen <jari.voutilainen@iki.fi>
Co-authored-by: Guillermo Ferrer Bosque <22233599+guibos@users.noreply.github.com>
Co-authored-by: Carlo Cancellieri <geo.ccancellieri@gmail.com>
Co-authored-by: Eric Soroos <eric@derilinx.com>
Co-authored-by: Alexandr Cherniavskyi <cherniavskyi.alexandr@linkdigital.com.au>
Co-authored-by: etj <etj@geo-solutions.it>
Co-authored-by: chris48s <chris.shaw480@gmail.com>
Co-authored-by: Sergey Motornyuk <sergey.motornyuk@linkdigital.com.au>
Co-authored-by: robbi5 <maxi@richt.name>
Co-authored-by: ThrawnCA <shell_layer-github@yahoo.com.au>
Co-authored-by: pdelboca <patriciodelboca@gmail.com>
Co-authored-by: Christian Buck <christian.buck@exxcellent.de>
Co-authored-by: wrinklenose <67383107+wrinklenose@users.noreply.github.com>
Co-authored-by: Ian Ward <ian@excess.org>
Co-authored-by: steveoni <steohenoni2@gmail.com>
Co-authored-by: Andres Vazquez <andres@data99.com.ar>
Co-authored-by: Gauravp-NEC <66116382+Gauravp-NEC@users.noreply.github.com>
Co-authored-by: Erin Doherty <erin.doherty@team.telstra.com>
Co-authored-by: Niall McAndrew <niall@mediasuite.co.nz>
Co-authored-by: Sunny-NEC <t_sunny.malik@india.nec.com>
Co-authored-by: Shubham Mahajan <mr.shubhammahajan@gmail.com>
Co-authored-by: Tome Cirun <cirun@live.com>
Co-authored-by: jescgom <jescgom@arte-consultores.com>
Co-authored-by: franz osorio <f.osorio@zbw.eu>
Co-authored-by: Dan Mihaila <danmihaila@gmail.com>
Co-authored-by: Francesco Frassinelli <francesco.frassinelli@nina.no>
Co-authored-by: Teemu Erkkola <teemu.erkkola@gofore.com>
Co-authored-by: Jeff Cullis <jcullis@dal.ca>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants