Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ckanext-datajson - Error decoding JSON object - spatial? #3549

Closed
3 tasks
jbrown-xentity opened this issue Nov 19, 2021 · 20 comments
Closed
3 tasks

ckanext-datajson - Error decoding JSON object - spatial? #3549

jbrown-xentity opened this issue Nov 19, 2021 · 20 comments
Assignees
Labels
bug Software defect or bug CKAN component/catalog Related to catalog component playbooks/roles

Comments

@jbrown-xentity
Copy link
Contributor

jbrown-xentity commented Nov 19, 2021

Not sure how necessary it is to fix this, but wanted to track somewhere...

Current deployment, are able to harvest invalid spatial metadata with ckanext-datajson. See example here, spatial value POLYGON ((-125 49, -67 49, -67 25, -125 25)) does not conform to dcat-us spec.

New CKAN2.9 on cloud.gov throws an error, see harvest logs on 11/19 like ckan.logic.ValidationError: None - {'spatial': ['Error decoding JSON object: Expecting value: line 1 column 39 (char 38)']}. It looks to be trying to format the string into a geojson type object (or something else?), but of course that will fail.

How to reproduce

  1. Create harvest source for https://www.usda.gov/sites/default/files/documents/data.json on local catalog
  2. Note harvest failures

Expected behavior

Harvest succeeds (see latest prod harvest here)

Actual behavior

Harvest fails with errors

Sketch

  • Find code that is changing the input value (maybe in a library?)
  • Evaluate if the output from the DCAT-US transformation matches ckanext-spatial expectations
  • Remove/fix
@jbrown-xentity jbrown-xentity added the bug Software defect or bug label Nov 19, 2021
@FuhuXia
Copy link
Member

FuhuXia commented Nov 22, 2021

This is one the the errors captured in ticket #3532 that is not properly handled and crash gather process.

@mogul
Copy link
Contributor

mogul commented Dec 16, 2021

See additional logs in #3597

@nickumia-reisys nickumia-reisys changed the title ckanext-datajson parse changes ckanext-datajson - Error decoding JSON object - spatial? Jan 30, 2023
@nickumia-reisys
Copy link
Contributor

Here's a full example of backtrace related to this error:

    2023-01-27 22:50:02,087 ERROR [ckanext.datajson.datajson] Failed to create package south-carolina-coastal-erosion-study-data-report-for-observations-october-2003-april-2004 from https://data.doi.gov/data.json
   None - {'spatial': ["Error decoding JSON object: Expecting ',' delimiter or ']': line 1 column 41 (char 40)"]}
   File "/home/vcap/deps/1/src/ckanext-spatial/ckanext/spatial/plugin/__init__.py", line 127, in check_spatial_extra
   File "/home/vcap/deps/1/python/lib/python3.8/site-packages/simplejson/__init__.py", line 525, in loads
     geometry = json.loads(extra.value)
     return _default_decoder.decode(s)
     obj, end = self.raw_decode(s)
   File "/home/vcap/deps/1/python/lib/python3.8/site-packages/simplejson/decoder.py", line 372, in decode
   File "/home/vcap/deps/1/python/lib/python3.8/site-packages/simplejson/decoder.py", line 402, in raw_decode
     return self.scan_once(s, idx=_w(s, idx).end())
   Traceback (most recent call last):
   During handling of the above exception, another exception occurred:
   File "/home/vcap/deps/1/bin/ckan", line 8, in <module>
     sys.exit(ckan())
     return self.main(*args, **kwargs)
   File "/home/vcap/deps/1/python/lib/python3.8/site-packages/click/core.py", line 782, in main
   File "/home/vcap/deps/1/python/lib/python3.8/site-packages/click/core.py", line 829, in __call__
     rv = self.invoke(ctx)
   File "/home/vcap/deps/1/python/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
     return _process_result(sub_ctx.command.invoke(sub_ctx))
   File "/home/vcap/deps/1/python/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
   File "/home/vcap/deps/1/python/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
     return callback(*args, **kwargs)
     return ctx.invoke(self.callback, **ctx.params)
   File "/home/vcap/deps/1/python/lib/python3.8/site-packages/click/core.py", line 610, in invoke
   File "/home/vcap/deps/1/src/ckanext-harvest/ckanext/harvest/cli.py", line 249, in fetch_consumer
     utils.fetch_consumer()
   File "/home/vcap/deps/1/src/ckanext-harvest/ckanext/harvest/utils.py", line 355, in fetch_consumer
   File "/home/vcap/deps/1/src/ckanext-harvest/ckanext/harvest/queue.py", line 497, in fetch_callback
     fetch_callback(consumer, method, header, body)
   File "/home/vcap/deps/1/src/ckanext-harvest/ckanext/harvest/queue.py", line 515, in fetch_and_import_stages
     success_import = harvester.import_stage(obj)
     pkg = get_action('package_create')(self.context(), pkg)
   File "/home/vcap/deps/1/python/lib/python3.8/site-packages/ckanext/datajson/datajson.py", line 779, in import_stage
     result = _action(context, data_dict, **kw)
   File "/home/vcap/deps/1/python/lib/python3.8/site-packages/ckan/logic/__init__.py", line 504, in wrapped
   File "/home/vcap/deps/1/python/lib/python3.8/site-packages/ckanext/geodatagov/logic.py", line 524, in package_create
     item.create(pkg)
     return up_func(context, data_dict)
   File "/home/vcap/deps/1/python/lib/python3.8/site-packages/ckan/logic/action/create.py", line 207, in package_create
     self.check_spatial_extra(package)
     raise tk.ValidationError(error_dict, error_summary=package_error_summary(error_dict))
   File "/home/vcap/deps/1/src/ckanext-spatial/ckanext/spatial/plugin/__init__.py", line 130, in check_spatial_extra
ckan.logic.ValidationError: None - {'spatial': ["Error decoding JSON object: Expecting ',' delimiter or ']': line 1 column 41 (char 40)"]}
Exit status 1

I would say that this is a pretty annoying and important bug to fix as agencies are unaware of spatial data not being harvested properly since it is not reported to them from the harvester.

@btylerburton btylerburton self-assigned this Mar 14, 2023
@jbrown-xentity
Copy link
Contributor Author

It looks like we have fixed and provided long term tests for this logic (though I think it would be better served by splitting these 6 assertions into separate tests): https://github.com/GSA/ckanext-geodatagov/blob/main/ckanext/geodatagov/tests/test_update_geo.py#L20-L38
This is testing this logic, such that it is taking the 4 possible inputs from dcat-us and translating into the geojson standard that ckanext-spatial requires.
We should search new relic for any instances of this class of error, and if we don't see any we can possibly close this.

@nickumia-reisys
Copy link
Contributor

Recent occurrences of this issue in NR.

@nickumia-reisys
Copy link
Contributor

nickumia-reisys commented Mar 20, 2023

Unfortunately with the fix merged above on 3/17, this is still an issue... I even verified that the deploy was successful...

image

P.S. If it's any consolation, there might be less errors? 🤷‍♀️

@btylerburton
Copy link
Contributor

Might be worth printing the object in the error case?

@nickumia-reisys
Copy link
Contributor

That sounds like a good idea. I can do that since you're on O&M.

@nickumia-reisys
Copy link
Contributor

With debug statements added, we'll wait for the next occurrence and then handle the errors that we find.

@nickumia-reisys nickumia-reisys self-assigned this Mar 20, 2023
@nickumia-reisys
Copy link
Contributor

nickumia-reisys commented Mar 21, 2023

So.... I don't know how to process these logs, but it seems like some coordinates are just not in an array? Like.. they're just numbers that are command separated and since they're not a formal array, that's why json can't load them? @FuhuXia @jbrown-xentity @jbrown-xentity @btylerburton @Jin-Sun-tts

image

This is my best guess...

image

@jbrown-xentity
Copy link
Contributor Author

That use case is actually a valid use case. See documentation on DCAT-US here, and a sample we test in geodatagov here. Now we don't have a e2e test, where we validate all of this together. We could add that to catalog, but I feel like this test already does that.
I think the stuff you posted above, unless an error follows it that's exactly like the form ckan.logic.ValidationError: None - {'spatial': ["Error decoding JSON object: Expecting ',' delimiter or ']': line 1 column 41 (char 40)"]}, then the code logic is functioning correctly.

@nickumia-reisys
Copy link
Contributor

@jbrown-xentity I think I'm missing something. Given that (1) the input data is valid, (2) the transform logic is working properly, (3) the Error decoding JSON object message never makes anywhere and (4) the output data is valid... why do we have an error?

@jbrown-xentity
Copy link
Contributor Author

Summary: this class of error still occurs. See logs here (and search app_name:catalog-fetch "spatial" "Error decoding JSON object") for examples. Once you find some, try zooming in on a few and then removing the filter text, so that you can see all logs. One example has numbers of the form 10., with a decimal at the end. Since eventually these should be numbers corresponding to lat/longs in a JSON format, maybe we should make sure we don't just input the numbers as given, but actually transform them to a number.
There might be other problems with data entry, and we should examine other errors to see other examples.

@nickumia-reisys
Copy link
Contributor

Just to clarify, the intent with this ticket is to fix the issues in geodatagov before it gets processed by spatial, right? We don't want to make PRs to upstream spatial, right?

@btylerburton
Copy link
Contributor

We don't want to make PRs to upstream spatial, right?

Correct.

@nickumia-reisys
Copy link
Contributor

More errors have been captured and handled. Will revisit on Monday and subsequently the next week if it resurfaces because of another error. I strongly believe this error is a characteristic of the ckan harvesting logic, this would not re-appear in the same way in all harvesting systems. There have been comments added to the code as to why it was necessary to do the pre-processing of the data, if this does need to be handled in a newer version of harvesting, there are clues as to what to consider.

@btylerburton
Copy link
Contributor

Can we close this out, @nickumia-reisys , or do you want to leave open until we confirm we've covered all the cases?

@nickumia-reisys
Copy link
Contributor

It seems like we were able to fix it "completely" this time though? It was happening every day, now it seems like 3 days without errors?
image

@nickumia-reisys
Copy link
Contributor

I was too excited to wait, but this would be technically more than one week without errors 🥳

image

We can wait another few days if we want to..

@nickumia-reisys
Copy link
Contributor

Zero occurrences in the past two weeks 🥂
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Software defect or bug CKAN component/catalog Related to catalog component playbooks/roles
Projects
Status: 🗄 Closed
Development

No branches or pull requests

5 participants