Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better debug support for JSON decoding errors #171

Open
WolfgangFahl opened this issue Aug 17, 2021 · 1 comment
Open

Better debug support for JSON decoding errors #171

WolfgangFahl opened this issue Aug 17, 2021 · 1 comment

Comments

@WolfgangFahl
Copy link

WolfgangFahl commented Aug 17, 2021

since i couldn't make anything of the Exception

Expecting property name enclosed in double quotes: line 406865 column 8 (char 11883759)

I got when running a SPARQL query against a Wikidata endpoint i modified the _convertJSON function to allow
to inspect the Json String that is the culprit:

def _convertJSON(self):
        """
        Convert a JSON result into a Python dict. This method can be overwritten in a subclass
        for a different conversion method.

        :return: converted result.
        :rtype: dict
        """
        jsonStr=self.response.read().decode("utf-8")
        try:
            return json.loads(jsonStr)
        except json.decoder.JSONDecodeError as jde:
            jsonFileName="/tmp/sparqlerror.json"
            with open(jsonFileName,"w") as jsonFile:
                        jsonFile.write(jsonStr)
            raise jde

the inspection of the /tmp/sparqlerror.json file revealed:

tail -120 sparqlerror.json
        "type" : "uri",
        "value" : "http://www.wikidata.org/entity/Q183"
      },
      "cityId" : {
        "type" : "uri",
        "value" : "http://www.wikidata.org/entity/Q98225433"
      }
    }, {
      "city" : {
        "xml:lang" : "en",
        "type" : "literal",
        "value" : "Cultural heritage D-4-5834-0143 in Weißenbrunn"
      },
      "cityCoord" : {
        "datatype" : "http://www.opengis.net/ont/geosparql#wktLiteral",
        "type" : "literal",
    SPARQL-QUERY: queryStr=
...

java.util.concurrent.TimeoutException
	at java.util.concurrent.FutureTask.get(FutureTask.java:205)
	at com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:292)

So it looks that there is a timeout and the timeout information is written directly into the json stream and not catched in any other way by the library. So it would be good to be able to inspect the string causing the json decode to choke as a standard feature.

The query that caused this behavior is shown below. It is run "per region" for our purpose and is successful in some 3000 + cases for most of the regions of the world. But for some regions with a very high number of human settlements that are known to wikidata the timeout was observed by us:

regionId regionIsoCode # of settlements
Q980 DE-BY ?
Q18677983 FR-GES 19345
Q21 GB-ENG ?
Q1356 IN-WB 42346

For FR-GES and IN-WB a second attempt was successful while the other two seem to systematically timeout.

# get cities by region for geograpy3
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>

SELECT distinct (?cityQ as ?cityId) ?city ?geoNameId ?gndId ?regionId ?countryId ?cityCoord ?cityPopulation WHERE { 
  VALUES ?hsType {
      wd:Q1549591 wd:Q3957 wd:Q5119 wd:Q15284 wd:Q62049 wd:Q515 wd:Q1637706 wd:Q1093829 wd:Q486972 wd:Q532
  }
  
  VALUES ?region {
         wd:Q980
  }
  
  # region the city should be in
  ?cityQ wdt:P131* ?region.
  # type of human settlement to try
  ?hsType ^wdt:P279*/^wdt:P31 ?cityQ.
  
  # label of the City
  ?cityQ rdfs:label ?city filter (lang(?city) = "en").
   
  # geoName Identifier
  OPTIONAL {
      ?cityQ wdt:P1566 ?geoNameId.
  }

  # GND-ID
  OPTIONAL { 
      ?cityQ wdt:P227 ?gndId. 
  }
  
  OPTIONAL{
     ?cityQ wdt:P625 ?cityCoord .
  }
  
  # region this city belongs to
  OPTIONAL {
    ?cityQ wdt:P131 ?regionId .     
  }
  
  OPTIONAL {
     ?cityQ wdt:P1082 ?cityPopulation
  }

  # country this city belongs to
  OPTIONAL {
      ?cityQ wdt:P17 ?countryId .
  }
}

try it!

I think the library should have a built-in option to analyze the json result further. The hint
This method can be overwritten in a subclass for a different conversion method.
already gives a hint that a specialized version of the standard approach migh be possible. How would the overriding be achieved?

I am willing to create a pullrequest based on the results of the discussion of this issue.

@WolfgangFahl WolfgangFahl changed the title Better debug support for JSON deding errors Better debug support for JSON decoding errors Aug 17, 2021
@WolfgangFahl
Copy link
Author

The same issue arises if an endpoint is not properly set e.g. if you try to use https://query.wikidata.org/ as an endpoint which will return HTML code and not JSON.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant