We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Due to the large WDQS server lag throughout the month, duplicate detection mechanisms relying on WDQS being up to date fail in droves.
This needs fixing, and since I cannot easily fix the source, it will have to be the symptoms.
Here is a query that finds PMIDs (filtered by publication date, to avoid a timeout) that occur more than once on Wikidata:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX hint: <http://www.bigdata.com/queryHints#> SELECT DISTINCT ?value (COUNT(DISTINCT ?item) AS ?ct) (GROUP_CONCAT(DISTINCT STRAFTER(STR(?item), "/entity/"); SEPARATOR = ", ") AS ?items) (GROUP_CONCAT(DISTINCT ?title; SEPARATOR = "/// ") AS ?titles) WHERE { VALUES (?earliest) { ("2017-12-01T00:00:00Z"^^xsd:dateTime) } VALUES (?latest) { ("2031-12-31T00:00:00Z"^^xsd:dateTime) } ?item wdt:P577 ?date_time. hint:Prior hint:rangeSafe "true"^^xsd:boolean. FILTER(?date_time >= ?earliest) FILTER(?date_time <= ?latest) ?item wdt:P698 ?value. ?item wdt:P1476 ?title. } GROUP BY ?value ?ct ?items ?titles HAVING (?ct > 1) ORDER BY DESC(?ct) LIMIT 100000
The text was updated successfully, but these errors were encountered:
That query currently gives 20150 results, so I will keep an eye on it for a day or so to see how it develops, and then start some batches for merging.
Sorry, something went wrong.
Current number is 20183.
Current number is 20179, so it seems someone has cleaned things up a bit.
A fix batch is running: https://tools.wmflabs.org/quickstatements/#/batch/4962 .
That batch has finished, and the number of such duplicates right now is 428.
The current number is 2805, so we probably need a new batch run soon.
Nor results right now, even for a LIMIT of 1.
No branches or pull requests
Due to the large WDQS server lag throughout the month, duplicate detection mechanisms relying on WDQS being up to date fail in droves.
This needs fixing, and since I cannot easily fix the source, it will have to be the symptoms.
Here is a query that finds PMIDs (filtered by publication date, to avoid a timeout) that occur more than once on Wikidata:
The text was updated successfully, but these errors were encountered: