-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicates the datasets #15
Comments
To check if the remote server supported pagination we were only checking whether the content from a request was the same as the previous one. This is quite fragile as some fields might get updated on each request, eg modified date for realtime data. We are now checking if the guids from a request are the same as the previous ones, which should be more reliable.
@montxo5 That was caused by the harvesters not being careful when checking if two requests had the same contents (to check if the remote server supported pagination). In Madrid's case, there are some real time datasets that got the timestamp updated on each request: <dct:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2014-06-11T03:05:31</dct:modified> Can you update your sources and check if you only get 101 records? |
Tanks for the reply. Sorry, but I didn't understand what do you mean when you say updating my resources. |
I meant doing |
I've updated the ckanext-dcat with git pull and it's still duplicating datasets. |
Did you restart the two harvester consumers? |
You were right, I forgot to restart the consumers... Thanks!! Now works perfectly! |
Glad you got it working! :) |
When I try to harvest this XML-RDF: http://datos.madrid.es/egob/catalogo.rdf
the process inserts the datasets twice. Insted of 101, it appears 202 datasets.
I've also tried whit this one: http://datos.gijon.es/set.rdf
and in this case it works OK.
I think that the problem is with some kind of redirect in the madrid's case. Could it be possible to control this cases?
Thanks in advance!
The text was updated successfully, but these errors were encountered: