Smarter automatic deduplication

Automatic deduplication works well (#25), however, when duplicates are found and removed, the datastore table and the resource file are no longer in sync.

Smarter dedup can be handled three ways. When dupes are found:
1. Stop the DP+ job and show the dupe error in the Datastore tab.
2. Replace the resource file with the dedupped CSV.
3. Take advantage of `qsv dedup`'s `--dupes-output` option and create two new resources - RESOURCENAME_dupes.csv and RESOURCENAME_dedupped.csv which are pushed to the Datastore. The original resource with dupes is NOT pushed.  The Data Publisher can then just use the CKAN interface to manage which resource to keep (e.g. delete the original and the _dupes resources; rename the _dedupped resource, removing the _dedupped suffix.) 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smarter automatic deduplication #31

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Smarter automatic deduplication #31

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions