Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplication control #255

Open
anderspeders opened this issue Jun 2, 2017 · 1 comment
Open

Duplication control #255

anderspeders opened this issue Jun 2, 2017 · 1 comment

Comments

@anderspeders
Copy link

We might need to have some type of duplication control as part of the upload process for ResourceProjects.

@mattfullerton
Copy link
Contributor

mattfullerton commented Jun 6, 2017

Well... 🙂 we do... new companies and projects get flagged as being potential duplicates and can then be merged together, flagged as duplicates to be ignored or treated as independent (not duplicates). The system is also designed to maintain the knowledge for future imports.

However, the tough part is not flagging too many entries as duplicates (programming side), and also taking the time to do the reconciliation (human side). And there is perhaps the question of whether some of this can be moved into a pre-import step - at least for company names this sounds like something very plausible - mapping a company name to a fixed single company name and company group if appropriate by checking against our existing data or opencorporates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants