Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate relations after importing aliases #81

Open
Rizziepit opened this issue Nov 4, 2014 · 6 comments
Open

Duplicate relations after importing aliases #81

Rizziepit opened this issue Nov 4, 2014 · 6 comments

Comments

@Rizziepit
Copy link
Contributor

No description provided.

@Rizziepit Rizziepit changed the title Creation of duplicate relations in loader Duplicate relations after importing aliases Nov 10, 2014
@pudo
Copy link

pudo commented Nov 20, 2014

This is -- to some extent -- the code that causes it:

https://github.com/granoproject/grano/blob/master/grano/logic/entities.py#L136

The question is, how does that code decide when to delete duplicate links - because it may want to consider more than just source and target. The only fully logical solution I can see is to load all entities first, then de-dupe and then load relations. But that would be a major refactor.

@Rizziepit
Copy link
Contributor Author

Would it not be possible to merge relations based on the uniqueness constraints in the schemata?

@pudo
Copy link

pudo commented Nov 20, 2014

Hm, but the uniqueness constraints aren't actually in the schema; they're in the loaders. Which may be a problem anyway: if the schema knew about de-dupe, we could just POST whole objects without checking for them first, which would halve the number of HTTP requests we need to do to load a dataset.

@Rizziepit
Copy link
Contributor Author

I was thinking of something along the lines of a grano command that takes the schema file as an argument and de-dupes the relations.

What are good reasons for keeping grano ignorant of uniqueness constraints? Simpler code?

@pudo
Copy link

pudo commented Nov 20, 2014

Well there could be different uniqueness constraints for different data sources, but that actually seems more like a bug now that I think of it.

@Rizziepit
Copy link
Contributor Author

Perhaps for now I can add a relation de-duping command to granoloader. It should be able to merge relations efficiently enough by paging through relations ordered by unique fields

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants