Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Git] origin has been changed in _fix_item method #1016

Open
xiao623 opened this issue Nov 8, 2021 · 1 comment
Open

[Git] origin has been changed in _fix_item method #1016

xiao623 opened this issue Nov 8, 2021 · 1 comment
Labels

Comments

@xiao623
Copy link

xiao623 commented Nov 8, 2021

item['origin'] = anonymize_url(item['origin'])

We can see that origin would be changed in _fix_item method.
Example, if the origin value of origin is https://xxx:xxx@xxx.com, it would been changed to https://xxx.com.
However, the uuid is generated from the origin value of origin:
perceval/backend.py#L424

            'uuid': uuid(self.origin, self.metadata_id(item)),

And if in the next time, we re-run perceval to get all commits (from-date = 1970-01-01) of the same repo but with different url https://xxx2:xxx2@xxx.com, there would be two docs in ES to store the same commit because of the different uuid.

What I want is that there is only one doc in ES to store the same commit of the unique repo (at least the value of origin after _fix_item).

@xiao623
Copy link
Author

xiao623 commented Nov 8, 2021

I think we should try not to change the value of origin in the method _fix_item. But if we must to do that , we need to change the generating rule of uuid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants