Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow exceptions #25

Closed
dschaehi opened this issue Mar 23, 2024 · 13 comments
Closed

Allow exceptions #25

dschaehi opened this issue Mar 23, 2024 · 13 comments

Comments

@dschaehi
Copy link

Thanks for the nice pluigin, @ChenglongMa!
In my case, I sometimes have items with the same title but in different types. For example, I have a paper with an accompanying blog post written by the same author. In this case, I'd like to keep both items but don't want them to be identified as duplicate. Is there a way to exclude such items from the Zotero collection "Duplicate Items"?

@ChenglongMa
Copy link
Owner

Hi @dschaehi,

Thanks for your attention on this plugin!

I am presently utilizing Zotero's default duplicate detection method, but I am very happy to implement the new function.

Would you be able to provide me with examples of different items sharing the same title/author?
These examples would be invaluable for debugging the feature.

Thanks!

Best regards,
Chenglong

@dschaehi
Copy link
Author

Hi @ChenglongMa,

Thanks for your reply!
I am happy to share some duplicate items:

Example 1:

  • Journal: Goodfellow, I., Bengio, Y. and Courville, A. (2016). Deep Learning. Cambridge, Massachusetts: The MIT Press. Available from: http://www.deeplearningbook.org/.
    and
  • Book: LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep learning. Nature, 521(7553), pp.436–444.

Example 2:

Example 3:

Example 4:

  • Conferece paper: Ge, X., Lee, J.H., Renz, J. and Zhang, P. (2016). Hole in One: Using Qualitative Reasoning for Solving Hard Physical Puzzle Problems. In: Proceedings of the 22nd European Conference on Artificial Intelligence. ECAI 2016. The Hague, The Netherlands: IOS Press, pp.1762–1763. Available from: http://ebooks.iospress.nl/volumearticle/45020
  • Workshop paper: Ge, X., Lee, J.H., Renz, J. and Zhang, P. (2016). Hole in One: Using Qualitative Reasoning for Solving Hard Physical Puzzle Problems. In: 29th International Workshop on Qualitative Reasoning. Available from: https://ivi.fnwi.uva.nl/tcs/QRgroup/qr16/pdf/QR2016Proceedings.pdf#page=52.

Note:
In example 1, the names of the authors are not even the same.
In example 4, I'd like to keep both versions, as the workshop version is longer than the conference version.

I also attached a zipped Zotero RDF file for your convenience.

Thanks!

Best,
Jae
duplicate_items.rdf.zip

@ChenglongMa
Copy link
Owner

Hi @dschaehi,

Thank you so much for the detailed examples.

I'll check the content of these items and implement the new features.

I'll let you know once I finish.

Cheers,
Chenglong

@ChenglongMa
Copy link
Owner

Hi @dschaehi,

I have investigated the built-in duplication detection method in Zotero.
They have a very sensitive and aggressive strategy to identify duplicates, like:

  1. Compare DOI if available;
  2. Compare ISBN if available;
  3. Compare title and one of authors;

That's why the items in the first example are labeled as duplicates (their titles are the same, and they share the same author: Bengio, Y).

As discussed in Zotero Forums (here and here), the developers seem to be interested but are delaying the development and maintenance of the relevant features.

Then, my plan is:

  1. I will try to refine the strategy and let them evaluate it (hopefully they have time);
  2. Meanwhile, I will add a new feature in Zoplicate that can mark them as not duplicates;

Any thoughts?
Thanks!

@dschaehi
Copy link
Author

Hi @ChenglongMa,

Thanks for putting so much effort into it!
The vanilla strategy is indeed very sensitive.
I like your plan very much. What I could also imagine is to give the users the possibility to customize the search criteria, similar to creating a "saved search" collection in Zotero.

Best,
Jae

@ChenglongMa
Copy link
Owner

Hi @dschaehi,

Thanks for your kind words.

Could you please explain more about the "custom search criteria"? How does it differ from existing features? And how does it relate to duplicate detection functionality?

Thanks!

@dschaehi
Copy link
Author

dschaehi commented Mar 25, 2024

Hi @ChenglongMa,

Sure. What I mean is to allow the user to create search criteria for finding duplicates like "all author names must match", "at least one author name must match", "DOIs must match", because different users have different preferences for finding duplicates. I thought about using a GUI similar to "saved search" in Zotero where you can specify your search criteria for creating a saved search collection (this does not allow you find duplicates though). An alternative way is to allow the users to use JavaScript snippets similar to Better BibTeX (see https://retorque.re/zotero-better-bibtex/exporting/scripting/index.html); by giving few examples of snippets for finding duplicates, people would be able to build their own search criteria.

Does this answer your questions?

Cheers,
Jae

@ChenglongMa
Copy link
Owner

Hi @dschaehi,

That's brilliant! I will try to add these features to this plugin.

Thank you so much!

Best,
Chenglong

@dschaehi
Copy link
Author

Thanks for the new version of the plugin, @ChenglongMa!
In my case, the new version works for duplicates of the same type but not for duplicates of different types. Can you check this? For example, Examples 1 and 2 in #25 (comment) cannot be marked as non-duplicates.

@ChenglongMa
Copy link
Owner

Oh, thanks for your testing, @dschaehi!

I'll check and fix the bug soon.

Thanks!

Best regards,
Chenglong

@ChenglongMa
Copy link
Owner

Hi @dschaehi,

I tested the function as you suggested.

I thought you said you couldn't find the right button in this place:

Snipaste_2024-04-29_21-36-46

You can mark them as non-duplicates by the They are NOT duplicates context menu or editing the Non-duplicates section in the sidebar:

  1. Snipaste_2024-04-29_21-40-42

  2. Go to their collection and click one of them, you can find a Non-duplicates section in the right sidebar:

Snipaste_2024-04-29_21-49-46

Of course, you can also manually select two items and operate them in the context menu:

Snipaste_2024-04-29_21-53-26

In the next version, I will add this button to place of the first screenshot.

Thanks!

Best,
Chenglong

@dschaehi
Copy link
Author

Ah, I see. I didn't know that I could use the right mouse click.
That solved my problem. Now my "Duplicate Items" collection is clean, which gives me a good feeling 😄. Thanks again!

@ChenglongMa
Copy link
Owner

Thank you @dschaehi,

I still think your idea of custom duplicate search and js scripting is amazing.

I'll try to implement it.

Thanks!

Best regards,
Chenglong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants