Search Enhancement: search bar should search for the preprint if it's not already in the database #89

georgiamoon · 2020-04-15T02:12:15Z

If a user searches for a preprint that is not already in the database, lookup the preprint on various servers and allow the user to request or add a review

IssueHunt Summary

Backers (Total: $100.00)

prereview ($100.00)

Become a backer now!

Or submit a pull request to get the deposits!

Tips

Checkout the Issuehunt explorer to discover more funded issues.
Need some help from other developers? Add your repositories on IssueHunt to raise funds.

sajacy · 2020-04-18T18:30:06Z

A couple things to note:

For both Preprints.org and Research Square, I was not able to find open APIs nor ToS or T&Cs for directly proxying preprint PDFs. How should we proceed with requests for preprints hosted by these types of sites? It seems inadvisable to simply reverse-engineer / scrape the PDF URLs.
The getpreprints repo only actually has EuropePMC implemented, whose catalog is about a day delayed (versus Crossref, for instance).

The sources that have APIs, documentation - which I can get hooked up into both search and resolving when a review is requested:

If there are other places to search, can we add a prioritized list here?

issuehunt-oss · 2020-07-07T21:13:43Z

@prereview has funded $100.00 to this issue.

Submit pull request via IssueHunt to receive this reward.
Want to contribute? Chip in to this issue via IssueHunt.
Checkout the IssueHunt Issue Explorer to see more funded issues.
Need help from developers? Add your repository on IssueHunt to raise funds.

wetneb · 2020-07-29T09:33:51Z

Because the search field says "Search preprints with PREreviews or requests for review by DOI, arXiv ID or title", I expect that if I paste in a DOI in that field, it will fetch the DOI metadata on the fly to display the paper in Prereview, letting me request a review or add one myself. Currently, it will return no results if the paper has not been added to Prereview before.

Recognizing such ids and fetching the corresponding metadata from the relevant services would perhaps be a good first step towards this issue. It is a lot easier than arbitrary search: fetching metadata with a known id is a lot cheaper than searching by free text.

In my experience, querying multiple third-party search APIs to return search results to the user in real time is a bit brittle. We used to do this in https://dissem.in/ and that was pretty slow (for instance Crossref's API can be less reliable at times). We now ingest the sources proactively in our database (which is a challenge of its own given the size of these sources, of course).

TheGuardianWolf · 2020-08-04T01:26:53Z

@harumhelmy this one seems good to start on, I've got a proposal for your search implementation if you've not already considered it.

I'm currently in a company that has implemented search for a product recently by building it from scratch and we found that this was limiting compared to using a third party search API such as one provided by Azure or Google, would you be interested in leveraging these cloud search engines into the application?

How this would work is that you have a document store that is in your database. You submit an index to Azure or Google, and then use their APIs to run your search queries. This provides you with a host of features such as search suggestions and a more powerful search engine.

I would suggest this approach in your project by constructing an index from all the search sources including your data and third party data proactively as @wetneb suggests, submitting it to one of the search services, and querying their API with your search.

You might look into a closer integration into Azure since you are using that platform, the search service can pull and index data from an Azure db without much glue code if you don't mind platform dependence.

TheGuardianWolf · 2020-08-13T10:16:46Z

Looking to work on this issue, has anyone read my proposal above?

harumhelmy · 2020-08-14T19:05:36Z

@TheGuardianWolf sorry for the delay here! I think this might be a good solution! The rub is that we're also separately working into transitioning the site's backend into postgres (it's currently on couchDB). I don't know much about integrating cloud search engines yet, so I'm wondering whether you know how reusable your fix would be reusable with a postgres backend?

TheGuardianWolf · 2020-08-14T20:11:15Z

For this situation, let's imagine I've finished implementing the cloud solution, the end products are:

The indexer for the internal data stored on couchDB
Adapter to reshape 3rd party data from their api into a workable format
The indexer for any 3rd party data
The adapter for the cloud search engine API
The Search UI itself in the frontend

Of these things, the only thing that needs to be rewritten is the indexer for internal data, as you'd need to fetch via sql rather than nosql.

Because you are moving from nosql to sql, I imagine there will be a moderate amount of schema change, I can try to abstract out the data fetching from DB as much as possible in this case to minimse the time spent on rewriting that part.

It would be good to get a bit more information about the current data structure vs the proposed new one along with any existing data indexing processes.

TheGuardianWolf · 2020-08-14T20:17:15Z

Would you be able to give me a working invite to your slack team? The one on the readme seems to be dead :(

I think we could talk about this more effectively via chat

harumhelmy · 2020-08-14T21:36:00Z

@TheGuardianWolf good call: https://join.slack.com/share/zt-gfm50o5z-Web8LW5Xt7c0_3SbmZPoEA

harumhelmy · 2020-08-14T21:38:13Z

I'm logging off for the evening (EDT here), but for a bit more (vagueish) context: the new data structure is still WIP, but we should finalize it on Tuesday, or a little bit after, and in the meantime I'll dig up a spreadsheet that might help with elucidating the current data structure

TheGuardianWolf · 2020-08-15T10:37:09Z

Unfortunately I can't accept a shared channel request, I don't have the paid version of slack!

For the repo, could I suggest ttps://gitter.im for developer discussions? Unless there's a better solution I'm not aware of

harumhelmy · 2020-08-17T21:29:25Z

Sorry for the bad invite 😅 can you try this one: https://join.slack.com/t/prereview/shared_invite/zt-9qpk9pc5-6fsyuI6hwMuenjusPxDTCw

murkatr · 2020-11-19T22:39:34Z

@harumhelmy @rudietuesdays Am I correct to say this issue is now also linked to the New Merge Platform project as issue #14?

rudietuesdays · 2020-11-24T21:10:02Z

@murkatr correct, though some of the implementation of this in the new merged platform is also related to building the API. Either way, it is related to the building taking place in the new merged platform.

cc @harumhelmy

georgiamoon added enhancement New feature or request COVID-19 labels Apr 15, 2020

rudietuesdays added this to To do in COVID-19 response May 12, 2020

murkatr removed this from To do in COVID-19 response Jun 10, 2020

murkatr added this to the M1: Modifications to OSrPRE milestone Jul 3, 2020

georgiamoon added the Mozilla 2020 Sprints label Jul 7, 2020

issuehunt-oss bot added the 💵 Funded on Issuehunt This issue has been funded on Issuehunt label Jul 7, 2020

leonardosfl mentioned this issue Jul 22, 2020

WIP: fix #143 #146

Draft

murkatr mentioned this issue Jul 28, 2020

Enhancement: Improve ResearchSquare Preprint display #143

Open

harumhelmy added the priority label Aug 3, 2020

murkatr added this to In progress in Wellcome Trust API Integrations Sep 2, 2020

rudietuesdays added the question Further information is requested label Oct 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search Enhancement: search bar should search for the preprint if it's not already in the database #89

Search Enhancement: search bar should search for the preprint if it's not already in the database #89

georgiamoon commented Apr 15, 2020 •

edited by issuehunt-oss bot

Loading

Backers (Total: $100.00)

Become a backer now!

Or submit a pull request to get the deposits!

Tips

sajacy commented Apr 18, 2020

issuehunt-oss bot commented Jul 7, 2020

wetneb commented Jul 29, 2020

TheGuardianWolf commented Aug 4, 2020

TheGuardianWolf commented Aug 13, 2020

harumhelmy commented Aug 14, 2020

TheGuardianWolf commented Aug 14, 2020 •

edited

Loading

TheGuardianWolf commented Aug 14, 2020

harumhelmy commented Aug 14, 2020

harumhelmy commented Aug 14, 2020

TheGuardianWolf commented Aug 15, 2020 •

edited

Loading

harumhelmy commented Aug 17, 2020

murkatr commented Nov 19, 2020

rudietuesdays commented Nov 24, 2020

Search Enhancement: search bar should search for the preprint if it's not already in the database #89

Search Enhancement: search bar should search for the preprint if it's not already in the database #89

Comments

georgiamoon commented Apr 15, 2020 • edited by issuehunt-oss bot Loading

Backers (Total: $100.00)

Become a backer now!

Or submit a pull request to get the deposits!

Tips

sajacy commented Apr 18, 2020

issuehunt-oss bot commented Jul 7, 2020

wetneb commented Jul 29, 2020

TheGuardianWolf commented Aug 4, 2020

TheGuardianWolf commented Aug 13, 2020

harumhelmy commented Aug 14, 2020

TheGuardianWolf commented Aug 14, 2020 • edited Loading

TheGuardianWolf commented Aug 14, 2020

harumhelmy commented Aug 14, 2020

harumhelmy commented Aug 14, 2020

TheGuardianWolf commented Aug 15, 2020 • edited Loading

harumhelmy commented Aug 17, 2020

murkatr commented Nov 19, 2020

rudietuesdays commented Nov 24, 2020

georgiamoon commented Apr 15, 2020 •

edited by issuehunt-oss bot

Loading

TheGuardianWolf commented Aug 14, 2020 •

edited

Loading

TheGuardianWolf commented Aug 15, 2020 •

edited

Loading