-
Notifications
You must be signed in to change notification settings - Fork 24
Rework search fields #696
Comments
Note that the "title" field already indexes authors too. But if you type a name such as "Marie Farge" you will have some false positives (for instance because "Marie" and "Farge" would appear in different authors of the same paper): One thing that we could do is to maintain a search index for |
I would suggest to make DOIs searchable, since this is an easy way to find out if a single publication is open access or not. This shouldn't require to much magic, since DOI are in a very specific format. Maybe we could fetch CrossRef-Data in the background in the case, that the publication is not yet in the system or simply redirect to view where we put the DOI directly into the URL. |
This is actually already implemented as Huge 👍 however for having it more easily discoverable and visible in the search forms! |
The trend is definitely to use one universal search field. On the other hand: A researcher should be able to formulate a complex search, but experience tells that this often not the case. Indeed, a complex search is not user friendly. I am not sure what would happen, if we put e.g. all titles and names in one index and present the output to user. I understand the current search logic is a binary search. This search is nice if the result set is small. Making the result set small is done usually by adding more and more restrictions during the search. This works for most bibliographic resources, e.g. you can use author names and words from title and the year, but this is of course not possible for searching by name only. On the other hand, more sophisticated search mechanisms try to search with relevance. However, it does not really affect the results, but their ordering. The idea is not to cut down the amount of results during the search, but to put the presumably best result first. Implementing a relevance search implies giving up the timeline - but the question remains: What is relevant? Omitting the possibility of searching by "author + title" we can do two indexes: One for authors, one for title. The main search form performs a query on both and merges the results. On the results page we then need a one-click-solution for eliminating one of the two sets. The idea is, that most names appear in few titles and we can distinguish between a name and title. Nevertheless, it would be interesting to measure something. Maybe we can try to measure the overlap of of author names with title words (there are some nice and fast algorithms out there) and do some playing, by adding separate indexes and put in the user queries and compare, how it's doing on average without presenting anything to the user. Of course, this would involve way more load on the server. |
👍 So far, what I have in mind would be to have a unified search field on the front page, similar to what they have at http://arxiv.org/ for instance:
I would keep the advanced search form with all of the dedicated fields. The remaining issue there is "how to handle the search form on the results page?". This is somewhat a detail for now though. Concerning the unified search:
|
I am not sure how easy it is to separate the author words from the title words in a search query, it's an interesting problem, but it is indeed likely to be more expensive than simply searching against a single field where both are indexed, and I am not sure how much we would gain in terms of accuracy in the results. Relevance-based ordering would definitely make sense too (especially given that the publication dates are not so reliable). I personally know very little about search. All I know is that it takes about a week to re-generate a lightweight version of the index (without search publishers and journals). If you want to test any of your proposals at scale, I am happy to deploy other indexes in parallel (for instance by serving them on other ports or domains). We can do that prototyping in parallel with the other import processes as long as we are not touching the production index. |
This is an issue to centralize and discuss what should be done with search field.
From #683 (comment)
I agree with this and I'd be more inclined to have a single search field in the main page to search for both titles and authors. In the end, this is what arXiv is doing (https://arxiv.org) for instance and it works quite well.
From #684 (comment)
That would be an option in the advanced search fields, which would have the nice impact of easing the search logic for us. I think there were two fields in the very beginning of Dissemin (if I remember correctly) and then this was moved to a single author field, isn't it?
From #661 (comment)
The text was updated successfully, but these errors were encountered: