Search not looking into content of PDF files #44

villalbamartin · 2017-11-07T13:04:27Z

The current search does not explore the text of the papers themselves, but only their metadata. As it has been pointed out in issue #39, this is less than ideal.

The search functionality would be greatly enhanced if we could look into the content of the PDF files themselves. This is likely to be quite complicated.

I'm opening this issue as a feature proposal, in order to collect ideas.

knmnyn · 2017-11-07T18:39:04Z

The current best thing to do is try to index the abstracts, some of which are currently being exported into the ACL XML metadata per paper. I suggest doing some development along these lines after the basic problems of ensuring replicability are solved. Cheers, Min

…

-- Min-Yen KAN (Dr) :: Associate Professor :: National University of Singapore :: NUS School of Computing, AS6 05-12, 13 Computing Drive Singapore 117417 :: +65 6516 1885(DID) :: +65 6779 4580 (Fax) :: kanmy@comp.nus.edu.sg (E) :: www.comp.nus.edu.sg/~kanmy (W)

On Tue, Nov 7, 2017 at 9:04 PM, villalbamartin ***@***.***> wrote: The current search does not explore the text of the papers themselves, but only their metadata. As it has been pointed out in issue #39 <#39>, this is less than ideal. The search functionality would be greatly enhanced if we could look into the content of the PDF files themselves. This is likely to be quite complicated. I'm opening this issue as a feature proposal, in order to collect ideas. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#44>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AANP6werHtf6tB3MrwxbulkJMUgsotAMks5s0FVbgaJpZM4QUwZW> .

mjpost · 2019-03-06T21:22:21Z

This should be fixed when the static rewrite goes live, since Google and others will index and link pages on the same host. The static rewrite also adds the abstracts to the paper pages, which should further help.

villalbamartin added the enhancement label Nov 7, 2017

mjpost mentioned this issue Jan 30, 2019

Write XML → BibTeX conversion script #122

Closed

eseyffarth mentioned this issue Feb 7, 2019

Papers out of sync #128

Closed

mjpost added this to To do in Static Rewrite of the Anthology via automation Mar 6, 2019

mjpost assigned mjpost and mbollmann Mar 6, 2019

mjpost added this to the Static Rewrite milestone Mar 6, 2019

mjpost closed this as completed Mar 6, 2019

Static Rewrite of the Anthology automation moved this from To do to Done Mar 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search not looking into content of PDF files #44

Search not looking into content of PDF files #44

villalbamartin commented Nov 7, 2017

knmnyn commented Nov 7, 2017 via email

mjpost commented Mar 6, 2019

Search not looking into content of PDF files #44

Search not looking into content of PDF files #44

Comments

villalbamartin commented Nov 7, 2017

knmnyn commented Nov 7, 2017 via email

mjpost commented Mar 6, 2019