More content extraction #10

jonnybazookatone · 2015-04-07T13:50:01Z

This is a general comment on content extraction. It should be possible to extract more sensical content from files that are ingested. For example, PDF, OCR and TXT files do not differentiate between their content unlike HTML and XML files. Once a schema is in place, it can be applied to all of the simple-text files that are extracted.

jonnybazookatone added the enhancement label Apr 7, 2015

jonnybazookatone added the low priority label Apr 17, 2015

jonnybazookatone self-assigned this Apr 23, 2015

jonnybazookatone added this to the Smart extraction for all types milestone Apr 23, 2015

jonnybazookatone removed their assignment Jul 20, 2016

marblestation removed this from the Smart extraction for all types milestone Feb 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More content extraction #10

More content extraction #10

jonnybazookatone commented Apr 7, 2015

More content extraction #10

More content extraction #10

Comments

jonnybazookatone commented Apr 7, 2015