New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for TIKA-1787 : NamedEntityParser #61
Conversation
Add OpenNLPNERecogniser as default
|
||
import java.io.InputStream; | ||
import java.nio.charset.StandardCharsets; | ||
import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove star imports
@thammegowda great work! See my comments and please update thank you |
+ Removed star imports + Removed dead code / commented code + Added License header to missing files
@chrismattmann Thanks for the feedback. Issues Resolved! |
file.getParentFile().mkdirs() | ||
inStream = urlConn.getInputStream() | ||
outStream = new FileOutputStream(file) | ||
//IOUtils.copyLarge(inStream, outStream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thammegowda can you remove this line? commented code.
one more minor update @thammegowda and this is ready to go! |
@thammegowda can you also write up a quick tutorial on http://wiki.apache.org/tika/TikaAndNER ? that shows how to install Stanford NER and run this? |
you will need wiki karma so let me know your username and I'll grant you karma. |
@chrismattmann Sure thing. I might have missed few such comments. I will review one more time. Please give me permission to create/edit NER wiki page, my username is "ThammeGowda". |
…ontributed by Thamme Gowda N and Yueheng He this closes #61 this closes #62 git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1714931 13f79535-47bb-0310-9956-ffa450edef68
…ontributed by Thamme Gowda N and Yueheng He this closes apache#61 this closes apache#62 git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1714835 13f79535-47bb-0310-9956-ffa450edef68
…ontributed by Thamme Gowda N and Yueheng He this closes apache#61 this closes apache#62 git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1714931 13f79535-47bb-0310-9956-ffa450edef68
UPDATE : Wiki URL : https://wiki.apache.org/tika/TikaAndNER
Added NamedEntityParser that supports loading of different NER implementations at runtime.
The default NER implementation based on OpenNLP is supplied.
Another implementation based on StanfordCoreNLP is located here This is GNU GPL 3, So kept separate.See UPDATE 2 below@chrismattmann This is not 100% complete, here are few TODOs :
EDIT :
2. Looking for a best way to read parsed text from non text streams within the NamedEntityParser (not sure if a parser can read output of previous parsers like html or pdf). Please suggest me on how to resolve this todoUsing secondary parser to get text contentUPDATE : 1. Added Regex Based NER . Though this can recognize much more patterns than names, (I am using it for recognising weapon names and weapon types )
UPDATE 2 : Added Core NLP NER with runtime class binding this one is still using java binding instead of command invocation, because :
UPDATE 3 : Chaining support :
Now we can chain many NER Implementations (OpenNLP, CoreNLP, RegEx) to the NamedEntityParser.