[Issue #33] (Team 7) Implementation for NE Extractor #102
Conversation
2. First Implementation for NE Extractor. 3. Enable the real test in the test case 4. Fixed a problem in NEExtractorTestConstants.java. There are some counting problems for start and end.
…ributes for desire search field. 2. Fixed a bug merge two spans. As the NLP package do it's own tokenizing, some time characters like " ' " would be use as delimiter, so when checking two adjacent words, the distance should be less or equal to one instead of one. 3. Added one NE constant as "Number". 3. Refactor test cases to follow the initialization. 4. Add one more test cases for passing two fields while only searching on field.
…tor-Team7 Conflicts: textdb/textdb-dataflow/pom.xml
| public void open() throws Exception { | ||
| try { | ||
| sourceOperator.open(); | ||
| sourceTuple = sourceOperator.getNextTuple(); |
There was a problem hiding this comment.
Why do we get the next tuple during open()? Can we do it in the first getNextTuple()?
|
The branch name "NEExtractor-Team7" is confusing. It should be "Team7-NEExtractor". I saw another branch called "Team7-NEExtractor". Please delete one of them to avoid confusions. |
| IField spanField = new ListField<Span>(spanList); | ||
| List<IField> fields = new ArrayList<IField>(); | ||
| fields.add(spanField); | ||
| ITuple resultTuple = new DataTuple(new Schema(SchemaConstants.SPAN_LIST_ATTRIBUTE), fields.toArray(new IField[fields.size()])); |
There was a problem hiding this comment.
@sandeepreddy602 and @rajesh9625 do we have a helper function already for something similar?
There was a problem hiding this comment.
@chenlica .. We don't have any helper function for this. It is a normal constructor call.
|
I gave some high-level comments. Please take care of them first. |
Refactor Annotation name to documentAnnocation Corrected some grammar Add an example for mergeSpans()
Now append a field that contain a list of spans to to the original tuple.
…into NEExtractor-Team7
Set returnSchema to null when close() is called.
|
@chenlica Just changed the return type to match other operator. Also changed the test cases. Please review it. Thanks! |
|
Fix the PR title using the same naming convention as other PRs. |
| * Return the recoginized data as a list of spans. | ||
| * | ||
| * For example: Given tuple with two field named: sentence1, sentence2. | ||
| * Return the recoginized data as a list of spans that appended to the original tuple as a field. |
There was a problem hiding this comment.
"that appended" -> "that are appended". Please pay attention to grammar :-)
|
@kishore-narendran : feel free to review this PR if you are available. |
| * Key -> NE_Constant | ||
| */ | ||
| private List<Span> extractNESpans(IField iField, String fieldName) { | ||
| List<Span> spanList = new ArrayList<>(); |
There was a problem hiding this comment.
The code of this function is the core of this operator, and it needs good comments to explain.
| * 2. The two spans are in the same field. They should have the same fieldName. | ||
| * 3. The two spans have the same key (Organization, Person,... etc) | ||
| */ | ||
| private Span mergeTwoSpan(Span previousSpan, Span currentSpan) { |
There was a problem hiding this comment.
mergeTwoSpan -> mergeTwoSpans
|
I left more comments. Please take care of them first. In general, please be more careful about the language in your comments. |
|
@chenlica I just added more comments for the extractNESpans() functions. Please take a look to see if its clear. Sorry about the grammar problem I know I'm really bad at that. I'll pay more attention while I'm writing comments. |
| * return: "Doc1", 10, 18, "Location", "New York" | ||
| * <p> | ||
| * The caller needs to make sure: | ||
| * 1. The two spans are adjacent. |
There was a problem hiding this comment.
What if these conditions are not satisfied? Throw an exception or return null?
There was a problem hiding this comment.
@chenlica I don't want to add this inside this method. It's a private method only being used inside this class and condition should satisfy before calling this method. The condition checking part is in line156 ~line 158. If this looks good I'll make the merge.
|
In general, the PR looks very good now. After taking care of my comments, please go ahead to do the merge! I left some minor comments. |
### What changes were proposed in this PR? Add an `emergency` label fast-path to Auto Queue. A PR with this label is bumped before any non-emergency PR regardless of CREATED_AT, and its presence in BEHIND bypasses the in-flight guard so a non-emergency PR's running CI doesn't delay the bump. Within each priority class CREATED_AT-ASC ordering is preserved. Eligibility gates (auto-merge / not draft / not conflicting / APPROVED / threads resolved) still apply — this only reorders the bump, it does not bypass review. Label name is set by the `EMERGENCY_LABEL` constant (one-line change if `priority/P0` or similar is preferred later). ### Any related issues, documentation, discussions? Builds on #4672, #4678, #4845. ### How was this PR tested? `yaml.safe_load` parses; `node --check` parses the wrapped script body. Unit test on the partition logic: `[#100 docs, #101 emergency, #102 plain, #103 emergency+fix]` → `[101, 103, 100, 102]`. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7, 1M context) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chenlica Please review our code. Thanks.