Skip to content

[Issue #33] (Team 7) Implementation for NE Extractor #102

Merged
sam0227 merged 15 commits into
masterfrom
NEExtractor-Team7
May 18, 2016
Merged

[Issue #33] (Team 7) Implementation for NE Extractor #102
sam0227 merged 15 commits into
masterfrom
NEExtractor-Team7

Conversation

@sam0227
Copy link
Copy Markdown
Contributor

@sam0227 sam0227 commented May 16, 2016

@chenlica Please review our code. Thanks.

sam0227 added 4 commits May 12, 2016 19:20
2. First Implementation for NE Extractor.
3. Enable the real test in the test case
4. Fixed a problem in NEExtractorTestConstants.java. There are some counting problems for start and end.
…ributes for desire search field.

2. Fixed a bug merge two spans. As the NLP package do it's own tokenizing, some time characters like "  ' " would be use as delimiter, so when checking two adjacent words, the distance should be less or equal to one instead of one.
3. Added one NE constant as "Number".
3. Refactor test cases to follow the initialization.
4. Add one more test cases for passing two fields while only searching on field.
…tor-Team7

Conflicts:
	textdb/textdb-dataflow/pom.xml
public void open() throws Exception {
try {
sourceOperator.open();
sourceTuple = sourceOperator.getNextTuple();
Copy link
Copy Markdown
Contributor

@chenlica chenlica May 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we get the next tuple during open()? Can we do it in the first getNextTuple()?

@chenlica
Copy link
Copy Markdown
Contributor

The branch name "NEExtractor-Team7" is confusing. It should be "Team7-NEExtractor". I saw another branch called "Team7-NEExtractor". Please delete one of them to avoid confusions.

IField spanField = new ListField<Span>(spanList);
List<IField> fields = new ArrayList<IField>();
fields.add(spanField);
ITuple resultTuple = new DataTuple(new Schema(SchemaConstants.SPAN_LIST_ATTRIBUTE), fields.toArray(new IField[fields.size()]));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandeepreddy602 and @rajesh9625 do we have a helper function already for something similar?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenlica .. We don't have any helper function for this. It is a normal constructor call.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK then.

@chenlica
Copy link
Copy Markdown
Contributor

I gave some high-level comments. Please take care of them first.

@sam0227
Copy link
Copy Markdown
Contributor Author

sam0227 commented May 17, 2016

@chenlica Just changed the return type to match other operator. Also changed the test cases. Please review it. Thanks!

@chenlica
Copy link
Copy Markdown
Contributor

Fix the PR title using the same naming convention as other PRs.

* Return the recoginized data as a list of spans.
*
* For example: Given tuple with two field named: sentence1, sentence2.
* Return the recoginized data as a list of spans that appended to the original tuple as a field.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"that appended" -> "that are appended". Please pay attention to grammar :-)

@chenlica
Copy link
Copy Markdown
Contributor

@kishore-narendran : feel free to review this PR if you are available.

* Key -> NE_Constant
*/
private List<Span> extractNESpans(IField iField, String fieldName) {
List<Span> spanList = new ArrayList<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code of this function is the core of this operator, and it needs good comments to explain.

* 2. The two spans are in the same field. They should have the same fieldName.
* 3. The two spans have the same key (Organization, Person,... etc)
*/
private Span mergeTwoSpan(Span previousSpan, Span currentSpan) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mergeTwoSpan -> mergeTwoSpans

@chenlica
Copy link
Copy Markdown
Contributor

I left more comments. Please take care of them first. In general, please be more careful about the language in your comments.

@sam0227 sam0227 changed the title (TEAM 7 ) NE Extractor Implementation [ISSUE #33] (TEAM 7) Implementation for NE Extractor May 18, 2016
@sam0227 sam0227 changed the title [ISSUE #33] (TEAM 7) Implementation for NE Extractor [Issue #33] (TEAM 7) Implementation for NE Extractor May 18, 2016
@sam0227 sam0227 changed the title [Issue #33] (TEAM 7) Implementation for NE Extractor [Issue #33] (Team 7) Implementation for NE Extractor May 18, 2016
@sam0227
Copy link
Copy Markdown
Contributor Author

sam0227 commented May 18, 2016

@chenlica I just added more comments for the extractNESpans() functions. Please take a look to see if its clear. Sorry about the grammar problem I know I'm really bad at that. I'll pay more attention while I'm writing comments.

* return: "Doc1", 10, 18, "Location", "New York"
* <p>
* The caller needs to make sure:
* 1. The two spans are adjacent.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if these conditions are not satisfied? Throw an exception or return null?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenlica I don't want to add this inside this method. It's a private method only being used inside this class and condition should satisfy before calling this method. The condition checking part is in line156 ~line 158. If this looks good I'll make the merge.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK then.

@chenlica
Copy link
Copy Markdown
Contributor

In general, the PR looks very good now. After taking care of my comments, please go ahead to do the merge! I left some minor comments.

@sam0227 sam0227 merged commit 60b8723 into master May 18, 2016
@sam0227 sam0227 deleted the NEExtractor-Team7 branch May 18, 2016 20:23
Yicong-Huang added a commit that referenced this pull request May 3, 2026
### What changes were proposed in this PR?

Add an `emergency` label fast-path to Auto Queue. A PR with this label
is bumped before any non-emergency PR regardless of CREATED_AT, and its
presence in BEHIND bypasses the in-flight guard so a non-emergency PR's
running CI doesn't delay the bump. Within each priority class
CREATED_AT-ASC ordering is preserved.

Eligibility gates (auto-merge / not draft / not conflicting / APPROVED /
threads resolved) still apply — this only reorders the bump, it does not
bypass review. Label name is set by the `EMERGENCY_LABEL` constant
(one-line change if `priority/P0` or similar is preferred later).

### Any related issues, documentation, discussions?

Builds on #4672, #4678, #4845.

### How was this PR tested?

`yaml.safe_load` parses; `node --check` parses the wrapped script body.
Unit test on the partition logic: `[#100 docs, #101 emergency, #102
plain, #103 emergency+fix]` → `[101, 103, 100, 102]`.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7, 1M context)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants