[Issue #33] (Team 7) Implementation for NE Extractor by sam0227 · Pull Request #102 · apache/texera

sam0227 · 2016-05-16T17:43:23Z

@chenlica Please review our code. Thanks.

2. First Implementation for NE Extractor. 3. Enable the real test in the test case 4. Fixed a problem in NEExtractorTestConstants.java. There are some counting problems for start and end.

…ributes for desire search field. 2. Fixed a bug merge two spans. As the NLP package do it's own tokenizing, some time characters like " ' " would be use as delimiter, so when checking two adjacent words, the distance should be less or equal to one instead of one. 3. Added one NE constant as "Number". 3. Refactor test cases to follow the initialization. 4. Add one more test cases for passing two fields while only searching on field.

…tor-Team7 Conflicts: textdb/textdb-dataflow/pom.xml

chenlica · 2016-05-16T18:56:26Z

    public void open() throws Exception {
        try {
            sourceOperator.open();
+            sourceTuple = sourceOperator.getNextTuple();


Why do we get the next tuple during open()? Can we do it in the first getNextTuple()?

chenlica · 2016-05-16T19:02:04Z

The branch name "NEExtractor-Team7" is confusing. It should be "Team7-NEExtractor". I saw another branch called "Team7-NEExtractor". Please delete one of them to avoid confusions.

chenlica · 2016-05-16T19:05:55Z

+            IField spanField = new ListField<Span>(spanList);
+            List<IField> fields = new ArrayList<IField>();
+            fields.add(spanField);
+            ITuple resultTuple = new DataTuple(new Schema(SchemaConstants.SPAN_LIST_ATTRIBUTE), fields.toArray(new IField[fields.size()]));


@sandeepreddy602 and @rajesh9625 do we have a helper function already for something similar?

@chenlica .. We don't have any helper function for this. It is a normal constructor call.

chenlica · 2016-05-16T19:13:14Z

I gave some high-level comments. Please take care of them first.

…e() in Open().

Refactor Annotation name to documentAnnocation Corrected some grammar Add an example for mergeSpans()

Now append a field that contain a list of spans to to the original tuple.

…into NEExtractor-Team7

Set returnSchema to null when close() is called.

sam0227 · 2016-05-17T19:25:11Z

@chenlica Just changed the return type to match other operator. Also changed the test cases. Please review it. Thanks!

chenlica · 2016-05-17T22:07:07Z

Fix the PR title using the same naming convention as other PRs.

chenlica · 2016-05-17T22:07:54Z

- *         Return the recoginized data as a list of spans.
- *
- *         For example: Given tuple with two field named: sentence1, sentence2.
+ *         Return the recoginized data as a list of spans that appended to the original tuple as a field.


"that appended" -> "that are appended". Please pay attention to grammar :-)

chenlica · 2016-05-17T22:12:38Z

@kishore-narendran : feel free to review this PR if you are available.

chenlica · 2016-05-17T22:13:25Z

+     * Key   -> NE_Constant
+     */
+    private List<Span> extractNESpans(IField iField, String fieldName) {
+        List<Span> spanList = new ArrayList<>();


The code of this function is the core of this operator, and it needs good comments to explain.

chenlica · 2016-05-17T22:14:19Z

+     * 2. The two spans are in the same field. They should have the same fieldName.
+     * 3. The two spans have the same key (Organization, Person,... etc)
+     */
+    private Span mergeTwoSpan(Span previousSpan, Span currentSpan) {


mergeTwoSpan -> mergeTwoSpans

chenlica · 2016-05-17T22:19:26Z

I left more comments. Please take care of them first. In general, please be more careful about the language in your comments.

2. Fixed grammar error.

sam0227 · 2016-05-18T07:53:15Z

@chenlica I just added more comments for the extractNESpans() functions. Please take a look to see if its clear. Sorry about the grammar problem I know I'm really bad at that. I'll pay more attention while I'm writing comments.

chenlica · 2016-05-18T14:56:40Z

+     * return:   "Doc1", 10, 18, "Location", "New York"
+     * <p>
+     * The caller needs to make sure:
+     * 1. The two spans are adjacent.


What if these conditions are not satisfied? Throw an exception or return null?

@chenlica I don't want to add this inside this method. It's a private method only being used inside this class and condition should satisfy before calling this method. The condition checking part is in line156 ~line 158. If this looks good I'll make the merge.

chenlica · 2016-05-18T15:02:17Z

In general, the PR looks very good now. After taking care of my comments, please go ahead to do the merge! I left some minor comments.

### What changes were proposed in this PR? Add an `emergency` label fast-path to Auto Queue. A PR with this label is bumped before any non-emergency PR regardless of CREATED_AT, and its presence in BEHIND bypasses the in-flight guard so a non-emergency PR's running CI doesn't delay the bump. Within each priority class CREATED_AT-ASC ordering is preserved. Eligibility gates (auto-merge / not draft / not conflicting / APPROVED / threads resolved) still apply — this only reorders the bump, it does not bypass review. Label name is set by the `EMERGENCY_LABEL` constant (one-line change if `priority/P0` or similar is preferred later). ### Any related issues, documentation, discussions? Builds on #4672, #4678, #4845. ### How was this PR tested? `yaml.safe_load` parses; `node --check` parses the wrapped script body. Unit test on the partition logic: `[#100 docs, #101 emergency, #102 plain, #103 emergency+fix]` → `[101, 103, 100, 102]`. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7, 1M context) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sam0227 added 4 commits May 12, 2016 19:20

1. Add dependency on pom.xml for Stanford NLP package

80f5794

2. First Implementation for NE Extractor. 3. Enable the real test in the test case 4. Fixed a problem in NEExtractorTestConstants.java. There are some counting problems for start and end.

Corrected some comments.

aac48d3

Merge commit 'd464d88885d7a97edf3d4441fb1d4fd78f4ae63a' into NEExtrac…

64135cc

…tor-Team7 Conflicts: textdb/textdb-dataflow/pom.xml

chenlica reviewed May 16, 2016
View reviewed changes

sam0227 added 8 commits May 17, 2016 10:15

Merge branch 'master' into NEExtractor-Team7

24b2e0f

Changed logic for get tuple from source operator. No more getNextTupl…

55c60b7

…e() in Open().

Refactor the function getSpans() to extractNESpans()

4462d27

Refactor Annotation name to documentAnnocation Corrected some grammar Add an example for mergeSpans()

Merge branch 'master' into NEExtractor-Team7

b0a7934

Changed the design of return tuple.

7c587c9

Now append a field that contain a list of spans to to the original tuple.

Merge branch 'NEExtractor-Team7' of https://github.com/TextDB/textdb …

3342283

…into NEExtractor-Team7

Add an If to makes sure the createSpanSchema only call once.

835e787

Set returnSchema to null when close() is called.

Modified the test constants to matches the new return design.

708ad22

chenlica reviewed May 17, 2016
View reviewed changes

sam0227 changed the title ~~(TEAM 7 ) NE Extractor Implementation~~ [ISSUE #33] (TEAM 7) Implementation for NE Extractor May 18, 2016

sam0227 changed the title ~~[ISSUE #33] (TEAM 7) Implementation for NE Extractor~~ [Issue #33] (TEAM 7) Implementation for NE Extractor May 18, 2016

sam0227 changed the title ~~[Issue #33] (TEAM 7) Implementation for NE Extractor~~ [Issue #33] (Team 7) Implementation for NE Extractor May 18, 2016

sam0227 added 3 commits May 18, 2016 00:10

1. Assign null value to returnSchema in Open().

660631d

2. Fixed grammar error.

Added more detail comment for the extraction function

bfe51bc

Fixed some grammar error.

f5db4ef

chenlica reviewed May 18, 2016
View reviewed changes

sam0227 merged commit 60b8723 into master May 18, 2016

sam0227 deleted the NEExtractor-Team7 branch May 18, 2016 20:23

Conversation

sam0227 commented May 16, 2016

Uh oh!

chenlica May 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenlica commented May 16, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenlica commented May 16, 2016

Uh oh!

sam0227 commented May 17, 2016

Uh oh!

chenlica commented May 17, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenlica commented May 17, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenlica commented May 17, 2016

Uh oh!

sam0227 commented May 18, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenlica commented May 18, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chenlica May 16, 2016 •

edited

Loading