Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add explodeSpec #8698

Closed
wants to merge 3 commits into from
Closed

add explodeSpec #8698

wants to merge 3 commits into from

Conversation

SEKIRO-J
Copy link
Contributor

@SEKIRO-J SEKIRO-J commented Oct 18, 2019

Description

Add explodeSpec feature, to give user ability to explode array objects without writing their own pre-processing scripts.

trying to add a forbidden-apis

in FileIteratingFirehose, the field private final ArrayList<InputRow> parsedInputRows; should be better declared as LinkedList, as consequential operation involves lots of push and pop, so better to use a queue. ArrayDeque doesn't work here because we need to insert null


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths.
  • added integration tests.
  • been tested in a test Druid cluster.

Key changed/added classes in this PR
  • JSONExplodeSpec
  • JSONExploderMaker
  • ObjectExploder
  • ObjectExploders

@SEKIRO-J
Copy link
Contributor Author

SEKIRO-J commented Nov 6, 2019

@clintropolis @vogievetsky
Finished the support for explodeSpec for Json Parser, with unit test implemented.
I think we can go ahead review it and test it in a cluster(not sure how to do it though).

Since the ObjectExploder is general, adding support for other parsers like Avro, Parquet, will be easy but long, so I plan to add them after Json Parser looks all good.
And will make NestedDataParseSpec a more general interface.

@jihoonson
Copy link
Contributor

Would you please check the build failure? It looks legit.

[ERROR] /Users/jihoonson/Codes/druid/indexing-service/src/test/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskTestBase.java:[105,15] no suitable constructor found for JSONParseSpec(org.apache.druid.data.input.impl.TimestampSpec,org.apache.druid.data.input.impl.DimensionsSpec,org.apache.druid.java.util.common.parsers.JSONPathSpec,com.google.common.collect.ImmutableMap<java.lang.Object,java.lang.Object>)
[ERROR]     constructor org.apache.druid.data.input.impl.JSONParseSpec.JSONParseSpec(org.apache.druid.data.input.impl.TimestampSpec,org.apache.druid.data.input.impl.DimensionsSpec,org.apache.druid.java.util.common.parsers.JSONPathSpec,java.util.List<org.apache.druid.java.util.common.parsers.JSONExplodeSpec>,java.util.Map<java.lang.String,java.lang.Boolean>) is not applicable
[ERROR]       (actual and formal argument lists differ in length)
[ERROR]     constructor org.apache.druid.data.input.impl.JSONParseSpec.JSONParseSpec(org.apache.druid.data.input.impl.TimestampSpec,org.apache.druid.data.input.impl.DimensionsSpec) is not applicable
[ERROR]       (actual and formal argument lists differ in length)

@@ -77,8 +82,22 @@ public InputRow nextRow() throws IOException
if (!hasMore()) {
throw new NoSuchElementException();
}
if (!parsedInputRows.isEmpty()) {
return parsedInputRows.remove(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayList.remove() internally moves the positions of remaining elements and this will happen for every row in this case which could be expensive. Since parsedInputRows won't be large in general, I suggest to use an iterator on the list instead of modifying the list.

    public E remove(int index) {
        rangeCheck(index);

        modCount++;
        E oldValue = elementData(index);

        int numMoved = size - index - 1;
        if (numMoved > 0)
            System.arraycopy(elementData, index+1, elementData, index,
                             numMoved);
        elementData[--size] = null; // clear to let GC do its work

        return oldValue;
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, it's a good idea, I was worrying about this issue too.
Tried to use LinkedList so that's O(1) pop, but I wonder why it's a forbidden API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because usually LinkedList is not a good idea. See #6112.

};
}

public interface ExploderMaker<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add javadoc to this class and methods? For class, It should include what this is and where it is used. For methods, it should say what the method does, what inputs and outputs are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

{
List<Object> getExplodeArray(T node, String expr);

T setObj(T node, Object value, String expr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.. don't know what the best name is for this method, but setObj() sounds not intuitive enough. Maybe createNewNodeWithValue()? Or you can split this into two methods of copy() and setValue().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will think of more intuitive names for those two methods.

@@ -82,6 +82,7 @@
null
),
new JSONPathSpec(true, ImmutableList.of()),
null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a sampler end-to-end unit test with an explodeSpec?

@fjy
Copy link
Contributor

fjy commented Nov 21, 2019

@SEKIRO-J this has a bunch of conflicts

@SEKIRO-J
Copy link
Contributor Author

SEKIRO-J commented Nov 21, 2019

@SEKIRO-J this has a bunch of conflicts
This PR is blocked by #8812. Will factor most of the code after that PR is done.

@stale
Copy link

stale bot commented Jan 20, 2020

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Jan 20, 2020
@stale
Copy link

stale bot commented Feb 17, 2020

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@stale stale bot closed this Feb 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants