-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix NPE when partionNumber 0 does not exist. #1190
Conversation
can you also add a unit test for this fix ? |
Let me make sure I understand this fix correctly. Essentially you are taking away the assumption that the chunk is chunk 0, but retaining the assumption that there is only 1 chunk. Is that correct? |
import com.google.common.collect.Lists; | ||
import com.google.common.collect.Ordering; | ||
import com.google.common.collect.Sets; | ||
import com.google.common.collect.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convention is to have all the imports spelled out explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I will fix it
Instead of getting the partition 0, I just try to get the first one using the iterator. For the IndexerSQLMetadataStorageCoordinator it does not work, I iterate over all partitions. |
@msprunck : I submitted a PR to your branch which completely removes chunk count assumptions, though I have very limited capacity to test it: |
@drcrallen : Thank you. I will test it on my index task as soon as possible. |
@@ -267,7 +267,7 @@ public Sink getSink(long timestamp) | |||
throw new ISE("No timeline entry at all!"); | |||
} | |||
|
|||
final Sink theSink = holder.getObject().getChunk(0).getObject(); | |||
final Sink theSink = holder.getObject().iterator().next().getObject(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed one >.<
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea how to manage this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The realtime plumber always uses SingleElementPartitionChunk, so there actually is no bug here to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gianm, @dcrallen I removed the fix from the PR.
@nishantmonu51 : I was able to get a unit test up at: The test fails in master but passes with the changes in this branch. |
Will return an empty list if the timeline does not find all chunks. So currently the IngestSegmentFirehoseFactory should either yield ALL segments or NO segments. |
I got the following injection error :
|
To test the missing chunk 0, you could start from partition num 1 your segment list in the IngestSegmentFirehoseFactoryTest::setUpStatic method. |
Hmm would you mind sharing the load json you are using? Please sanitize
|
The json of the task : On Thu, Mar 12, 2015 at 4:19 PM Charles Allen notifications@github.com
|
This error seems to come directly from overlord. I receive it as http response when submitting the job. There is no binding for TaskToolboxFactory in CliOverlord like CliPeon. It seems linked to the change you did in IngestSegmentFirehoseFactory.java vigiglobe@853f1c2 |
Yes, that's what I was thinking. Once im in the office I'll get a new patch
|
The less-obvious problem is that injecting an injector is "something we try
|
@msprunck : as a sanity check, can you please confirm that all the segments load properly on historical nodes? |
I'm running locally with the following config and it is going fine so far: {
"type" : "index",
"spec" : {
"dataSchema":{
"dataSource" : "tpch_year2",
"granularitySpec": {
"intervals": [
"1990-01-01T00:00:00.000Z/2000-01-01T00:00:00.000Z"
],
"segmentGranularity":"YEAR",
"type": "uniform"
},
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"timestampSpec" : {
"column" : "__time",
"format" : "auto"
},
"dimensionsSpec" : {
"dimensions": null,
"dimensionExclusions" : [],
"spatialDimensions" : []
}
}
},
"metricsSpec":[
{
"name" : "COUNT",
"type" : "count"
},
{
"fieldName" : "l_quantity",
"name" : "L_QUANTITY_longSum",
"type" : "longSum"
},
{
"fieldName" : "l_extendedprice",
"name" : "L_EXTENDEDPRICE_doubleSum",
"type" : "doubleSum"
},
{
"fieldName" : "l_discount",
"name" : "L_DISCOUNT_doubleSum",
"type" : "doubleSum"
},
{
"fieldName" : "l_tax",
"name" : "L_TAX_doubleSum",
"type" : "doubleSum"
},
{
"fieldName":"l_comment",
"name" : "L_COMMENT_HLL",
"type" : "hyperUnique"
},
{
"fieldName":"l_orderkey",
"name" : "L_ORDERKEY_HLL",
"type" : "hyperUnique"
}
]
},
"ioConfig":{
"type":"index",
"firehose":{
"type":"ingestSegment",
"dataSource":"tpch_year",
"interval":"1990-01-01T00:00:00.000Z/2000-01-01T00:00:00.000Z"
}
},
"tuningConfig":{
"type":"index",
"targetPartitionSize":-1,
"rowFlushBoundary":0,
"numShards":1
}
}
} I'll give more info once it has completed. |
It succeeded at least. |
Well, it succeeded, but its really easy to screw up the metric naming when re-ingesting. |
Now it seems to work, it's now running for more than 5 hours for one day of data. It's too slow. I ran a batch ingestion on the same machine and it took me 2 hours. |
I don't think Firehose ingestion has anywhere near the parallelism of
|
Hi @msprunck, this pull request is not merge-able right now. Do you mind a sync with master? Also, please sign our CLA here: http://druid.io/community/cla.html |
@msprunck : FYI, TravisCI fixes just went in a couple mins ago, if this is failing on the AsyncQueryForwardingServletTest, it may be worth simply closing and reopening the PR. |
wait, nvmnd, I was just informed @xvrl was able to restart the PR build jobs en masse. |
@msprunck : Can you squash the commits as per https://github.com/druid-io/druid/blob/master/CONTRIBUTING.md please? |
Cool! 👍 |
Since I also contributed to this we'll probably need a non-PR-contributor to also give it a review. |
return new SegmentToMergeHolder(segment, input.getInterval(), file); | ||
} | ||
} | ||
final List<SegmentToMergeHolder> segmentsToMerge = Lists.newArrayList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
segmentsToMerge
can just be an iterable, no need to materialize the list.
@msprunck put in a vigiglobe#4 for the comments |
@msprunck : I can carry the PR in a MMX repo if you want to take a break from the last mile of the PR :) You'll still get credit |
* Replace with generic iterables
I merged your PR. It should be ok now :-) |
Fix NPE when partionNumber 0 does not exist.
fixes #1053 |
As described in #1053 and in the google group (https://groups.google.com/forum/#!msg/druid-development/k-Q-Rj52czk/AOLtNu0Dr1kJ), when using the IngestSegmentFirehose, a NPE is thrown when there is no partionNumber 0 in a segment due to a misconfiguration. I replaced getChunk(0) by iterator().next().