Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avro-extensions -- feature to specify avro reader schema inline #3249

Merged
merged 1 commit into from Aug 10, 2016

Conversation

himanshug
Copy link
Contributor

@skoppar
Copy link

skoppar commented Aug 4, 2016

This is something we need in our set up. Thanks Himanshu. Which release is this expected to be a part of ?


private final Schema schemaObj;

private Map<String, Object> schema;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@drcrallen
Copy link
Contributor

This should be reasonable to be in 0.9.2, and if there's a community need even more so

@himanshug
Copy link
Contributor Author

@skoppar do you have single schema for all the events in the kafka topic or multiple ? is it possible for you to test this patch?

are you already trying to use avro extension with schema_repo and facing some error?

@nishantmonu51
Copy link
Member

LGTM, 👍

@himanshug himanshug changed the title WIP: avro-extensions -- feature to specify avro reader schema inline avro-extensions -- feature to specify avro reader schema inline Aug 5, 2016
@himanshug
Copy link
Contributor Author

updated the docs.

@skoppar
Copy link

skoppar commented Aug 5, 2016

Hi Himanshu,
I havent tried Schema repo and was about to extend the AvroBytesDecoder myself and thats when I saw your comment on another google thread. https://groups.google.com/forum/#!searchin/druid-user/avro$20druid%7Csort:relevance/druid-user/72NKnm6eBRY/VCgyaTKEAwAJ

I am testing your patch now. Will keep you posted

@skoppar
Copy link

skoppar commented Aug 8, 2016

publicAvroKafka.txt

Hi Himanshu,

I am not able to get the avro stream to work with the new jar. I have
packaged your changes which produced
"druid-avro-extensions-0.9.2-SNAPSHOT.jar".
I moved the original jar into a subdirectory so that there are no conflicts
and copied the snapshot jar in current directory (imply-1.3.0/dist/druid/
extensions/druid-avro-extensions). Also changed imply-1.3.0/conf-
quickstart/druid/_common/common.runtime.properties to include avro library:

druid.extensions.loadList=["druid-avro-extensions", "druid-histogram",
"druid-datasketches", "druid-kafka-indexing-service"]

I've modified kafka.json in imply-1.3.0/conf-quickstart/tranquility to the
one attached.

Upon starting imply:
bin/supervise -c conf/supervise/quickstart.conf

I do not see any topics being picked in the tranquility logs
[imply-1.3.0/var/sv/tranquility-kafka.log]:
2016-08-08 17:27:56,913 [main] INFO k.c.ZookeeperConsumerConnector -
[cust1-tranquility_skoppar-mac15.com-1470677276325-37ef097f], Creating
topic event watcher for topics ((?!))
2016-08-08 17:27:56,929 [main] INFO k.c.ZookeeperConsumerConnector -
[cust1-tranquility_skoppar-mac15.com-1470677276325-37ef097f], Topics to
consume = List()

Doesn't it mean that no topics have been registered?

Only relevant entries in the log are:
2016-08-08 17:27:56,151 [main] INFO c.m.t.kafka.writer.WriterController -
Ready: [topicPattern] -> dataSource mappings:
2016-08-08 17:27:56,152 [main] INFO c.m.t.kafka.writer.WriterController -
[(?!)] -> AvroSource (priority: 1)

I am new to Druid, so please pardon my ignorance. Is the
publicAvrokafka.json file right? At this point I havent added any business
functionality. The derived fields are coming from an ETL process outside of
Druid. I just want to be able to ingest avro data through kafka and that
doesnt seem to work. To be sure its not a kafka issue, I have tried using
kafka-console-consumer and I can receive events there.

regards
Sunita

On Fri, Aug 5, 2016 at 7:35 AM, Himanshu notifications@github.com wrote:

updated the docs.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3249 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ARAoyRcBJ9n4QvVBCCUoH05HJTDWJh38ks5qc0olgaJpZM4JNFH8
.

@himanshug
Copy link
Contributor Author

@skoppar i haven't really used kafka with tranquility and information provided does not look like avro parsing related failures.
i don't see any kafka topic etc input information in your spec , not sure how you provide it to tranquility... @gianm can you spot something?

@skoppar
Copy link

skoppar commented Aug 8, 2016

Thanks for quick response Himanshu.
From what I understand,
"properties": {
"topicPattern": "raw-cust1"
}

Provides the topic which is per datasource level and below properties provide overall cluster connection. Adding group_id to data source specs might be more advisable, but I haven't tried that yet. I am using all default ports.

"kafka.group.id":"cust1-tranquility",
"zookeeper.connect":"localhost",
"kafka.zookeeper.connect":"localhost"

regards
Sunita

On Mon, Aug 8, 2016 at 11:47 AM, Himanshu notifications@github.com wrote:

@skoppar https://github.com/skoppar i haven't really used kafka with
tranquility and information provided does not look like avro parsing
related failures.
i don't see any kafka topic etc input information in your spec , not sure
how you provide it to tranquility... @gianm https://github.com/gianm
can you spot something?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3249 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ARAoycaGfewz_18wXsiKchgnJC_MDnRYks5qd3mygaJpZM4JNFH8
.

@skoppar
Copy link

skoppar commented Aug 9, 2016

Another fact is, Zookeeper is able to recognize the consumer group and the
group is visible upon executing zookeeper shell. Also from tranquility
side, zookeeper connection seems to be present. The Kafka broker connection
is what seems to be missing and the topic list is empty as mentioned in the
previous comment. Appreciate any help @gianm https://github.com/gianm
I also tried with 0.8.2.0-kafka-1.3.0 instead of 0.9.0-kafka-2.0.1

@nishantmonu51
Copy link
Member

nishantmonu51 commented Aug 9, 2016

@skoppar seems like you have specified the properties at wrong place in json spec file,
the properties needs to be defined outside the spec block in json as done here -
https://github.com/druid-io/tranquility/blob/master/distribution/src/universal/conf/kafka.json.example

@skoppar
Copy link

skoppar commented Aug 9, 2016

Thanks Nishant for bringing it to my notice.
Yea I see there is some discrepancy in the spec now. It indeed was the issue w.r.t recognizing the topic. After putting the topic details out of spec, tranquility is able to find the kafka topic.

Currently getting some class loading issues with the new jar. Probably I need to double check the extension loadlist. Will update here.

@skoppar
Copy link

skoppar commented Aug 9, 2016

Hi Himanshu,

Looks like there is a dependency on io.druid.druid-api-0.9.1.jar based on the errors I see. imply-1.3.0/dist/tranquility/lib has a huge list of jars. I hope I have to replace fewer jars. Are there any documented dependencies? Attaching the exception I get now.
WithPR.txt

@gianm
Copy link
Contributor

gianm commented Aug 9, 2016

@skoppar what is the tranquility config you are using now?

@skoppar
Copy link

skoppar commented Aug 9, 2016

@gianm attaching the current tranquility-kafka json file.
publicAvroKafka.txt

@himanshug
Copy link
Contributor Author

@skoppar @gianm given that exception came from StringInputRowParser , I believe avro_stream parser did not get recognized and that probably means avro-extensions module wasn't loaded. did you see any avro-extension loading failures or a message regarding loading of avro-extension in the logs?

@skoppar
Copy link

skoppar commented Aug 9, 2016

@himanshug doesnt look like loading the jar is the issue. I could be wrong though. Attaching the indexing-log in imply-1.3.0/var/druid/indexing-logs which shows the druid-avro-extensions-0.9.2-SNAPSHOT.jar being added. Please let me know if I am looking at a wrong place.
index_realtime_metrics-kafka.txt

@himanshug
Copy link
Contributor Author

@skoppar one of the problems in your json is that "parseSpec" is inside "avroBytesDecoder" while it should be a level up inside "parser" .

@skoppar
Copy link

skoppar commented Aug 9, 2016

oops ! let me relook at the complete json

On Tue, Aug 9, 2016 at 1:14 PM, Himanshu notifications@github.com wrote:

@skoppar https://github.com/skoppar one of the problems in your json is
that "parseSpec" is inside "avroBytesDecoder" while it should be a level up
inside "parser" .


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3249 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ARAoyXAMybK3Ysq-K-Ou47LgWSrF7UY4ks5qeN-YgaJpZM4JNFH8
.

@skoppar
Copy link

skoppar commented Aug 9, 2016

Its working now. 2 more changes I had to do was:

  1. Specify the format in parseSpec
  2. Add the path to druid extensions manually in quickStart.conf

With that done to the attached conf, I am able to process records with schema inline. However, I do not see the data source in druid console or pivot. I can see metrics-kafka (had run the example), so I was expecting this should show up as well.

Using topic name - raw-avro for this test. Below is what the tranquility-kafka log shows - the avro schema and then the below message
2016-08-09 21:34:23,038 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {raw-avro={receivedCount=800, sentCount=0, droppedCount=0, unparseableCount=800}} pending messages in 0ms and committed offsets in 3ms.

@himanshug
Copy link
Contributor Author

@skoppar glad to know that it worked out for you and thanks for testing the patch.

@fjy
Copy link
Contributor

fjy commented Aug 10, 2016

👍

@fjy fjy merged commit 46da682 into apache:master Aug 10, 2016
@skoppar
Copy link

skoppar commented Aug 10, 2016

Well not completely. Trying to figure out why segments are not getting
created. But the patch did work well. This must be something else I am
missing. I have 800 messages received and unparseableCount=800. I tried
changing timestamp to be current, but still get the same results.
Appreciate any inputs you have on this, however, it does not seem to be
related to the changes in the PR.

On Aug 10, 2016, at 10:47 AM, Himanshu notifications@github.com wrote:

@skoppar https://github.com/skoppar glad to know that it worked out for
you and thanks for testing the patch.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3249 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ARAoyTP1qdVFbw4wClZT9FgayJfKcbtOks5qeg7BgaJpZM4JNFH8
.

@skoppar
Copy link

skoppar commented Aug 10, 2016

Well not completely. Trying to figure out why segments are not getting
created. But the patch did work well in terms of pulling the schema and
getting right count of events. This must be something else I am missing. I
have 800 messages received and unparseableCount=800. I tried changing
timestamp to be current, but still get the same results. Appreciate any
inputs you have on this, however, it probably is not related to the changes
in the PR.

@himanshug
Copy link
Contributor Author

@skoppar enable debug logging and that would show you more info.

@skoppar
Copy link

skoppar commented Aug 10, 2016

Figured it out with the help of a kafka consumer class. The console consumer showed data just fine, so couldn't spot the error. The rootcause is, I was sending the avro message using StringEncoder. Using DefaultEncoder on producer side fixed the issue. Thanks @himanshug and @gianm. Awaiting 9.2 release :) 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants