New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read header and footer if available #99
base: trunk
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebastian-alfers, thank you for the pull request, I had some comments on it. Let me know if you have any question regarding to them.
public abstract class AbstractAvroEventDeserializer implements EventDeserializer { | ||
|
||
protected AvroSchemaType schemaType; | ||
protected ResettableInputStream ris; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to change this to private final
: just add it to the constructor and replace the this.ris = ris
with a super(ris)
call in the subclasses.
There is no need for the subclasses to access this and pushing the responsibility of setting it (and also giving the possibility to modify it) to the subclasses would make it very error-prone.
= "flume.avro.schema.literal"; | ||
|
||
|
||
protected void initialize(ResettableInputStream ris) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter is not needed if AbstractAvroEventDeserializer.ris
is changed to private final
.
@@ -26,6 +26,7 @@ | |||
public enum EventDeserializerType { | |||
LINE(LineDeserializer.Builder.class), | |||
AVRO(AvroEventDeserializer.Builder.class), | |||
FLUME(FlumeEventAvroEventDeserializer.Builder.class), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think FLUME_AVRO
would be a more expressive name for this.
Please add this to the user guide as well. Thanks.
private FlumeEventAvroEventDeserializer(Context context, ResettableInputStream ris) { | ||
this.ris = ris; | ||
|
||
schemaType = AvroSchemaType.valueOf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (line 48-57) is a copy-paste from AvroEventDeserializer
. Could you please simplify it by pushing it (from here and from the AvroEventDeserializer
) up to the AbstractAvroEventDeserializer
?
If the logic is moved to the parent class then its schemaType
field can be changed to private final
.
SeekableResettableInputBridge in = new SeekableResettableInputBridge(ris); | ||
long pos = in.tell(); | ||
in.seek(0L); | ||
fileReader = new DataFileReader<GenericRecord>(in, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: explicit type argument can be replaced with <>
@adenes Thanks fore your comments, I updated the pr |
@adenes any comments? |
@adenes ping |
@laxman-ch @adenes @simonati anyone likes to push this PR forward? Comments welcome |
Thanks for the heads up! Let me take a look today. |
This change is quite big and hard to follow. Please give me more time to finish the review or help any other reviewer by making consumption of your changes easier by leaving some more hints about the ideas behind your changes. |
@simonati Thanks for you comment. Actually, I wrote down all my ideas in the corresponding jira ticket. If you have any particular question feel free to point them out |
Push |
Can one of the admins verify this patch? |
dsync: prometheus
This commit fixes an issue, that header-values can not be restored correctly after an event was avro-serialized to disk.
The problem was, that deserializing the event from disk, the body of the event contained the binary and the header.
See: https://issues.apache.org/jira/browse/FLUME-2942
Discussion welcome!