New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-3266: Cast to PathOutputCommitter #1431
Conversation
Change the cast of the outputCommitter to a PathOutputCommitter. This allows the system to use a different outputCommitter than just the FileOutputCommitter, for example the MagicS3GuardCommitter.
LGTM! |
Hello! It looks like PathOutputCommitter is a Hadoop 3.0.0 addition. Is there an EOL for Hadoop 2.x ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello! This does end up failing when compiled with the -Phadoop2
profile. I don't think Hadoop 2.x is EOL yet, so we may need to use a different tactic.
I imagine that we could either generate two different artifact profiles for Apache Hadoop 2.x and 3.x, but it might be simpler to make the call via reflection, or behind an "instanceof". If I understand correctly, this method isn't called frequently (once per split, not once per record).
If you can think of a way to support both versions in the same artifact, in a maintainable and readable way, I'll be suitably impressed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. you will also need it for the fast and safe abfs/gcs committer coming in MAPREDUCE-7341
are you using the new methods in |
I also interested in fixing this, so I create #1618 using reflection to invoke |
avro has just upgraded its hadoop dependency to 3.3.3 via dependabot. not sure if that was intentional. if java 11 is wanted, it'd have to be 3.2.0+ anyway |
@steveloughran Do you mean #1697 ? But there is also Line 469 in 7997697
I guess this one could be bumped to 2.10.2 before releasing 1.11.1. |
Sorry, I've got confused and am now confusing everyone else; been looking at the build and here is my understanding now
I think that it is time to turn off the hadoop 2 support. Because if avro builds are java8+ anyway, they aren't that likely to run on those hadoop 2.x clusters still in production. I know this as whenever I have to produce something to work on any cluster of that era -I have to use a jdk7 JDK to make the release. Do that and I will provide the shim library to get at the maximum performance IO operations available against HDFS and cloud storage, when present. |
See AVRO-6535 -- we should revisit this implementation as we drop Hadoop 2 . |
Change the cast of the outputCommitter to a PathOutputCommitter.
This allows the system to use a different outputCommitter than just the
FileOutputCommitter, for example the MagicS3GuardCommitter.
Fixes Output stream incompatible with MagicS3GuardCommitter