New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-4122] Use CarbonFile API instead of java File API for Flink CarbonLocalWriter #4090
Conversation
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5438/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3677/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5439/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3678/ |
5171f27
to
42567df
Compare
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5440/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3679/ |
docs/flink-integration-guide.md
Outdated
@@ -78,7 +78,7 @@ limitations under the License. | |||
val carbonProperties = new Properties | |||
// Set the carbon properties here, such as date format, store location, etc. | |||
|
|||
// Create carbon bulk writer factory. Two writer types are supported: 'Local' and 'S3'. | |||
// Create carbon bulk writer factory. Three writer types are supported: 'Local', Hdfs' and 'S3'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everywhere if we use file factory API and support HDFS conf input, only one writer is enough right ? do we need 3 writers ?
Because in carbon table or SDK we don't create multiple type of writers to handle this kind of scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. i also thought about the same. But since, already they have implemented writers for LOCAL and S3 type, i have implemented for HDFS. But i can see, there are some differences only for S3 writer, some extra configurations are needed and they are not creating directory while writing stage directories in S3. you can check CarbonS3Writer.commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chaned code to use CarbonLocalWriter itself to handle Local and Hdfs FileSystems. Please review
42567df
to
8a6d987
Compare
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5450/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3689/ |
LGTM |
Why is this PR needed?
Currently, only two writer's(Local & S3) is supported for flink carbon streaming support. If user wants to ingest data from flink as a carbon format, directly into HDFS carbon table, there is no writer type to support it.
What changes were proposed in this PR?
Since the code for writing flink stage data will be same for Local and Hdfs FileSystems, we can use the existing CarbonLocalWriter to write data into hdfs, by using CarbonFile API instead of java File API.
Changed code to use CarbonFile API instead of java.io.File.
Does this PR introduce any user interface change?
Is any new testcase added?