-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-4137] Refactor CarbonDataSourceScan without the soruces.Filter of Spark 3 #4097
Conversation
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5503/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3738/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5504/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3739/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5506/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3741/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3744/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5509/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5511/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3746/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5514/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3749/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5515/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3750/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5518/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3753/ |
retest this please |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3754/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5519/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5520/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3755/ |
7f5a5c1
to
b62aad2
Compare
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5521/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3756/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5098/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3339/ |
b62aad2
to
10b6a95
Compare
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5108/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3349/ |
@@ -59,23 +58,3 @@ class InPolygonRangeListUDF extends ((String, String) => Boolean) with Serializa | |||
true // Carbon applies the filter. So, Spark do not have to apply filter. | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this filter is use for Spatial index, if remove how about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
it will still keep the udf and expression for Spatial index
-
In the old flow, carbon converts spark expression to spark filter, and then converts spark filter to carbon expression.
So, after creating carbon expression, this filter will be removed from memory.
Now in the new flow, it will convert spark expression to carbon expression directly, and not need this filter.
@@ -34,13 +32,3 @@ class TextMatchMaxDocUDF extends ((String, Int) => Boolean) with Serializable { | |||
v1.length > 0 | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same as the Spatial index, if removed, how to use the fun.
…ilter of Spark 3 Why is this PR needed? 1. In spark version 3, org.apache.spark.sql.sources.Filter is sealed, carbon can't extend it in carbon code. 2. The name of CarbonLateDecodeStrategy class is incorrect, the code is complex and hard to read 3. CarbonDataSourceScan can be the same for 2.3 and 2.4, and should support both batch reading and row reading. What changes were proposed in this PR? 1. translate spark Expression to carbon Expression directly, skip the spark Filter step. Remove all spark Filters in carbon code. old follow: Spark Expression => Spark Filter => Carbon Expression new follow: Spark Expression => Carbon Expression 2. Remove filter reorder, need to implement expression reorder (added CARBONDATA-4138). 3. separate CarbonLateDecodeStrategy to CarbonSourceStrategy and DMLStrategy, and simplify the code of CarbonSourceStrategy. 4. move CarbonDataSourceScan back to the source folder, use one CarbonDataSourceScan for all versions CarbonDataSourceScan supports both VectorReader and RowReader, Carbon will not use RowDataSourceScanExec. Does this PR introduce any user interface change? No Is any new testcase added? No
10b6a95
to
314db91
Compare
LGTM |
merged |
Why is this PR needed?
What changes were proposed in this PR?
translate spark Expression to carbon Expression directly, skip the spark Filter step. Remove all spark Filters in carbon code.
old follow: Spark Expression => Spark Filter => Carbon Expression
new follow: Spark Expression => Carbon Expression
Remove filter reorder, need to implement expression reorder (added CARBONDATA-4138).
separate CarbonLateDecodeStrategy to CarbonSourceStrategy and DMLStrategy, and simplify the code of CarbonSourceStrategy.
move CarbonDataSourceScan back to the source folder, use one CarbonDataSourceScan for all versions
CarbonDataSourceScan supports both VectorReader and RowReader, Carbon will not use RowDataSourceScanExec.
Does this PR introduce any user interface change?
Is any new testcase added?