New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-3680][alpha-feature]Support Secondary Index feature on carbon table. #3608
Conversation
Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/214/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1916/ |
Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/221/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1923/ |
Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/229/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1931/ |
26cb931
to
c5c1b46
Compare
Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/235/ |
Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/236/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1938/ |
Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/237/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1939/ |
7807f9f
to
dd5eb17
Compare
Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/244/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1946/ |
dd5eb17
to
dbf4283
Compare
Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/248/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1951/ |
dbf4283
to
9ce0d3f
Compare
retest this please |
Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/252/ |
please add description |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1955/ |
9ce0d3f
to
bfb367b
Compare
Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/261/ |
Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/264/ |
cd239e4
to
a0a2609
Compare
* threshold of high cardinality | ||
*/ | ||
@CarbonProperty | ||
public static final String HIGH_CARDINALITY_THRESHOLD = "high.cardinality.threshold"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not required now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, removed
|
||
/** | ||
* Default value for SI segment Compaction / merge small files | ||
* Making this true degrade the LOAD performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please explain in comment when should user set to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -2341,4 +2347,78 @@ private CarbonCommonConstants() { | |||
* Default first day of week | |||
*/ | |||
public static final String CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT = "SUNDAY"; | |||
|
|||
@CarbonProperty | |||
public static final String CARBON_PUSH_LEFTSEMIEXIST_JOIN_AS_IN_FILTER = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain in comment when should user set to true
/** | ||
* Method to prune the segments based on task min/max values | ||
* | ||
* @param segments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove it if you are not writing description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -32,14 +32,14 @@ | |||
* It is the wrapper around datamap and related filter expression. By using it user can apply | |||
* datamaps in expression style. | |||
*/ | |||
public interface DataMapExprWrapper extends Serializable { | |||
public abstract class DataMapExprWrapper implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can still use interface, java 8 interface can have default implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since some users are still using older java version,i think we can keep until we completely move out.
public void serializeMemoryBlock() { | ||
} | ||
|
||
public void copyToMemoryBlock() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why empty implementation? not abstract?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since only for unsafe implementation, its required.
identifierWrapper.getConfiguration(), indexInfos); | ||
identifierWrapper.isAddToUnsafe(), | ||
identifierWrapper.getConfiguration(), | ||
identifierWrapper.isSerializeDmStore(), indexInfos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move indexInfos to next line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
identifierWrapper.getConfiguration(), indexInfos); | ||
identifierWrapper.isAddToUnsafe(), | ||
identifierWrapper.getConfiguration(), | ||
identifierWrapper.isSerializeDmStore(), indexInfos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move indexInfos to next line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/265/ |
Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/268/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1971/ |
* merge the data files for upcoming loads or run SI rebuild command which does this job for all | ||
* segments. (REBUILD INDEX <index_table>) | ||
*/ | ||
public static final String DEFAULT_CARBON_SI_SEGMENT_MERGE = "false"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEFAULT_CARBON_SI_SEGMENT_MERGE => CARBON_SI_SEGMENT_MERGE_DEFAULT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -61,7 +61,8 @@ | |||
|
|||
public DiskBasedDMSchemaStorageProvider(String storePath) { | |||
this.storePath = CarbonUtil.checkAndAppendHDFSUrl(storePath); | |||
this.mdtFilePath = storePath + CarbonCommonConstants.FILE_SEPARATOR + "datamap.mdtfile"; | |||
this.mdtFilePath = CarbonUtil.checkAndAppendHDFSUrl( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.mdtFilePath = this.storePath + CarbonCommonConstants.FILE_SEPARATOR + "datamap.mdtfile";
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* inside the secondary index table, we need to delete the stale carbondata files. | ||
* refer {@link org.apache.spark.sql.secondaryindex.rdd.CarbonSIRebuildRDD} | ||
*/ | ||
private static void cleanUpDataFilesAfterSmallFIlesMergeForSI(CarbonTable table, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleanUpDataFilesAfterSmallFIlesMergeForSI => cleanUpDataFilesAfterSmallFilesMergeForSI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
case indexTableName ~ table ~ cols ~ indexStoreType ~ tblProp => | ||
|
||
if (!("carbondata".equalsIgnoreCase(indexStoreType) || | ||
"org.apache.carbondata.format".equalsIgnoreCase(indexStoreType))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
support only "carbondata"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
protected lazy val showIndexes: Parser[LogicalPlan] = | ||
(SHOW ~> opt(FORMATTED)) ~> (INDEXES | INDEX) ~> ON ~> ident ~ opt((FROM | IN) ~> ident) <~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about
(SHOW ~> INDEXES ~> ON ~> (ident <~ ".").? ~ ident
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
throw new ErrorMessage( | ||
s"Parent Table ${ carbonTable.getDatabaseName }." + | ||
s"${ carbonTable.getTableName }" + | ||
s" is Partition Table and Secondary index on Partition table is not supported ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it easy to support it in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, there will some issues in pruning part, but we can support
## See the License for the specific language governing permissions and | ||
## limitations under the License. | ||
## ------------------------------------------------------------------------ | ||
org.apache.spark.sql.CarbonSource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already exists in integration/spark2 module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/275/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1978/ |
2d68230
to
ec1d823
Compare
Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/278/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1981/ |
LGTM |
Why is this PR needed? There is a random 11 testcase failure in CI. What changes were proposed in this PR? a) After analyzing and adding logs, found out that test case added in #3608 has not reset the carbon.enable.auto.load.merge carbon property. This issue is random as if any other testcase (which resets that carbon property) in CarbonIndexFileMergeTestCase is ran before issue testcase, it will set the value to false and mask the issue. Also in a suite, order of testcase execution is not same always. So, it is random. b) Test case added by #3832 has a problem. test cdc with compaction fails randomly because 12.2 segment will not be there and merge carbon property is not reset when this is failed. It is because auto compaction sometimes 0.1 goes for marked for delete during merge operation. There is no record mismatch. Just that 12.2 doesn't exist. It will be in some other segment name. currently removed checking that 12.2 segment. Does this PR introduce any user interface change? No Is any new testcase added? No this closes #3948
Why is this PR needed?
Currently we have datamaps like,* default datamaps* which are block and
blocklet and coarse grained datamaps like bloom, and fine grained
datamaps like lucene which helps in better pruning during query. What if we
introduce another kind of datamap which can hold blockletId as index? Initial level,
we call it as index which will work as a child table to the main table like we have
MV in our current code.
Yes, lets introduce the secondary index to carbon table which will be the
child table to main table and it can be created on column like we create
lucene datamap, where we give index columns to create index. In a similar way,
we create secondary index on column, so indexes on these column will be blocklet IDs
which will help in better pruning and faster query when we have a filter query on the
index column.
What changes were proposed in this PR?
introduced SI feature
it contains:
Does this PR introduce any user interface change?
Is any new testcase added?