New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRILL-7098: File Metadata Metastore Plugin #1754
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vdiravka, there are two classes which I think it is better to leave in the exec module: SchemaContainer
and SchemaTransformer
.
Also, there are some other classes which would be good to move to the metastore-api module: TableMetadataProvider
, MetadataProviderManager
, TableMetadataProviderBuilder
and others. Could you please move them also?
@@ -76,7 +76,7 @@ static boolean isNullOrEmpty(ColumnStatistics stat) { | |||
|| !stat.containsStatistic(ColumnStatisticsKind.MIN_VALUE) | |||
|| !stat.containsStatistic(ColumnStatisticsKind.MAX_VALUE) | |||
|| !stat.containsStatistic(ColumnStatisticsKind.NULLS_COUNT) | |||
|| (long) stat.getStatistic(ColumnStatisticsKind.NULLS_COUNT) == GroupScan.NO_COLUMN_STATS; | |||
|| (long) stat.getStatistic(ColumnStatisticsKind.NULLS_COUNT) == Statistic.NO_COLUMN_STATS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there are some extra spaces before Statistic
interface here and in other places were added. Could you please remove them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableMetadataProviderBuilder
and TableMetadataProvider
are moved to metastore
module.
MetadataProviderManager
depends on DrillStatsTable
, which hardly coupled with java-exec
. I will consider of creating additional abstractions here. I will update this answer then.
I have returned SchemaContainer
and SchemaTransformer
, but if MetadataProviderManager
is moved to metastore
module, SchemaProvider
along with SchemaContainer
should be in metastore
module too.
@@ -76,7 +76,7 @@ static boolean isNullOrEmpty(ColumnStatistics stat) { | |||
|| !stat.containsStatistic(ColumnStatisticsKind.MIN_VALUE) | |||
|| !stat.containsStatistic(ColumnStatisticsKind.MAX_VALUE) | |||
|| !stat.containsStatistic(ColumnStatisticsKind.NULLS_COUNT) | |||
|| (long) stat.getStatistic(ColumnStatisticsKind.NULLS_COUNT) == GroupScan.NO_COLUMN_STATS; | |||
|| (long) stat.getStatistic(ColumnStatisticsKind.NULLS_COUNT) == Statistic.NO_COLUMN_STATS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added minor comments connected with formatting.
I have a question regarding packing these new modules into jdbc-all
jar. Looks like a client should not use them, but they weren't excluded from the java-exec
module in jdbc-all
pom file. Should they be excluded, or they should be specified explicitly to be put into the jar?
@@ -151,13 +152,13 @@ public long getColumnValueCount(SchemaPath column) { | |||
long colNulls; | |||
if (columnStats != null) { | |||
Long nulls = (Long) columnStats.getStatistic(ColumnStatisticsKind.NULLS_COUNT); | |||
colNulls = nulls != null ? nulls : GroupScan.NO_COLUMN_STATS; | |||
colNulls = nulls != null ? nulls : Statistic.NO_COLUMN_STATS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove extra space here:
colNulls = nulls != null ? nulls : Statistic.NO_COLUMN_STATS; | |
colNulls = nulls != null ? nulls : Statistic.NO_COLUMN_STATS; |
return GroupScan.NO_COLUMN_STATS == tableRowCount | ||
|| GroupScan.NO_COLUMN_STATS == colNulls | ||
? GroupScan.NO_COLUMN_STATS : tableRowCount - colNulls; | ||
return Statistic.NO_COLUMN_STATS == tableRowCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here
return Statistic.NO_COLUMN_STATS == tableRowCount | |
return Statistic.NO_COLUMN_STATS == tableRowCount |
|| GroupScan.NO_COLUMN_STATS == colNulls | ||
? GroupScan.NO_COLUMN_STATS : tableRowCount - colNulls; | ||
return Statistic.NO_COLUMN_STATS == tableRowCount | ||
|| Statistic.NO_COLUMN_STATS == colNulls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|| Statistic.NO_COLUMN_STATS == colNulls | |
|| Statistic.NO_COLUMN_STATS == colNulls |
? GroupScan.NO_COLUMN_STATS : tableRowCount - colNulls; | ||
return Statistic.NO_COLUMN_STATS == tableRowCount | ||
|| Statistic.NO_COLUMN_STATS == colNulls | ||
? Statistic.NO_COLUMN_STATS : tableRowCount - colNulls; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? Statistic.NO_COLUMN_STATS : tableRowCount - colNulls; | |
? Statistic.NO_COLUMN_STATS : tableRowCount - colNulls; |
} else { | ||
// merges specified schema with schema from table | ||
fields.forEach((schemaPath, majorType) -> { | ||
if (SchemaPathUtils.getColumnMetadata(schemaPath, schema) == null) { | ||
SchemaPathUtils.addColumnMetadata(schema, schemaPath, majorType); | ||
MetadataUtils.addColumnMetadata(schema, schemaPath, majorType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MetadataUtils.addColumnMetadata(schema, schemaPath, majorType); | |
MetadataUtils.addColumnMetadata(schema, schemaPath, majorType); |
@@ -37,8 +36,8 @@ public Long mergeStatistics(Collection<? extends BaseMetadata> statistics) { | |||
long rowCount = 0; | |||
for (BaseMetadata statistic : statistics) { | |||
Long statRowCount = getValue(statistic); | |||
if (statRowCount == null || statRowCount == GroupScan.NO_COLUMN_STATS) { | |||
rowCount = GroupScan.NO_COLUMN_STATS; | |||
if (statRowCount == null || statRowCount == Statistic.NO_COLUMN_STATS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (statRowCount == null || statRowCount == Statistic.NO_COLUMN_STATS) { | |
if (statRowCount == null || statRowCount == Statistic.NO_COLUMN_STATS) { |
if (statRowCount == null || statRowCount == GroupScan.NO_COLUMN_STATS) { | ||
rowCount = GroupScan.NO_COLUMN_STATS; | ||
if (statRowCount == null || statRowCount == Statistic.NO_COLUMN_STATS) { | ||
rowCount = Statistic.NO_COLUMN_STATS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rowCount = Statistic.NO_COLUMN_STATS; | |
rowCount = Statistic.NO_COLUMN_STATS; |
@@ -50,7 +49,7 @@ public Long mergeStatistics(Collection<? extends BaseMetadata> statistics) { | |||
@Override | |||
public Long getValue(BaseMetadata metadata) { | |||
Long rowCount = (Long) metadata.getStatistic(this); | |||
return rowCount != null ? rowCount : GroupScan.NO_COLUMN_STATS; | |||
return rowCount != null ? rowCount : Statistic.NO_COLUMN_STATS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return rowCount != null ? rowCount : Statistic.NO_COLUMN_STATS; | |
return rowCount != null ? rowCount : Statistic.NO_COLUMN_STATS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- To move
MetadataProviderManager
tometastore
it is necessary to rework Stats classes. Since stats should be stored in Drill Metastore, additional abstractions should be created later. drill-metastore
classes are removed fromjdbc-all
, they are not needed for Drill clinets now. Possibly it can be reconsidered in future.- Spaces are removed, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the PR and addressing CR comments.
Changes look good, +1.
Also, please squash the commits.
1b1537d
to
bfc968b
Compare
The commits are squashed. The branch is rebased onto Drill master branch |
Introducing Drill Metastore module with main Metastore API. It also should be used for creating Drill Metastore plugins, they can be created as separate modules inside main Metastore module.
The main reason of this improvement is decoupling
drill-exec
anddrill-metastore
code.drill-metastore
is used as dependency fordrill-exec
.