[GOBBLIN-1488] Added option to set perm group at table level#3334
[GOBBLIN-1488] Added option to set perm group at table level#3334aplex merged 2 commits intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3334 +/- ##
============================================
+ Coverage 46.54% 46.55% +0.01%
- Complexity 10133 10138 +5
============================================
Files 2051 2051
Lines 79537 79546 +9
Branches 8878 8880 +2
============================================
+ Hits 37018 37035 +17
+ Misses 39089 39082 -7
+ Partials 3430 3429 -1
Continue to review full report at Codecov.
|
aplex
left a comment
There was a problem hiding this comment.
We also have "writer.group.name" and "publishe.final.dir.group". Do they not cover the needed use case, and we need extra flexibility here?
| public static final String DATA_PUBLISHER_OVERWRITE_ENABLED = DATA_PUBLISHER_PREFIX + ".overwrite.enabled"; | ||
| // This property is used to specify the owner group of the data publisher final output directory | ||
| public static final String DATA_PUBLISHER_FINAL_DIR_GROUP = DATA_PUBLISHER_PREFIX + ".final.dir.group"; | ||
| public static final String DATA_PUBLISHER_OUTPUT_DIR_GROUP = DATA_PUBLISHER_PREFIX + ".output.dir.group"; |
There was a problem hiding this comment.
What's the difference between "final" and "output" dir for publisher?
There was a problem hiding this comment.
Although the FINAL DIR and OUTPUT DIR are essentially the same dir (/db/table in case of databases) but the DATA_PUBLISHER_FINAL_DIR_GROUP is applied at the leaf level (/db/table/yyyy/mm/dd/hh) while DATA_PUBLISHER_OUTPUT_DIR_GROUP is applied at the table level (/db/table which is what we want)
There was a problem hiding this comment.
"writer.group.name" is applied to the actual data file (avro, orc) while "publisher.final.dir.group" only applied the group at leaf level (DATA_PUBLISHER_FINAL_DIR_GROUP)
There was a problem hiding this comment.
This is confusing. If FINAL DIR and OUTPUT DIR are the same, why does the group apply to different folders?
Also, what group will apply to folder at intermediate level, like "/db/table/yyyy/mm" ?
There was a problem hiding this comment.
Discussed over the chat. The summary is that the old setting works incorrectly, but even if it would work correctly, it would not be what we need.
From vikrambohra:
so the publisherOutputDir can be the following based on use case
- data.publisher.final.dir ( if data.publisher.appendExtractToFinalDir is set to false)
- data.publisher.final.dir/db/table ( if data.publisher.appendExtractToFinalDir is set to true and writer.file.path.type = namespace_table) [ THIS IS THE CASE FOR brooklin-etl ]
- data.publisher.final.dir/table ( if data.publisher.appendExtractToFinalDir is set to true and writer.file.path.type = tablename)
- and a default
…3334) This option will allow us to set permissions for publisher output, on table level. The publisher output directory can be one of the following: * data.publisher.final.dir ( if data.publisher.appendExtractToFinalDir is set to false) * data.publisher.final.dir/db/table ( if data.publisher.appendExtractToFinalDir is set to true and writer.file.path.type = namespace_table) * data.publisher.final.dir/table ( if data.publisher.appendExtractToFinalDir is set to true and writer.file.path.type = tablename) *and a default Deprecated data.publisher.final.dir.group since it is set incorrectly.
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
Allow option to change perm group at publisher dir leaf level.
if publisherdir = /db/table then the perm group is set for path /table
Tests
Commits