Skip to content

[GOBBLIN-1488] Added option to set perm group at table level#3334

Merged
aplex merged 2 commits intoapache:masterfrom
vikrambohra:permFix
Jul 19, 2021
Merged

[GOBBLIN-1488] Added option to set perm group at table level#3334
aplex merged 2 commits intoapache:masterfrom
vikrambohra:permFix

Conversation

@vikrambohra
Copy link
Copy Markdown
Contributor

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):
    Allow option to change perm group at publisher dir leaf level.
    if publisherdir = /db/table then the perm group is set for path /table

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jul 16, 2021

Codecov Report

Merging #3334 (485c6a5) into master (9ed315a) will increase coverage by 0.01%.
The diff coverage is 33.33%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3334      +/-   ##
============================================
+ Coverage     46.54%   46.55%   +0.01%     
- Complexity    10133    10138       +5     
============================================
  Files          2051     2051              
  Lines         79537    79546       +9     
  Branches       8878     8880       +2     
============================================
+ Hits          37018    37035      +17     
+ Misses        39089    39082       -7     
+ Partials       3430     3429       -1     
Impacted Files Coverage Δ
...pache/gobblin/configuration/ConfigurationKeys.java 0.00% <ø> (ø)
...rg/apache/gobblin/publisher/BaseDataPublisher.java 72.72% <33.33%> (-1.15%) ⬇️
...a/org/apache/gobblin/cluster/GobblinHelixTask.java 60.21% <0.00%> (-2.16%) ⬇️
...main/java/org/apache/gobblin/yarn/YarnService.java 15.14% <0.00%> (+0.78%) ⬆️
.../apache/gobblin/runtime/api/JobExecutionState.java 80.37% <0.00%> (+0.93%) ⬆️
.../org/apache/gobblin/cluster/GobblinTaskRunner.java 63.46% <0.00%> (+0.96%) ⬆️
...gobblin/service/modules/core/GitConfigMonitor.java 83.05% <0.00%> (+1.69%) ⬆️
...in/java/org/apache/gobblin/cluster/HelixUtils.java 38.01% <0.00%> (+5.78%) ⬆️
...lin/util/filesystem/FileSystemInstrumentation.java 100.00% <0.00%> (+7.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9ed315a...485c6a5. Read the comment docs.

Copy link
Copy Markdown
Contributor

@aplex aplex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have "writer.group.name" and "publishe.final.dir.group". Do they not cover the needed use case, and we need extra flexibility here?

public static final String DATA_PUBLISHER_OVERWRITE_ENABLED = DATA_PUBLISHER_PREFIX + ".overwrite.enabled";
// This property is used to specify the owner group of the data publisher final output directory
public static final String DATA_PUBLISHER_FINAL_DIR_GROUP = DATA_PUBLISHER_PREFIX + ".final.dir.group";
public static final String DATA_PUBLISHER_OUTPUT_DIR_GROUP = DATA_PUBLISHER_PREFIX + ".output.dir.group";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between "final" and "output" dir for publisher?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the FINAL DIR and OUTPUT DIR are essentially the same dir (/db/table in case of databases) but the DATA_PUBLISHER_FINAL_DIR_GROUP is applied at the leaf level (/db/table/yyyy/mm/dd/hh) while DATA_PUBLISHER_OUTPUT_DIR_GROUP is applied at the table level (/db/table which is what we want)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"writer.group.name" is applied to the actual data file (avro, orc) while "publisher.final.dir.group" only applied the group at leaf level (DATA_PUBLISHER_FINAL_DIR_GROUP)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing. If FINAL DIR and OUTPUT DIR are the same, why does the group apply to different folders?

Also, what group will apply to folder at intermediate level, like "/db/table/yyyy/mm" ?

Copy link
Copy Markdown
Contributor

@aplex aplex Jul 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed over the chat. The summary is that the old setting works incorrectly, but even if it would work correctly, it would not be what we need.

From vikrambohra:
so the publisherOutputDir can be the following based on use case

  • data.publisher.final.dir ( if data.publisher.appendExtractToFinalDir is set to false)
  • data.publisher.final.dir/db/table ( if data.publisher.appendExtractToFinalDir is set to true and writer.file.path.type = namespace_table) [ THIS IS THE CASE FOR brooklin-etl ]
  • data.publisher.final.dir/table ( if data.publisher.appendExtractToFinalDir is set to true and writer.file.path.type = tablename)
  • and a default

@aplex aplex merged commit 07e76fc into apache:master Jul 19, 2021
jack-moseley pushed a commit to jack-moseley/gobblin that referenced this pull request Aug 24, 2022
…3334)

This option will allow us to set permissions for publisher output, on table level.

The publisher output directory can be one of the following:
* data.publisher.final.dir ( if data.publisher.appendExtractToFinalDir is set to false)
* data.publisher.final.dir/db/table ( if data.publisher.appendExtractToFinalDir is set to true and writer.file.path.type = namespace_table)
* data.publisher.final.dir/table ( if data.publisher.appendExtractToFinalDir is set to true and writer.file.path.type = tablename)
*and a default

Deprecated data.publisher.final.dir.group since it is set incorrectly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants