Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOBBLIN-1497] Reduce the number of calls on FlowSpec initialization where possible,… #3340

Merged
merged 3 commits into from Jul 28, 2021

Conversation

Will-Lo
Copy link
Contributor

@Will-Lo Will-Lo commented Jul 26, 2021

… and configToProperties/vice versa

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):

FlowSpecs contain both configs and property fields that duplicate the same key/value information, however there are slight differences with properties being mutable, configs being type-safe, and legacy code being compatible with properties but not configs.

As such, FlowSpec creation and configToProperties are memory expensive operations at scale, and we should aim to minimize them as much as possible.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@codecov-commenter
Copy link

codecov-commenter commented Jul 26, 2021

Codecov Report

Merging #3340 (63b4bbb) into master (48af4c6) will decrease coverage by 0.00%.
The diff coverage is 85.71%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3340      +/-   ##
============================================
- Coverage     46.54%   46.54%   -0.01%     
- Complexity    10136    10137       +1     
============================================
  Files          2051     2052       +1     
  Lines         79547    79563      +16     
  Branches       8880     8882       +2     
============================================
+ Hits          37027    37030       +3     
- Misses        39091    39102      +11     
- Partials       3429     3431       +2     
Impacted Files Coverage Δ
.../modules/scheduler/GobblinServiceJobScheduler.java 59.58% <50.00%> (+0.73%) ⬆️
...n/java/org/apache/gobblin/util/SchedulerUtils.java 84.90% <100.00%> (+0.29%) ⬆️
...obblin/service/modules/restli/FlowConfigUtils.java 72.97% <100.00%> (+0.75%) ⬆️
...a/org/apache/gobblin/util/limiter/NoopLimiter.java 40.00% <0.00%> (-20.00%) ⬇️
...lin/util/filesystem/FileSystemInstrumentation.java 85.71% <0.00%> (-14.29%) ⬇️
...in/java/org/apache/gobblin/cluster/HelixUtils.java 34.71% <0.00%> (-3.31%) ⬇️
...he/gobblin/service/modules/flow/FlowGraphPath.java 81.15% <0.00%> (-2.67%) ⬇️
...in/service/modules/core/GobblinServiceManager.java 50.20% <0.00%> (-0.84%) ⬇️
...main/java/org/apache/gobblin/util/ConfigUtils.java 58.59% <0.00%> (-0.27%) ⬇️
.../org/apache/gobblin/service/ServiceConfigKeys.java 0.00% <0.00%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 48af4c6...63b4bbb. Read the comment docs.

@Will-Lo Will-Lo force-pushed the optimize-config-properties branch from 7181b29 to 4e38fdc Compare July 27, 2021 07:58
@Will-Lo
Copy link
Contributor Author

Will-Lo commented Jul 27, 2021

@arjun4084346 @jack-moseley @umustafi could you take a look?

@arjun4084346
Copy link
Contributor

minor comments to simply things. LGTM otherwise. good changes. maybe we should get rid of configAsProperties entirely. but that may need more changes.

@Will-Lo
Copy link
Contributor Author

Will-Lo commented Jul 27, 2021

@arjun4084346 yea we should look into configAsProperties especially at the rate that GaaS is growing at. Would require some refactoring around some legacy classes and probably more conversions from propertiesToConfig. However since those would be short-lived compared to FlowSpecs it would still be an overall memory improvement

Copy link
Contributor

@umustafi umustafi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These replacements look good!

Copy link
Contributor

@sv2000 sv2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor comment.

Copy link
Contributor

@sv2000 sv2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

@sv2000 sv2000 merged commit 1367614 into apache:master Jul 28, 2021
phet pushed a commit to phet/gobblin that referenced this pull request Aug 9, 2021
…where possible,… (apache#3340)

* Reduce the number of calls on FlowSpec initialization where possible, and configToProperties/vice versa

* address review

* Address comments
@Will-Lo Will-Lo deleted the optimize-config-properties branch May 4, 2022 07:38
jack-moseley pushed a commit to jack-moseley/gobblin that referenced this pull request Aug 24, 2022
…where possible,… (apache#3340)

* Reduce the number of calls on FlowSpec initialization where possible, and configToProperties/vice versa

* address review

* Address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants