Skip to content

Optimize memory usage on the TLS manager/store #10302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

ddtmachado
Copy link
Contributor

What does this PR do?

Optimize the memory footprint of TLS certificate handling on the TLS Manager and Store, mainly:

  • Avoid parsing the certificate multiple times
  • Avoid copying certificates back and forth multiple times
  • Do not wipe the store on every refresh (dynamic config update)

I included a small bench test for the manager in the PR, it's goal is to switch TLS certificates in batches of 10 from a pool of 100. With it I observed a large decrease from +10k allocs/op average to 1.9k allocs/op average as follows:

#Before
122390966 ns/op	 3410744 B/op	   10843 allocs/op

#After
704874 ns/op	  187272 B/op	    1981 allocs/op

Motivation

Since v2 Traefik received many reports with concerns and problems caused by the increased memory usage, specially while handling TLS certs.

I had some free time to investigate and found the code on the manager and store overly complex and probably trading memory for compatibility without strong justification, other than "I won't risk touching this code" maybe? :D

More

  • Added/updated tests
  • Added/updated documentation

Additional Notes

I did not had time to properly test it in a real environment with real data. Needs way more testing.

@nmengin
Copy link
Contributor

nmengin commented Jan 3, 2024

Hello @ddtmachado,

Thank you for this PR, we are very interested in it but won't be able to test/review/merge it before releasing Traefik v3.0.
That's why I moved it to the next milestone.

On your side, could you rebase it on the master branch?

Copy link
Member

@rtribotte rtribotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ddtmachado!

Nice to see you around and thanks for this contribution!
I took a look at the changes and I have a few comments.
Regarding the next steps of the review process, do you want to work and iterate on the changes, or you don't mind if we push some review commits?

@ddtmachado
Copy link
Contributor Author

Hey @ddtmachado!

Nice to see you around and thanks for this contribution! I took a look at the changes and I have a few comments. Regarding the next steps of the review process, do you want to work and iterate on the changes, or you don't mind if we push some review commits?

Hey Romain, great to see you're on this!!
And sorry for the delay but somehow I'm not getting notifications from anything on the traefik repo and I just checked it is enabled.

Anyway, I'm running out of free time lately due to other demands and would not mind if you just push some commits on top, feel free as I completely agree with your comments and did not validade for those use cases.

But in case you run out of time as well or get prioritized in something else let me know and I'll give it a shot in the coming weeks.

@rtribotte rtribotte force-pushed the tls-memory branch 2 times, most recently from ae8c9a0 to ab81fd8 Compare March 4, 2024 14:58
@rtribotte
Copy link
Member

rtribotte commented Mar 5, 2024

Hello @ddtmachado,

I pushed review commits and also rebased the changes.
I have marked the PR as need-review, I think I could LGTM it right away, but I prefer to sleep on it to give it maybe more thought :-)

Copy link
Member

@rtribotte rtribotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@rtribotte rtribotte removed their assignment Mar 6, 2024
@kevinpollet kevinpollet modified the milestones: 3.1, next Jun 28, 2024
@kevinpollet kevinpollet modified the milestones: 3.2, next Oct 3, 2024
@nmengin nmengin added priority/P1 need to be fixed in next release and removed priority/P1 need to be fixed in next release labels Nov 27, 2024
@nmengin nmengin assigned juliens and unassigned kevinpollet Dec 9, 2024
@kevinpollet kevinpollet modified the milestones: 3.3, next Dec 17, 2024
@juliens
Copy link
Member

juliens commented Jan 28, 2025

Hello @ddtmachado

Thank you for your PR.

After discussing with other maintainers, I’m closing this PR for now as its current implementation has unintended impacts on the management of certificate. Currently, certificates are switched atomically, in sync with the entire dynamic configuration. However, the changes in this PR lead to incremental updates, which could conflict with the rest of the dynamic configuration.

That said, for v4, we plan to explore the possibility of improving the global atomic switch to make it more granular. This effort will also include considerations for memory optimization, particularly around TLS management.

Thank you for your understanding, and I look forward to revisiting these ideas in the future.

@juliens juliens closed this Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants