Skip to content

Conversation

gyfora
Copy link
Contributor

@gyfora gyfora commented Apr 21, 2022

This PR improves/reworks the configuration management of the oprator. Problems with the current approach:

  • Configuration and FlinkOperatorConfigurations are passed around everywhere
  • Every component generates effective configurations on their own (without consistency which can cause problems)
  • Configs are generated again and again creating a large number of temporary files in every reconcile iteration
  • Configuration is completely static and impossible to change without operator restart

To try to tackle all these problems at once I have introduced the FlinkConfigManager object which is responsible for:

  • Generating configuration for deployment / cluster interactions (observe)
  • The config generation logic is consistently aware of currently deployed spec (this is important due to rollback logic)
  • Caching the generated configs for a given specification to speed things up and avoid extra tmp files
  • Allow modifying default configurations on the fly by watching ConfigMap changes

While these changes touch almost all components, it is mostly cosmetic. The core improvements are located in the FlinkConfigManager class and which config is requested in the different parts of the reconcile logic.

@gyfora
Copy link
Contributor Author

gyfora commented Apr 21, 2022

@wangyang0918
Copy link
Contributor

Just a quick note: instead of creating a watch on the ConfigMap, I prefer to simply use ScheduledExecutorService#scheduleAtFixedRate to detect default configuration file change on the fly. It is similar to monitorInterval in log4j. Because naked watch is not stable and could be closed in many cases, we need to use informer instead. But the informer might be an over-kill.

@gyfora
Copy link
Contributor Author

gyfora commented Apr 22, 2022

Just a quick note: instead of creating a watch on the ConfigMap, I prefer to simply use ScheduledExecutorService#scheduleAtFixedRate to detect default configuration file change on the fly. It is similar to monitorInterval in log4j. Because naked watch is not stable and could be closed in many cases, we need to use informer instead. But the informer might be an over-kill.

Makes sense @wangyang0918 thats a simple change to make it more robust

@gyfora
Copy link
Contributor Author

gyfora commented Apr 22, 2022

Reworked the watcher logic and added some more tests

@gyfora gyfora force-pushed the FLINK-27303 branch 3 times, most recently from c12d32a to 153dc8c Compare April 24, 2022 14:37
Copy link

@Aitozi Aitozi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@gyfora gyfora merged commit c377ebe into apache:main Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants