Skip to content

Formalize our usage of the Pulumi configuration#81

Merged
mehalter merged 17 commits into
mainfrom
centralized_config
Sep 4, 2024
Merged

Formalize our usage of the Pulumi configuration#81
mehalter merged 17 commits into
mainfrom
centralized_config

Conversation

@mehalter
Copy link
Copy Markdown
Member

@mehalter mehalter commented Aug 22, 2024

Currently we have many different methods/models of interacting with the Pulumi configuration and it is fragmented and hard to recover at a given point what is really going on in any arbitrary place in the configuration. This PR is an investigation into formalizing, centralizing, and improving how we handle configuration through the infrastructure.

Plan

This is the current plan that is up to changing at any moment through development and discussion, will be kept up to date.

  • Manage all interaction of the Pulumi configuration with a util.config library
  • Centralize the idea of the "configuration" rather than placing that information arbitrarily
    • Keep in mind that we want to maintain the ease of adding/removing pieces in the configuration
  • Make it so the configuration is interlaced with the defaults and "fully resolved" as step 1 in the infrastructure to make sure it gets the understanding of configuration finalized before the infrastructure starts
    • This is happening at the component level and each component is in charge of the configuration for themselves. If components want to access each others configurations, they should do so by accessing the .config variable from a component that is created and initialized

TO TEST

Run pulumi preview on main and this branch and see that there are no differences

Closes #69

@mehalter mehalter force-pushed the centralized_config branch from 6018efd to f1aa2be Compare August 22, 2024 15:45
@mehalter mehalter force-pushed the centralized_config branch 5 times, most recently from f7ca01e to 75066d9 Compare September 3, 2024 16:33
@mehalter mehalter marked this pull request as ready for review September 3, 2024 20:28
@mehalter
Copy link
Copy Markdown
Member Author

mehalter commented Sep 3, 2024

I just finished a first pass at implementing this PR and I believe everything is at a point where it is ready to start getting reviewed.

Important

This is a pretty significant change and would be best gone through over a call

This takes a very modular and integrated approach to configuration management. It builds a model where the configuration of a component is managed by the component's class itself.

Lifecycle of Configuration

Initialization of a component

During initialization the component fetches the configuration either from Pulumi or if a configuration is passed into it (this part is useful if components built out many other components such as datalakhouses building several tributaries which build several buckets). Each component can define it's own default_config property which is used to fill in missing configuration values that are required for operation.

Accessing configuration within a component

These configuration objects after resolution are provided as CapeConfig objects, under the config property of the component, which have a get function which can be given an arbitrary number of keys to search into the configuration as well as have a fallback, default value that is returned if it is not able to use the exact path of keys provided to resolve a value. This is the main entry point to interacting with the configuration and should be considered the "final configuration source of truth" for the component. If the component creates other components, parts of the configuration may be accessed and passed into the constructor of the other components, each of which will manage their own default configuration and handling the the new CapeConfig objects that it receives.

Accessing configuration across components

There may be times when a component needs to access the configuration of other components. This should be done by accessing configuration directly through an instantiation of said component with their .config property. This is imperative because configuration objects are not considered, "final" or "fully resolved" until the initialization of a component. Accessing configuration of a component before it is initialized indicates semantic blurring of contexts and indicates that something may be logically incorrect.

Copy link
Copy Markdown
Member

@thecaffiend thecaffiend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One stale comment to remove. One thing to explain me. One review to rule them all

Comment thread capeinfra/datalake/datalake.py Outdated
Comment thread capeinfra/pipeline/data.py
@mehalter mehalter merged commit a81545b into main Sep 4, 2024
@mehalter mehalter deleted the centralized_config branch September 4, 2024 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

implement common config file access

2 participants