Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-1939][BUGFIX] Fixing hardcoded default + refactor (spark8t/lib) #9

Closed
wants to merge 2 commits into from

Conversation

juditnovak
Copy link
Contributor

The Defaults.kubectl_cmd value (potentially configured by env var SPARK_KUBECTL) wasn't taken into account in this command invocation chain.

@@ -634,7 +634,7 @@ def __init__(
self,
kube_config_file: Union[str, Dict[str, Any]],
context_name: Optional[str] = None,
kubectl_cmd: str = "kubectl",
kubectl_cmd: str = Defaults().kubectl_cmd,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shipped default wasn't possible to override within the whole execution chain.

Of course there can be other corners to take Defaults into account -- this is one solution.

Copy link
Contributor

@deusebio deusebio May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, I would probably place this within the cli submodule, and not as default in the class. I feel a bit uncomfortable that the defaults of the class depends on environment variables.

I believe it is best to add this when instanciating the registry object in the scripts, e.g. here and here, with something like

registry = K8sServiceAccountRegistry(..., kubectl_cmd = defaults.kubectl_cmd)

ps. By the way, we could factor out how we instanciate the registry object in params.py, similarly to how we are currently doing this for kube_interface

Copy link
Contributor Author

@juditnovak juditnovak May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, in this case I'm proposing the following refactor + update.

First of all, I think it is a mistake to mix our CLI exectuables with library functions. Second, I think it's also a mistake to mix our parameter-parsing library module with additional logic. (As the latter results in 'growing' modules comibning mixed functionalities).

My proposal is the following:

spark8t/
    cli/
        <CLI executables>
    lib/
        <various lib funcitons, like>
        <our own parameter parsing function -- currently caled params.py>
        <interpreting our settings (CLI arguments, config files, env vars, hardcoded defaults, etc.) -- currently I caled it process_settings.py >
    resources/
        <k8s config>
    <other core modules>

Within this frame process_settings.py should hold stuff like get_kube_interface(), where convenient settings are applied.

I think this provides both a more organized structure, and the correct settings to be applied on the correct level.

I feel a bit uncomfortable that the defaults of the class depends on environment variables.

The same module is using Defaults all across, "sticking" it on each LightKube or SparkInterface object that's instantiated.... So all those instantions depend on env vars.... (Sry, I may have got this comment wrong if it's not what you meant.)

Copy link
Contributor

@deusebio deusebio Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, I'm not sure about this. The idea of the cli submodule is to implement the command line interface, centralizing the business logic to accessing to the functionalities of the package via prompt command line. Ideally, one could expose the same functionalities through a REST api. In that case, beside cli, we could also have a submodule rest where we centralize argument parsing and structure of the endpoints (and therefore - similartly to here - all logic to parse url endpoints arguments should live within the rest submodule).

Therefore, I'd really rather keep the business logic to parse the inputs from command line in the cli submodule, and not moving it into another more general lib submodule.

So all those instantions depend on env vars

Not really. I believe there is a subtle but relevant difference: they depend on the Defaults class, not on env vars. One could even create Defaults from configuration files or hard-coded dict (as one may need to do in tests).

In CLI, it makes sense to create a Default object from env vars. And this is indeed what we do here. Using the REST api example as before, in that case, Defaults should probably be parsed from a config file, e.g.

defauls = Default(read_yaml('my-config-file.yaml`)

ps. Actually, here we provide the default to read this from the env, but I actually don't like this, and we should probably remove the default value of environ there as well.

I hope this clarify the thoughts behind some of the choices.

Copy link
Contributor Author

@juditnovak juditnovak Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting the response to 2 chapters, addressing both onging threads.

Structure:

I see your point regarding the cli submodule, and I get the very valid rationale behind. However, I'm missing a response on the original points that I've raised, and which, thus still reamain open. I believe that there should be a solution that satisfies both objectives. (As I guess "dumping" everything-alltogether to a cli/ and separately in a rest/ folder would not take us closer to a clean, organized structure either :-) )

Refreshing my original points: I beleive that it's a mistake to

  1. Mix executable "endpoint" scripts with helper/library modules (even if related).
    This is against a clear structure within the repository.
  2. Mix different functional aspects within the same module
    (In our case: CLI parameter parsing vs. initialization of core components of the application)

Addresssing both your points and mine, we could also go with this structure:

spark8t/
   cli/
      lib/
      bin/

or less clean/organized

spark8t/
   cli/
      lib/
      <exectuable1>
      <exectuable2>
      ...      

...but if I get it right we're trying to hold on to a mostly flat structure, which is contradicting to further embedded levels of directory hierarchy. However I overlooked the point U've raised in your other comment, thus we may not be able to keep the structure simple and flat. (Especially if in the future we'd like to have futher functionalities as rest/...)

I agree that a list of solutions could be good here, and we shouldn't forget about the point you've raised. I'm just emphasizing the importance of a clear structure both within the repository (1) and on the level of individual modules (2). I'm glad to find any solution complying to these objectives.

Defaults:

I'm not sure if I get your point. I assume that we have a fixed structure of our Defaults, as an interface schema. By clear specification of potential configuration options (regardless of what source the configuration comes from). Interface definition (typically like Defaults().spark_home, Defaults.kubectl_cmt) that we can safely refer to in all modules using Defaults.

In this context, I may be overlooking something (I'm not being sarcastic, I honestly mean it!!! :-) ), but on a 1st base I don't see the difference, or why the orignally proposed solution may depend more on ENV_VARs (the concern you've raised originally) than the following, already existing references within the same module (LightKube class):

https://github.com/canonical/spark-k8s-toolkit-py/blob/main/spark8t/services.py#L278
-> then the same interface deferred directly in create() -- where ALL these references may depend on SPARK_HOME (equally as the proposed solution may or may not depend on the SPARK_KUBECTL variable)

There are similar references all across the services.py module, also including the SparkInterface class.

I honestly mean that I may be missing a point, or design objective here, and that there may be an underlying difference that didn't come clear to me.

Of course I agree that for consistency, we should just rather have all Defaults passed to KubeInterface object initialization as well (as we do it for the equal-level sibling LightKube heir of AbstractKubeInterface)... Keeping the two implementations as close to each other as possible. In this respect, instead of the originally proposed solution, this may be better

# We already have this
class LightKube(AbstractKubeInterface):
    def __init__(self, defaults: Defaults, ...):
        self.defaults = defaults
        [..]
    
    def create(...):
        with open(self.defaults.template_serviceaccount):
        [..]
     
# This could be adjusted
class KubeInterface(AbstractKubeInterface):
    class __init__(self, defaults: Defaults, ...):
        self.defaults = defaults
        [..]
    def create(...):
        # self.defaults.kubectl_cmd used here as needed
    

with the propoasal on get_kube_interface() instantiating KubeInterface by passing the complete spark8t.cli.defaults object.

@juditnovak juditnovak marked this pull request as ready for review May 30, 2023 16:05
@juditnovak juditnovak force-pushed the DPE-2023_fixing_hardcoded_default branch from f761ccf to 55eb2cf Compare May 31, 2023 08:38
@juditnovak juditnovak force-pushed the DPE-2023_fixing_hardcoded_default branch from 55eb2cf to e300d7c Compare May 31, 2023 10:14
@juditnovak juditnovak changed the title [DPE-1939][BUGFIX] Fixing hardcoded default [DPE-1939][BUGFIX] Fixing hardcoded default + refactor (spark8t/lib) May 31, 2023
from spark8t.services import AbstractKubeInterface, KubeInterface, LightKube


def get_kube_interface(args: Namespace) -> AbstractKubeInterface:
Copy link
Contributor

@deusebio deusebio Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo not sure this would still be relevant given the discussion here but we should probably provide also the defaults as argument of the function here. In general, we should really not import this from cli, as this creates a circular dependency (cli depends on lib which depends on cli)

Copy link
Contributor Author

@juditnovak juditnovak Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid point, my bad, I overlooked something on this level. Got it right now, after following up on your comment. Sorry for that.

This may encourage one of the proposed strucutures I've listed in response to the abovementioned thread.

@deusebio
Copy link
Contributor

deusebio commented Jun 1, 2023

BTW, following comment here, we should probably also remove the resource/conf/spark-defaults.conf file within this ticket

@deusebio
Copy link
Contributor

deusebio commented Jun 7, 2023

shall we close this?

@deusebio
Copy link
Contributor

Closing as stale

@deusebio deusebio closed this Jun 27, 2023
@deusebio deusebio deleted the DPE-2023_fixing_hardcoded_default branch June 29, 2023 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants