[DPE-1939][BUGFIX] Fixing hardcoded default + refactor (`spark8t/lib`) #9

juditnovak · 2023-05-30T15:56:06Z

The Defaults.kubectl_cmd value (potentially configured by env var SPARK_KUBECTL) wasn't taken into account in this command invocation chain.

juditnovak · 2023-05-30T16:01:14Z

spark8t/services.py

@@ -634,7 +634,7 @@ def __init__(
        self,
        kube_config_file: Union[str, Dict[str, Any]],
        context_name: Optional[str] = None,
-        kubectl_cmd: str = "kubectl",
+        kubectl_cmd: str = Defaults().kubectl_cmd,


The shipped default wasn't possible to override within the whole execution chain.

Of course there can be other corners to take Defaults into account -- this is one solution.

Uhm, I would probably place this within the cli submodule, and not as default in the class. I feel a bit uncomfortable that the defaults of the class depends on environment variables.

I believe it is best to add this when instanciating the registry object in the scripts, e.g. here and here, with something like

registry = K8sServiceAccountRegistry(..., kubectl_cmd = defaults.kubectl_cmd)

ps. By the way, we could factor out how we instanciate the registry object in params.py, similarly to how we are currently doing this for kube_interface

All right, in this case I'm proposing the following refactor + update.

First of all, I think it is a mistake to mix our CLI exectuables with library functions. Second, I think it's also a mistake to mix our parameter-parsing library module with additional logic. (As the latter results in 'growing' modules comibning mixed functionalities).

My proposal is the following:

spark8t/ cli/ <CLI executables> lib/ <various lib funcitons, like> <our own parameter parsing function -- currently caled params.py> <interpreting our settings (CLI arguments, config files, env vars, hardcoded defaults, etc.) -- currently I caled it process_settings.py > resources/ <k8s config> <other core modules>

Within this frame process_settings.py should hold stuff like get_kube_interface(), where convenient settings are applied.

I think this provides both a more organized structure, and the correct settings to be applied on the correct level.

I feel a bit uncomfortable that the defaults of the class depends on environment variables.

The same module is using Defaults all across, "sticking" it on each LightKube or SparkInterface object that's instantiated.... So all those instantions depend on env vars.... (Sry, I may have got this comment wrong if it's not what you meant.)

Uhm, I'm not sure about this. The idea of the cli submodule is to implement the command line interface, centralizing the business logic to accessing to the functionalities of the package via prompt command line. Ideally, one could expose the same functionalities through a REST api. In that case, beside cli, we could also have a submodule rest where we centralize argument parsing and structure of the endpoints (and therefore - similartly to here - all logic to parse url endpoints arguments should live within the rest submodule).

Therefore, I'd really rather keep the business logic to parse the inputs from command line in the cli submodule, and not moving it into another more general lib submodule.

So all those instantions depend on env vars

Not really. I believe there is a subtle but relevant difference: they depend on the Defaults class, not on env vars. One could even create Defaults from configuration files or hard-coded dict (as one may need to do in tests).

In CLI, it makes sense to create a Default object from env vars. And this is indeed what we do here. Using the REST api example as before, in that case, Defaults should probably be parsed from a config file, e.g.

defauls = Default(read_yaml('my-config-file.yaml`)

ps. Actually, here we provide the default to read this from the env, but I actually don't like this, and we should probably remove the default value of environ there as well.

I hope this clarify the thoughts behind some of the choices.

Splitting the response to 2 chapters, addressing both onging threads.

Structure:

I see your point regarding the cli submodule, and I get the very valid rationale behind. However, I'm missing a response on the original points that I've raised, and which, thus still reamain open. I believe that there should be a solution that satisfies both objectives. (As I guess "dumping" everything-alltogether to a cli/ and separately in a rest/ folder would not take us closer to a clean, organized structure either :-) )

Refreshing my original points: I beleive that it's a mistake to

Mix executable "endpoint" scripts with helper/library modules (even if related).
This is against a clear structure within the repository.

Mix different functional aspects within the same module
(In our case: CLI parameter parsing vs. initialization of core components of the application)

Addresssing both your points and mine, we could also go with this structure:

spark8t/ cli/ lib/ bin/

or less clean/organized

spark8t/ cli/ lib/ <exectuable1> <exectuable2> ...

...but if I get it right we're trying to hold on to a mostly flat structure, which is contradicting to further embedded levels of directory hierarchy. However I overlooked the point U've raised in your other comment, thus we may not be able to keep the structure simple and flat. (Especially if in the future we'd like to have futher functionalities as rest/...)

I agree that a list of solutions could be good here, and we shouldn't forget about the point you've raised. I'm just emphasizing the importance of a clear structure both within the repository (1) and on the level of individual modules (2). I'm glad to find any solution complying to these objectives.

Defaults:

I'm not sure if I get your point. I assume that we have a fixed structure of our Defaults, as an interface schema. By clear specification of potential configuration options (regardless of what source the configuration comes from). Interface definition (typically like Defaults().spark_home, Defaults.kubectl_cmt) that we can safely refer to in all modules using Defaults.

In this context, I may be overlooking something (I'm not being sarcastic, I honestly mean it!!! :-) ), but on a 1st base I don't see the difference, or why the orignally proposed solution may depend more on ENV_VARs (the concern you've raised originally) than the following, already existing references within the same module (LightKube class):

https://github.com/canonical/spark-k8s-toolkit-py/blob/main/spark8t/services.py#L278
-> then the same interface deferred directly in create() -- where ALL these references may depend on SPARK_HOME (equally as the proposed solution may or may not depend on the SPARK_KUBECTL variable)

https://github.com/canonical/spark-k8s-toolkit-py/blob/main/spark8t/services.py#L525

https://github.com/canonical/spark-k8s-toolkit-py/blob/main/spark8t/services.py#L536

etc.

There are similar references all across the services.py module, also including the SparkInterface class.

I honestly mean that I may be missing a point, or design objective here, and that there may be an underlying difference that didn't come clear to me.

Of course I agree that for consistency, we should just rather have all Defaults passed to KubeInterface object initialization as well (as we do it for the equal-level sibling LightKube heir of AbstractKubeInterface)... Keeping the two implementations as close to each other as possible. In this respect, instead of the originally proposed solution, this may be better

# We already have this class LightKube(AbstractKubeInterface): def __init__(self, defaults: Defaults, ...): self.defaults = defaults [..] def create(...): with open(self.defaults.template_serviceaccount): [..] # This could be adjusted class KubeInterface(AbstractKubeInterface): class __init__(self, defaults: Defaults, ...): self.defaults = defaults [..] def create(...): # self.defaults.kubectl_cmd used here as needed

with the propoasal on get_kube_interface() instantiating KubeInterface by passing the complete spark8t.cli.defaults object.

deusebio · 2023-06-01T21:09:47Z

spark8t/lib/process_settings.py

+from spark8t.services import AbstractKubeInterface, KubeInterface, LightKube
+
+
+def get_kube_interface(args: Namespace) -> AbstractKubeInterface:


todo not sure this would still be relevant given the discussion here but we should probably provide also the defaults as argument of the function here. In general, we should really not import this from cli, as this creates a circular dependency (cli depends on lib which depends on cli)

That's a valid point, my bad, I overlooked something on this level. Got it right now, after following up on your comment. Sorry for that.

This may encourage one of the proposed strucutures I've listed in response to the abovementioned thread.

deusebio · 2023-06-01T21:12:32Z

BTW, following comment here, we should probably also remove the resource/conf/spark-defaults.conf file within this ticket

deusebio · 2023-06-07T12:11:16Z

shall we close this?

deusebio · 2023-06-27T12:25:45Z

Closing as stale

juditnovak commented May 30, 2023

View reviewed changes

juditnovak marked this pull request as ready for review May 30, 2023 16:05

juditnovak requested review from deusebio and welpaolo May 30, 2023 16:05

juditnovak force-pushed the DPE-2023_fixing_hardcoded_default branch from f761ccf to 55eb2cf Compare May 31, 2023 08:38

juditnovak added 2 commits May 31, 2023 12:12

Refactoring library functions to lib

4463259

Initializing KubeInterface with the correct settings

e300d7c

juditnovak force-pushed the DPE-2023_fixing_hardcoded_default branch from 55eb2cf to e300d7c Compare May 31, 2023 10:14

juditnovak changed the title ~~[DPE-1939][BUGFIX] Fixing hardcoded default~~ [DPE-1939][BUGFIX] Fixing hardcoded default + refactor (spark8t/lib) May 31, 2023

deusebio reviewed Jun 1, 2023

View reviewed changes

deusebio closed this Jun 27, 2023

deusebio deleted the DPE-2023_fixing_hardcoded_default branch June 29, 2023 14:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DPE-1939][BUGFIX] Fixing hardcoded default + refactor (`spark8t/lib`) #9

[DPE-1939][BUGFIX] Fixing hardcoded default + refactor (`spark8t/lib`) #9

juditnovak commented May 30, 2023

juditnovak May 30, 2023

deusebio May 30, 2023 •

edited

Loading

juditnovak May 31, 2023 •

edited

Loading

deusebio Jun 1, 2023 •

edited

Loading

juditnovak Jun 2, 2023 •

edited

Loading

deusebio Jun 1, 2023 •

edited

Loading

juditnovak Jun 2, 2023 •

edited

Loading

deusebio commented Jun 1, 2023

deusebio commented Jun 7, 2023

deusebio commented Jun 27, 2023

		from spark8t.services import AbstractKubeInterface, KubeInterface, LightKube


		def get_kube_interface(args: Namespace) -> AbstractKubeInterface:

[DPE-1939][BUGFIX] Fixing hardcoded default + refactor (spark8t/lib) #9

[DPE-1939][BUGFIX] Fixing hardcoded default + refactor (spark8t/lib) #9

Conversation

juditnovak commented May 30, 2023

juditnovak May 30, 2023

Choose a reason for hiding this comment

deusebio May 30, 2023 • edited Loading

Choose a reason for hiding this comment

juditnovak May 31, 2023 • edited Loading

Choose a reason for hiding this comment

deusebio Jun 1, 2023 • edited Loading

Choose a reason for hiding this comment

juditnovak Jun 2, 2023 • edited Loading

Choose a reason for hiding this comment

deusebio Jun 1, 2023 • edited Loading

Choose a reason for hiding this comment

juditnovak Jun 2, 2023 • edited Loading

Choose a reason for hiding this comment

deusebio commented Jun 1, 2023

deusebio commented Jun 7, 2023

deusebio commented Jun 27, 2023

[DPE-1939][BUGFIX] Fixing hardcoded default + refactor (`spark8t/lib`) #9

[DPE-1939][BUGFIX] Fixing hardcoded default + refactor (`spark8t/lib`) #9

deusebio May 30, 2023 •

edited

Loading

juditnovak May 31, 2023 •

edited

Loading

deusebio Jun 1, 2023 •

edited

Loading

juditnovak Jun 2, 2023 •

edited

Loading

deusebio Jun 1, 2023 •

edited

Loading

juditnovak Jun 2, 2023 •

edited

Loading