Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ml_model_settings parameter #434

Merged
merged 5 commits into from
May 28, 2024
Merged

add ml_model_settings parameter #434

merged 5 commits into from
May 28, 2024

Conversation

mafrahm
Copy link
Contributor

@mafrahm mafrahm commented May 21, 2024

This PR adds the 'ml_model_settings' parameter that writes key-value pairs directly into the self.parameters attribute.

Example:

class MyModel(MLModel):
    def __init__(
        self,
        *args,
        **kwargs,
    ):
        # we cannot cast to dict on command line, but one can do this by hand
        ml_process_weights = self.parameters.get("ml_process_weights", {"st": 1, "tt": 2})
        if isinstance(ml_process_weight, tuple):
            ml_process_weights = {proc: int(weight) for proc, weight in [s.split(":") for s in ml_process_weights]}

        # store parameters of interest in the ml_model_inst, e.g. via the parameters attribute
        self.parameters = {
            "batchsize": int(self.parameters.get("batchsize", 1024)),
            "layers": tuple(int(layer) for layer in self.parameters.get("layers, (64, 64, 64)),
            "ml_process_weights": ml_process_weights
        }

        # create representation of ml_model_inst
        self.parameters_repr = law.util.create_hash(sorted(self.parameters.items()))

These parameters can then be changed on command line

law run cf.MLTraining --version v1 --ml-model MyModel --ml-model-settings "batchsize=2048,layers=32;32;32,ml_process_weights=st:1;tt:4"

Open questions:

1.) At the moment, the ml_model_settings is not implemented for the MLModelsMixin, therefore we can only use the "default" model (or create new model via derive) when e.g. creating histograms. We could add a ml_models_settings parameter to this mixin, but I'm not sure if this would be used much.
2.) Since the output of the model is now hashed, we might want to also automatically produce an output of the parameters or the parameters_repr such that ml trainings can always be reproduced if necessary.

@mafrahm mafrahm added enhancement New feature or request ml ML pipeline related things labels May 21, 2024
@mafrahm mafrahm self-assigned this May 21, 2024
Copy link
Member

@pkausw pkausw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR, thanks! Just some minor questions.

I agree that having a mechanism that dumps the current parameter settings automatically would be nice. However, I think the user can do this themselves at the end of e.g. the training loop. We might want to think about providing e.g. a decorator that can be used directly for this end and that wraps e.g. the training function of the model inst. But imho, this isn't critical for this PR. (Though it also shouldn't be super hard, so if you have some spare time feel free to add it ;) )

columnflow/ml/__init__.py Show resolved Hide resolved
columnflow/tasks/framework/mixins.py Show resolved Hide resolved
Copy link
Member

@pkausw pkausw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

columnflow/tasks/framework/mixins.py Show resolved Hide resolved
columnflow/ml/__init__.py Show resolved Hide resolved
@pkausw pkausw merged commit 3253c97 into master May 28, 2024
8 checks passed
@pkausw pkausw deleted the feature/MLModel_settings branch May 28, 2024 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ml ML pipeline related things
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants