Feature: RecursiveDict.compress() to shorten paths to steps and their hyperparams #486

guillaume-chevalier · 2021-05-16T14:05:12Z

Is your feature request related to a problem? Please describe.
Hyperparam names are too long in nested steps

Describe the solution you'd like
A way to compress the names so as to make them shorter. More specifically, I think that an automated algorithm for all existing ML pipelines could be built. That would be to do something like:

all_hps = pipeline.get_hyperparams()
all_hps_shortened = all_hps.compress()
pprint(all_hps_shortened)

Then we'd see something like this in the pprint:

{
    "*__MetaStep__*__SKLearnWrapper_LinearRegression__C": 1000,
    "*__SomeStep__hyperparam3": value,
    "*__SKLearnWrapper_BoostedTrees__count": 10
}

That is, the unique paths to some steps were compressed using the star (*) operator. The Star operator means "one or more steps between". But the way the paths are compressed would be lossless, in the sense that the original names could ALWAYS be retrieved given the original pipeline's tree structure.

Describe alternatives you've considered
Using custom ways to flush words and compress them. That seems good, but it doesn't seem to generalize to all pipelines that could exist.

Additional context
Hyperparameter names were said to be too long as well in #478

Additional idea
For hyperparameters, given the fact that in the future every model may need to name its expected hyperparams, then it may be possible to use their name only and directly if there is no other step with the same hyperparams. If another step uses the same hyperparam names, then compression with the "*" could go up in the tree to find the first non-common parent name or something.

More ideas are needed to be sure we do this the right way.

The text was updated successfully, but these errors were encountered:

Rohith295 · 2021-06-25T22:02:25Z

I would like to work on this issue.

guillaume-chevalier · 2021-06-26T15:43:43Z

Idea 1:

List of dicts containing step name and/or hyperparameter(s) and the parent list. Possible to remove parents for printing. It keeps the good order:

def compresed(self: HyperparameterSamples) -> CompressedHyperparameterSamples
class CompressedHyperparameterSamples(List[Dict[str, Any]])
def CompressedHyperparameterSamples.remove_parents()

Idea 2:

def CompressedHyperparameterSamples.to_wildcards() :

Before: 
pipeline__a__predictor1__IncrementalFitCausalityModel__hp1
pipeline__a__predictor2__IncrementalFitCausalityModel__hp1
pipeline__b__predictor1__IncrementalFitCausalityModel__hp1

After: 
*a__predictor1*hp1: 435
*a__predictor2*hp1: 435
*b*hp1: 234

Or this Alternative version of After for instance: 
*a*predictor1*hp1: 435
*predictor2*hp1: 435
*b*hp1: 234

Rohith295 · 2021-06-26T15:44:08Z

One type of compressed format

all_hps_shortened = all_hps.compress()
print(all_hps_shortened)
[
    {

        "step_name": "step1",
        "hp": {

        },
        "parents": [""] or "_"
    },
    {

        "step_name": "step1",
        "hp": {

        },
        "parents": [""] or "_"
    },

]

**if trim parents is True**
all_hps_shortened = all_hps.compress(trim_parents = True)
print(all_hps_shortened)
[
    {

        "step_name": "step1",
        "hp": {

        },
    },
    {

        "step_name": "step1",
        "hp": {

        },
    },

]

guillaume-chevalier · 2021-06-26T15:45:51Z

@Rohith295 perfect ! I like compress(trim_parents = True) that would call remove_parents instantly as in Idea 1 😃

guillaume-chevalier · 2021-06-26T15:46:53Z

Would be interesting to have this as well: CompressedHyperparameterSamples.restore() -> HyperparameterSamples

guillaume-chevalier · 2022-06-15T18:44:27Z

Completed using the "use_wildcards" argument such as in RecursiveDict.to_flat_dict(use_wildcards=True)

guillaume-chevalier added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers question Further information is requested labels May 16, 2021

This was referenced Jun 27, 2021

Added feature to compress hps #498

Closed

Recursive dict compress feature #502

Closed

guillaume-chevalier added this to the 0.6.1 milestone Jun 29, 2021

guillaume-chevalier removed this from the 0.6.1 milestone Oct 17, 2021

guillaume-chevalier closed this as completed Jun 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: RecursiveDict.compress() to shorten paths to steps and their hyperparams #486

Feature: RecursiveDict.compress() to shorten paths to steps and their hyperparams #486

guillaume-chevalier commented May 16, 2021

Rohith295 commented Jun 25, 2021

guillaume-chevalier commented Jun 26, 2021 •

edited

Loading

Rohith295 commented Jun 26, 2021

guillaume-chevalier commented Jun 26, 2021

guillaume-chevalier commented Jun 26, 2021

guillaume-chevalier commented Jun 15, 2022

Feature: RecursiveDict.compress() to shorten paths to steps and their hyperparams #486

Feature: RecursiveDict.compress() to shorten paths to steps and their hyperparams #486

Comments

guillaume-chevalier commented May 16, 2021

Rohith295 commented Jun 25, 2021

guillaume-chevalier commented Jun 26, 2021 • edited Loading

Idea 1:

Idea 2:

Rohith295 commented Jun 26, 2021

guillaume-chevalier commented Jun 26, 2021

guillaume-chevalier commented Jun 26, 2021

guillaume-chevalier commented Jun 15, 2022

guillaume-chevalier commented Jun 26, 2021 •

edited

Loading