Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: help message, type checker and input restrictions #7

Closed
dc3ea9f opened this issue Jun 14, 2023 · 24 comments
Closed

Feature Request: help message, type checker and input restrictions #7

dc3ea9f opened this issue Jun 14, 2023 · 24 comments

Comments

@dc3ea9f
Copy link

dc3ea9f commented Jun 14, 2023

When using this library, I typically write a yaml or json file as a configuration and load it using chanfig.Config. However, I have encountered two issues that make the process a bit uncomfortable:

  1. When running the program with the --help option, only the parameters added with config.add_argument are displayed, which is inconvenient. It would be better if the full help message, similar to what argparse provides, could be shown.
  2. After writing the configuration file, I always need to add very long codes to manually verify the input type and check for any other restrictions to ensure that the user hasn't entered any invalid input (e.g., using --type and --choices in argparse). It would be more convenient if we can use a comment (use JSONC for JSON) to restrict input type and other conditions. For example, a configuration file in YAML format could look like this:
model:
  activation: 'ReLU' // **choices: ['ReLU', 'ELU', 'GeLU']; type: str**
exp:
  seed: 42 // **type: int**

Comments quoted with ** would be parsed into input checker by CHANfiG. When there is no comment, CHANfiG parser would roll back to the current version. Although it is a bit ugly, I haven't found any other better solutions so far.

I believe addressing these points would greatly enhance the development experience with CHANfiG. I would appreciate any comments or suggestions you may have on these matters.

@ZhiyuanChen
Copy link
Owner

When running the program with the --help option, only the parameters added with config.add_argument are displayed, which is inconvenient. It would be better if the full help message, similar to what argparse provides, could be shown.

We will look into this.
It could be more default than it appears, for example, what do we do to those arguments with dest specified?
Also, should we continue to use defaultdict, or should we ensure the configure parameters are appeared in Config class/file to avoid typos?
Anyway, this will be tracked in #8.

After writing the configuration file, I always need to add very long codes to manually verify the input type and check for any other restrictions to ensure that the user hasn't entered any invalid input (e.g., using --type and --choices in argparse). It would be more convenient if we can use a comment (use JSONC for JSON) to restrict input type and other conditions. For example, a configuration file in YAML format could look like this:

Hmmmm, I'm not sure if this kind of restriction is "pythonic", though I don't find including jsonc support could be harmful.
We will be tracking the progress in #9.

Thank you for reporting these!

@dc3ea9f
Copy link
Author

dc3ea9f commented Jun 14, 2023

Also, should we continue to use defaultdict, or should we ensure the configure parameters are appeared in Config class/file to avoid typos?

Sorry, I cannot get your point, could you explain more?

Hmmmm, I'm not sure if this kind of restriction is "pythonic"

I think it is not "Pythonic" at all. But writing too many type checks or input restriction codes is really annoying. I expect a more convenient way to do this.

During my development, I need to manually check if some configs are missing. especially for those configurations that need user inputs. After that, I considered adding type hint/checker/choices to my configuration file. Now, I am not sure if it is necessary since some types are conventional (e.g. batch_size must be integer). Do you have any suggestions or ideas?

@ZhiyuanChen
Copy link
Owner

Also, should we continue to use defaultdict, or should we ensure the configure parameters are appeared in Config class/file to avoid typos?

The reason why we make Config a defaultdict is that we cannot determine the if a key is valid or not. So we assume all user input are valid keys.
If we have a file that indicate the possible keys, we wouldn't need a defaultdict and we can check if a key is valid or not.

During my development, I need to manually check if some configs are missing. especially for those configurations that need user inputs. After that, I considered adding type hint/checker/choices to my configuration file. Now, I am not sure if it is necessary since some types are conventional (e.g. batch_size must be integer). Do you have any suggestions or ideas?

Could you elaborate more on why you need these checks?
I mean, if some value has incorrect type, they will raise an Error.

@dc3ea9f
Copy link
Author

dc3ea9f commented Jun 15, 2023

OK. I know what you mean. I have tested those configurations added by --dest. The values could be added to the right place. I think we can just use the config file as standard input for the help message. If you think it ignores some "valid" keys from the user, we can consider another argument like --help-chanfig.

I would add type checks for programs that need many resources. In deep learning experiments, we usually define the dataset, dataloader, model sequentially and run model.forward(). If the dataset/model is too big, we must wait a long time to get the runtime error. A pre-check before these would be better to avoid wasting time.

@ZhiyuanChen
Copy link
Owner

ZhiyuanChen commented Jun 16, 2023

Thank you for suggestion and clarification.

I think we can just use the config file as standard input for the help message. If you think it ignores some "valid" keys from the user, we can consider another argument like --help-chanfig.

In the next few versions, we will include the pre-defined keys in the --help message.

I would add type checks for programs that need many resources.

This is very convincing and I believe this behaviour is mandatory to chanfig.

We will include the support of jsonc before duanwu festival.

@dc3ea9f
Copy link
Author

dc3ea9f commented Jun 19, 2023

Hello, I found another more pythonic way for type hints.

The config can be a Python file rather than yaml/json, as your usage example does. Type hints could be added through typing. Though typing doesn't support runtime check, I think we can enable it though typeguard or other libraries.

from chanfig import Config
from rich import print

class TestConfig(Config):
    def __init__(self):
        super().__init__()
        self.name: str = "CHANfiG"
        self.seed: int = 1013
        self.data.batch_size: int = 64
        self.model.encoder.num_layers: int = 6
        self.model.decoder.num_layers: int = 6
        self.activation: str = "GELU"
        self.optim.lr: float = 1e-3    


if __name__ == '__main__':
    config = TestConfig()
    config = config.parse()
    config.freeze()
    print(config)

Or we can define a wrapper on the value, e.g.:

from chanfig import Value
# ...
        self.data.batch_size = Value(6, type=int, validator=lambda e: 4 < e < 128, required=False)
        self.activation = Value('GELU', type=str, choices=['GELU', 'RELU', 'TanH'], required=True)
# ...

I think the second way won't hurt default user behaviors and does fewer modifications to our code.

@ZhiyuanChen
Copy link
Owner

ZhiyuanChen commented Jun 21, 2023

Hello, I found another more pythonic way for type hints.

The config can be a Python file rather than yaml/json, as your usage example does. Type hints could be added through typing. Though typing doesn't support runtime check, I think we can enable it though typeguard or other libraries.

from chanfig import Config
from rich import print

class TestConfig(Config):
    def __init__(self):
        super().__init__()
        self.name: str = "CHANfiG"
        self.seed: int = 1013
        self.data.batch_size: int = 64
        self.model.encoder.num_layers: int = 6
        self.model.decoder.num_layers: int = 6
        self.activation: str = "GELU"
        self.optim.lr: float = 1e-3    


if __name__ == '__main__':
    config = TestConfig()
    config = config.parse()
    config.freeze()
    print(config)

This is an excellent idea, I'm working on it.

Or we can define a wrapper on the value, e.g.:

from chanfig import Value
# ...
        self.data.batch_size = Value(6, type=int, validator=lambda e: 4 < e < 128, required=False)
        self.activation = Value('GELU', type=str, choices=['GELU', 'RELU', 'TanH'], required=True)
# ...

I think the second way won't hurt default user behaviors and does fewer modifications to our code.

Using Variable could be dangerous, as it can only pretend to be a str / int but actually is not. This could results in unexpected behaviour.

Although it won't hurt to add this feature, I'll work on it.

@dc3ea9f
Copy link
Author

dc3ea9f commented Jun 21, 2023

Thank you for your hard working!

While the first one can only do type checking and weak in the data validator part. For the second implementation, maybe we can save two copy of configs? One for current, one for the new Values, the Value only build a data validator entry when calling __set__ and won’t hold the actual data. I think the memory cost can be ignored and this may not such dangerous.

@ZhiyuanChen
Copy link
Owner

e4d179f add support for strong data validation for Variable. You may call NestedDict/FlatDict/Variable.validate() to validate values now~

The first one is much harder than it appears and requires deep hack into the underling implementation. I'll continue work on it.

For the second implementation, maybe we can save two copy of configs? One for current, one for the new Values, the Value only build a data validator entry when calling set and won’t hold the actual data. I think the memory cost can be ignored and this may not such dangerous.

Considering two Config, a ValueConfig that stores actual value, and a VariableConfig that stores type check information. Since ValueConfig stores all values python built-in types which are tooooo dangerous to hack, we can only set value on VariableConfig and make it sync value to ValueConfig. But we want to get value, we must access the ValueConfig. This behaviour is very counter-intuitive and is possibly not an ideal solution.
One way to get around this is to use another Wrapper class which wraps both configs and access them through descriptor protocol. But it would make the whole structure much more complicated.

Maybe I could was too cautious, I use Variable in most of my projects and they hardly fail. Would you mind try it and see if it works?

@dc3ea9f
Copy link
Author

dc3ea9f commented Jun 21, 2023

Awesome! So quick update, I would try to use Variable in my projects later.

@dc3ea9f
Copy link
Author

dc3ea9f commented Jun 21, 2023

we can only set value on VariableConfig and make it sync value to ValueConfig. But we want to get value, we must access the ValueConfig. This behaviour is very counter-intuitive and is possibly not an ideal solution.

maybe the VariableConfig is confusing, let's call it ValidatorConfig, it would save type, validator function and other data validation related information. It would impact on we set data, and the data in ValueConfig is assumed to be validated (we set a good initial value and all modifications are through validator). We get data directly from ValueConfig since the other config only guides what data is 'good'. I think this is not counter-intuitive.

@ZhiyuanChen
Copy link
Owner

maybe the VariableConfig is confusing, let's call it ValidatorConfig, it would save type, validator function and other data validation related information. It would impact on we set data, and the data in ValueConfig is assumed to be validated (we set a good initial value and all modifications are through validator). We get data directly from ValueConfig since the other config only guides what data is 'good'. I think this is not counter-intuitive.

This would be straightforward for experienced engineers and researcher.
But considering most python users barely understand the differences between objects and classes…

Although we could do modify the get/set method of Wrapper. If the value to set is a Validator, we set the ValidatorConfig, otherwise, we validate the value and then set the ValueConfig.

@ZhiyuanChen
Copy link
Owner

Awesome! So quick update, I would try to use Variable in my projects later.

No worries. I promised duanwu festival and I must honor it.
I understand the deadline can be tight and how frustrating it is when upstream library is slow in response.

@dc3ea9f
Copy link
Author

dc3ea9f commented Jun 21, 2023

This would be straightforward for experienced engineers and researcher.
But considering most python users barely understand the differences between objects and classes…

To be honest and a little offense, I think those python users may not be our target users.

Although we could do modify the get/set method of Wrapper. If the value to set is a Validator, we set the ValidatorConfig, otherwise, we validate the value and then set the ValueConfig.

Maybe you misunderstand the logic or I haven't gotten your point? There is no situation that "the value to set is a Validator". For the user side, they would use:

# ...
        self.exp.seed = 42
        self.data.batch_size = Wrapper(6, type=int, validator=lambda e: 4 < e < 128, required=False)
        self.activation = Wrapper('GELU', type=str, choices=['GELU', 'RELU', 'TanH'], required=True)
# ...

and in chanfig, it would create two flatten dict (I am not familiar with the implementation):

data = {"exp.seed": 42, "data.batch_size": 6, "activation": "GELU"}
validators = {
    "exp.seed": Validator(), # default
    "data.batch_size": Validator(type=int, validator=lambda e: 4 < e < 128, required=False),
    "activation": Validator(type=str, choices=['GELU', 'RELU', 'TanH'], required=True),
}

and when we call __setattr__(self, key, value), it would be like:

def __setattr__(self, key, value):
    if self.validators[key].check(value):
        self.data[key] = value
    else:
        raise ValueError(...)

Users won't access the validator directly, the Wrapper would define the validator.

@dc3ea9f
Copy link
Author

dc3ea9f commented Jun 21, 2023

Please just ignore me... After re-check your commit e4d179f, I found it actually does what I want. Maybe the brain is ruined at night...

The current implementation meets my demands. There is no reason to discuss how we implement the Wrapper (you have finished).

Sorry and hope you have a good duanwu festival.

@ZhiyuanChen
Copy link
Owner

ZhiyuanChen commented Jun 25, 2023

To be honest and a little offense, I think those python users may not be our target users.

Actually, this package (and also DanLing) is targeted for those who has minimum knowledge in programming. Therefore, every APIs are defined following intuitive. It should just works.

Maybe you misunderstand the logic or I haven't gotten your point? There is no situation that "the value to set is a Validator". For the user side

Actually our thoughts are the same.
I meant, a Config is a python dict (it actually inherits from dict). To store the validator information, we need two dicts.
So, we need a Wrapper class that has access to both dicts (data and validators) for easier accessing. And the rest are the same as what you wrote above.
Please let me know if you think this kind of Wrapper class is necessary.

@dc3ea9f
Copy link
Author

dc3ea9f commented Jun 25, 2023

Actually, this package (and also DanLing) is targeted for those who has minimum knowledge in programming. Therefore, every APIs are defined following intuitive. It should just works.

Sorry, my bad. Please feel free to keep your opinion.

Actually our thoughts are the same. I meant, a Config is a python dict (it actually inherits from dict). To store the validator information, we need two dicts. So, we need a Wrapper class that has access to both dicts (data and validators) for easier accessing. And the rest are the same as what you wrote above. Please let me know if you think this kind of Wrapper class is necessary.

Yes, I have found our thoughts are the same. Currently, I'm trying to use Variable in my programs and have no ideas for now.

@ZhiyuanChen
Copy link
Owner

We just released v0.0.83 this~

@ZhiyuanChen
Copy link
Owner

ZhiyuanChen commented Jun 27, 2023

  1. When running the program with the --help option, only the parameters added with config.add_argument are displayed, which is inconvenient. It would be better if the full help message, similar to what argparse provides, could be shown.

We just pushed a fix, please let me know if the help message can be improved.

@dc3ea9f
Copy link
Author

dc3ea9f commented Jun 29, 2023

Glad to hear that. I would try the newest version later.

I apologize for my delayed response as I was sick.

@ZhiyuanChen
Copy link
Owner

Glad to hear that. I would try the newest version later.

I apologize for my delayed response as I was sick.

No worries, thank you so much for your kind and quick feedback

Wish you a speedy recovery

@dc3ea9f
Copy link
Author

dc3ea9f commented Aug 4, 2023

The first one is much harder than it appears and requires deep hack into the underling implementation. I'll continue work on it.

I think the Python Cookbook 3 section 9.20 (page 394) is helpful for us to do something with the type hinting.

@ZhiyuanChen
Copy link
Owner

I have implemented corresponding code for annotation validation.
However, many of its features require Python 3.10+

@ZhiyuanChen
Copy link
Owner

I have fixed the compatibilities issue with previous python version.
Could you try if it works as you expected.

Note that due to the limitation of Python, we can only check annotations defined in class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants