-
Notifications
You must be signed in to change notification settings - Fork 15
Support dynamic configuration #316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devendorize #316 +/- ##
================================================
- Coverage 97.86% 87.18% -10.69%
================================================
Files 18 18
Lines 2249 2263 +14
================================================
- Hits 2201 1973 -228
- Misses 48 290 +242
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, I haven't check all the details for correctness, but I can see that with this approach, the calling order restriction of dandischema.conf.set_instance_config() in relation to the import of dandischema.models has indeed been lifted. However, to achieve that, the validations affected by the value of the instance config in dandischema.conf have been moved into custom Pydantic validators, i.e., the field_validators. The validation behaviors of those validators exist only in Python runtime and are not encoded in the corresponding JSON Schema schemas of the models.
I think, in general, we want to move validation out of custom Pydantic validators into validation that can be encoded into JSON Schema schemas. @yarikoptic, your input?
| doi: str = Field( | ||
| title="DOI", | ||
| json_schema_extra={"readOnly": True, "nskey": DANDI_NSKEY}, | ||
| default="", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We require doi value when DANDI DOI pattern is not available. Defaulting to """ will fail the check_id method below, but the feedback the user getting will be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what should default be set to here? Prior to this, it seemed that either pattern was set to DANDI_DOI_PATTERN, or it was set to the default pattern and default was set to "". Should default be set dynamically to match that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually talking about the default value of the doi field. Without this proposed change, the doi field doesn't have a default when DANDI_DOI_PATTERN is None which means that the doi field is required in that situation. With this proposed change, the doi field always has a default which means the doi field is never required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, that's an oversight on my part.
Good point. This could be addressed using
I don't think they're necessarily opposed to one another. The custom validators I added are simply using regex, which can be re-encoded into strings once the model is serialized. |
You are right. We can do that but that will make the eventual transitioning to LinkML more complex, and from now on we have to manage the serialization of those involved elements instead of being done by Pydantic.
I like the idea of separation of model from code, so I am hesitant in moving more of the model specs to the code by using those custom validators and respective serializers. I want to know why we want these customization in the first place. I see that these changes allow |
|
re
and related
Although I am with you on the ultimate desires/design, an immediate target is to provide support for multiple instances with current existing setup of pydantic + jsonschema, with minimal amount of "user visible changes" (i.e. not changing much if anything in "DANDI vendorized schema"). So I would say -- we can go and move into a few more python validations and serializers for now. It would also help to identify such points better for when we re-approach expressing it in linkml again. |
|
re
we should not need it , but did we check after relaxing all the regexes that client would work as fine? |
261f0f4 to
c5f6327
Compare
|
We merged other developments into |
I'm okay handling conflicts, but are we going forward with this branch? It seemed that @candleindark had major objections. |
Yes. We are going forward with this. Thanks for bring the idea and the PR. After some considerations, we think it's best to avoid the "brittle" setup that depends on the import order. You can rebase this PR and make all the tests pass, or I can send a PR to this PR, whichever works better for you. Let me know which way you prefer. |
c94f71a to
0a849af
Compare
26cd116 to
e3acf84
Compare
40ddbb9 to
a32505b
Compare
b03344c to
2069ffa
Compare
9477b07 to
1e13bb4
Compare
| @staticmethod | ||
| def get_id_pattern(): | ||
| conf = get_instance_config() | ||
| sub_pattern = conf.id_pattern + "|" + conf.id_pattern.lower() | ||
| pattern = rf"^({sub_pattern}):\d{{6}}(/(draft|\d+\.\d+\.\d+))$" | ||
|
|
||
| return pattern | ||
|
|
||
| @field_validator("id") | ||
| @classmethod | ||
| def check_id(cls, value: str) -> str: | ||
| pattern = cls.get_id_pattern() | ||
| if re.match(pattern, value) is None: | ||
| raise ValueError(f"ID does not match pattern {pattern}") | ||
|
|
||
| return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I really wanted to do was the following:
@field_pattern("id")
@staticmethod
def get_id_pattern():
conf = get_instance_config()
sub_pattern = conf.id_pattern + "|" + conf.id_pattern.lower()
pattern = rf"^({sub_pattern}):\d{{6}}(/(draft|\d+\.\d+\.\d+))$"
return patternWhere field_pattern would be a new decorator that does two things:
- Creates a
field_validatorfor the specified fields ("id"), that just runsre.matchwith the pattern returned from the decorated functionget_id_patternon the supplied fields ("id"). - Tags the function/class in a way that allows for
patternto be injected into the rendered schema automatically
I tried to implement this but couldn't find a way to do so. Perhaps in the future this can be updated.
aba244b to
73eddd3
Compare
for more information, see https://pre-commit.ci
|
@candleindark I unfortunately don't have time to get this PR to 100%. However, it is almost there (90%). As far as I can tell, the big remaining item is this comment you left, regarding the doi default value. I believe a default could be supplied in the Do you think you can take this on? |
Sure. I will take it from this point on. Thanks for helping out. |
|
ok, so it seems just that vendorized CI runs (where we pretend to be running on a specific vendorized instance) are failing , e.g. in the tests @jjnesbitt you didn' try to run tests while having specified an instance "outside" of the environment right? |
…g isort requirement
The support of the use of the `|` operator to express optional type was introduced in Python 3.10. The lowest supported Python in this project is currently 3.9. Let's delay the use of `|` to express optional types after dropping of Python 3.9
So that it behaves the same as `dandischema .models.DANDI_INSTANCE_URL_PATTERN` that it is replacing. Incidentally, this commit also renames the local variable `pattern` to `instance_url` to reflect the nature of the assigned value
Rename property `published_version_pattern` to `published_version_url_pattern`. The new name is more consistent with the `PUBLISHED_VERSION_URL_PATTERN` constant that existed in `dandischema.models` which the property is replacing
Realign the definition of `Config.dandi_doi_pattern` to `dandischema.models.DANDI_DOI_PATTERN` which it is replacing
Remove special handling of importing of `dandischema.models` before `set_instance_config()` is called. The whole point of the containing PR is to remove the reliance on import order, so such a handling will no longer be needed.
| [(license_.name, license_.value) for license_ in _INSTANCE_CONFIG.licenses], | ||
| [ | ||
| (license_.name, license_.value) | ||
| for license_ in get_instance_config().licenses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line prevents dandischema to become truly dynamic, the purpose of this PR, and there is no way around it as long as LicenseType is defined as an Enum at the module level. Once this definition of LicenseType is executed, changes in the value returned by get_instance_config() do not alter the value of LicenseType.
A way to make LicenseType dynamic, as suggested by ChatGPT, is to define it as custom type with hooks for Pydantic to validate and generate JSON schema, such as the following.
# types.py (or nearby)
from pydantic import GetJsonSchemaHandler
from pydantic.json_schema import JsonSchemaValue
from pydantic_core import core_schema, PydanticCustomError
from dandischema.conf import get_instance_config
def _current_license_values() -> list[str]:
# your config objects already have .value equal to "scheme:id"
return [lic.value for lic in get_instance_config().licenses]
class DynamicLicense(str):
@classmethod
def __get_pydantic_core_schema__(cls, _source, _handler):
def validate(v):
s = str(v)
allowed = _current_license_values()
if s not in allowed:
raise PydanticCustomError(
'license_value',
'Invalid license: {val}. Allowed: {allowed}',
{'val': s, 'allowed': allowed},
)
return s
return core_schema.no_info_after_validator_function(
validate, core_schema.str_schema()
)
@classmethod
def __get_pydantic_json_schema__(cls, core_schema, handler: GetJsonSchemaHandler) -> JsonSchemaValue:
schema = handler(core_schema)
schema['enum'] = _current_license_values()
schema['title'] = 'License'
return schemaThough I have yet to test this approach fully, it looks viable to me but very messy and opaque. However, the more crucial question for me is if we should redefine LicenseType as a custom type in order achieve the goal of this PR, or should we just accept the current state of #294, which has restriction on import order? If we take this approach, and there are other enum classes need to be made dynamic in the further, they will have to be redefined the same way as well. Should we take this approach, @yarikoptic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this alternative work as for immediate needs/use-cases:
- instead of directly assigning a list here, we come up with a function like
def assign_enums(instance_config)which we call here after getting theinstance_config. - inside that function we set all desired enums like this one to config based value
- in
set_instance_configwe add a flagassign_enums: bool = Falseandif assign_enums: import dandischema.modelsand callassign_enumswith that new config
This way we
- would not have circular import
- would be able to adjust all those enums (if more than just
LicenseTypewould need to be set).
WDYT @jjnesbitt about this situation and how to solve it?
Overall, me and @candleindark feel that added complexity over-weights the benefit we might get with "dynamic configuration" at the moment, and would prefer to go with a much simpler original solution of doing instance setup at import time once (that is what in devendorize branch) for now to avoid all possible gotchas due to added complexity to config life cycle. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other point I want to bring up, after realizing it in handling this LicenseType issue, is that in making models.py fully dynamic, we have rendered the use of set_instance_config() a potential pitfall. Now, one has to be extremely careful with the use of set_instance_config(). Using set_instance_config() to define the models, one has to ensure that it is always called in a function that will be executed each time that a model entity is evaluated, and any failure in doing so will lead to a models.py that is only partially dynamic and holds definitions that are inconsistent with the instant config. Case in point, the error in the current definition of LicenseType was overlook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way that I can think to solve this problem in the framework of this PR is to change the type of license to a str, and define the following field validator for license:
@field_validator("license")
@staticmethod
def check_license(value: str) -> str:
license_values = [x.value for x in get_instance_config().licenses]
if value not in license_values:
raise ValueError(f"License {value} not valid")
return valueand then add this in DandiBaseModel.__get_pydantic_json_schema__:
if prop == "License":
value["items"]["enum"] = [
x.value for x in get_instance_config().licenses
]This solution is messy and has its issues, so if you'd rather go with a static import approach, you're free to do so. However, I'll just point out that there's already loads of arbitrary logic in the __get_pydantic_json_schema__ method, as well as many other places.
The larger issue is that you're trying to take a 2000 line file filled with pydantic type definitions, and make it configurable with minimal changes. As far as I can tell, this is not something that's really done in pydantic. You'd be better off creating a function that returns the properly configured classes at runtime, but that has it's own issues.
🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it also effect meditor so we loose drop down and require users to enter that string?
Indeed the task we are trying to do has big "footprint" but IMHO it is quite easy to achieve with import time customization. After we achieve desired effects of allowing multiple instances and making backend and frontend configured with just a few new settings, we will look into overhauling this setup, likely with a switch to linkml as the source of the model. Potentially, again, just keeping it a singular version customized at import time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that solution wouldn't effect the meditor since the JSON schema for the field is customized at DandiBaseModel.__get_pydantic_json_schema__. However, this is less desirable to the automated JSON schema generation if we were to define LicenseType as an Enum.
In validation errors of published dandiset
|
ok, per my comment above we will postpone an attempt to make it work "properly" via dynamic configuration and stick to import time configuration. I will close it for now but it should be kept in mind whenever we reapproach this. |
This allows support for setting
Configdynamically (vendorization, for example), without needing to rely on module import order.@yarikoptic I believe this removes the need for the
clear_dandischema_modules_and_set_env_varsconftest function, as config no longer relies on module import, but I'm still looking into that, which is why this is draft.