Support dynamic configuration #316

jjnesbitt · 2025-07-31T17:00:43Z

This allows support for setting Config dynamically (vendorization, for example), without needing to rely on module import order.

@yarikoptic I believe this removes the need for the clear_dandischema_modules_and_set_env_vars conftest function, as config no longer relies on module import, but I'm still looking into that, which is why this is draft.

codecov · 2025-07-31T17:02:11Z

Codecov Report

❌ Patch coverage is 96.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.18%. Comparing base (b78da9e) to head (fbf75b1).

Files with missing lines	Patch %	Lines
dandischema/models.py	93.75%	4 Missing ⚠️
dandischema/conf.py	96.42%	1 Missing ⚠️
dandischema/tests/conftest.py	95.65%	1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (b78da9e) and HEAD (fbf75b1). Click for more details.

HEAD has 6 uploads less than BASE

Flag BASE (b78da9e) HEAD (fbf75b1)

unittests 54 48

Additional details and impacted files

@@               Coverage Diff                @@
##           devendorize     #316       +/-   ##
================================================
- Coverage        97.86%   87.18%   -10.69%     
================================================
  Files               18       18               
  Lines             2249     2263       +14     
================================================
- Hits              2201     1973      -228     
- Misses              48      290      +242

Flag	Coverage Δ
unittests	`87.18% <96.00%> (-10.69%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

candleindark

At this point, I haven't check all the details for correctness, but I can see that with this approach, the calling order restriction of dandischema.conf.set_instance_config() in relation to the import of dandischema.models has indeed been lifted. However, to achieve that, the validations affected by the value of the instance config in dandischema.conf have been moved into custom Pydantic validators, i.e., the field_validators. The validation behaviors of those validators exist only in Python runtime and are not encoded in the corresponding JSON Schema schemas of the models.

I think, in general, we want to move validation out of custom Pydantic validators into validation that can be encoded into JSON Schema schemas. @yarikoptic, your input?

dandischema/models.py

candleindark · 2025-07-31T18:07:13Z

dandischema/models.py

+    doi: str = Field(
+        title="DOI",
+        json_schema_extra={"readOnly": True, "nskey": DANDI_NSKEY},
+        default="",


We require doi value when DANDI DOI pattern is not available. Defaulting to """ will fail the check_id method below, but the feedback the user getting will be different.

So what should default be set to here? Prior to this, it seemed that either pattern was set to DANDI_DOI_PATTERN, or it was set to the default pattern and default was set to "". Should default be set dynamically to match that?

I am actually talking about the default value of the doi field. Without this proposed change, the doi field doesn't have a default when DANDI_DOI_PATTERN is None which means that the doi field is required in that situation. With this proposed change, the doi field always has a default which means the doi field is never required.

Ah I see, that's an oversight on my part.

jjnesbitt · 2025-07-31T18:42:34Z

The validation behaviors of those validators exist only in Python runtime and are not encoded in the corresponding JSON Schema schemas of the models.

Good point. This could be addressed using field_serializer. I can take a stab at that.

I think, in general, we want to move validation out of custom Pydantic validators into validation that can be encoded into JSON

I don't think they're necessarily opposed to one another. The custom validators I added are simply using regex, which can be re-encoded into strings once the model is serialized.

candleindark · 2025-07-31T19:28:59Z

The validation behaviors of those validators exist only in Python runtime and are not encoded in the corresponding JSON Schema schemas of the models.

Good point. This could be addressed using field_serializer. I can take a stab at that.

You are right. We can do that but that will make the eventual transitioning to LinkML more complex, and from now on we have to manage the serialization of those involved elements instead of being done by Pydantic.

I think, in general, we want to move validation out of custom Pydantic validators into validation that can be encoded into JSON

I don't think they're necessarily opposed to one another. The custom validators I added are simply using regex, which can be re-encoded into strings once the model is serialized.

I like the idea of separation of model from code, so I am hesitant in moving more of the model specs to the code by using those custom validators and respective serializers.

I want to know why we want these customization in the first place. I see that these changes allow dandischema.conf.set_instance_config() to be called after as well as before the import of dandischema.models. Do we need this ability though? Without these changes, dandischema.models can be initialized correctly as long as the env vars corresponding to the fields of dandischema.conf.Config are set when dandi-archive is lanuched, and dandischema.models don't need to change for the duration of a dandi-archive run, i.e., dandischema.conf.set_instance_config() doesn't need to be called in dandi-archive at all. As for dandi-cli, per recent discussion with @yarikoptic, we no longer need an instance specific dandischema.models. Thus, at this point, dandischema.conf.set_instance_config() doesn't need to be called within dandi-cli either.

yarikoptic · 2025-07-31T20:33:30Z

re

I think, in general, we want to move validation out of custom Pydantic validators into validation that can be encoded into JSON Schema schemas. @yarikoptic, your input?

and related

that will make the eventual transitioning to LinkML more complex, and from now on we have to manage the serialization of those involved elements instead of being done by Pydantic.
...
I like the idea of separation of model from code, so I am hesitant in moving more of the model specs to the code by using those custom validators and respective serializers.

Although I am with you on the ultimate desires/design, an immediate target is to provide support for multiple instances with current existing setup of pydantic + jsonschema, with minimal amount of "user visible changes" (i.e. not changing much if anything in "DANDI vendorized schema"). So I would say -- we can go and move into a few more python validations and serializers for now. It would also help to identify such points better for when we re-approach expressing it in linkml again.

yarikoptic · 2025-07-31T20:34:15Z

re

As for dandi-cli, per recent discussion with @yarikoptic, we no longer need an instance specific dandischema.models

we should not need it , but did we check after relaxing all the regexes that client would work as fine?

yarikoptic · 2025-08-08T18:09:53Z

We merged other developments into devendorize (which also should be passing tests now) and now conflicts came up. @jjnesbitt - would you prefer to update this branch yourself or would be ok with @candleindark to attempt that? if ok - would you prefer rebase or merge?

jjnesbitt · 2025-08-08T18:11:25Z

We merged other developments into devendorize (which also should be passing tests now) and now conflicts came up. @jjnesbitt - would you prefer to update this branch yourself or would be ok with @candleindark to attempt that? if ok - would you prefer rebase or merge?

I'm okay handling conflicts, but are we going forward with this branch? It seemed that @candleindark had major objections.

candleindark · 2025-08-08T20:23:07Z

We merged other developments into devendorize (which also should be passing tests now) and now conflicts came up. @jjnesbitt - would you prefer to update this branch yourself or would be ok with @candleindark to attempt that? if ok - would you prefer rebase or merge?

I'm okay handling conflicts, but are we going forward with this branch? It seemed that @candleindark had major objections.

Yes. We are going forward with this. Thanks for bring the idea and the PR. After some considerations, we think it's best to avoid the "brittle" setup that depends on the import order.

You can rebase this PR and make all the tests pass, or I can send a PR to this PR, whichever works better for you. Let me know which way you prefer.

jjnesbitt · 2025-08-12T18:27:47Z

dandischema/models.py

+    @staticmethod
+    def get_id_pattern():
+        conf = get_instance_config()
+        sub_pattern = conf.id_pattern + "|" + conf.id_pattern.lower()
+        pattern = rf"^({sub_pattern}):\d{{6}}(/(draft|\d+\.\d+\.\d+))$"
+
+        return pattern
+
+    @field_validator("id")
+    @classmethod
+    def check_id(cls, value: str) -> str:
+        pattern = cls.get_id_pattern()
+        if re.match(pattern, value) is None:
+            raise ValueError(f"ID does not match pattern {pattern}")
+
+        return value


What I really wanted to do was the following:

@field_pattern("id") @staticmethod def get_id_pattern(): conf = get_instance_config() sub_pattern = conf.id_pattern + "|" + conf.id_pattern.lower() pattern = rf"^({sub_pattern}):\d{{6}}(/(draft|\d+\.\d+\.\d+))$" return pattern

Where field_pattern would be a new decorator that does two things:

Creates a field_validator for the specified fields ("id"), that just runs re.match with the pattern returned from the decorated function get_id_pattern on the supplied fields ("id").

Tags the function/class in a way that allows for pattern to be injected into the rendered schema automatically

I tried to implement this but couldn't find a way to do so. Perhaps in the future this can be updated.

for more information, see https://pre-commit.ci

jjnesbitt · 2025-08-12T18:55:00Z

@candleindark I unfortunately don't have time to get this PR to 100%. However, it is almost there (90%). As far as I can tell, the big remaining item is this comment you left, regarding the doi default value. I believe a default could be supplied in the check_doi method, but the issue is that other parts of the testing code make use of a statically defined instance config. In order to properly fix how DOI is handled, I think those constants (and anywhere they're made use of) need to be updated to use a dynamic instance config.

Do you think you can take this on?

candleindark · 2025-08-12T20:25:45Z

Do you think you can take this on?

Sure. I will take it from this point on. Thanks for helping out.

yarikoptic · 2025-08-15T18:50:05Z

ok, so it seems just that vendorized CI runs (where we pretend to be running on a specific vendorized instance) are failing , e.g. in the tests

FAILED dandischema/tests/test_metadata.py::test_requirements[obj1-PublishedDandiset-missingfields1]
359
FAILED dandischema/tests/test_metadata.py::test_requirements[obj2-PublishedDandiset-missingfields2]
360
FAILED dandischema/tests/test_metadata.py::test_requirements[obj3-PublishedDandiset-missingfields3]
361
FAILED dandischema/tests/test_models.py::test_dandimeta_1 - assert 5 == 6
362
FAILED dandischema/tests/test_models.py::test_vendorization[config_dict0-[A-Z][-A-Z]*-10\\.\\d{4,}-valid_vendored_fields0-invalid_vendored_fields0]
363
FAILED dandischema/tests/test_models.py::test_vendorization[config_dict2-DANDI-10\\.\\d{4,}-valid_vendored_fields2-invalid_vendored_fields2]
364
FAILED dandischema/tests/test_models.py::test_vendorization[config_dict3-[A-Z][-A-Z]*-10\\.\\d{4,}-valid_vendored_fields3-invalid_vendored_fields3]

@jjnesbitt you didn' try to run tests while having specified an instance "outside" of the environment right?

…g isort requirement

The support of the use of the `|` operator to express optional type was introduced in Python 3.10. The lowest supported Python in this project is currently 3.9. Let's delay the use of `|` to express optional types after dropping of Python 3.9

So that it behaves the same as `dandischema .models.DANDI_INSTANCE_URL_PATTERN` that it is replacing. Incidentally, this commit also renames the local variable `pattern` to `instance_url` to reflect the nature of the assigned value

Rename property `published_version_pattern` to `published_version_url_pattern`. The new name is more consistent with the `PUBLISHED_VERSION_URL_PATTERN` constant that existed in `dandischema.models` which the property is replacing

Realign the definition of `Config.dandi_doi_pattern` to `dandischema.models.DANDI_DOI_PATTERN` which it is replacing

Remove special handling of importing of `dandischema.models` before `set_instance_config()` is called. The whole point of the containing PR is to remove the reliance on import order, so such a handling will no longer be needed.

candleindark · 2025-08-19T03:56:36Z

dandischema/models.py

-        [(license_.name, license_.value) for license_ in _INSTANCE_CONFIG.licenses],
+        [
+            (license_.name, license_.value)
+            for license_ in get_instance_config().licenses


This line prevents dandischema to become truly dynamic, the purpose of this PR, and there is no way around it as long as LicenseType is defined as an Enum at the module level. Once this definition of LicenseType is executed, changes in the value returned by get_instance_config() do not alter the value of LicenseType.

A way to make LicenseType dynamic, as suggested by ChatGPT, is to define it as custom type with hooks for Pydantic to validate and generate JSON schema, such as the following.

# types.py (or nearby) from pydantic import GetJsonSchemaHandler from pydantic.json_schema import JsonSchemaValue from pydantic_core import core_schema, PydanticCustomError from dandischema.conf import get_instance_config def _current_license_values() -> list[str]: # your config objects already have .value equal to "scheme:id" return [lic.value for lic in get_instance_config().licenses] class DynamicLicense(str): @classmethod def __get_pydantic_core_schema__(cls, _source, _handler): def validate(v): s = str(v) allowed = _current_license_values() if s not in allowed: raise PydanticCustomError( 'license_value', 'Invalid license: {val}. Allowed: {allowed}', {'val': s, 'allowed': allowed}, ) return s return core_schema.no_info_after_validator_function( validate, core_schema.str_schema() ) @classmethod def __get_pydantic_json_schema__(cls, core_schema, handler: GetJsonSchemaHandler) -> JsonSchemaValue: schema = handler(core_schema) schema['enum'] = _current_license_values() schema['title'] = 'License' return schema

Though I have yet to test this approach fully, it looks viable to me but very messy and opaque. However, the more crucial question for me is if we should redefine LicenseType as a custom type in order achieve the goal of this PR, or should we just accept the current state of #294, which has restriction on import order? If we take this approach, and there are other enum classes need to be made dynamic in the further, they will have to be redefined the same way as well. Should we take this approach, @yarikoptic?

Couldn't this alternative work as for immediate needs/use-cases:

instead of directly assigning a list here, we come up with a function like def assign_enums(instance_config) which we call here after getting the instance_config.

inside that function we set all desired enums like this one to config based value

in set_instance_config we add a flag assign_enums: bool = False and if assign_enums: import dandischema.models and call assign_enums with that new config

This way we

would not have circular import

would be able to adjust all those enums (if more than just LicenseType would need to be set).

WDYT @jjnesbitt about this situation and how to solve it?

Overall, me and @candleindark feel that added complexity over-weights the benefit we might get with "dynamic configuration" at the moment, and would prefer to go with a much simpler original solution of doing instance setup at import time once (that is what in devendorize branch) for now to avoid all possible gotchas due to added complexity to config life cycle. WDYT?

The other point I want to bring up, after realizing it in handling this LicenseType issue, is that in making models.py fully dynamic, we have rendered the use of set_instance_config() a potential pitfall. Now, one has to be extremely careful with the use of set_instance_config(). Using set_instance_config() to define the models, one has to ensure that it is always called in a function that will be executed each time that a model entity is evaluated, and any failure in doing so will lead to a models.py that is only partially dynamic and holds definitions that are inconsistent with the instant config. Case in point, the error in the current definition of LicenseType was overlook.

The way that I can think to solve this problem in the framework of this PR is to change the type of license to a str, and define the following field validator for license:

@field_validator("license") @staticmethod def check_license(value: str) -> str: license_values = [x.value for x in get_instance_config().licenses] if value not in license_values: raise ValueError(f"License {value} not valid") return value

and then add this in DandiBaseModel.__get_pydantic_json_schema__:

if prop == "License": value["items"]["enum"] = [ x.value for x in get_instance_config().licenses ]

This solution is messy and has its issues, so if you'd rather go with a static import approach, you're free to do so. However, I'll just point out that there's already loads of arbitrary logic in the __get_pydantic_json_schema__ method, as well as many other places.

The larger issue is that you're trying to take a 2000 line file filled with pydantic type definitions, and make it configurable with minimal changes. As far as I can tell, this is not something that's really done in pydantic. You'd be better off creating a function that returns the properly configured classes at runtime, but that has it's own issues.

🤷

Wouldn't it also effect meditor so we loose drop down and require users to enter that string?

Indeed the task we are trying to do has big "footprint" but IMHO it is quite easy to achieve with import time customization. After we achieve desired effects of allowing multiple instances and making backend and frontend configured with just a few new settings, we will look into overhauling this setup, likely with a switch to linkml as the source of the model. Potentially, again, just keeping it a singular version customized at import time.

I think that solution wouldn't effect the meditor since the JSON schema for the field is customized at DandiBaseModel.__get_pydantic_json_schema__. However, this is less desirable to the automated JSON schema generation if we were to define LicenseType as an Enum.

In validation errors of published dandiset

yarikoptic · 2025-08-22T19:37:04Z

ok, per my comment above we will postpone an attempt to make it work "properly" via dynamic configuration and stick to import time configuration. I will close it for now but it should be kept in mind whenever we reapproach this.

jjnesbitt requested review from candleindark and yarikoptic July 31, 2025 17:00

jjnesbitt mentioned this pull request Jul 31, 2025

Vendor-Configurable Metadata Models #294

Merged

7 tasks

candleindark reviewed Jul 31, 2025

View reviewed changes

candleindark force-pushed the devendorize branch 2 times, most recently from 261f0f4 to c5f6327 Compare August 4, 2025 03:37

yarikoptic mentioned this pull request Aug 8, 2025

Set environment vars to set vendor-specific dandi-schema used by the DANDI instance dandi/dandi-infrastructure#224

Merged

3 tasks

Support dynamic configuration

c21c64c

jjnesbitt force-pushed the dynamic-config branch from c94f71a to 0a849af Compare August 11, 2025 15:32

Generate jsonschema pattern with dynamic validator

e3acf84

jjnesbitt force-pushed the dynamic-config branch from 26cd116 to e3acf84 Compare August 11, 2025 17:53

jjnesbitt added 4 commits August 12, 2025 13:41

Remove remaining use of _INSTANCE_CONFIG

c409445

Fix Dandiset identifier pattern

9ee5664

Use unvendored constants in Config

702b27d

Update tests

a32505b

jjnesbitt force-pushed the dynamic-config branch from 40ddbb9 to a32505b Compare August 12, 2025 17:45

Add import for dandi-cli

2069ffa

jjnesbitt force-pushed the dynamic-config branch from b03344c to 2069ffa Compare August 12, 2025 18:07

Fix incorrect instance url pattern when unvendored

1e13bb4

jjnesbitt force-pushed the dynamic-config branch from 9477b07 to 1e13bb4 Compare August 12, 2025 18:21

jjnesbitt commented Aug 12, 2025

View reviewed changes

Create http url type adapter at module level

73eddd3

jjnesbitt force-pushed the dynamic-config branch from aba244b to 73eddd3 Compare August 12, 2025 18:30

[pre-commit.ci] auto fixes from pre-commit.com hooks

05694fb

for more information, see https://pre-commit.ci

yarikoptic mentioned this pull request Aug 15, 2025

Add DJANGO_DANDI_INSTANCE_NAME and make use of dandischema.set_instance_config dandi/dandi-archive#2493

Closed

candleindark added 8 commits August 18, 2025 12:27

chore(imports): remove unused imports and rearrange imports to meetin…

f8d3c5f

…g isort requirement

rf: rename a property in Config

08be466

Rename property `published_version_pattern` to `published_version_url_pattern`. The new name is more consistent with the `PUBLISHED_VERSION_URL_PATTERN` constant that existed in `dandischema.models` which the property is replacing

fix: realign Config.dandi_doi_pattern

e4fe4cd

Realign the definition of `Config.dandi_doi_pattern` to `dandischema.models.DANDI_DOI_PATTERN` which it is replacing

doc: provide docstrings to properties

6fa7d52

chore: remove unused import

8359cfd

fix: remove special handling of import order

7e6a06f

Remove special handling of importing of `dandischema.models` before `set_instance_config()` is called. The whole point of the containing PR is to remove the reliance on import order, so such a handling will no longer be needed.

candleindark reviewed Aug 19, 2025

View reviewed changes

candleindark added 2 commits August 19, 2025 09:41

style: Give more informative user error prompt

3bdb67b

In validation errors of published dandiset

fix: correct PublishedDandiset.get_doi_pattern()

fbf75b1

yarikoptic closed this Aug 22, 2025

Support dynamic configuration #316

Support dynamic configuration #316

Uh oh!

Conversation

jjnesbitt commented Jul 31, 2025

Uh oh!

codecov bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

candleindark left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjnesbitt commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

candleindark commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yarikoptic commented Jul 31, 2025

Uh oh!

yarikoptic commented Jul 31, 2025

Uh oh!

yarikoptic commented Aug 8, 2025

Uh oh!

jjnesbitt commented Aug 8, 2025

Uh oh!

candleindark commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjnesbitt commented Aug 12, 2025

Uh oh!

candleindark commented Aug 12, 2025

Uh oh!

yarikoptic commented Aug 15, 2025

Uh oh!

candleindark Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yarikoptic commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Jul 31, 2025 •

edited

Loading

candleindark left a comment •

edited

Loading

jjnesbitt commented Jul 31, 2025 •

edited

Loading

candleindark commented Jul 31, 2025 •

edited

Loading

candleindark commented Aug 8, 2025 •

edited

Loading

candleindark Aug 19, 2025 •

edited

Loading