Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional schema #104

Closed
wants to merge 8 commits into from
Closed

Add optional schema #104

wants to merge 8 commits into from

Conversation

kazk
Copy link
Contributor

@kazk kazk commented Jun 27, 2021

  • Schema is generated by re-serializing, so any fixups are included.
  • $ref is inlined by calling T::schema() and appending any other values, including Kubernetes extensions.
  • Doesn't depend on schemars

Closes #86

Output of pretty printed Pod::schema(): https://gist.github.com/kazk/e4a56b7ff9431416a1c29a1c602a7c8a

kazk added 2 commits June 26, 2021 18:34
- Schema is generated by re-serializing, so any fixups are included.
- `$ref` is inlined by calling `T::schema()` and overriding any
  description if specified.
@Arnavion
Copy link
Owner

The "Files changed" tab lags both Firefox and Chromium to death, so this is going to be a great PR to review :D

writeln!(writer)?;
writeln!(writer, r#"#[cfg(feature = "schema")]"#)?;
writeln!(writer, r#"impl {type_name} {{"#, type_name = type_name)?;
writeln!(writer, r" pub fn schema() -> serde_json::Value {{")?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to implement the thing for expanding $refs anyway, is there a reason to return serde_json::Value here instead of schemars::JsonSchema ?

Also, that would allow us to emit code that directly creates a schemars::SchemaObject, instead of round-tripping through serde_json::json!. Then the user doesn't have to compile every invocation of that proc macro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean, why not impl schemars::JsonSchema?

  1. Making k8s_openapi depend on schemars felt awkward to me. I like how it doesn't restrict users to a specific implementation. kube currently uses schemars, but there are some missing features that might not make sense to add to a general JSON schema library, so this might change in the future. By keeping the schema serde_json::Value, it can be also useful for other use cases (as you wrote in Interop with schemars crate - impl schemars::JsonSchema for resource types #86 (comment)). Also, it should be pretty easy to make use of this information for schemars.
  2. impl schemars::JsonSchema seems like a lot of work if we want to avoid serde_json::json!. Methods used in derived JsonSchema to help are private too. Maybe I'm missing something, though. I haven't manually implemented JsonSchema for complex types. We'll need to generate code to build SchemaObject from swagger20::Schema. For each $ref, we'll need to do <T as schemars::JsonSchema>::add_schema_as_property, but this is not part of the public API.

Copy link
Owner

@Arnavion Arnavion Jun 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If #[derive(schemars::JsonSchema] can do it, then it's public API. Unless it's #[doc(hidden)] because the maintainer doesn't want other people to use it. Looking at what #[derives(schemars::JsonSchema)] expands to should answer that.

But the point about being agnostic to the crate is taken. How does a user who does want to use #[derive(schemars::JsonSchema)] on their type that contains k8s_openapi types use this serde_json::Value ?

Copy link
Contributor Author

@kazk kazk Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's #[doc(hidden)] in the latest published version, and it was then removed (GREsau/schemars@55b8604#diff-88895b0c150fd6f777b3df6591cbb00c7b88b100f8d99b769babe68315fd2cd0).

How does a user who does want to use #[derive(schemars::JsonSchema)] on their type that contains k8s_openapi types use this serde_json::Value ?

The easiest is by using #[schemars(schema_with = "schema_fn")], which is already necessary when adding validations and Kubernetes extensions. The original example from #86 is possible with:

#[derive(CustomResource, Debug, Clone, Deserialize, Serialize, JsonSchema)]
#[kube(group = "example.com", version = "v1", kind = "Example")]
#[kube(shortname = "ex", namespaced)]
pub struct ExampleSpec {
    #[schemars(schema_with = "optional_pvc_spec_schema")]
    pub pvcspec: Option<PersistentVolumeClaimSpec>,
}

fn optional_pvc_spec_schema(_: &mut schemars::gen::SchemaGenerator) -> schemars::schema::Schema {
    let mut schema = PersistentVolumeClaimSpec::schema();
    // for `Option`
    schema
        .as_object_mut()
        .expect("schema object")
        .insert("nullable".to_owned(), serde_json::json!(true));
    serde_json::from_value(schema).unwrap()
}

(UPDATE: can ignore the rest of this comment, and see the next one)

If we add k8s_openapi::Schema trait, we can do something like this too:

#[derive(CustomResource, Debug, Clone, Deserialize, Serialize, JsonSchema)]
#[kube(group = "example.com", version = "v1", kind = "Example")]
#[kube(shortname = "ex", namespaced)]
pub struct ExampleSpec {
    #[schemars(with = "Nullable<PersistentVolumeClaimSpec>")]
    pub pvcspec: Option<PersistentVolumeClaimSpec>,
}

with

struct Nullable<'a, T> {
    phantom: std::marker::PhantomData<&'a T>,
}

impl<T> schemars::JsonSchema for Nullable<'_, T>
where
    T: k8s_openapi::Schema,
{
    fn schema_name() -> String {
        // Doesn't guarantee uniqueness, but it doesn't matter for us because
        // we're not using `definitions`.
        std::any::type_name::<T>().to_owned()
    }

    fn json_schema(_: &mut schemars::gen::SchemaGenerator) -> schemars::schema::Schema {
        let mut schema = T::schema();
        schema
            .as_object_mut()
            .expect("schema object")
            .insert("nullable".to_owned(), serde_json::json!(true));
        serde_json::from_value(schema).unwrap()
    }
}

Copy link
Contributor Author

@kazk kazk Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the trait because it can be just:

pub struct ExampleSpec {
    #[schemars(with = "Option<SchemaValue<PersistentVolumeClaimSpec>>")]
    pub pvcspec: Option<PersistentVolumeClaimSpec>,
}

with

struct SchemaValue<'a, T> {
    phantom: std::marker::PhantomData<&'a T>,
}

impl<T> schemars::JsonSchema for SchemaValue<'_, T>
where
    T: k8s_openapi::Schema,
{
    fn schema_name() -> String {
        std::any::type_name::<T>().to_owned()
    }

    fn json_schema(_: &mut schemars::gen::SchemaGenerator) -> schemars::schema::Schema {
        serde_json::from_value(T::schema()).unwrap()
    }
}

k8s-openapi-codegen-common/src/swagger20/definitions.rs Outdated Show resolved Hide resolved
k8s-openapi-codegen-common/src/lib.rs Outdated Show resolved Hide resolved
@Arnavion
Copy link
Owner

I know this has been blocked on me for some time, but I've been busy and will probably be busy for another two weeks due to IRL stuff. Sorry.

@Arnavion
Copy link
Owner

Okay, sorry for the delay. I should have time more consistently now.

I'll continue the discussion from #104 (comment) here so that it's easier to find.


You said:

Methods used in derived JsonSchema to help are private too. Maybe I'm missing something, though. I haven't manually implemented JsonSchema for complex types. We'll need to generate code to build SchemaObject from swagger20::Schema. For each $ref, we'll need to do <T as schemars::JsonSchema>::add_schema_as_property, but this is not part of the public API.

But AFAICT this isn't a problem. JsonSchema::json_schema needs to return a Schema, specifically a Schema::Object which contains a SchemaObject, for which all the fields we would set are public.

And specifically, for $refs you'd create schemars::schema::SchemaObject { reference: Some("io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta".to_owned()), ..Default::default() }


Anyway, going through your change, I remembered why I suggested implementing schemars::JsonSchema instead of returning a serde_json::Value:

https://github.com/kazk/k8s-openapi/blob/c624cce2148ace17a5728b6edc9c4c595dc819fb/src/v1_21/apiextensions_apiserver/pkg/apis/apiextensions/v1/json_schema_props.rs#L725-L741

Each of these is an infinite recursion. Currently this is only a problem for JsonSchemaProps (which is why there's also a circular-reference workaround for this type in another place), but in the future there might be more.

Impling schemars::JsonSchema allows us to use gen.subschema_for for the field types, which solves the circular reference problem.

If we want something generic that isn't tied to schemars but nevertheless leaves refs unflattened to allow recursive types, we can have our custom traits to have an agnostic equivalent of subschema_for:

trait SchemaGenerator {
    // Unlike schemars::gen::SchemaGenerator::subschema_for, this doesn't need to return a value.
    fn visit_subschema<T: Schema>(&mut self);
}

impl Schema for Pod {
    fn schema(gen: &mut SchemaGenerator) -> (&'static str, serde_json::Value, BTreeMap<&'static str, fn() -> serde_json::Value>) {
        let schema_name = "io.k8s.api.core.v1.Pod";

        gen.visit_subschema::<crate::apimachinery::pkg::apis::meta::v1::ObjectMeta>();

        let schema = serde_json! {{
            "properties": {
                "metadata": { "$ref": "io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta" }
            },
        }};
    }
}

But even with this, the impl of k8s_openapi::SchemaGenerator for schemars::gen::SchemaGenerator has to come from k8s_openapi itself (because of orphan rules), so k8s_openapi will end up having a dep on schemars anyway.

@kazk
Copy link
Contributor Author

kazk commented Jul 25, 2021

Okay, sorry for the delay. I should have time more consistently now.

No problem, thanks for reviewing.

JsonSchema::json_schema needs to return a Schema, specifically a Schema::Object which contains a SchemaObject, for which all the fields we would set are public.

And specifically, for $refs you'd create schemars::schema::SchemaObject { reference: Some("io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta".to_owned()), ..Default::default() }

Can that SchemaObject resolve those $refs? I didn't think so, but haven't looked into it much. My comment was based on looking at the output of #[derive(JsonSchema)] for structs with JsonSchema impled fields ($refs), and finding uses of #[doc(hidden)] methods.


Yeah, the circular reference in JSONSchemaProps* is a problem.

Impling schemars::JsonSchema allows us to use gen.subschema_for for the field types, which solves the circular reference problem.

I originally thought infinite recursion is still a problem even if we use gen.subschema_for because the schema cannot have $refs. And with this restriction, calling T::schema() is basically the same as gen.subschema_for::<T>(). But I noticed inline_subschemas documentation has "Some references may still be generated in schemas for recursive types.", so I guess it does prevent infinite recursion (the schema will be invalid for us, though).
For the use case of generating a schema for CRD, circular references should never be a problem because it's impossible to describe with the limited schema.

If we go with schemars route, it'll prevent infinite recursion and produce invalid schema for us instead when that happens. If what you described works, then it might be worth the effort. We should change the feature to schemars.

If it's not as easy to do, maybe replace self referencing fields in JSONSchemaProps* (and anything else in the future) with {} (any type)? Or skip them entirely? Not great, but I don't think anyone is interested in supporting them anyway because the main use case is CRD schema.

@Arnavion
Copy link
Owner

and finding uses of #[doc(hidden)] methods.

Right, they're #[doc(hidden)] but they're just convenience wrappers for the most part. Essentially the flow is:

https://github.com/GREsau/schemars/blob/master/schemars_derive/src/lib.rs#L121

-> https://github.com/GREsau/schemars/blob/master/schemars_derive/src/schema_exprs.rs#L12

-> https://github.com/GREsau/schemars/blob/master/schemars_derive/src/schema_exprs.rs#L436

-> https://github.com/GREsau/schemars/blob/master/schemars/src/_private.rs#L29

-> https://github.com/GREsau/schemars/blob/master/schemars/src/_private.rs#L36

... and for our codegen we can just do the same expansion ourselves.


I originally thought infinite recursion is still a problem even if we use gen.subschema_for because the schema cannot have $refs.

The way it works with schemars is:

impl JsonSchema for Metadata {
    fn json_schema(gen) -> Schema {
        // (1)
        Schema::Object(SchemaObject {
            object: Some(...),
        })
    }
}

impl JsonSchema for Pod {
    fn json_schema(gen) -> Schema {
        Schema::Object(SchemaObject {
            object: Some(ObjectValidation {
                properties: {
                    "metadata": gen.subschema_for("io...Metadata"), // (2)
                },
            }),
        })
    }
}

(1) is the actual schema of Metadata, and as part of (2) gen.subschema_for() does get this actual Schema and persist it in its internal list of subschemas, but the Schema it returns from that function is a synthesized one that is a Schema::Object(SchemaObject { reference: Some("io...Metadata") }) instead. So the Schema returned by Pod::json_schema does contain $refs, but that's fine, because the targets of the refs have been registered with the generator inside subschema_for.


If we go with schemars route, it'll prevent infinite recursion and produce invalid schema for us instead when that happens.

Can you clarify why it would be invalid? Do you mean that you'd consider a schema with $refs to be invalid? Why is that?

@kazk
Copy link
Contributor Author

kazk commented Jul 25, 2021

Thanks for expanding on schemars. I'll look into it. They seem to be considering making #[doc(hidden)] in methods in JsonSchema public, so that helps (https://github.com/GREsau/schemars/blob/55b860428e4526ef34886cb877af58256de6b046/schemars/src/lib.rs#L364-L374).

Do you mean that you'd consider a schema with $refs to be invalid? Why is that?

It's a valid JSON Schema/Open API, but it's invalid for our use case because $ref is not allowed in CRD schema. kubernetes/kubernetes#62872
We use inline_subschemas when generating.

@Arnavion
Copy link
Owner

inline_subschemas will still work for your use case, then. It's the SchemaGenerator's job to handle that.

@kazk
Copy link
Contributor Author

kazk commented Jul 26, 2021

inline_subschemas will still work for your use case, then.

No, I knew that. I meant the recursive types won't be supported (for our use case) either way. The current version has infinite recursion, and schemars version will have $ref.

@kazk
Copy link
Contributor Author

kazk commented Jul 26, 2021

For Pod, the generated code should look like the following:

impl crate::schemars::JsonSchema for Pod {
    fn schema_name() -> String {
        "io.k8s.api.core.v1.Pod".to_owned()
    }

    fn json_schema(__gen: &mut crate::schemars::gen::SchemaGenerator) -> crate::schemars::schema::Schema {
        crate::schemars::schema::Schema::Object(crate::schemars::schema::SchemaObject {
            metadata: Some(Box::new(crate::schemars::schema::Metadata {
                description: Some("Pod is a collection of containers that can run on a host. This resource is created by clients and scheduled onto hosts.".to_owned()),
                ..Default::default()
            })),
            object: Some(Box::new(crate::schemars::schema::ObjectValidation {
                properties: std::array::IntoIter::new([
                    (
                        "apiVersion".to_owned(),
                        crate::schemars::schema::Schema::Object(crate::schemars::schema::SchemaObject {
                            metadata: Some(Box::new(crate::schemars::schema::Metadata {
                                description: Some("APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources".to_owned()),
                                ..Default::default()
                            })),
                            instance_type: Some(std::convert::Into::into(crate::schemars::schema::InstanceType::String)),
                            ..Default::default()
                        }),
                    ),
                    (
                        "kind".to_owned(),
                        crate::schemars::schema::Schema::Object(crate::schemars::schema::SchemaObject {
                            metadata: Some(Box::new(crate::schemars::schema::Metadata {
                                description: Some("Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds".to_owned()),
                                ..Default::default()
                            })),
                            instance_type: Some(std::convert::Into::into(crate::schemars::schema::InstanceType::String)),
                            ..Default::default()
                        }),
                    ),
                    (
                        "metadata".to_owned(),
                        __gen.subschema_for::<crate::apimachinery::pkg::apis::meta::v1::ObjectMeta>(),
                    ),
                    (
                        "spec".to_owned(),
                        __gen.subschema_for::<crate::api::core::v1::PodSpec>(),
                    ),
                    (
                        "status".to_owned(),
                        __gen.subschema_for::<crate::api::core::v1::PodStatus>(),
                    ),
                ]).collect(),
                required: std::array::IntoIter::new([
                    "metadata",
                ]).map(std::borrow::ToOwned::to_owned).collect(),
                ..Default::default()
            })),
            ..Default::default()
        })
    }
}
JSON
{
  "type": "object",
  "description": "Pod is a collection of containers that can run on a host. This resource is created by clients and scheduled onto hosts.",
  "properties": {
    "apiVersion": {
      "description": "APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources",
      "type": "string"
    },
    "kind": {
      "description": "Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds",
      "type": "string"
    },
    "metadata": {
      "$ref": "#/definitions/io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta",
      "description": "Standard object's metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata"
    },
    "spec": {
      "$ref": "#/definitions/io.k8s.api.core.v1.PodSpec",
      "description": "Specification of the desired behavior of the pod. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status"
    },
    "status": {
      "$ref": "#/definitions/io.k8s.api.core.v1.PodStatus",
      "description": "Most recently observed status of the pod. This data may not be up to date. Populated by the system. Read-only. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status"
    }
  }
}

gen.subschema_for::<T>() should work, but we'll lose the overrides (description and extensions) set in the reference object in the upstream spec. To be fair, that's the correct behavior.

When an object contains a $ref property, the object is considered a reference, not a schema. Therefore, any other properties you put there will not be treated as JSON Schema keywords and will be ignored by the validator.

https://json-schema.org/understanding-json-schema/structuring.html#ref

This object cannot be extended with additional properties and any properties added SHALL be ignored.
https://swagger.io/specification/#reference-object

@Arnavion
Copy link
Owner

I started a thing here if you want to build off of it. k8s-openapi-codegen-common/src/templates/impl_schema.rs needs to be filled out to handle all the SchemaKinds.

@kazk
Copy link
Contributor Author

kazk commented Jul 28, 2021

I got it mostly working locally. Output of JSONSchemaProps with inline_subschemas: https://gist.github.com/kazk/0de1996d49018d53557dab4b22b0c5f7 (some types are still missing)

@kazk
Copy link
Contributor Author

kazk commented Jul 28, 2021

I was able to override description for $refs (e.g., fixes port field having the description of IntOrString). PodSpec output: https://gist.github.com/kazk/33bb9f9b22da93349372748ae84a1ede

Extensions can be added similarly, so there's no downsides now. I'll open a new PR later.

@kazk kazk mentioned this pull request Jul 29, 2021
1 task
@kazk
Copy link
Contributor Author

kazk commented Jul 29, 2021

Superseded by #105.

@kazk kazk closed this Jul 29, 2021
@kazk kazk deleted the add-schema branch August 9, 2021 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interop with schemars crate - impl schemars::JsonSchema for resource types
2 participants