-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec: Initial OpenAPI Spec for a Rest Catalog #3770
Conversation
4ee9289
to
15ffde3
Compare
15ffde3
to
1d659ba
Compare
rest_docs/rest-catalog-open-api.yaml
Outdated
description: Wrapper for the response of a successful request | ||
example: { "data": { ... } } | ||
|
||
ResponseErrorObject: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these responses go under responses
? Is there a distinction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There really isn't much of a distinction. It's possible to define a bit more things from within the schema
section, but I'll check if I actually am using any of those anymore or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So TIL it seems that originally there was just schemas
. So it has some of the most support for JSON schema and things (like pattern checks that can be added as javax annotations etc).
Other things are newer and less feature rich. I've seen several tutorials with basic error classes, and they've mostly done it via schemas
. I'm not sure if that's just to lessen the overhead for the article, but since we're getting rid of the data
wrapper, I'm going to keep the error
wrapper in schemas for now.
Keeping the ResponseErrorObject
in schemas allows it to have much more control over what is and is not excepted, what's required to be defined, etc.
I'm open to moving it, but for now it seems like just getting rid of the data
wrapper and the top level wrapper and keeping ResponseErrorObject
would be good.
|
||
components: | ||
####################################################### | ||
# Common Parameter Definitions Used In Several Routes # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to have this deduplication!
rest_docs/rest-catalog-open-api.yaml
Outdated
description: whether the underlying data was purged or is being purged | ||
|
||
GetNamespaceResponse: | ||
description: Returns a namespace and the properties stored on it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably mention that if properties are not supported the properties are optional? Or maybe they must be sent but can be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How I have it currently, on the response, it's defaulted to { }
. As within the Iceberg code, we don't really have the concept of a Namespace without properties. It's just implicit in some places that it's not supported.
I'll throw the words optional or when supported on there though to be more precise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. I also indicated that namespaces
is a required field, as well as gave a default value of an empty object to properties
and marked is as the equivalent of @Nullable
.
When the properties
field is null, it indicates the server doesn't support namespace properties. If it returns an empty object, then there are simply no properties presently stored on this namesapce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to keep that way of distinguishing null
vs {}
, especially as there's no way to describe it in the OpenAPI document that I'm aware of, but it seemed like the easiest way to indicate truly empty vs not supported (and thus empty).
delete: | ||
tags: | ||
- Catalog API | ||
summary: Drop a namespace from the catalog. Namespace must be empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, we may want to add a cascade option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I hesitated to add this in (especially as I've been working on adding cascade support), but it is on the interface definition (as indicated by the @throws
section of the javadoc):
iceberg/api/src/main/java/org/apache/iceberg/catalog/SupportsNamespaces.java
Lines 98 to 105 in c5d3d1c
/** | |
* Drop a namespace. If the namespace exists and was dropped, this will return true. | |
* | |
* @param namespace a namespace. {@link Namespace} | |
* @return true if the namespace was dropped, false otherwise. | |
* @throws NamespaceNotEmptyException If the namespace is not empty | |
*/ | |
boolean dropNamespace(Namespace namespace) throws NamespaceNotEmptyException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we wanted to avoid boolean parameters on the catalog API where we could. The problem here is that it would be really inefficient to use the rest of this API to do a cascading drop because the client would need potentially hundreds of requests to drop each table and namespace individually, and the requests have relatively high latency. We may want to add a cascade flag to the catalog API as well, or a new method (dropNamespaceCascade
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks good to me. I think that one of the main things to do is add consistency since there are a lot of differences that make it hard to consume as the yaml spec. I think that part of the motivation is to not have a huge number of schemas and response types so this is embedding them in some of the routes. But that ends up being a little confusing overall.
4871563
to
2f27baa
Compare
2b48812
to
7ef36d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great there's a spec, had a few small questions , from a first glance.
$ref: '#/components/responses/DropTableResponse' | ||
example: { "dropped": true, "purged": false } | ||
202: | ||
description: Accepted - for use if purgeRequested is implemented as an asynchronous action. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, any way to check if async action finished?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. At present we've left it out for simplicity (as there's already been a lot of discussion around just this), so it will be include it in a follow up PR, but there can be a callback URL that a client can subscribe to to get notification messages etc.
401: | ||
description: Unauthorized | ||
404: | ||
description: Not Found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below we have NoSuchTable and NoSuchNamespace, curious, should we follow it? (Sorry I guess it was already commented but I didn't follow the resolution)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's ok. Here, where there's just a response code and a description, in OpenAPI terms that means that there is no response body.
I've done it here for HEAD
requests, which aren't supposed to return a response body. However, several well-known systems such as S3 still do have a response body with HEAD requests, so we might allow for that anyway.
Check if a table exists within a given namespace. This request does not return a response body. | ||
responses: | ||
200: | ||
description: OK - Table Exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rymurr and @jackye1995, what do you think about adding a body to the HEAD responses? I think it would be nice to do for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case, I feel it's better to use POST /v1/tables/head
to express such operation, otherwise I am not sure what is going to happen for an open-api generated client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to use a GET
request to /v1/namespaces/{namespace}/tables/{table}/exists
instead?
But I don't think that's worth it. Let's just leave this as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine as-is for now. But a lot of libraries seem to allow bodies for HEAD requests. We should consider it in the future though. I know there are "the standard(s)" but quite a number of well used libraries do seem to deviate from them now and then for practical purposes.
rest_docs/rest-catalog-open-api.yaml
Outdated
examples: | ||
NoSuchNamespaceExample: | ||
$ref: '#/components/examples/NoSuchNamespaceError' | ||
head: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we actually want this. It makes sense to have a HEAD route for tables since loading them can be expensive (sending the full metadata JSON). But here you may as well run the GET route because it's simple to look up the database params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say we keep it for now but consider dropping it. Possibly we can mark it as optional somehow in a follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, since we are not going to add bodies to the HEAD responses, I think that we should remove this. There isn't a significant benefit that I can think of. We can always add it later if someone wants it.
I'm +1 for committing this. I think it's a great start on the REST catalog spec and that the namespace basics are ready to go. |
rest_docs/rest-catalog-open-api.yaml
Outdated
description: Unauthorized | ||
404: | ||
description: Not Found | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I realize some places have extra lines for each HTTP verb section, and some don't, just make sure it's consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for consistency. I like the empty line between verbs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated to be consistent with empty lines between verbs, including an empty line after the path and parameters and the first verb.
Check if a table exists within a given namespace. This request does not return a response body. | ||
responses: | ||
200: | ||
description: OK - Table Exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case, I feel it's better to use POST /v1/tables/head
to express such operation, otherwise I am not sure what is going to happen for an open-api generated client.
NoSuchNamespaceExample: | ||
$ref: '#/components/examples/NoSuchNamespaceError' | ||
|
||
/v1/namespaces/{namespace}/properties: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I missed the discussion last time, but I still don't understand why we cannot use PUT /v1/namespaces/{namespace}
for this. I would assume if we have some special properties for namespace in the future, we still want to have the ability to update that anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My takeaway from the discussion was that PUT
is typically used to set the entire state and should be idempotent. If we were specifically setting the properties to a specific set, PUT
would be appropriate. Since we are sending changes and not the complete state, POST
is better so that it is more clear what is happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for having this new PR out! I have mostly nitpicking comments, but overall this looks like a very great start!
The big questions I have are:
- namespace property, can we just use
PUT namespace
for that /v1/tables/rename
, can we use/v1/table-ops/rename
or something else to make the path more self-explanatory- I think we do need 5xx and 400 error definitions in the spec based on RFC 7231, and we can simplify the error response using a universal error schema.
404: | ||
description: Not Found | ||
|
||
/v1/tables/rename: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can use a different word from tables
for action paths, because tables
already means a table resource in the REST-based path. Using another word like table-ops
might be more self-explanatory for the purpose of the route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see a downside to using tables
here. I don't think it is confusing to have some operations under tables
and still use a namespace path for individual table operations.
Thanks for your review @jackye1995
My one concern with using an unqualified
I'm presently against using anything with Can we can mentally mark it as potentially re-namable to I also feel that
Yes we do. I have seen them with exactly |
|
a4a3f5c
to
d1f3e23
Compare
Sure. In that case the only big thing I see currently missing is the 400 response, I will approve when that is added. Some pending discussions in my mind after this PR are:
|
Thanks for the pending thoughts @jackye1995! I'll respond a bit, and then we'll keep these for later discussion as well (assuming there will be more discussion groups / follow up PRs etc)
Agreed.
So currently, there's no set of fields in the response for HEAD requests. That's what the response code and just the string
I had brought up pagination for one or two routes, but it was deemed as somewhat premature. I think eventually we will definitely need some kind of pagination for sure.
Will be coming in a follow up.
That one is not currently implemented, but I agree. Once we finish implementing it, it should be easy to add in. We can start considering where to put it in the API here sooner. Probably as a query parameter I would think. |
Yes I mean to have another API with a |
7278319
to
118fc52
Compare
Another thing I only recently discovered is that many of these things can be split across several files. A CLI tool is then used to bundle them together if need be, but it might be good to consider that in the future to make consuming an ever-growing document a lot simpler. Not going to do that for now as that's probably something to discuss (it does add complexity as well for the reader), but it might be something we can find a trade off for. |
Ahh that makes sense. I'm still not personally opposed to having a response body in a HEAD request, as users can likely ignore it given that it's a HEAD request. And given that there are a number of well used APIs that do have them. But for now I think it's probably best to put that into the backlog as several people have had strong-ish opinions about it. The "backlog" meaning the work I'll be returning to more or less right after this PR as well as whatever side work I also have going 😄 |
@jackye1995, it looks like we're mostly in agreement now:
For 1, I think we're just trying to decide on the path for the For 2, I think we should remove the HEAD request for namespaces and leave the tables HEAD request without response bodies. We're probably agreed? 3, 4, and 5 are not strictly required so I like the idea of adding them in follow-ups in parallel. That way we can start iterating on other areas with this as the basis. |
I'm in agreement on 1, that I've gone ahead and removed the HEAD request for namespaces. We can always add it back in in the future if need be. I agree that returning the whole table object for an existence check might be relatively expensive, and so it makes sense to keep the HEAD check for tables. I'm still not opposed to adding response bodies to the HEAD requests eventually, but that can be a topic for another day if need be. And then agreed on 3,4,5. The sooner we can get the initial document in, the sooner we can make smaller changes that are more digestible and can also work more in parallel 😄 |
There's a failing Hive test that by definition can't be related to this PR (as this PR only adds an initial YAML document). |
Hey all, I just got back today. Will review this asap tomorrow morning. Hope I am not holding anyone up? |
Should be fine. Thanks for taking a look, @rymurr! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, thanks for all the work!
df8cf5f
to
ce8c924
Compare
@rymurr, any comments? We'd like to start working on the follow ups if you have a chance to take a look. |
cc @nastra as well @rymurr in case you have any final comments on the doc as it is. I know it's a lot to ingest, but I think if we can get this initial doc in, then we can hopefully make things smaller and begin to follow a more consistent pattern that we'll adapt to as we all work more with OpenAPI. Please keep in mind this is just the first of many patches and things to get this off the ground, so that work can begin somewhat more parallelized. Also, once we're all working on this API, if things need to be changed for practical reasons etc, I'm not opposed to that. I think those things will become more apparent as we make more progress. 😄 |
…for table resource for now. Also update the list namespaces example description.
ce8c924
to
e09e509
Compare
Please don't wait for me. I will get to it next week and we can do a follow
up if I raise anything.
…On Thu, Jan 6, 2022 at 6:21 AM Kyle Bendickson ***@***.***> wrote:
@rymurr <https://github.com/rymurr>, any comments? We'd like to start
working on the follow ups if you have a chance to take a look.
cc @nastra <https://github.com/nastra> as well @rymurr
<https://github.com/rymurr> in case you have any final comments on the
doc as it is. I know it's a lot to ingest, but I think if we can get this
initial doc in, then we can hopefully make things smaller and begin to
follow a more consistent pattern that we'll adapt to as we all work more
with OpenAPI.
Please keep in mind this is just the first of many patches and things to
get this off the ground, so that work can begin somewhat more parallelized.
Also, once we're all working on this API, if things need to be changed for
practical reasons etc, I'm not opposed to that. I think those things will
become more apparent as we make more progress. 😄
—
Reply to this email directly, view it on GitHub
<#3770 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAPNXIL4GEPQKYMPM3KLX53UUURGTANCNFSM5KJ7LIWA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I merged this and we can pick up changes from @rymurr's review as we extend this. Thanks @jackye1995 and everyone on the original PR for all the input, and especially to @kbendick for getting this ready! |
This is a follow up to the original PR, #3561, as the discussion there got too large.
We mostly handle namespace related concerns here. Table related objects will come in a follow up PR.
I have followed up on comments and suggestions that were brought up on the other PR, as well as some conclusions that we've come to in the meeting.
The easiest way to view this document is by importing the YAML directly into editor.swagger.io