Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/versions header Content-Type value should be granularized according to RFC requirements in validator #669

Closed
CasperWA opened this issue Jan 11, 2021 · 10 comments · Fixed by #670
Assignees
Labels
bug Something isn't working priority/medium Issue or PR with a consensus of medium priority validator Related to the OPTIMADE validator

Comments

@CasperWA
Copy link
Member

Section 5.2 only specifies that what is returned is a subset of the MIME type described by text/csv; header=present, I don't see anything I would specifically read as a formal requirement on the content-type parameter in the http headers. The CSV RFC does have a lowercase "should" when discussing the header, and other parts seem to suggest the content-type parameter is optional.

Originally posted by @rartino in Materials-Consortia/providers#58 (comment)

The validation check for the header value for Content-Type for the /versions-endpoint seems to not be on a MUST level and should be downgraded in the validator.

@CasperWA CasperWA added priority/medium Issue or PR with a consensus of medium priority validator Related to the OPTIMADE validator labels Jan 11, 2021
@ml-evs
Copy link
Member

ml-evs commented Jan 11, 2021

I would say that the text/csv part is a MUST though right? Otherwise browsers won't even know what to do with it. The header could be optional, sure, though automated scrapers will prefer it to be there, I'm sure

@CasperWA
Copy link
Member Author

CasperWA commented Jan 11, 2021

I would say that the text/csv part is a MUST though right? Otherwise browsers won't even know what to do with it. The header could be optional, sure, though automated scrapers will prefer it to be there, I'm sure

I don't know that it makes sense. If it's just returning a 404 Not Found, the Content-Type could be anything, also text/plain or anything, right?
But for the actual working endpoint, yes. Mandatory I would also say. What say you @rartino? :)

ml-evs added a commit that referenced this issue Jan 11, 2021
- Make checks on content-type for '/versions' optional (closes #669)
- Report correct base URL on /versions validation failure (closes #668)
@ml-evs
Copy link
Member

ml-evs commented Jan 11, 2021

But for the actual working endpoint, yes. Mandatory I would also say. What say you @rartino? :)

Well, that's the bit that is now optional...

@ml-evs
Copy link
Member

ml-evs commented Jan 11, 2021

The 404 was a red herring, it was always successfully ignoring it (as reported by the log lines following the error). The misleading bit was that the line above showed the wrong request URL (as it is a bit of a fudge to go up one level in base URL inside the validator).

If we want content-type to check text/csv properly on real endpoints, I'll have to update #670 to what I had previously.

@CasperWA
Copy link
Member Author

The 404 was a red herring, it was always successfully ignoring it (as reported by the log lines following the error). The misleading bit was that the line above showed the wrong request URL (as it is a bit of a fudge to go up one level in base URL inside the validator).

If we want content-type to check text/csv properly on real endpoints, I'll have to update #670 to what I had previously.

Hmm. I think we should check that the content type is indeed text/csv, take it as mandatory, but all other content type values are optional? Isn't this also the definition in the RFC?

@ml-evs
Copy link
Member

ml-evs commented Jan 11, 2021

Yeah, I agree, I guess I was misinterpreting your comment at the top of the issue:

The validation check for the header value for Content-Type for the /versions-endpoint seems to not be on a MUST level and should be downgraded in the validator.

where "header" in your comment refers to header=present and not the response header generally 🙃

@CasperWA
Copy link
Member Author

Something like that... 😅 But yeah. Misleading title - will fix.

@CasperWA CasperWA changed the title /versions header Content-Type is OPTIONAL/SHOULD /versions header Content-Type value should be granularized according to RFC requirements Jan 11, 2021
@CasperWA CasperWA changed the title /versions header Content-Type value should be granularized according to RFC requirements /versions header Content-Type value should be granularized according to RFC requirements in validator Jan 11, 2021
ml-evs added a commit that referenced this issue Jan 11, 2021
- Make checks on header parameter in content-type for '/versions' optional (closes #669)
- Report correct base URL on /versions validation failure (closes #668)
@rartino
Copy link
Contributor

rartino commented Jan 12, 2021

 I would say that the text/csv part is a MUST though right? Otherwise browsers won't even know what to do with it. The header could be optional, sure, though automated scrapers will prefer it to be there, I'm sure

But for the actual working endpoint, yes. Mandatory I would also say. What say you @rartino? :)

I'm not sure for what use-case we really need to worry about scrapers. If you think you are connecting to an OPTIMADE endpoint, you know exactly what format you expect from /versions, regardless of the headers.

Lets try to be very formal about the requirements.

For all other endpoints, our header requirements are inherited from the JSON Api specification (MUST be Content-Type: application/vnd.api+json, except for index meta-dbs where we explicitly lift this to SHOULD to allow certain static hosting setups.) There are no corresponding statements about the /versions endpoint.

However, one may argue that OPTIMADE must adhere to the HTTP protocol (e.g., section 2 under "Response format"), so as far as I can see, the HTTP RFC is the only thing that formally sets these requirements, and it sets the presence of content-type on SHOULD level.

Of course, this doesn't exactly specify whether, IF IT IS INCLUDED, the header MUST be text/csv. I cannot really find a strict expression of "how bad" it is to mislabel HTTP content, but it is a stretch to read the formulations in the HTTP RFC as "allowing this". Nonetheless, it is my impression that we "in the wild" often allow formats that are more strict to be embedded in a more general media-type. In fact, we already allow such "upconversion" of the media-type in the OPTIMADE specification for serving JSON as text/plain from index meta-dbs to accommodate inflexible hosting. I think it is reasonable to extend this also to the /versions endpoint to allow text/csv to be served as text/plain.

Hence, I'd argue:

  • Presence of content-type in /versionsis a requirement on SHOULD level.
  • If present, the content-type on /versions SHOULD be text/csv, but if it is not, it MUST be text/plain.

@CasperWA
Copy link
Member Author

CasperWA commented Jan 12, 2021

I honestly don't like this loose approach and the expectation that any client/user simply "knows" the format of endpoints because it's/they've read the specification. I prefer the more automatized approach, where the protocol and standard web content negotiation let's the client/user derive as much information as possible. Especially because the /versions endpoint introduces a different content format than the rest of the endpoints...

Already using text/plain for the JSON endpoints should be disallowed, but I can understand practical limitations on why this might be difficult to comply with (e.g., setting headers for GH Pages and similar). I think at the very least we should lift this requirement to a MUST level on the standard implementations (i.e., keeping it on a SHOULD level for the index meta-databases, which prove to be the ones most hit by this requirement).

However, in the end, I respect the references you've mentioned above, @rartino, and as such this comment represents more my opinion than anything else.

@ml-evs
Copy link
Member

ml-evs commented Jan 12, 2021

For the sake of progress, I've demoted validation of any of the headers for /versions to SHOULD/OPTIONAL, though I agree with most of @CasperWA's last comment.

The /versions endpoint is already a bit funny in that it prompts a file download in the browser when the text/csv content-type is used (any just displays it when it is text/plain).

@rartino is correct that the specification does not really enforce any requirements on content-type, and I don't think it's worth updating the specification with this at this stage.

ml-evs added a commit that referenced this issue Jan 12, 2021
- Make checks on type and header parameter in content-type for '/versions' optional additionally allowing 'text/plain' (closes #669)
- Report correct base URL on /versions validation failure (closes #668)
ml-evs added a commit that referenced this issue Jan 15, 2021
- Make checks on type and header parameter in content-type for '/versions' optional additionally allowing 'text/plain' (closes #669)
- Report correct base URL on /versions validation failure (closes #668)
ml-evs added a commit that referenced this issue Jan 15, 2021
- Make checks on type and header parameter in content-type for '/versions' optional additionally allowing 'text/plain' (closes #669)
- Report correct base URL on /versions validation failure (closes #668)
- Added mock tests for /versions in various scenarios
ml-evs added a commit that referenced this issue Jan 15, 2021
- Make checks on type and header parameter in content-type for '/versions' optional additionally allowing 'text/plain' (closes #669)
- Report correct base URL on /versions validation failure (closes #668)
- Added mock tests for /versions in various scenarios
@ml-evs ml-evs added the bug Something isn't working label Jan 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority/medium Issue or PR with a consensus of medium priority validator Related to the OPTIMADE validator
Projects
None yet
3 participants