Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNG-6975] Wagon-HTTP, set content-type when determinable from file extension #72

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

chrisbeckey
Copy link

@chrisbeckey chrisbeckey commented Aug 6, 2020

Set the HTTP content-type header,

  • using the file extension to determine value, when uploading a file.

  • using the header when uploading a Stream

<profile>
<id>jdk11</id>
<activation>
<jdk>[1.11,)</jdk>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really 1.11? If yes, then this is a bug in Maven.

* <b>disabled by default</b>
* This flag is only effective when uploading from a File, not directly from a Stream.
*/
private static final boolean SET_CONTENT_TYPE_FROM_FILE_EXTENSION =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether this should be a per-server config rather than global. In general, sys props are either globally or for a technical reason. WDYT?

@chrisbeckey
Copy link
Author

chrisbeckey commented Aug 6, 2020 via email

@elharo elharo changed the title Wagon-HTTP, set content-type when determinable from file extension (MNG-6975) [MNG-6975] Wagon-HTTP, set content-type when determinable from file extension Aug 6, 2020
@pzygielo
Copy link
Contributor

pzygielo commented Aug 6, 2020

@chrisbeckey

Just to be sure I tested it with "1.11" and with "11" and they both worked.

Yeah, but why it did?

I think that 1.11 works only because when

[DEBUG] Detected Java String: '11.0.8'
[DEBUG] Normalized Java String: '11.0.8'

normalized string is tested against range [1.11,) - it is accepted due to 11 (of 11.0.8) being greater than 1 of (1.11).

I am pretty sure it shall be [11,), as you see, with [1.11,) for example JDK 10 is also accepted (as 10 > 1).

@chrisbeckey
Copy link
Author

Regarding the version 1.11 vs 11: in following up on the suggestion by @elharo to use URLConnection.guess... the need for javax.activation has been negated. Update to the PR is coming soon.

@chrisbeckey
Copy link
Author

Updated to use URLConnection.guess***, which also allows this to work with streaming. Added a second property to control behavior when an error in content identification occurs. Default to creating content-type and not failing if determining content-type fails. Thanks to @elharo for the suggestion, this is better than before.

@michael-o
Copy link
Member

The stream guessing is almost pointless, look at its code: https://github.com/battleblow/openjdk-jdk8u/blob/bsd-port/jdk/src/share/classes/java/net/URLConnection.java

Path based approach looks better:

#sun.net.www MIME content-types table
#
# Property fields:
#
#   <description> ::= 'description' '=' <descriptive string>
#    <extensions> ::= 'file_extensions' '=' <comma-delimited list, include '.'>
#         <image> ::= 'icon' '=' <filename of icon image>
#        <action> ::= 'browser' | 'application' | 'save' | 'unknown'
#   <application> ::= 'application' '=' <command line template>
#

#
# The "we don't know anything about this data" type(s).
# Used internally to mark unrecognized types.
#
content/unknown: description=Unknown Content
unknown/unknown: description=Unknown Data Type

#
# The template we should use for temporary files when launching an application
# to view a document of given type.
#
temp.file.template: c:\\temp\\%s

#
# The "real" types.
#
application/octet-stream: \
	description=Generic Binary Stream;\
	file_extensions=.saveme,.dump,.hqx,.arc,.obj,.lib,.bin,.exe,.zip,.gz

application/oda: \
	description=ODA Document;\
	file_extensions=.oda

application/pdf: \
	description=Adobe PDF Format;\
	file_extensions=.pdf

application/postscript: \
	description=Postscript File;\
	file_extensions=.eps,.ai,.ps;\
	icon=ps

application/rtf: \
	description=Wordpad Document;\
	file_extensions=.rtf;\
	action=application;\
	application=wordpad.exe %s

application/x-dvi: \
	description=TeX DVI File;\
	file_extensions=.dvi

application/x-hdf: \
	description=Hierarchical Data Format;\
	file_extensions=.hdf;\
	action=save

application/x-latex: \
	description=LaTeX Source;\
	file_extensions=.latex

application/x-netcdf: \
	description=Unidata netCDF Data Format;\
	file_extensions=.nc,.cdf;\
	action=save

application/x-tex: \
	description=TeX Source;\
	file_extensions=.tex

application/x-texinfo: \
	description=Gnu Texinfo;\
	file_extensions=.texinfo,.texi

application/x-troff: \
	description=Troff Source;\
	file_extensions=.t,.tr,.roff

application/x-troff-man: \
	description=Troff Manpage Source;\
	file_extensions=.man

application/x-troff-me: \
	description=Troff ME Macros;\
	file_extensions=.me

application/x-troff-ms: \
	description=Troff MS Macros;\
	file_extensions=.ms

application/x-wais-source: \
	description=Wais Source;\
	file_extensions=.src,.wsrc

application/zip: \
	description=Zip File;\
	file_extensions=.zip;\
	icon=zip;\
	action=save

application/x-bcpio: \
	description=Old Binary CPIO Archive;\
	file_extensions=.bcpio;\
	action=save

application/x-cpio: \
	description=Unix CPIO Archive;\
	file_extensions=.cpio;\
	action=save

application/x-gtar: \
	description=Gnu Tar Archive;\
	file_extensions=.gtar;\
	icon=tar;\
	action=save

application/x-shar: \
	description=Shell Archive;\
	file_extensions=.sh,.shar;\
	action=save

application/x-sv4cpio: \
	description=SVR4 CPIO Archive;\
	file_extensions=.sv4cpio;\
	action=save

application/x-sv4crc: \
	description=SVR4 CPIO with CRC;\
	file_extensions=.sv4crc;\
	action=save

application/x-tar: \
	description=Tar Archive;\
	file_extensions=.tar;\
	icon=tar;\
	action=save

application/x-ustar: \
	description=US Tar Archive;\
	file_extensions=.ustar;\
	action=save

audio/basic: \
	description=Basic Audio;\
	file_extensions=.snd,.au;\
	icon=audio

audio/x-aiff: \
	description=Audio Interchange Format File;\
	file_extensions=.aifc,.aif,.aiff;\
	icon=aiff

audio/x-wav: \
	description=Wav Audio;\
	file_extensions=.wav;\
	icon=wav;\
	action=application;\
	application=mplayer.exe %s

image/gif: \
	description=GIF Image;\
	file_extensions=.gif;\
	icon=gif;\
	action=browser

image/ief: \
	description=Image Exchange Format;\
	file_extensions=.ief

image/jpeg: \
	description=JPEG Image;\
	file_extensions=.jfif,.jfif-tbnl,.jpe,.jpg,.jpeg;\
	icon=jpeg;\
	action=browser

image/tiff: \
	description=TIFF Image;\
	file_extensions=.tif,.tiff;\
	icon=tiff

image/vnd.fpx: \
	description=FlashPix Image;\
	file_extensions=.fpx,.fpix

image/x-cmu-rast: \
	description=CMU Raster Image;\
	file_extensions=.ras

image/x-portable-anymap: \
	description=PBM Anymap Image;\
	file_extensions=.pnm

image/x-portable-bitmap: \
	description=PBM Bitmap Image;\
	file_extensions=.pbm

image/x-portable-graymap: \
	description=PBM Graymap Image;\
	file_extensions=.pgm

image/x-portable-pixmap: \
	description=PBM Pixmap Image;\
	file_extensions=.ppm

image/x-rgb: \
	description=RGB Image;\
	file_extensions=.rgb

image/x-xbitmap: \
	description=X Bitmap Image;\
	file_extensions=.xbm,.xpm

image/x-xwindowdump: \
	description=X Window Dump Image;\
	file_extensions=.xwd

image/png: \
	description=PNG Image;\
	file_extensions=.png;\
	icon=png;\
	action=browser

image/bmp: \
	description=Bitmap Image;\
	file_extensions=.bmp;

text/html: \
	description=HTML Document;\
	file_extensions=.htm,.html;\
	icon=html

text/plain: \
	description=Plain Text;\
	file_extensions=.text,.c,.cc,.c++,.h,.pl,.txt,.java,.el;\
	icon=text;\
	action=browser

text/tab-separated-values: \
	description=Tab Separated Values Text;\
	file_extensions=.tsv

text/x-setext: \
	description=Structure Enhanced Text;\
	file_extensions=.etx

video/mpeg: \
	description=MPEG Video Clip;\
	file_extensions=.mpg,.mpe,.mpeg;\
	icon=mpeg

video/quicktime: \
	description=QuickTime Video Clip;\
	file_extensions=.mov,.qt

application/x-troff-msvideo: \
	description=AVI Video;\
	file_extensions=.avi;\
	icon=avi;\
	action=application;\
	application=mplayer.exe %s

video/x-sgi-movie: \
	description=SGI Movie;\
	file_extensions=.movie,.mv

message/rfc822: \
	description=Internet Email Message;\
	file_extensions=.mime

application/xml: \
	description=XML document;\
	file_extensions=.xml

You can even supply a custom file with -Dcontent.types.user.table=.....

* or the stream header to determine the type, <b>disabled by default</b>
*/
private static final boolean AUTOSET_CONTENT_TYPE =
Boolean.valueOf( System.getProperty( "maven.wagon.http.autocontenttype", "true" ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it's pattern copied from existing code - but why in new code not use parseBoolean instead of valueOf and skip unboxing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that in OSS contributions, at least others I've made, consistency is valued highly. No other reason and, FWIW and IMHO, unboxing/autoboxing should be banished from decent society.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of Maven code is 10+ years old and dates back to Java 1.4 and earlier. It can be quite crufty, and consistency alone is not a string argument in this context. The things we really care about are written down in the docs. Everything else should assume best current practices for Java 1.7.

@michael-o
Copy link
Member

@chrisbeckey What is your stance on my previous comment?

@chrisbeckey
Copy link
Author

@michael-o I'm not sure I understand the comment. "guessContentTypeFromStream" is invoked if the source is an InputStream, not when reading from a File, there is no extension available to determine the content from.

@michael-o
Copy link
Member

@chrisbeckey
Copy link
Author

@michael-o Is there a better way to determine content from a Stream (not from a file name, which is handled separately)?

Boolean.valueOf( System.getProperty( "maven.wagon.http.autocontenttype", "true" ) );

/**
* If enabled, then an when determining the content type will result in a fatal exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an --> an error
will result --> results

@michael-o
Copy link
Member

@chrisbeckey Not with external libraries.

@chrisbeckey
Copy link
Author

chrisbeckey commented Aug 7, 2020

@michael-o assuming you meant "Not WITHOUT external libraries." ?
IMHO, URLConnection functions over the activation library has the benefit of simplicity. Changing the POM to add the JDK11 profile (and the activation within that) has maintenance implications that are not commensurate with the benefit, considering that this feature has not been previously requested.
In effect using URLConnection will address the majority of use cases with about a dozen lines of code and little expected maintenance.
Of course, reasonable opinions may differ ...

? URLConnection.guessContentTypeFromName( this.source.getName() )
: this.stream != null
? URLConnection.guessContentTypeFromStream( this.stream )
: DEFAULT_CONTENT_TYPE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider this like to be redundant: RFC 7231, section 3.1.1.5:

If a Content-Type header field is not present, the recipient
MAY either assume a media type of "application/octet-stream"
([RFC2046], Section 4.5.1) or examine the data to determine its type.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is redundant, and I think most of the interest in this change is from people using client software that's bespoke, naive, or maybe completely ignorant of the RFC.

That said, I'd recommend doing it anyway. It's becoming apparent to me that there is a lot of client software out there that doesn't know/implement RFC 7231, section 3.1.1.5. It's perfectly correct to not do this, but it's also a good way to keep getting asked the question by newbies (like I was). Unless you really want to keep educating people about it, I'd recommend just returning the default that the caller should assume.

Copy link
Member

@michael-o michael-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced that it is the entity's task to set the content type. It is the caller's task The HttpEntity is solely designed to carry data w/o any logic at best.

@michael-o
Copy link
Member

I will pick this up by the end of the week.

@michael-o
Copy link
Member

There are still statements which have not been responded to. Until then this PR will remain open.

@michael-o
Copy link
Member

Still interested in completing this?

@booniepepper
Copy link

booniepepper commented Dec 8, 2021

As a note: My (minor) issue could have benefited from this: https://issues.apache.org/jira/browse/MDEPLOY-289

The summary is our enterprise (amazon) used custom maven repository proxies. It would be nice to not default/guess content type when missing.

@michael-o michael-o closed this Dec 9, 2021
@michael-o
Copy link
Member

Closed by accident

@michael-o michael-o reopened this Dec 9, 2021
@marcelstoer
Copy link

I am still interested in seen this land.

In our case the WAF complained about the absence of a content-type header when uploading a JAR. I guess it would have been fine seeing application/octet-stream - had there been a header at all. (sorry, if there's a separate issue for that, didn't find one)

[DEBUG] http-outgoing-0 >> PUT /artifactory/the/rest/is/redacted.jar HTTP/1.1
[DEBUG] http-outgoing-0 >> Cache-control: no-cache
[DEBUG] http-outgoing-0 >> Pragma: no-cache
[DEBUG] http-outgoing-0 >> User-Agent: Apache-Maven/3.8.5 (Java 11.0.9.1; Mac OS X 10.16)
[DEBUG] http-outgoing-0 >> Content-Length: 17856154
[DEBUG] http-outgoing-0 >> Host: my.artifactory.host
[DEBUG] http-outgoing-0 >> Connection: Keep-Alive
[DEBUG] http-outgoing-0 >> Expect: 100-continue
[DEBUG] http-outgoing-0 >> Cookie: SCDID_S=redacted$$
[DEBUG] http-outgoing-0 >> Accept-Encoding: gzip,deflate
[DEBUG] http-outgoing-0 >> Authorization: Basic *redacted*
[DEBUG] http-outgoing-0 << HTTP/1.1 100 Continue

P.S. Maybe the description of this PR should mention that it'll fix https://issues.apache.org/jira/browse/WAGON-599.

@michael-o
Copy link
Member

This is a bug in your WAF: https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type
Nothing wrong with the request

@marcelstoer
Copy link

I didn't claim there be anything "wrong" with the request. Of course we can adjust the security profile in the WAF such that it allows PUT requests without content-type (which we did). Still, I see this PR as a useful addition to wagon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants