Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

xlsx mime type #149

Closed
garak opened this issue Nov 19, 2015 · 12 comments
Closed

xlsx mime type #149

garak opened this issue Nov 19, 2015 · 12 comments
Labels

Comments

@garak
Copy link
Contributor

garak commented Nov 19, 2015

I'm creating an xlsx file (e.g. using Type::XLSX).
I expect to get a file with mime type application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, instead I'm getting one with application/zip.

@adrilo
Copy link
Collaborator

adrilo commented Nov 19, 2015

Hmmm this is weird. Are you calling the openToBrowser() function?

@garak
Copy link
Contributor Author

garak commented Nov 19, 2015

No, I'm just using openToFile and close to create the file.
I checked the mime type using file -b --mime mygeneratedfile.xlsx

@adrilo
Copy link
Collaborator

adrilo commented Nov 19, 2015

Oh I see. Then it makes sense, as an XLSX is simply a ZIP file with the "xlsx" extension.
Mime types only make sense from a browser perspective. In a filesystem, the OS tries to guess. Even when you create a file with Excel, the mime type is application/zip, so I guess there is not much I can do.

@garak
Copy link
Contributor Author

garak commented Nov 19, 2015

My use case is a generated xlsx file, that I have to serve afterwards.
I'd like to rely on the mime type took from file, instead of forcing it with an HTTP header.
There must be a way, since running the same command (file -b --mime) on other xslx files gives me the correct type.

@adrilo
Copy link
Collaborator

adrilo commented Nov 20, 2015

How did you create the other xlsx files? Excel? Which version of Excel?
With Excel 2010 on Mac/Windows, file -b --mime returns application/zip.

@garak
Copy link
Contributor Author

garak commented Nov 20, 2015

I know that application/zip is also a valid mime type for xlsx, anyway we know that a generic user would expect an excel file to be served/opened as a spreadsheet, not like an archive.
Unfortunately, I don't own any copy of Excel to try, anyway you can find many sample files online (like this one), all with the "correct" mime type.

@adrilo
Copy link
Collaborator

adrilo commented Nov 30, 2015

Hmmm even with the file you sent me I have:

$ file -b --mime ..../1.xlsx
application/zip; charset=binary

Looks like detecting the mime type is environment specific.

@garak
Copy link
Contributor Author

garak commented Nov 30, 2015

Of course it is, it depends on the configuration of your magic.
The point is that library is generating files with a magic number that is not the same as the "official", so even with the correct magic configurated (e.g. on a standard Ubuntu), you get a generic mime instead of a specific one.

@garak
Copy link
Contributor Author

garak commented Nov 30, 2015

Here is a different try. Write a script called fileinfo.php:

<?php
$finfo = new \finfo(FILEINFO_MIME_TYPE);
echo $finfo->file($argv[1]).PHP_EOL;

With a such file, php fileinfo.php 1.xlsx is returning application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, while php fileinfo.php file_generated_with_spout.xlsx is returning application/zip

@adrilo
Copy link
Collaborator

adrilo commented Dec 1, 2015

Cool! I can reproduce the issue :)

$ php file.php spout-test/test-numeric2.xlsx
application/octet-stream

I don't have application/zip but application/octet-stream... Anyway, I'll try to take a look at it to figure out how the mime type is being set

@adrilo
Copy link
Collaborator

adrilo commented Dec 4, 2015

Hey @garak,

After pulling my hair out for a few hours, I finally figured out why the mime type is wrong. It turns out the order in which the XML files gets added to the final ZIP file (an XLSX file being a ZIP file with the xlsx extension) matters for the heuristics used to detect types.

Currently, files are added in this order:

[Content_Types].xml
_rels/.rels
docProps/app.xml
docProps/core.xml
xl/_rels/workbook.xml.rels
xl/sharedStrings.xml
xl/styles.xml
xl/workbook.xml
xl/worksheets/sheet1.xml

The problem comes from inserting the "docProps" related files. It seems like the heuristic is to look at the first few bytes and check if it finds Content_Types and xl. By having the "docProps" files inserted in between, the first xl occurrence must happen outside of the first bytes the algorithm looks at and therefore concludes it's a simple zip file.

I'll try to fix this nicely

@garak
Copy link
Contributor Author

garak commented Dec 5, 2015

Great! Hope to see a new tag released soon.
Thank you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants