Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug-57342] Excel compatible Zip64 implementation #154

Closed
wants to merge 2 commits into from

Conversation

@rzymek
Copy link

@rzymek rzymek commented Jun 12, 2019

https://bz.apache.org/bugzilla/show_bug.cgi?id=57342

I did an in depth analysis of this issue. Turns out the problem is not with the OOXML data generated by POI. The problem has to do with the ZIP format. Specifically with ZIP64 extension. That's why it's all OK up until sheet1.xml reaches over 4GB (uncompressed).
I have all the details written up in a blog post: https://rzymek.github.io/post/excel-zip64/
Short story: Excel will want to repair the file if uncompressed size of a zip entry exceeds 4GB and ZIP's Local File Header (LFH) does not specify zip spec version 4.5

This pull request uses custom (Excel compatible) Zip64 implementation when Zip64Mode is set to Always.

@asfgit
Copy link

@asfgit asfgit commented Jun 12, 2019

Can one of the admins verify this patch?

Loading

@asfgit asfgit closed this in cb64b73 Jun 12, 2019
@pjfanning
Copy link
Contributor

@pjfanning pjfanning commented Jun 12, 2019

Loading

asfgit pushed a commit that referenced this issue Oct 6, 2019
Alain-Bearez pushed a commit to cuali/poi that referenced this issue Dec 12, 2019
@bosofelipe
Copy link

@bosofelipe bosofelipe commented May 29, 2020

@rzymek Could you give me an example because i try generate a large excel with 37000 rows and 2500 coluns and file still corrupted using apache 4.1.2

Loading

@bosofelipe
Copy link

@bosofelipe bosofelipe commented May 29, 2020

@rzymek my current code failed

`import java.io.File;
import java.io.FileOutputStream;

import org.apache.commons.compress.archivers.zip.Zip64Mode;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;

public class TestExcel {

public static void main(String[] args) throws Exception {
	SXSSFWorkbook workbook = null;
	workbook = new SXSSFWorkbook(null, 10000, true, false);
	workbook.setZip64Mode(Zip64Mode.Always);
	int ROWS_COUNT = 37000;
	int COLS_COUNT = 2500;

	org.apache.poi.ss.usermodel.Sheet sheet = workbook.createSheet("1");
	for (int i = 1; i <= ROWS_COUNT; i++) {
		Row row = sheet.createRow(i);
		for (int j = 1; j <= COLS_COUNT; j++) {
			row.createCell(j).setCellValue("1");
			System.out.println(i + " " + j);
		}
	}

	FileOutputStream out = new FileOutputStream("~/Downloads/Excel.xlsx");
	workbook.write(out);
	out.close();
	workbook.close();
	workbook.dispose();

	File file = new File("~/Downloads/Excel.xlsx");
	System.out.println(file.length());

}

}`

Loading

@pjfanning
Copy link
Contributor

@pjfanning pjfanning commented May 29, 2020

Zip64Mode.AsNeeded is more correct - always may mean you use zip64 mode when you don't need it

Loading

@rzymek
Copy link
Author

@rzymek rzymek commented May 31, 2020

Are you getting "corrupted file" error from Excel or OpenOffice or something else? OpenOffice Calc has a limit of 1024 columns (Excel's limit is 16k columns). Other than that, the code looks ok.

Zip64Mode needs to be Always in this case to enable ZIP64 handling compatible with Excel.

Loading

@pjfanning
Copy link
Contributor

@pjfanning pjfanning commented May 31, 2020

@rzymek thanks for clarifying - do you know what effect setting Zip64Mode.Always has if you create a small spreadsheet - will this file cause problems for Excel?

Loading

@rzymek
Copy link
Author

@rzymek rzymek commented May 31, 2020

As far as I checked, Zip64Mode.Always does not cause problem with Excel even in small files. When it comes to Excel and big files (XML over 4Gb), then ZIP64 must be declared in the zip entry header before the actual zip entry contents.

Loading

@pjfanning
Copy link
Contributor

@pjfanning pjfanning commented May 31, 2020

Thanks @rzymek - we might want to make Zip64Mode.Always the default - needs some experimentation before we'd make that change though

Loading

@rzymek
Copy link
Author

@rzymek rzymek commented May 31, 2020

Exactly. I think that custom zip64 implementation should sit as an option for a few versions (it's only enabled when Zip64Mode.Always).

Loading

@bosofelipe
Copy link

@bosofelipe bosofelipe commented Jun 1, 2020

@rzymek I tested with Libre Office... now i tested with MS Excel and the problem was solved, Is a limitation of Libre office, with you told us?

Tks a lot!

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants