Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream does not support seeking. #17

Closed
Ian1971 opened this issue Aug 27, 2014 · 4 comments
Closed

Stream does not support seeking. #17

Ian1971 opened this issue Aug 27, 2014 · 4 comments

Comments

@Ian1971
Copy link
Collaborator

Ian1971 commented Aug 27, 2014

https://exceldatareader.codeplex.com/workitem/11988

"I am using a shell that calls a program that decrypts a file to StandardOutput.

Process process = new Process();
//... setup to run decrypter
process.Start();
IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(process.StandardOutput.BaseStream);

Can there be a constructor that will handle non-seekable (StreamReader) Streams as well?

My constraints...
I don't want to write the sensitive data to file and the solution must handle large files (reading to memory first will not work)."

@Ian1971
Copy link
Collaborator Author

Ian1971 commented Aug 27, 2014

The way excel binary files are organised the data is not necessary in a sequential order, this is the reason that the stream has to be seekable.

It would be good however to remove this need for a seekable stream as it has impacted a few people (processing http streams - necessitating saving to file first).

So, I am open for a discussion on how to achieve this.

@jas-on
Copy link

jas-on commented Aug 29, 2014

I thought that data in Excel files were organized sequentially. I.E. I unzip an .xlsx file and I look at an individual XML worksheet which is clearly structured. If there are strings then those are stored in sharedStrings.xml and those aren't necessarily in order, but they can be looked up and data can still be retrieved sequentially from the original worksheet.

@Ian1971
Copy link
Collaborator Author

Ian1971 commented Aug 29, 2014

Yes but that is xlsx. The binary format is not stored sequentially, in fact it's quite a (overly) complicated file format.
http://www.openoffice.org/sc/excelfileformat.pdf
http://www.openoffice.org/sc/compdocfileformat.pdf

@andersnm
Copy link
Collaborator

XLS compound document streams are stored in sectors of 512 bytes chunks which could be physically located anywhere in the file. The order of the sectors/chunks is dictated by the compound document FAT tables, which themselves are not always stored sequenctially. (slightly simplified :) )

This makes it really hard to parse anything but specially crafted xls as forward-only streams (without loading everything into memory)

XSLX is essentially a .zip file which has a file system inside; there is no guarantee the compressed files are organized in the order expected by ExcelDataReader. Additionally System.IO.Compression.ZipArchive does not support forward-only unzipping well.

Likewise, this makes it really hard to parse anything but specially crafted xlsx as forward-only streams.

For these reasons I think supporting forward-only streams is not within the scope of ExcelDataReader. ExcelDataReader should continue to focus on low memory footprint and thus require seekable streams for input.

(it might be worth considering an option to support forward-only streams by loading everything into memory, but that was explicitly not requested here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants