Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Reading huge excel file consumes too much memory. Caused by simplexml_load_string #2793

Closed
4 tasks done
vroad opened this issue Aug 10, 2020 · 5 comments
Closed
4 tasks done
Labels

Comments

@vroad
Copy link

vroad commented Aug 10, 2020

Prerequisites

Versions

  • PHP version: 7.4.9
  • Laravel version: 7.9.2
  • Package version: 3.1.21

Description

Reading huge excel file (over 10MB) causes php process to be killed. The sheet causing the issue is over 75MB after uncompressing the file.

I tracked down the place causing the error, and found that simplexml_load_string consumes around 2GB of memory.
Is there a way to reduce memory usage?

https://github.com/PHPOffice/PhpSpreadsheet/blob/9683e5be189080a9c75298e00efb72be3f5c866a/src/PhpSpreadsheet/Reader/Xlsx.php#L621

This only occurs with XLSX files because other types of files does not use SimpleXML for reading.
Chunked reading does not help because the problems occurs before laravel excel starts reading rows.

The problem is happening inside PhpSpreadsheet library, and SimpleXML is not suited for reading huge file as it tries to load entire tree into memory.
PhpSpreadsheet does not probably support other types of XML parses which uses less memory, so there is nothing I can do as long as I use Laravel Excel?

@vroad vroad added the question label Aug 10, 2020
@patrickbrouwers
Copy link
Member

patrickbrouwers commented Aug 10, 2020

so there is nothing I can do as long as I use Laravel Excel?

Depending on your server memory size, you can increase the memory limit of the process (ini set)

The problem is happening inside PhpSpreadsheet library

As you said, it happens inside PhpSpreadsheet, which means there's nothing we can do in this package unfortunately.

@vroad
Copy link
Author

vroad commented Aug 10, 2020

I'm running my import command on a server with small amount of memory to save costs. It's possible to temporary switch to a server with large amount of memory, it's just not very convenient to do so every time I run import comand.

Limiting memory with php's memory_limit option does not work for memory allocations done by SimpleXML.
PHP does not seem to take memory used by SimpleXML into account.

If php is running inside a container, memory limit enforced by container runtime works of course, and kills the process to enforce limit.

I just wanted to confirm that there isn't a way to solve the problem without rewriting my program to use another package.
Thanks for the answer anyway.

@mtx-z
Copy link

mtx-z commented Sep 26, 2023

Hello @nbyloff,
thank you for the detail you added. I'm running into the same issue with XLSX files having millions of rows.
I'm experiencing the same as you: PHP silently fails after 60k rows (no matter the batch size: 2k, 5k, 10k...) and instance goes out of memory.

Did you find something else to achieve this?
Can this be "fixed" using XLS or CSV files and Laravel-Excel?

It seems that fast-excel also has issues with large imports.

Thanks.

EDIT: it seems that I have the same issue using CSV. Unfortunately, XLS is limited to 65k~ rows.

@nbyloff
Copy link

nbyloff commented Sep 26, 2023

@mtx-z I came up with a solution that didn't involve spreadsheets that large; they are much shorter now. However, the only way I got it to work (before I abandoned it) was to make it a two step process. First I would write all my data to a DB table that tracked an excel row id & an excel column id. Step two was to use a cursor to query the DB (sorting by row id & column id) and write everything in sequence (A1, B1, C2, A2, B2, C2, etc).

I also had been doing it with this library as it was the fastest I found. But that is now archived.

If you are writing the data in sequentially in Excel (and not jumping around to cells writing data), it might work with a different library. I essentially learned the memory issues came from jumping around to different cells too much in these massive spreadsheets. It had to do it in order so there was nothing to track in memory.

@mtx-z
Copy link

mtx-z commented Sep 26, 2023

@nbyloff thank you for your answer.

I ended up doing the same as you: I imported my Excel file into the database, and imported it from a custom job using $query->chunk().
My large Excel import is a "one-time" event, and "daily imported files" do not exceed 5k rows. So I'll just keep using Laravel-Excel for small files.

Fast-Excel is a Laravel wrapper for Spout that you linked, so my guess is the large file import issue I linked occurs with both.

Thank you again for those details and your tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants