-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export-Excel extremely slow on large spreadsheets #506
Comments
@worc4021 if you just do the import-Excel, does it return all the data? Would it be possible to scrub the xlsx file and post it? |
Hi! |
Ok, if you only do an import-Excel, does it work? What's the response time like? So 1700x300 empty, string and numeric data? |
Import-Excel is fine, takes about 1-2 seconds. 1700x300 was the one I tried, naturally since each row corresponds to one sim and each column corresponds to one metric the size can vary, but that's the ballpark. |
I created a sample set of objects. Took 22 minutes to export to excel. There may be some low hanging perf improvements that can be done, not sure how much effort or how much improvement it would bring. |
Thank you for looking into this!
Sad times.. I prefer scripting over clicking :D Thank you, |
I had the same issue but seems like happening only when using verbose
switch.. If you remove that, it shud be faster.. I work with excel of 50000
rows x 25 columns...
With verbose takes 20-30 minutes
Without it takes only 2 or less minutes...
…On Tue, Dec 11, 2018, 12:44 Rainer Manuel Schaich ***@***.*** wrote:
Thank you for looking into this!
- So it is nothing I'm doing particularly wrong on my side.
- Not sure whether this would be worthwhile.
I just stumbled upon ImportExcel and thought it would be a great prost
processing tool for when I compile my regressions to compare different sets
of sims. The alternative is literally click 'Add rows' in Spotfire instead.
So if you don't see something that obviously makes it scale poorly AND know
an alternative implementation, I would not advise you to change the
existing version.
Sad times.. I prefer scripting over clicking :D
Thank you,
Manuel
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#506 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/Aq2jYfFdJPr2pmML4zsDTGivQDpD7mDSks5u3-7kgaJpZM4ZNKvI>
.
|
@worc4021 Yeah, friends don't let friends click. Exactly why I built this :) @pkarunkar Good point, you're right. Maybe @jhoneill has some insight. |
:) I was just saying Doug. I m happy without verbose. I mean who cares
showing up the data on the host like matrix movie unless I want to show off
to some newbie that I am doing magic in powershell..
I just need the end result and it works great for me. But some things are
really magical..
…On Tue, Dec 11, 2018, 12:59 Doug Finke ***@***.*** wrote:
@worc4021 <https://github.com/worc4021> Yeah, friends don't let friends
click. Exactly why I built this :)
@pkarunkar <https://github.com/pkarunkar> Good point, you're right.
-Verbose slows things down a lot. Unfortunately, I wasn't using -Verbose
so, this probably the fastest it can be for now.
Maybe @jhoneill <https://github.com/jhoneill> has some insight.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#506 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/Aq2jYeZNkOBnDuM2XMw5gmNduAQtzmycks5u3_KAgaJpZM4ZNKvI>
.
|
Maybe just a defence for the Verbose switch: When you run a command in the prompt and for over 5 minutes nothing happens you might try to see whether -Verbose produces any output... ;) |
Agreed, that's how I used to use verbose, then once I am happy my script is
working, I mostly convert them to a background job. So it differs person to
person. I still use verbose which shows few lines of information..
I just don't use it with excel.. Thanks for bringing it up though.
…On Tue, Dec 11, 2018, 15:24 Rainer Manuel Schaich ***@***.*** wrote:
Maybe just a defence for the Verbose switch: When you run a command in the
prompt and for over 5 minutes nothing happens you might try to see whether
-Verbose produces any output... ;)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#506 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/Aq2jYdNB7oU12AdNuN9NE5J5xOwQBMq9ks5u4BSUgaJpZM4ZNKvI>
.
|
First on Write-Verbose: it slows things down quite a lot and I removed it from the tightest looping part of export-excel because (in this case) it would output over 500K messages. That's just too many to be helpful. Second. Don't ever, ever do ; Import-Excel | export-Excel or $var = import ; $var | export. |
@jhoneill : you caught me.. I experienced these in older versions of
ImportExcel and totally stopped using verbose after that. Will try it
tomorrow at office and reply back.
…On Tue, Dec 11, 2018, 18:28 jhoneill ***@***.*** wrote:
First on Write-Verbose: it slows things down quite a lot and I removed it
from the tightest looping part of export-excel because (in this case) it
would output over 500K messages. That's just too many to be helpful.
I just tried 1..500000 | foreach {$x = $_ } -takes 7.7 seconds ;
1..500000 | foreach {$x = $_; write-verbose -Message $_ } takes 45
seconds when it is *not outputting anything*
@pkarunkar <https://github.com/pkarunkar> - are you getting the these
time differences with the current version, or are you getting lots of
verbose output - I just did
ps | export-excel -now
and
ps | export-excel -now -verbose
And verbose was faster (but within experimental error)
Second. Don't ever, ever do ; Import-Excel | export-Excel or $var = import
; $var | export.
Unless the data is very, very small and you want to lose any formatting
you have on the spreadsheet.
I wrote copy-excelWorksheet because you can copy a whole sheet within and
between files VERY quickly.
I also wrote Join-Worksheet to copy whole sheets to a common destination -
this might not be exactly what @worc4021 <https://github.com/worc4021>
wants but it may be possible to use some of that code.
In this case - I'm not sure if there are perf implications to an
PSCustomObject with 1700 properties, but that's more properties than ever
seen before.
If you know that your numbers are in numeric types, not strings, you'll
get a decent speed improvement from -NoNumberConversion. but ultimately
1700x300 is 510,000 cells, and doug's test did them at about 400 cells/sec.
Each value is checked to see if is a number, a hyperlink, a date, a
timespan or a formula (and without -NoNumberConversion, a number in the
form of a string), before assuming it is a string. It is then poked into a
cell, and if it is a date, timespan or hyperlink formatting is applied
(formatting cells 1 by 1 is an expensive operation).
Where there are huge numbers of rows, it may be faster to built a
datatable in memory and use Send-SQLDataToExcel, and you can get Excel data
with a SQL query … but it should be faster to use the copy functionality
which is there.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#506 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/Aq2jYVgTcnVi0HclNhJaetobzEnCxZEGks5u4D-ygaJpZM4ZNKvI>
.
|
@jhoneill : Ok I faked the test at home by copy pasting data into cells making it 70k rows and 32 cols. Using -noheader since duplicate column header popped in while copy pasting. `Measure-Command {Get-Service | Export-Excel $env:userprofile\Desktop\Newfolder\VerboseTest.xlsx} Measure-Command {import-Excel $env:userprofile\Desktop\Newfolder\VerboseTestlargedata.xlsx -NoHeader } Measure-Command {$Var = import-Excel $env:userprofile\Desktop\Newfolder\VerboseTestlargedata.xlsx -NoHeader } Measure-Command {$var1 | Export-Excel $env:userprofile\Desktop\Newfolder\VerboseTest.xlsx} The last 2 commands took 8 minutes with or without verbose. That is still very fast comparing my past experiences. At office, I have different data types, string, date, numeric... I do not know if that would make any difference. |
I don't know if the EPPlus library gets slower to insert into the worksheet XML as the sheet gets bigger. I made and 21,000 cell sheet ( Hence my advice not to do import | export …. it also loses formulas, which copy preserves. |
OK I tried a couple of things. Then exported it
Double the rows and the time doubles, so EPPlus scales linearly. 1.3 Seconds for 10K, 2.6 for 20K, 5.4 for 40K , 10.31 for 80K, 21.3 for 160K Next switched to 200 columns x 50 rows. 10K cells should take ~ 1.3 secs. It took over 10 seconds. So I tried
7.4 seconds ! So, looking at each property of a PS Custom object with hundreds of properties gets slow. (Although it didn't seem to get any worse when I changed to 25 rows by 400 columns [Edit] There seems to be a step change some in the region of 75 to 80 properties. ) All the more reason to use copy and not |
Off the top of my head. Was wondering if there was an -AsIs switch on the export, it bypassed all the property inspection and just pushed all the values into the sheet? Don't know without testing. |
IIRC - no. Everything goes via Add-CellValue. One can try But this is very slow to save. There are other LoadFrom methods which may also be worth looking at. |
Hi, |
I still have performance issues with Export-Excel in version 5.4.4. Edit:
Any ideas? Excel is created though but conversions are also not as I wish. Edit 2: Edit 3: |
So just to kick this dead horse for a while longer.
To me, that means the save time is excessive. We must be doing a lot of chunking through the data in memory (since I don't see autosize and autofilter having any affect on write speed) Am I crazy in hoping to see an improvement of 5-10 times here? Thoughts? |
Expected. There is no bulk export. Each value in the data is inspected for a number of things as it is set in the worksheet cell. |
I think we need to be clear that export-excel is not making any attempt to be optimized for speed. It is trying to be general purpose and export small to medium volumes of data. It adds each cell individually which you would expect to be a slower way than (for example) adding a database table in one block. It checks every value to see if it is a formula, a hyperlink, a date, or text that needs to and processes it accordingly. On my machine with 500 items in a folder this takes 1.2 seconds
The rate is 1000-2000 cells per second. It will vary a little and obviously with machine speed, but that's the order of magnitute. edit My test on the train this morning seems to have run at about half normal speed. At home I'm gettting around 3000 -4000 cells/sec By comparison this runs in 0.2 seconds
If speed of execution is more important than something produced quickly (etc) AND you have the knowledge of the data to produce it then there are optimizations you can do for your data … However I have found an significant optimization which I'm going to talk to @dfinke about. |
The Datatable workaround works well in my case. I'll refactor my stuff to use them instead, using Convertto-Datatable, which will speed it up by perhaps 10x. Thanks for suggesting it. Anything you can do with a "significant optimization" would also be welcome, of COURSE. |
@ericleigh007 You asked for 5-10x improvement. I give you ... ~ 10 x improvement - there was something I looked at somewhere else which I thought was too hard to change. It makes the script harder to read/maintain but the perf change is clear - getting up to 70K cells per second on my machine. There's some variablity - the difference between runs is greater than the change with -NoNumberConversion and -Autosize. @dfinke this passes all the build tests, and I should have it on my repo shortly couple of final things to clear up. It is worse from a reading/mainting P.o.V but not massively so. |
Cool! Reminds me of the below, which never sat well with me. I'd prefer a different function so it can "more easily" be recognized, both in discovery and maintenance. Haven't looked at what it would take to refactor though. Get-Process | Get-Member vsGet-Member -InputObject Get-Process |
That opens up a whole can of worms.... in a lot of places if $x is an array passing it via What I've done is
|
Admittedly, sometimes PowerShell trying to help you with arrays vs objects gets sort of confusing.
I use [array]$obj in my code to make sure that things that I want to be arrays are always arrays.
Thanks, guys. This is awesome.
Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
…________________________________
From: jhoneill <notifications@github.com>
Sent: Tuesday, April 2, 2019 5:38:46 PM
To: dfinke/ImportExcel
Cc: Eric; Mention
Subject: Re: [dfinke/ImportExcel] Export-Excel extremely slow on large spreadsheets (#506)
That opens up a whole can of worms.... in a lot of places if $x is an array passing it via -InputObject causes the command (e.g. get-member) to look at the array itself. In others (e..g Format-Table) the members get processed.
I think command2 $(command1) and
command1 | command2 are about equal - it depends how you're thinking in step-by-step, or end-product-first
command1 | command2 | command3 | command 4 is easier than
command4 (command3 (command2(command1))))
Though that's often how my excel formulas look - I'm a recovering LISP programmer so that's to be expected
What I've done is
1. Renamed -TargetData to -InputObject (but with an alias so existing use of it doesn't break) and then do foreach ($targetData in $inputObject). This gives a small speed improvement over piping.
2. Moved the Add-CellValue function into the main body - cutting out the function call gives a significant speed improvement.
3. Moved the handling of "simple" types so the moved code doesn't need to be duplicated.
4. Taken the "Autoformat only the first X rows" idea from #555<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdfinke%2FImportExcel%2Fissues%2F555&data=02%7C01%7C%7C71853008f1f94bdd12c108d6b7b39658%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636898379275059595&sdata=VfkCsQGh6p23rbI0G7kIOU2mIz6BIANoL0Ip1BAiK2g%3D&reserved=0> and also added special case handling for a table (if $inputObject is a table, Insert it in the begin block and remove it before the process block) - there was something about that in #555<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdfinke%2FImportExcel%2Fissues%2F555&data=02%7C01%7C%7C71853008f1f94bdd12c108d6b7b39658%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636898379275069606&sdata=VLju3uYvlu8PDl7zmXNakAE095bl7FxxX1mU4QmHjac%3D&reserved=0> as well.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdfinke%2FImportExcel%2Fissues%2F506%23issuecomment-479214967&data=02%7C01%7C%7C71853008f1f94bdd12c108d6b7b39658%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636898379275089624&sdata=8DIFgEFwb94RLyCphRPJGKWweSMjj%2BpROt%2BN8zjMOpQ%3D&reserved=0>, or mute the thread<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAG-5aocdPfQ_mMcxMs6H5VyMYYTcyBjTks5vc83mgaJpZM4ZNKvI&data=02%7C01%7C%7C71853008f1f94bdd12c108d6b7b39658%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636898379275099627&sdata=ZUFi994ggWifDiolGJAo2kE%2B1KQ%2BSFjadHNFzj%2BgafY%3D&reserved=0>.
|
@jhoneill Thank you for implementing some of the stuff from my fork, The changes sound good, I hope it goes well so I can finely use the official version and stop needing to maintain my fork. |
@jhoneill Knew you were a LISP coder. What's next? cdr car? Probably going to need to float this as a preview so hopefully we can get this exercised and shake out as many issues a possible. |
@dfinke ah yes, "You do anything with car, cdr, cond and setq". I haven't touched LISP or Pascal this century or Occam, or Forth, or Assembler. But "Everything's a list" …. er … hash-table, and "we don't need no stinking variables, nest everything!" is still with me. |
@ili101 , you're welcome. Your ideas were all good, I was just nervous about getting them all in and not breaking other things. I've also committed some changes to make Send-SQLDatatoExcel a lot more compact. It now brings the parameters from Export-Excel in as dynamic parameters rather than declaring everything. That was a reminder why I hate both dynamic parameters and parameter sets. |
The zip of the module an the updates can be downloaded here http://bit.ly/2OTxoaO |
Wow this is so much faster. Just yesterday I timed the export of 950 custom PS objects to excel and it took18 seconds. Since the update - nearly instantaneous. |
Curious, how many properties on each? |
10 ish. Not much. Actual time went for 18 secs to sub 3 secs. On an SSD
\\\\Greg
…________________________________
From: Doug Finke <notifications@github.com>
Sent: Thursday, April 25, 2019 17:53
To: dfinke/ImportExcel
Cc: uSlackr; Comment
Subject: Re: [dfinke/ImportExcel] Export-Excel extremely slow on large spreadsheets (#506)
Curious, how many properties on each?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#506 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AGEO6ZZXHVYGSN5JR5OXQ4LPSIR5PANCNFSM4GJUVPEA>.
|
First of all -- thanks for this great package! Here is just a quick presentation of my experiences in case anybody comes here for advice on how to speed up his script ... as this is basically the only post around the web which discusses this topic. I had a long running script where Export-Excel took a significant part of the total run-time. The script focuses on regression testing and has to process and inspect all exported data. The whole, relatively complex data wrangling part took only about 5% of the total runtime, whereas Export-Excel took the other 95%. After several trials (like converting to csv, exporting and texttocolumns, etc.) I examined the solution via "Send-SQLDataToExcel" by @jhoneill, which solved my issue. With this, the export to excel takes a time in the same order as the data processing part, and this brought my runtime from 20 minutes down to about 1 minute. I just wanted to highlight this point, as the statement by jhoneill reads as "not so great" which motivated me to my own wicked trials ... in fact, the Send-SQLDataToExcel approach is great for performance. |
Might help to say how large the data set is and whether any special export features are being used. I routinely process 10,000 lines with little impact from the export. Good tip on the Send-SQLDatatoExcel. And what version of importexcel you have |
@davidhigh Thanks for the comments. |
I ran some further tests and here are the timings: For 5000 rows and 55 columns:
For 30'000 rows and 150 columns (--about a factor of 16 larger in the number of rows times columns):
I took care to measure only the runtimes of the pure commands, i.e. without the time needed to generate the data. For this I called the Export-Excel cmdlet from the pipe as Two conclusions:
In my case, as I generate the comparison data during program execution, I could simply switch to [DataTable] instead of [PSObject] as results objects, with no significant impact on the runtime. But even if the data is only available as standard Powershell objects, converting it to a [DataTable] first and then do the export to Excel seems advantageous, which is basically trading memory versus runtime costs. My suggestion for improvement is thus to incorporate the pattern "Convert-ToDataTable + Export" into Export-Excel, possibly via a switch
The |
Keep in mind that Export-Excel works like this.
The time to execute the PowerShell part should scale linearly. Poking cell by cell is going to get slow when you get to big numbers, I'm a little surprised that you go 50,000 cells per second with your first sample. Perhaps writing something to create tables before sending them is the way forward for people with requirements for very large amounts of data. |
Hi,
I've been trying to merge two excel spreadsheets together, the source xlsx files are just over 3MB, but when I call
Export-Excel -Verbose
it seems that is flies through the headers and then stalls. These spreadsheets have about 1700 columns and 300 rows, i.e. their union has slightly more. But these numbers are not unheard-of. Is this expected to scale poorly or am I doing something wrong? It stalls saving only one already:I can see that it is trying to do something in the task manager, but it's not progressing after
Cheers,
Manuel
The text was updated successfully, but these errors were encountered: