Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find and Replace #565

Closed
TardisBeaubien opened this issue Mar 20, 2019 · 4 comments
Closed

Find and Replace #565

TardisBeaubien opened this issue Mar 20, 2019 · 4 comments

Comments

@TardisBeaubien
Copy link

First of all, thank you so much for this great tool - it's a lifesaver.

I have a use case where I have a dozen xlsx with enormous amount of data (about 500K cells total) and I need to look for about 2000 strings and replace them with other strings. Some times, the string is the only thing in a cell, but most often a whole sentence was written and somewhere in it there's the string I'm looking for. Basically find and replace times 2000.

I couldn't find a built in way to search or replace so my solution involves iterating thru every cell and drawing of every sheet of every file. On top on that, some cells are RichText, forcing me to iterate thru the richtext collection to find and replace.

Running my script takes about 75 minutes right now and it feels inefficient. I'm looking into hyper threading with invoke-parallel, but before going there : Did I miss something ? Is there a better way to look up and replace a value than looking at every cell and drawing ?

@dfinke
Copy link
Owner

dfinke commented Mar 20, 2019

I took a quick look, I didn't see a built-in find/replace, plus there is not a "bulk" find/replace that I know of. Operating on all the sheets at once seems the way to go.

@jhoneill Do you know of anything off the top of your head?

@jhoneill
Copy link
Contributor

There's nothing in the module.
2000 search and replace operations in 75 minutes is about 2 seconds each.
I'd write it as

$xl = Open-ExcelPackage <<path>>
$xl.<<sheet>>.Cells.where({$_.value -like "*find*"}).foreach({$_.value = $value -replace "find","Replace"}) 
close-excelPackage $xl 

with something to loop through the 2000 find / replace pairs between the open and close.
Whether you can get the time per search down much that way I don't know because with 500K cells, you're always going to have to 500K comparisons to do. (x 2000 is a billion!)
My Start-Parallel module, and others like invoke-Parallel , rely on runspaces - you can spawn runspaces quickly but there's no data transfer between them so you can't have a thread for each column of the worksheet. What you could do to make it thread better is to have multiple threads looking for cells which need to be changed, feeding one which makes the changes.

@TardisBeaubien
Copy link
Author

Thanks for the feedback. I ended up using invoke-Parallel with one xlsx per runspace and cut my time in half.

I didn't think about using where-object, I was using foreach Cells. I'm sure that's much faster. I'll try it out tomorrow.

@jonscott
Copy link

I had to use the following format to change the value in matching cells:
$xl.<<sheet>>.Cells.where({$_.value -like "*find*"}).foreach({$_.Value = "Replace"})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants