-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get-Content is slow on large text files. Could it have a parameter to speed it up by not adding NoteProperties? #7537
Comments
I'm willing to take this if the PS team OKs it, although I'm unsure what to name the parameter. Someone suggested |
I have decided to go ahead and work on this issue. @powershell/powershell can I get an assignment? |
I've marked it as an enhancement and up-for-grabs. You should just be able to assign it to yourself. We added |
#7481 Will make this a bit better - haven't tried it yet to see how much, and it will always be slower that just creating the strings. |
@BrucePay maybe I'm just being daft but I don't see a way to assign this to myself |
See #7501 |
Someone just brought PR #7502 to my attention, would that make solving this issue unnecessary? It seems to solve the same issue of |
Only individuals marked as The WIP PR is still under review as it is a breaking change. However, although that change will improve things if accepted, it may still make sense to add a parameter to |
Ohhh I didn't imagine it had already been acted on. That's partly what I meant about "ideally I would want a parameter name to communicate "this is faster" to people who see it". I've used |
@SteveL-MSFT ok sounds good. Given that it looks like that PR will affect mine I will wait until it's merged. |
As for what to name the parameter: I suggest (Conversely, a more sensibly named parameter alias for Using |
using -raw and it flies for me. |
@ZackInMA, yes, However, this won't help you if you want line-by-line streaming, which is the typical use case, and that is the one that's painfully slow. If the individual lines are needed, there are two ways of speeding up the operation - both of which make the line output non-streaming, however: # Read all lines into an array that is then output *as a whole*
# To use this in a pipeline, enclose in (...) to force enumeration
Get-Content -ReadCount 0 file.txt
# Slower alternative, but still much faster than Get-Content with neither -ReadCount nor -Raw:
# Read into a single string, then split by newlines.
# Note: If the last line has a trailing newline, as is typical,
# the resulting array will have an empty last element.
(Get-Content -Raw file.txt) -split '\r?\n Bypassing the line-by-lne streaming in itself speeds up these commands, but in both cases only one object is decorated with the NoteProperties: the array object as a whole with |
Wow, thanks for taking the time man. |
Bringing this up to Cmdlets WG to discuss |
The WG has reviewed this and believe that an appropriate approach may be to change the default value of -ReadCount to 0 which will essentially improve the performance for all users while possibly causing a small number of users to use $PSDefaultParameterValue['get-content:readcount'] = 1. We also believe this should be provided as an experimental feature. |
The proposed change would be massively breaking:
|
At first look, we can improve the cmdlet using a trick we use in FileSystemProvider - use cached NoteProperty object for all current outputs. |
@JamesWTruher when we discussed this in the WG I don't think anyone picked up that @mklement0's "massively breaking" sounds like hyperbole, but it may not be in this case
|
@jhoneill We should bring this back up to WG discussion. I believe the ask is for line-by-line reading, but no extra decoration. I'm thinking maybe just |
@PowerShell/wg-powershell-cmdlets reviewed this and agree that the use case to have the string objects not have additional decoration makes sense. Considering that this parameter may be used by other cmdlets, we suggest a switch called |
This issue has not had any activity in 6 months, if this is a bug please try to reproduce on the latest version of PowerShell and reopen a new issue and reference this issue if this is still a blocker for you. |
2 similar comments
This issue has not had any activity in 6 months, if this is a bug please try to reproduce on the latest version of PowerShell and reopen a new issue and reference this issue if this is still a blocker for you. |
This issue has not had any activity in 6 months, if this is a bug please try to reproduce on the latest version of PowerShell and reopen a new issue and reference this issue if this is still a blocker for you. |
This issue has been marked as "No Activity" as there has been no activity for 6 months. It has been closed for housekeeping purposes. |
Ping to keep alive |
Using Get-Content to read an example 170,000 line wordlist text file.
The reason for the slow version is explained here, apparently by Bruce Payette in 2006:
I think it's a shame that the default usage of Get-Content is the slow version, but that's likely not going to change. But, 12 years on from this posting, is it time to add a way to suppress adding this extra information?
e.g. a parameter to
Get-Content
which switches off the NoteProperties. I have no good parameter name suggestion - ideally I would want it to communicate "this is faster" to people who see it in written code, or who read the documentation wondering how they can speed up Get-Content on large files.The text was updated successfully, but these errors were encountered: