Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced Performance in PowerShell 7.1? #14087

Closed
hl2guide opened this issue Nov 13, 2020 · 13 comments
Closed

Reduced Performance in PowerShell 7.1? #14087

hl2guide opened this issue Nov 13, 2020 · 13 comments
Labels
Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a Resolution-Fixed The issue is fixed. WG-Engine-Performance core PowerShell engine, interpreter, and runtime performance

Comments

@hl2guide
Copy link

hl2guide commented Nov 13, 2020

A script of mine that reliably took 26 or 27 minutes to run in 7.0 now does not complete within 2 hours in 7.1.

The script uses "ForEach-Object" and "Where-Object" on many lines of text to remove lines matching a criteria in a 85MB text file.

Maybe a performance bottleneck in 7.1?


Repro script and profiling traces (updated by @daxian-dbw)

@hl2guide hl2guide added the Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a label Nov 13, 2020
@o-l-a-v
Copy link

o-l-a-v commented Nov 14, 2020

Try to use .ForEach{} and .Where{} methods, seems to often be faster than piping in to Where-Object or ForEach-Object.

@iSazonov
Copy link
Collaborator

@hl2guide We need a demo script and txt files to investigate the issue. Could you please share more info?

@iSazonov iSazonov added the WG-Engine-Performance core PowerShell engine, interpreter, and runtime performance label Nov 14, 2020
@adamsitnik
Copy link
Contributor

to remove lines matching a criteria

@hl2guide if you can't share the source of the script, could you please tell us exactly which methods do you use to match the criteria and to remove the line?

@hl2guide
Copy link
Author

@adamsitnik I'll upload a demo file very soon.

@hl2guide
Copy link
Author

Please find the demo script file attached as a ZIP file.

DEMO.zip

The demo script sorts and filters out lines in a text file (filter_blocklist.txt).

My PC Info

CPU: Intel Core i5-4690 CPU @ 3.5GHz [quad core]
RAM: 32GB
OS: 64-bit Windows 10 Pro (20H2 - 19042.630)

My Results of Demo Script

(Obtained from Windows Task Manager, with all other applications closed off, just a single cmd.exe window)

PowerShell 5.1 (powershell.exe) : Max 5.3 GB of RAM usage, Time Taken: 37.6663 minutes
PowerShell 7.1 (pwsh.exe) : Max 7.7 GB of RAM usage, Time Taken: did not complete after 2 hours

@iSazonov
Copy link
Collaborator

@hl2guide Thanks for sharing demo script and data!

It is great case I see.

I was never able to get results from original script on my old notebook so I truncated the data file to 100000 lines and modify the script to measure parts of the script - (1) Get-Content and (2) "Removing duplicate lines..." block.
I did measure (in PowerShell itself and in PerfView) on PS 7.0 GA, PS 7.1 GA and PS 7.1 Preview-s from 1 to 7.

Get-Content works the same on all versions but I wonder how many allocations it does and how slow it is on large files. We could investigate this as separate issue and maybe request new .Net API.

Second block demonstrates the same regression from 7.1 Preview1 to Preview7:

  • On PowerShell 7.0
Read file..
Read file time Taken: 2.651 seconds.
Removing duplicate lines, please wait..
Read file time Taken: 12.863 seconds.
Time Taken: 15.519 seconds.
  • On PowerShell 7.1 Preview1-7
Read file..
Read file time Taken: 1.865 seconds.
Removing duplicate lines, please wait..
Read file time Taken: 110.019 seconds.
Time Taken: 111.889 seconds.
  • On PowerShell 7.1 GA
Read file..
Read file time Taken: 2.126 seconds.
Removing duplicate lines, please wait..
Read file time Taken: 305.829 seconds.
Time Taken: 307.958 seconds.

This requires a deeper investigations.

@iSazonov
Copy link
Collaborator

iSazonov commented Nov 17, 2020

PerfView data files for 7.0 GA and 7.1 GA https://ru.files.fm/u/nnz6vztgj

PS 7.1 GA
image

PS 7.0 GA
image

Looks like an issue with GC.

If I drill into the first line
image

Perhaps it was a change in Regex - now it allocates too many arrays.

Files.Fm
Visit this link to download: PerfViewData-7.0-GA.etl.zip and 1 other file

@daxian-dbw
Copy link
Member

daxian-dbw commented Nov 17, 2020

@iSazonov Thanks for the profiling and analysis!
It looks .NET 5 has memory issues when dealing with large amount of Regex.Replace(string, string) calls. Below is the "GC Heap Alloc stack" from your profiling, and it shows nearly 28 GB of ReadOnlyMemory<Char> was allocated during your script execution, which resulted in the process spending 93.9% of the CPU time in GC.

@iSazonov Can you please open an issue in the dotnet/runtime repo? It looks to me they should use ReadOnlySpan instead of ReadOnlyMemory as I don't see async/await is being used.

image
image
image

@adamsitnik
Copy link
Contributor

@iSazonov thanks for sharing the trace files! I've used PerfView to compare both trace files and it looks like the regression is caused by Regex.Replace

obraz

Overweight report for symbols common between both files

Base (old) Time: 9166.0
Test (new) Time: 328872.0
Delta: 319706.0
Delta %: 3488.0

In this report, overweight is ratio of actual growth compared to 3488.0%.
Interest level attempts to identify smaller methods which changed a lot. These are likely the most interesting frames to sart investigating
An overweight of greater than 100% indicates the symbol grew in cost more than average.
High overweights are a likely source of regressions and represent good places to investigate.
Only symbols that have at least 2% impact are shown.

Name Base Test Delta Responsibility % Overweight % Interest Level
coreclr!Ref_ScanDependentHandlesForPromotion 869.0 95834.0 94965.0 29.70 313.31 5
coreclr!WKS::gc_heap::mark_phase 1817.0 191375.0 189558.0 59.29 299.10 5
coreclr!ScanConsecutiveHandlesWithUserData 2407.0 239698.0 237291.0 74.22 282.64 5
coreclr!WKS::gc_heap::relocate_phase 1386.0 113743.0 112357.0 35.14 232.42 5
coreclr!WKS::gc_heap::plan_phase 1570.0 115493.0 113923.0 35.63 208.04 5
system.text.regularexpressions.il!System.Text.RegularExpressions.RegexReplacement.Replace(class System.Text.RegularExpressions.Regex,class System.String,int32,int32) 1112.0 301582.0 300470.0 93.98 774.69 4
system.text.regularexpressions.il!System.Text.RegularExpressions.Regex.Replace(class System.String,class System.String) 1314.0 302280.0 300966.0 94.14 656.68 4
coreclr!HndScanHandlesForGC 1607.0 298745.0 297138.0 92.94 530.12 4
coreclr!JIT_NewArr1 1823.0 303327.0 301504.0 94.31 474.17 4
coreclr!BlockScanBlocksEphemeral 2476.0 272908.0 270432.0 84.59 313.14 4
coreclr!TableScanHandles 2760.0 298768.0 296008.0 92.59 307.48 4
coreclr!BlockScanBlocksEphemeralWorker 2454.0 265189.0 262735.0 82.18 306.95 4
coreclr!WKS::gc_heap::gc1 3484.0 308883.0 305399.0 95.52 251.32 4
coreclr!WKS::gc_heap::garbage_collect 3493.0 308474.0 304981.0 95.39 250.32 4
coreclr!WKS::GCHeap::GarbageCollectGeneration 3506.0 308884.0 305378.0 95.52 249.72 4
ntdll!LdrpDispatchUserCallTarget 692.0 36440.0 35748.0 11.18 148.11 4
coreclr!WKS::GCHeap::Relocate 519.0 22638.0 22119.0 6.92 122.19 4
coreclr!UpdateDependentHandle 455.0 19846.0 19391.0 6.07 122.19 4
coreclr!PromoteDependentHandle 594.0 24543.0 23949.0 7.49 115.59 4
System.Management.Automation.il!System.Management.Automation.ParserOps.ReplaceOperatorImpl(class System.Management.Automation.ExecutionContext,class System.String,class System.Text.RegularExpressions.Regex,class System.Object) 1981.0 313677.0 311696.0 97.49 451.10 3
coreclr!SegmentScanByTypeMap 91.0 10005.0 9914.0 3.10 312.35 3
coreclr!WKS::GCHeap::Alloc 3341.0 318985.0 315644.0 98.73 270.86 3
System.Management.Automation.il!System.Management.Automation.ScriptBlock.InvokeWithPipeImpl(value class System.Management.Automation.ScriptBlockClauseToInvoke,bool,class System.Collections.Generic.Dictionary2,class System.Collections.Generic.List1,value class ErrorHandlingBehavior,class System.Object,class System.Object,class System.Object,class System.Management.Automation.Internal.Pipe,class System.Management.Automation.InvocationInfo,class System.Object[]) 6889.0 324992.0 318103.0 99.50 132.39 3
System.Management.Automation.il!System.Management.Automation.ScriptBlock.InvokeWithPipe(bool,value class ErrorHandlingBehavior,class System.Object,class System.Object,class System.Object,class System.Management.Automation.Internal.Pipe,class System.Management.Automation.InvocationInfo,bool,class System.Collections.Generic.List1,class System.Collections.Generic.Dictionary2,class System.Object[]) 6904.0 325026.0 318122.0 99.50 132.11 3
System.Management.Automation.il!System.Management.Automation.ScriptBlock.InvokeUsingCmdlet(class System.Management.Automation.Cmdlet,bool,value class ErrorHandlingBehavior,class System.Object,class System.Object,class System.Object,class System.Object[]) 6943.0 325146.0 318203.0 99.53 131.40 3
System.Management.Automation.il!Microsoft.PowerShell.Commands.ForEachObjectCommand.ProcessScriptBlockParameterSet() 6963.0 325210.0 318247.0 99.54 131.04 3
coreclr!ClearDependentHandle 317.0 12885.0 12568.0 3.93 113.67 3
System.Management.Automation.il!System.Management.Automation.CommandProcessorBase.DoExecute() 8107.0 327441.0 319334.0 99.88 112.93 3
System.Management.Automation.il!System.Management.Automation.CommandProcessor.ProcessRecord() 8107.0 327441.0 319334.0 99.88 112.93 3
System.Management.Automation.il!System.Management.Automation.Interpreter.ActionCallInstruction`6[System.__Canon,System.Boolean,System.__Canon,System.__Canon,System.__Canon,System.__Canon].Run(class System.Management.Automation.Interpreter.InterpretedFrame) 8113.0 327447.0 319334.0 99.88 112.85 3
System.Management.Automation.il!System.Management.Automation.PipelineOps.InvokePipeline(class System.Object,bool,class System.Management.Automation.CommandParameterInternal[][],class System.Management.Automation.Language.CommandBaseAst[],class System.Management.Automation.CommandRedirection[][],class System.Management.Automation.Language.FunctionContext) 8113.0 327447.0 319334.0 99.88 112.85 3
System.Management.Automation.il!System.Management.Automation.Interpreter.LightLambda.RunVoid1(!!0) 8114.0 327448.0 319334.0 99.88 112.83 3
System.Management.Automation.il!System.Management.Automation.Interpreter.Interpreter.Run(class System.Management.Automation.Interpreter.InterpretedFrame) 8114.0 327448.0 319334.0 99.88 112.83 3
System.Management.Automation.il!System.Management.Automation.DlrScriptCommandProcessor.Complete() 8114.0 327448.0 319334.0 99.88 112.83 3
System.Management.Automation.il!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(class System.Management.Automation.Interpreter.InterpretedFrame) 8114.0 327448.0 319334.0 99.88 112.83 3
System.Management.Automation.il!System.Management.Automation.CommandProcessorBase.DoComplete() 8114.0 327448.0 319334.0 99.88 112.83 3
System.Management.Automation.il!System.Management.Automation.Internal.PipelineProcessor.DoCompleteCore(class System.Management.Automation.CommandProcessorBase) 8114.0 327448.0 319334.0 99.88 112.83 3
System.Management.Automation.il!System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(class System.Object) 8114.0 327448.0 319334.0 99.88 112.83 3
System.Management.Automation.il!System.Management.Automation.DlrScriptCommandProcessor.RunClause(class System.Action`1,class System.Object,class System.Object) 8114.0 327448.0 319334.0 99.88 112.83 3
coreclr!CallDescrWorkerInternal 8119.0 327612.0 319493.0 99.93 112.82 3
System.Management.Automation.il!System.Management.Automation.Runspaces.LocalPipeline.InvokeHelper() 8115.0 327449.0 319334.0 99.88 112.82 3
System.Management.Automation.il!System.Management.Automation.Runspaces.LocalPipeline.InvokeThreadProc() 8115.0 327449.0 319334.0 99.88 112.82 3
coreclr!ThreadNative::KickOffThread 8117.0 327507.0 319390.0 99.90 112.81 3
system.private.corelib!System.Threading.ThreadHelper.ThreadStart_Context(System.Object) 8117.0 327507.0 319390.0 99.90 112.81 3
system.private.corelib!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) 8117.0 327507.0 319390.0 99.90 112.81 3
coreclr!ThreadNative::KickOffThread_Worker 8117.0 327507.0 319390.0 99.90 112.81 3
system.management.automation!? 8117.0 327494.0 319377.0 99.90 112.81 3
coreclr!MethodDescCallSite::CallTargetWorker 8119.0 327571.0 319452.0 99.92 112.81 3
system.private.corelib!System.Threading.ThreadHelper.ThreadStart() 8117.0 327453.0 319336.0 99.88 112.79 3
coreclr!ManagedThreadBase_DispatchOuter 8964.0 327612.0 318648.0 99.67 101.92 2
coreclr!ManagedThreadBase_DispatchMiddle 8964.0 327612.0 318648.0 99.67 101.92 2
ntdll!RtlUserThreadStart 9029.0 328461.0 319432.0 99.91 101.43 2
kernel32!BaseThreadInitThunk 9029.0 328461.0 319432.0 99.91 101.43 2
ROOT 9166.0 328872.0 319706.0 100.00 100.00 2
coreclr!JIT_New 1447.0 15618.0 14171.0 4.43 28.08 2

@iSazonov
Copy link
Collaborator

iSazonov commented Nov 17, 2020

@stephentoub For the PerfVew results I used a data file with 100000 lines and the test script has 54 replace operators called per line. Line size is < 70 chars.
Original data file is over 3000000 lines. I would not say that this is an edge case - I use such scripts sometimes, although somewhat simpler and on more small data.

@daxian-dbw daxian-dbw added Resolution-External The issue is caused by external component(s). and removed Resolution-External The issue is caused by external component(s). labels Nov 17, 2020
@hl2guide hl2guide changed the title Slower performance in 7.1? Reduced Performance in PowerShell 7.1? Nov 18, 2020
@iSazonov
Copy link
Collaborator

iSazonov commented Nov 19, 2020

@daxian-dbw Please help with understanding:
From 7.0 GA trace:
image

We allocate a lot of System.Object[].

image

Is this creating a script block? If so I wonder why our script block cache does not work?
Can an optimization of the cache from #13673 helps?

@daxian-dbw
Copy link
Member

Close the issue as @iSazonov has confirmed that the issue has been fixed: dotnet/runtime#44808 (comment)

@daxian-dbw
Copy link
Member

@iSazonov The call stack from your screenshot suggests that the arrays are allocated from the generated code when running

try
{
codeToInvoke(functionContext);
}

I haven't looked at #13673 closely (will do so), but I don't think the allocation in the screenshot is resulted by creating script blocks.

@iSazonov iSazonov added the Resolution-Fixed The issue is fixed. label Dec 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a Resolution-Fixed The issue is fixed. WG-Engine-Performance core PowerShell engine, interpreter, and runtime performance
Projects
None yet
Development

No branches or pull requests

5 participants