-
Notifications
You must be signed in to change notification settings - Fork 7.8k
csv parser used in import alias refactor #9091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@SytzeAndr the
|
reuse of the old implementation
Thanks for the comment, I'll then reuse the part from the old implementation for when to return an empty collection. |
@SytzeAndr Thanks for your contribution! We are working on enhancing the cmdlet and have plans to improve performance. |
@iSazonov I haven't tested it yet, but performance is one of the reasons I think refactoring this code would make sense. I haven't set up the testing environment locally (so therefore WIP), but when I do and the tests are passing I'll do a measure in terms of performance against the old implementation aswell. |
Are you aware that this method is only called from the The implementation of |
@SytzeAndr If you want to improve performance, I would suggest a different approach. Start with a scenario you know is slow. Then run it under a profiler, and try to understand if there are any bottlenecks. Then try to understand the code around the bottleneck - think about it, why is it written the way it is, how could it be done instead etc. Keep recordings of the original scenario. Try out changes, and measure them, while at the same time making sure that the tests that covers your changes still passes. The area you have selected here is very hard to improve significantly upon, especially since it is not in the hot path. |
removed old commented code and implemented a version using the System.IO.Stringreader
some updates on this PR: I've implemented the method by using the StringReader and StringBuilder. It currently passes all CI tests, so I assume my implementation is fine. I assume use of the Stringbuilder has a positive impact on performance since the I've reduced the complexity of this method according to CodeFactor (and is regarded as being a fixed issue to the repo). Although I still think it can be improved by splitting some things and might do so. @powercode I used this PR aswell to get a bit used to contributing to powershell, and I chose to tackle an issue which could be fixed with some refactoring. Therefore, I didn't start with analyzing the bottlenecks since I had to set up things yet and wanted to start small by just refactoring some code. I agree with you in the sense that it would make more sense to analyze from a bit higher perspective to find the true bottlenecks. So the next step for me is to test it for performance against the old implementation, after that I'll remove the WIP tag (assuming performance is not worse). |
src/Microsoft.PowerShell.Commands.Utility/commands/utility/Csv.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.PowerShell.Commands.Utility/commands/utility/Csv.cs
Outdated
Show resolved
Hide resolved
@Geweldig makes sense, thanks for the suggestions! |
… values somehow. That throws an error instead of skipping the line
Sorry for the delay. I managed to get rid of the |
src/Microsoft.PowerShell.Commands.Utility/commands/utility/ImportAliasCommand.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.PowerShell.Commands.Utility/commands/utility/ImportAliasCommand.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.PowerShell.Commands.Utility/commands/utility/ImportAliasCommand.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.PowerShell.Commands.Utility/commands/utility/ImportAliasCommand.cs
Outdated
Show resolved
Hide resolved
test/powershell/Modules/Microsoft.PowerShell.Utility/Import-Alias.Tests.ps1
Outdated
Show resolved
Hide resolved
test/powershell/Modules/Microsoft.PowerShell.Utility/Import-Alias.Tests.ps1
Outdated
Show resolved
Hide resolved
test/powershell/Modules/Microsoft.PowerShell.Utility/Import-Alias.Tests.ps1
Outdated
Show resolved
Hide resolved
…ortAliasCommand.cs Co-Authored-By: Ilya <darpa@yandex.ru>
@iSazonov @daxian-dbw do you see any pending issues with this PR? |
src/Microsoft.PowerShell.Commands.Utility/commands/utility/ImportAliasCommand.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.PowerShell.Commands.Utility/commands/utility/ImportAliasCommand.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SytzeAndr Please fix Codacy issues in the new code: specifically about converting 'CreateItemOptions','ConstructAlias','IsValidParsedLine' to be static methods. Thank you.
…ortAliasCommand.cs Co-Authored-By: Ilya <darpa@yandex.ru>
Hi, sorry for the radio silence. I fixed the
Regarding static methods, I didn't made them static due to the following reasons. I'm not sure why codefactor doesn't pick these up. Let me know if you agree, if so I suspect it is fine for merging.
|
while (i < csvTrimmed.Length && inQuotes) | ||
{ | ||
i++; | ||
nextChar = csvTrimmed[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boundary after i++ can be = csvTrimmed.Length:
while (++i < csvTrimmed.Length && inQuotes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that introduce a bug? Replacing i
with ++i
would increment i
when the conditional statement is checked, regardless whether inQuotes
is true or false. In other words, when i
is at the index of the second "
(the escape quote), it will increment i
one additional time, skipping the first character after we escape the quote sequence.
For example, a"b"cd
would then be parsed to abd
, which is incorrect as it should be parsed to abcd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iSazonov , can you please looks at this again? thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment is only about that there can be boundary overflow because of i++
without follow check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true but would require some different solution, which is replacing line 147 with i + 1 < csvTrimmed.Length && inQuotes
.
I just commited and pushed this change to this branch
@daxian-dbw Can you please review this again? thank you. |
|
||
private static Collection<string> ParseCsvLine(string csv) | ||
{ | ||
ReadOnlySpan<char> csvTrimmed = csv.Trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
csv should be ReadOnlySpan to avoid allocation here.
This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days. |
Since there is no activity for a long time and this is too edge case, we can close the PR. |
PR Summary
The old implementation used a for loop with lots of if statements to parse a csv string, which is used in the
import-alias
cmdlet. I think the old implementation is bug sensitive and hard to read. A new implementation can have a reduced complexity and uses theStringReader
andStringBuilder
to convert a string in csv format to a collection of elements, while keeping functionality intact. Experimental analysis shows this new implementation performs better, especially for large Strings.PR Checklist
.h
,.cpp
,.cs
,.ps1
and.psm1
files have the correct copyright headerWIP:
or[ WIP ]
to the beginning of the title (theWIP
bot will keep its status check atPending
while the prefix is present) and remove the prefix when the PR is ready.[feature]
to your commit messages if the change is significant or affects feature tests