-
Notifications
You must be signed in to change notification settings - Fork 132
RFC for String Manipulating Cmdlets #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
- is technically insufficient from a performance perspective. | ||
- is unintuitive and inconvenient for the end user. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This motivation should convince me that the new cmdlets are very useful, but it convinces me to reverse.
Existing tools to manipulate strings - the most common data type manipulated in PowerShell are lacking in usability.
For many years of PowerShell's existence, no one (neither MSFT nor anyone else) has created a popular module with similar cmdlets. This suggests that the existing opportunities have always sufficed.
The .NET tools available through string and regex are inconvenient to use.
That's a controversial statement, too. Each opportunity has its own field of application. None of the suggested cmdlets replace regex-s. If we talk about performance, .Net methods will be faster, and we can even use C# and assemblies by means of Add-Type
.
String-manipulating operators (such as
-split
) are unintuitive to explore, as their options are not easily discoverable.
String-manipulating operators (such as
-split
) are unintuitive to explore, as their options are not easily discoverable.
This worked for PowerShell version 1.0. Now I have thousands of cmdlets on my computer and it is no longer working.
In addition, any language must be studied before use. If the documentation has a section on manipulating strings and describes the operators and methods of working with strings, this is quite enough. If you think Get-Command *-String
is understandable then try to find ping
using Get-Command *-Ping
. Someday you will find Tes-Connection
and will be puzzled that there is still Test-NetConnection
- it's so easy to find, isn't it? :-)
Neither option allows use on the pipeline, forcing interruption of the pipeline or use of inefficient
ForEach-Object
calls.
Neither option allows use on the pipeline, forcing interruption of the pipeline or use of inefficient
ForEach-Object
calls.
Pipeline in itself is very slow to work. Replacing one cmdlet with another will not give us a very noticeable win.
If you would suggest a solution comparable in performance with sed/awk
, it would be very acceptable.
Currently to be fast we are forced to use c#/Add-Type.
is technically insufficient from a performance perspective.
As stated above this just convinces in reverse.
is unintuitive and inconvenient for the end user.
To the above mentioned it is necessary to add that many languages are created in the world intended for processing of strings and none of them has solved this problem. Like other languages, PowerShell works with strings, it does it slowly and has other drawbacks, but PowerShell has everything you need to handle strings.
Previously, I was asking about scenarios. As a user I have to understand clearly when I Can/Should/Must use these cmdlets and when I Can not/Shouldn't/Must not. This should be in the documentation. This should be in the section about manipulating strings. This should be in the RFC.
Each cmdlet raises many questions. We have a powerful format system. Why should we have Format-String
? If the formatting system does not do something, it is better to enhance it.
It is possible that Out-String
could replace the other cmdlets if it is so important.
Why do we need so many cmdlets? I have thousands of cmdlets installed on my computer. This is becoming a problem. As a user I would prefer to know one Convert-String
cmdlet with the necessary parameters like -Add
.
Here I have gathered what has already been discussed. This is no longer for a protracted discussion, and the arguments for the PowerShell Committee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have an rapidly increasing number of new users to PowerShell, discover-ability is a significantly more important issue.
Operators have always been difficult to discover, and have had limited documentation.
Personally I find -f
to be extremely un-PowerShell. It doesn't clearly state what it does or how it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As to the statement "I haven't seen any modules that do this". We don't always release all of our modules to the public. There are probably dozens of people maintaining similar modules with CMDLets similar to that which has been described.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are multiple groups of users who utilize PowerShell. For the "Programmers", I am sure that a plethora of cmdlets is a bad thing. Also they probably well know all of the quirks of every PowerShell command. However, I would strongly suggest there is a group of users who may be termed "Casual" or "System Admins" for whom a plethora of cmdlets is great. They mainly write one or two line scripts that are just pumping data through a pipeline. For these users, the cmdlets described here are great. Are there other ways to get there, sure, but as PowerShell branches out to serve more OSs and more user groups, I think ease of use and discoverability is key. My 2cents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iSazonov - just because there's no PowerShell module that you can find publicly doesn't mean that:
(1) Folks don't write their own janky implementations internally
(2) Folks have time to write something that presumably might better be handled by the PowerShell team
(3) Folks don't need it or wouldn't use it
It's always painful using some of the workarounds in code - it looks ugly af, and it's harder to follow for many folks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is Simply NOT TRUE that nobody has created a "popular module" with similar cmdlets. PSCX was the original super-popular module. It was the one we told everyone to get, and many people are still using it as a matter of course. It still has compiled Join-String
and Split-String
commands (and Edit-File
for regex replace, as well as related commands like ConvertTo-Base64
and Get-Clipboard
etc.).
In fact, @FriedrichWeinmann, if PowerShell core is going to stomp on yet another set of PSCX commands, I would like to suggest that this time you should make sure your implementation matches what's in PSCX (or be additive to it) instead of creating another source of frustration like Expand-Archive
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also worth pointing out that setting aside PSCX there are other implementations of Join-String
and Split-String
and Update-String
(actually Update-StringInFile
and Edit-File
) and Format-String
(no less than four of these, 2 by MVPs and 2 by MSFT) on the PowerShell gallery...
Frankly though, I'm not sure that argues for including this in PowerShell, rather than shipping it externally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we talk about performance, .Net methods will be faster, and we can even use C# and assemblies by means of Add-Type.
Following your logic, if C# is better, why should PowerShell even exist? And the answer is the vast majority of PowerShell users aren't programmers. Telling them to use Add-Type
is completely unreasonable. In terms of relative performance, this
@('"a","b","c"') * 1000 | Split-String '","' -trim '"'
is both easier to use and faster than the alternative:
@('"a","b","c"') * 1000 | foreach { $_.Trim('"') }
Now I have thousands of cmdlets on my computer and it is no longer working.
And .NET has tens of thousands of properties/methods, etc. Is that not working for you either? How about the PowerShell repo? It has somewhere around 6000 files (including build artifacts). Too many? Anyway, I agree that we need a better way to search for relevant commands but that is orthogonal to this discussion. (but I would certainly be happy to look at an RFC proposing ways to improve the current state of affairs.)
We have a powerful format system. Why should we have Format-String?
Per the RFC text, this cmdlet has nothing to do with Output & Formatting. It's a wrapper around String.Format()
(or -f
) but works in the pipeline on a stream of objects. Right now you have to do:
gci | ForEach-Object { 'name: {0} length' -f $_.name, $_.length }
With Format-String
you just do:
gci | Format-String -FormatString 'name: {0} length' -property name,length
This is easier, more discoverable and certainly faster as there is no ScriptBlock
dispatch.
- `Join-String` | `join` | ||
- `Set-String` | `replace` | ||
- `Split-String` | `split` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re substring
automatically resolving to Get-SubString
: This comes with a hefty performance penalty (and the implementation is incomplete), leading @lzybkr to ponder removing the feature altogether.
Additionally, we have to be careful not to clash with preinstalled external utilities, which on Unix-like platforms applies to join
and split
.
Unfortunately, using the sanctioned verb alias prefixes and abbreviating noun String
as s
does not make for descriptive names (and can also clash): Add-String
-> as
(clashes with /usr/bin/as
(assembler) on Unix); Format-String
-> fs
, Get-Substring
-> gss
, Join-String
-> js
, Set-String
-> ss
(clashes with /bin/ss
(socket utility) on Linux), Split-String
-> sls
(yes, that would clash with Select-String
's alias, which is irregularly named and should be scs
)
Not sure what the right answer is; stick str
in the names? addstr
, formatstr
, substr
, joinstr
, setstr
, splitstr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, sls => Select-String
and cfs => ConvertFrom-String
clearly establish precedence for s
as string, but you might use str
to disambiguate when necessary...
Thanks for all the feedback on this. I've come to the conclusion that it would probably best to completely drop the aliases for now - potential conflicts just are a killer - but without the alias I've seen so far mostly agreement with the introduction on these cmdlets. So how does this work from a governance perspective, what needs to be done for this RFC to move forward and get the implementation voted upon? |
In case this is overlooked, here's a link back to the discussion and many favorable impressions. I may have pestered @FriedrichWeinmann earlier, after seeing yet-another-ugly-piece-of-code that would have been solved by this : ) Cheers! |
- `Format-String` | Implements full `-f` functionality or the `String.Format()` method. Supports gathering multiple items from the pipeline before formatting. | ||
- `Get-SubString` | Implements the `.SubString()` method, the `-Trim` operator as well as the `.TrimStart()` and `.TrimEnd()` methods. | ||
- `Join-String` | Implements the `-join` operator, allowing specifying the number of items to join in each batch. | ||
- `Set-String` | Implements the `-Replace` operator as well as the `.Replace()` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that Update-String
would be a little more intuitive, given this would work on existing strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm ... Update-String
seems a little weird to me given that .NET strings are immutable...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think a more centralised Edit-String
would be more apropos for both Format and Replace operations.
$String | Edit-String -Format $ItemsToInsert
$String | Edit-String -Replace $FindValue [-With $ReplaceValue] [-Literal]
Set-String
seems improper, as does Update-String
; both imply things that aren't really happening. And both -f
and -replace
are effectively just different methods of accomplishing rather similar goals in many cases, though each has its strengths and weaknesses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And both -f and -replace are effectively just different methods of accomplishing rather similar goals
Very different methods. The -replace
operator is a primitive (though it uses regex and works on collections) whereas -f
is a higher order operation (but doesn't work on collections) for one specific purpose - formatting strings - so it has a lot of functionality that specifically targets that domain e.g.
gci -file | foreach{ 'Name: {0,25} Length: 0x{1,-7:x}' -f $_.name, $_.length }
With the -f
operator, the only replacement you can do requires specific, prearranged spots in the source string. With -replace
, you can replace anything anywhere,
$String | Edit-String -Format $ItemsToInsert
I expect that it's actually:
$listOfStrings | String-Format -Format "formatString" -property prop1, prop2, prop3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points, aye.
Hmm. I would think both our scenarios for Format are potentially viable. Might be worth supporting both, perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally love the idea as a second (and frankly, the primary) option on Format-String
Bruce
I disagree with rolling them together into a single command though, @vexx32 - too different as Bruce said, but also ... it breaks with user expectations. Both from the developer side and the admin side.
Is it worth summing up the three main problems with this RFC, as I see them?
All of that said, I'm not entirely opposed, I'm just not sure it's worth including in core. |
Given the pipeline context, what you have to compare these new cmdlets to is the use of language features in a script block passed to And in this scenario cmdlets are generally faster. |
Are you arguing that we shouldn't include these cmdlets because they are popular? And (at least part of) the idea for PSCX, right from the start, was that it would be a proving ground for things we might want to build into PowerShell at some later date. |
@FriedrichWeinmann I think the main things that the Committee would be looking for are full cmdlet signatures (which you can generate from your code), a bit more documentation and some examples illustrating their use - feel free to use any of the examples I've posted if you think they're useful. Finally, since you have code, including performance data would be good since the motivation includes performance improvements as part of the motivation. |
We see that you've published an experimental implementation on the PS Gallery which is awesome. Given our push to experiment with new modules/cmdlets via the Gallery and demonstrate usefulness and demand before deciding whether to deliver them in PowerShell itself. As we did with #117, would you mind retargeting this for withdrawn, and then we can re-evaluate how we want to refactor it (if at all) when it comes time to decide whether or not it should ship in PowerShell? |
Agreed (and thanks for the assessment of that little project :) ) For reference: Install-Module string One of the core pain issues is |
Thanks, @FriedrichWeinmann ! |
No description provided.