Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add -EndOfLine character and -DiscardBlankLines parameters to Get-Content #9345

Closed
Liturgist opened this issue Apr 11, 2019 · 14 comments
Closed
Labels
Issue-Enhancement the issue is more of a feature request than a bug Resolution-No Activity Issue has had no activity for 6 months or more WG-Cmdlets-Management cmdlets in the Microsoft.PowerShell.Management module

Comments

@Liturgist
Copy link

Liturgist commented Apr 11, 2019

Summary of the new feature/enhancement

It would be helpful to have an -EndOfLine character parameter in order to read files which use a character to indicate the start of comments.

I want to read a file containing a list of computernames grouped by department. I want to be able to have comment lines to name the groups as well as omit some servers which are known to be unavailable.

# sales
server1
server2  #owned by Bill Smith
# shipping
#server3  #down for hardware failure
server4

Using this list, the following does not work well.

Test-Connection -Computername (Get-Content -Path './hosts.txt') -Count 1

I realize that I could filter them explicitly.

Test-Connection -Computername (Get-Content -Path '.\hosts.txt' | ForEach-Object { if (($_ -ne '') -and ($_[0] -ne '#')) { $_ }})

I would like to use:

Test-Connection -Computername (Get-Content -Path '.\hosts.txt' -EndOfLine '#' -DiscardBlankLines)

@Liturgist Liturgist added the Issue-Enhancement the issue is more of a feature request than a bug label Apr 11, 2019
@iSazonov
Copy link
Collaborator

Dup #3855

@iSazonov iSazonov added the Resolution-Duplicate The issue is a duplicate. label Apr 12, 2019
@Liturgist
Copy link
Author

The #3855 issue is about "delimiters" between lines. That is not what this is about.

@iSazonov
Copy link
Collaborator

EndOfLine assumes "delimiter". If you want "line filter" please correct the PR header and description.

@iSazonov iSazonov removed the Resolution-Duplicate The issue is a duplicate. label Apr 12, 2019
@Liturgist
Copy link
Author

EndOfLine does not assume "delimiter." "Delimiter," for the reasons mklement0 has noted in #3855, is problematic. It is probably a terminology choice that is regretted. But, it was made, so there you are.

Using LineFilter would not imply the desired result here.

Perhaps there is a better terminology than EndOfLine, but this matches up and would produce the same result as the eol setting in cmd.exe FOR statements.

@mklement0
Copy link
Contributor

I definitely like the feature, but I agree that "end of line" would cause confusion with "newline" , cmd.exe's for terminology notwithstanding.

Omitting the word comment is not an option, I think, but it's tricky to come up with the right name.


Let's look at cmd.exe's for /f behavior first - note that I'm assuming "delims=" to read each line in full, as Get-Content does:

  • for /f always, invariably ignores empty lines, but never blank lines (at least 1 char., but all-whitespace)

  • If you add eol=<char>:

    • A line is omitted if and only if it immediately starts with <char>

      • Note, however, that without delims= and with either the default behavior or with tokens=..., <char> lines preceded by whitespace only would be ignored.
    • Therefore, any other occurrence of <char> is not special, so that with eol=# server2 #owned by Bill Smith would be read as-is, for instance, as would   # no mas.

In short: eol=<char> in the context of for /f essentially means "ignore lines that start with [comment character] <char>".


Given that single-character comment sigils virtually always have consider-everything-through-the-end-of-the-current-line-a-comment semantics, I think we can actually omit that aspect in the naming, and use something like -CommentChar instead.

  • To simplify matters, with -CommentChar present, we could invariably skip lines that start with the specified character (optionally preceded by whitespace).

    • That said, separately, a switch for omitting empty and blank lines could be useful, but I'd more simply call it -NoBlankLines (mutually exclusive with -Raw).
  • However, the bigger question is whether to strip suffixes from lines that contain the comment characters inside a line (that doesn't start with the comment char.); that is, should a line such as server2 #owned by Bill Smith with -CommentChar '#' be read as just server2?

    • The problematic aspect is that unconditional stripping assumes that a comment char. always, unequivocally represents the start of a comment, which cannot generally assumed, without knowing a given file's specific format; e.g., with a line such as foo = "bar#none" # true comment, the result may be undesired.

@Liturgist
Copy link
Author

Excellent analysis @mklement0. I would be good with both -CommentChar and -NoBlankLines.

Could there be a switch -CommentForce that would strip from the CommentChar on? Perhaps there is a better name. Yes, the format of the file must be known and not subject to change. We should not violate the "know your data" maxim.

@Liturgist
Copy link
Author

Does this merit further work? What is the process to make such a change?

@iSazonov
Copy link
Collaborator

@Liturgist Example in your initial post is not common enough (I mean a standards like csv or tabular data). @mklement0 also pointed out this problem. So I don't see how we could go forward.
I think the best thing we could do is Tabular data
#3692 (comment)
From this point of view, Get-Content is not the most suitable place for parsing and filtering.

@Liturgist
Copy link
Author

@iSazonov, perhaps it would be good if this were uncommon. There are many files in the world that are process in this way by cmd.exe FOR loops.

WRT @mklement0's comment about unconditional striping from a -CommentChar, that could be a limitation of the mechanism. I am not sure as to the difficulty of parsing quoted text since I would not call myself a language guy. I would think that it is well-known.

What would doing "tablular data" mean? Are you saying that only tab-delimited files could be used?

@iSazonov
Copy link
Collaborator

What would doing "tablular data" mean? Are you saying that only tab-delimited files could be used?

In short mentioned above standard generalizes CSV format so that you can use any delimiters for "fields" and "lines". I believe it solves part of what you ask.

@iSazonov iSazonov added the WG-Cmdlets-Management cmdlets in the Microsoft.PowerShell.Management module label Dec 1, 2021
Copy link
Contributor

This issue has not had any activity in 6 months, if this is a bug please try to reproduce on the latest version of PowerShell and reopen a new issue and reference this issue if this is still a blocker for you.

2 similar comments
Copy link
Contributor

This issue has not had any activity in 6 months, if this is a bug please try to reproduce on the latest version of PowerShell and reopen a new issue and reference this issue if this is still a blocker for you.

Copy link
Contributor

This issue has not had any activity in 6 months, if this is a bug please try to reproduce on the latest version of PowerShell and reopen a new issue and reference this issue if this is still a blocker for you.

@microsoft-github-policy-service microsoft-github-policy-service bot added the Resolution-No Activity Issue has had no activity for 6 months or more label Nov 16, 2023
Copy link
Contributor

microsoft-github-policy-service bot commented Nov 17, 2023

📣 Hey @Liturgist, how did we do? We would love to hear your feedback with the link below! 🗣️

🔗 https://forms.office.com/r/P926k48jRJ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Enhancement the issue is more of a feature request than a bug Resolution-No Activity Issue has had no activity for 6 months or more WG-Cmdlets-Management cmdlets in the Microsoft.PowerShell.Management module
Projects
None yet
Development

No branches or pull requests

3 participants