Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Get-FileEncoding cmdlet or function. #2290

Closed
thezim opened this issue Sep 17, 2016 · 21 comments
Closed

Add Get-FileEncoding cmdlet or function. #2290

thezim opened this issue Sep 17, 2016 · 21 comments
Labels
Committee-Reviewed PS-Committee has reviewed this and made a decision In-PR Indicates that a PR is out for the issue Issue-Enhancement the issue is more of a feature request than a bug Resolution-No Activity Issue has had no activity for 6 months or more Up-for-Grabs Up-for-grabs issues are not high priorities, and may be opportunities for external contributors WG-Cmdlets general cmdlet issues

Comments

@thezim
Copy link
Contributor

thezim commented Sep 17, 2016

This is common task I see across many PowerShell modules and think it would add value for cross platform tasks.

@SteveL-MSFT SteveL-MSFT added the WG-Cmdlets general cmdlet issues label Sep 19, 2016
@adityapatwardhan adityapatwardhan added Issue-Enhancement the issue is more of a feature request than a bug Up-for-Grabs Up-for-grabs issues are not high priorities, and may be opportunities for external contributors labels Sep 19, 2016
@iSazonov
Copy link
Collaborator

iSazonov commented Sep 28, 2016

Do you mean this?
http://poshcode.org/2059
https://gist.github.com/jpoehls/2406504

This suggests that need the following cmdlets: Convert-FileEncoding and Convert-StringEncoding

And the RFC is required.

@thezim
Copy link
Contributor Author

thezim commented Sep 28, 2016

@iSazonov Yes. The additional cmdlets are nice to haves as well.

@iSazonov
Copy link
Collaborator

This is common task I see across many PowerShell modules
@thezim Could you give examples of such modules?

@iSazonov
Copy link
Collaborator

iSazonov commented Oct 6, 2016

I investigated this field. It is questionable. We need the reference algorithm from experts in the field.
Sample http://gnuwin32.sourceforge.net/packages/file.htm

@iSazonov
Copy link
Collaborator

iSazonov commented Dec 7, 2016

For compatibility we need to use the ported file utility. Can we rewrite it on C# and include in the repo as cmdlet?

@lzybkr lzybkr added the Review - Committee The PR/Issue needs a review from the PowerShell Committee label Dec 7, 2016
@joeyaiello
Copy link
Contributor

Posted by @sdwheeler in our Community Call, this is a version from Lee: http://poshcode.org/2153

@SteveL-MSFT
Copy link
Member

@PowerShell/powershell-committee discussed this and recommendation is to have a cmdlet that supports this capability instead of adding to FileInfo. Usage will be more common now that we are cross platform and should be part of the Utility module. Get-FileEncoding and Convert-FileEncoding makes sense from a discovery standpoint. Seems we can just review the parameters at PR time rather than requiring RFC for this one.

@HemantMahawar HemantMahawar added Committee-Reviewed PS-Committee has reviewed this and made a decision and removed Review - Committee The PR/Issue needs a review from the PowerShell Committee labels Dec 8, 2016
@iSazonov
Copy link
Collaborator

iSazonov commented Dec 8, 2016

@joeyaiello If we do a different algorithm then file, it may be misleading Unix users.

@SteveL-MSFT Could you please clarify about the possibility of porting of file utility?

@SteveL-MSFT
Copy link
Member

@iSazonov porting file as a cmdlet makes sense (assuming appropriate licensing). alternatively since I see the file is ported to Windows already, perhaps it's not worth the effort to port file to c# and instead just wrap it in a cmdlet?

@lzybkr
Copy link
Member

lzybkr commented Dec 8, 2016

Our conclusion on this issue was specifically about wanting better support for encodings, nothing more.

I think we also questioned the value in porting file to PowerShell because extensions are the primary way of understanding file types on Windows.

@iSazonov
Copy link
Collaborator

iSazonov commented Dec 9, 2016

@SteveL-MSFT We cannot expect that there is the file utility on each Unix system especially on OsX.

Today I am more deeply researched how file utility works. Encoding detection is very simple (yes, file type detection is overkill for us) and can be easily ported to C#. Thus we can easily achieve compliance with the de facto Unix standard. The bad news is that the code is very old and should be brought into line with modern standards (from FSS-UTF (1992) / UTF-8 (1993) to UTF8 (2003)).

Another bad news is that this utility does not detect codepages. Do we want to make detection of codepages? If so, do we want high-speed heuristics (sample) or will use simpler but slower ways?

Now about the conversion. Simple test:

[text.encoding]::GetEncodings().count

return
in Powershell 5.1 - 140 codepages
in Powershell 6.0 (alfa 13) - 8 codepages
(Unix iconv - ~300 codepages)

Should we completely rely on .Net Core in the expectation that there will be support for multiple charsets? Or should we make our implementation?

@thezim
Copy link
Contributor Author

thezim commented Dec 9, 2016

@SteveL-MSFT for me I was just looking for detection of encodings that existing cmdlets currently accept such as Out-File. No code page usage. I do see the value in a full set of encoding cmdlets though.

@iSazonov
Copy link
Collaborator

iSazonov commented Feb 6, 2017

Opened - Initial discussion about encoding cmdlets PowerShell/PowerShell-RFC#67

@mklement0
Copy link
Contributor

@iSazonov: As an aside re:

We cannot expect that there is the file utility on each Unix system especially on OsX.

file is POSIX-mandated utility and therefore available on most (all?) modern Unix platforms, including macOS (OS X).

That said, the focus of the POSIX file utility spec is on classifying files by content - encodings aren't even mentioned.

In practice, however, both the GNU and the BSD/macOS implementations do report a text file's encoding, including the presence/absence of the UTF-8 pseudo-BOM.

@iSazonov
Copy link
Collaborator

iSazonov commented Mar 2, 2017

@mklement0 Thank you mentioned this utility as POSIX. In most cases, however, it is installed as part of a separate package. This should encourage us to require the installation of this utility when installing PowerShell Core. I believe it is unacceptable for us.
I recently did a little review of GNU file utility and found that its code is too out of date.
I suppose we should not rely on it. Perhaps there is a more modern version, but I don't known about it.

And welcome to discussion PowerShell/PowerShell-RFC#67

@roysubs
Copy link

roysubs commented Nov 21, 2020

I'm not (nearly) as advanced a PowerShell user as you guys, and I have a weak understanding of file encoding (I don't have a clue what the point of a BOM is honestly) but once every year or two, I get stung by file encoding, and the last time (a few days ago), cost us a Production migration as we were scratching our heads why our automation tool could not run batch scripts (the reason was that the batch scripts were generated by PowerShell which defaults to UTF-8 which made the batch scripts broken, but the errors made us think that it was the automation tool that was failing in some way). Such a scenario might all be very trivial/obvious to you guys, but it is not to most users (a "text file" has no deeper complexity than "text file" to most people, most of the time).

Both required tools (Get-FileEncoding and Convert-FileEncoding in PowerShell/PowerShell-RFC#67) are long-overdue as core components of PowerShell. Get- would greatly enhance appreciation of file encoding issues (and the more information the better in my mind, codepages etc), while Convert- becomes more and more important in making PowerShell a useful cross-platform tool. Would really appreciate if this two-years-since-last-comment thread was un-mothballed?

@iSazonov
Copy link
Collaborator

Would really appreciate if this two-years-since-last-comment thread was un-mothballed?

@roysubs This was approved and you can grab the work.

@roysubs
Copy link

roysubs commented Nov 21, 2020

I really wish that I had the ability to do that @iSazonov !

I know that @mklement0 has a very deep understanding of file encoding, I'm hoping that he might have the time to build this... 🙂

@iSazonov
Copy link
Collaborator

@mklement0 is a great analytic but not a fan of coding :-)

Implementation is simple with using StreamReader.CurrentEncoding . Of cause later we could make the cmdlet more "powershel-ly" smart with an heuristics.

@roysubs
Copy link

roysubs commented Nov 21, 2020

Sounds great, and I'll help if I can, but presumably you'd have to do this in C# (I'm more of just a SysAdmin / DevOps type scripter, I just use PowerShell and Python to manage some tasks on my work environments). I want to see PowerShell take over on Linux though, it's just a much better language imo 🙂.

@microsoft-github-policy-service microsoft-github-policy-service bot added the In-PR Indicates that a PR is out for the issue label Oct 27, 2023
Copy link
Contributor

This issue has not had any activity in 6 months, if there is no further activity in 7 days, the issue will be closed automatically.

Activity in this case refers only to comments on the issue. If the issue is closed and you are the author, you can re-open the issue using the button below. Please add more information to be considered during retriage. If you are not the author but the issue is impacting you after it has been closed, please submit a new issue with updated details and a link to this issue and the original.

@microsoft-github-policy-service microsoft-github-policy-service bot added the Resolution-No Activity Issue has had no activity for 6 months or more label May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Committee-Reviewed PS-Committee has reviewed this and made a decision In-PR Indicates that a PR is out for the issue Issue-Enhancement the issue is more of a feature request than a bug Resolution-No Activity Issue has had no activity for 6 months or more Up-for-Grabs Up-for-grabs issues are not high priorities, and may be opportunities for external contributors WG-Cmdlets general cmdlet issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants