Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a full set of encoding tests #3488

Closed
6 tasks
iSazonov opened this issue Apr 5, 2017 · 5 comments
Closed
6 tasks

Need a full set of encoding tests #3488

iSazonov opened this issue Apr 5, 2017 · 5 comments
Labels
Issue-Enhancement the issue is more of a feature request than a bug WG-Quality-Test issues in a test or in test infrastructure

Comments

@iSazonov
Copy link
Collaborator

iSazonov commented Apr 5, 2017

Now we don't have a full set of encoding tests (only for redirections). We need to create them during future Encoding RFC implementation (?).

In #3467 (discussion Issue #3248) we fix Default/OEM encoding behavior PowerShell Core on Windows (as in Windows PowerShell) but don't add tests (waiting the RFC). The simplest test (from @mklement0) is:

# Setup:
# Create a no-BOM UTF-8 file with the following literal content:
#   'ö' 
# which, when executed as a PS script, should echo 'ö' back, IF the script
# was correctly decoded from UTF-8 by PS.
# UTF-8 bytes: 0x27 (single quote), 0xc3 0xb6 (encoding of 'ö', U+00F6), 0x27 (single quote)
[byte[]] (0x27, 0xc3, 0xb6, 0x27) | Set-Content -Encoding Byte /tmp/$PID.ps1

# Test: See if the 'ö' is echoed back correctly.
#       Should return $True
'ö' -eq (& /tmp/$PID.ps1)

We need test for:

@iSazonov iSazonov added the WG-Quality-Test issues in a test or in test infrastructure label Apr 5, 2017
@mklement0
Copy link
Contributor

@iSazonov:

The test snippet tests how PowerShell itself reads source code that has no BOM, so can you please add that as a separate action item (check box) to the list of tests?
(Perhaps that behavior is implied by the other tests, but I think it should still be a separate one.)

(And, on a more quibbly note: Can you add the word "character" before "encoding" to the title, so that it's clear what kind of encoding is being referred to?)

And here are the Set-Content / Get-Content test snippets with BOM-less UTF-8 encoding in effect:

# Setup: *create* a  BOM-less UTF-8 file.
'ö' | set-content -nonewline /tmp/$pid.txt

# Tests: Both should output $True

# Compare the raw bytes of the new file to the UTF-8 encoding of 'ö' (0xc3 0xb6)
# With the current alpha17, this would return $False, because Set-Content creates an
# ISO-8859-1 file.
$null -eq (Compare-Object (Get-Content -Encoding Byte -Raw /tmp/$pid.txt) (0xc3, 0xb6))

# See if the BOM-less UTF-8 file is *read* correctly.
'ö' -eq (Get-Content -Raw /tmp/$pid.txt)

@iSazonov
Copy link
Collaborator Author

iSazonov commented Apr 5, 2017

We have never used "character" before so it can confuse us even more. (?)

@mklement0
Copy link
Contributor

mklement0 commented Apr 5, 2017

@iSazonov:

Not that it will matter much in this instance, but just for the record and for future instances where the distinction may matter:

We are discussing character encodings here.

Encoding is a far more generic term, of which character encoding is just one instance.

Given that the issues in this repo span all sorts of topics, giving sufficient context is preferable.

P.S.: In light of the above, I would have preferred the title "Default Character Encoding" for the RFC.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Apr 5, 2017

@mklement0 In the code we use "encoding". So let us defer the question to the discretion of the mantainers.

@mklement0
Copy link
Contributor

@iSazonov:

Sounds good.

It's perfectly understandable to shorten "character encoding" to "encoding" in a given, narrow context, where there's no risk of ambiguity.

My only point is that in a larger context the added specificity can be helpful - both for searching topics by keywords and for quick comprehension.

@TravisEz13 TravisEz13 added the Issue-Enhancement the issue is more of a feature request than a bug label Apr 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Enhancement the issue is more of a feature request than a bug WG-Quality-Test issues in a test or in test infrastructure
Projects
None yet
Development

No branches or pull requests

3 participants