Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConvertTo-Csv / Export-Csv do not create valid CSV with new-lines and double-quotes #9284

Open
jsnape opened this issue Apr 3, 2019 · 11 comments
Labels
Issue-Enhancement the issue is more of a feature request than a bug Up-for-Grabs Up-for-grabs issues are not high priorities, and may be opportunities for external contributors WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module

Comments

@jsnape
Copy link

jsnape commented Apr 3, 2019

Steps to reproduce

Using the following as a test (this is valid CSV).

label,testvalue
"blank",""
"null",
"comma","the quick, brown, fox"
"single-quote","the quick 'brown' fox"
"double-quote","the quick ""brown"" fox"
"newline","the quick brown
     fox"
"newline after double-quote","the quick ""brown""
     fox"
"newline before double-quote","the quick ""brown
    "" fox"
"double-quote comma newline","the quick ""brown"",
     fox"
PS> (gc test.csv | convertFrom-csv) | Export-csv -Path test2.csv -NoTypeInformation
# or
PS> (gc test.csv | convertFrom-csv) | ConvertTo-Csv -NoTypeInformation | Out-File -Path test2.csv -NoTypeInformation

Expected behavior

The output file should look the same as the input file

Actual behavior

Its OK that there are quotes around everything now but the formatter has closed the double-quote before the line break and as a result introduced new records into the file.

"label","testvalue"
"blank",""
"null",
"comma","the quick, brown, fox"
"single-quote","the quick 'brown' fox"
"double-quote","the quick ""brown"" fox"
"newline","the quick brown"
"fox""",
"newline after double-quote","the quick ""brown"""
"fox""",
"newline before double-quote","the quick ""brown"
" fox""",
"double-quote comma newline","the quick ""brown"","
"fox""",

Environment data

$PSVersionTable

Name                           Value
----                           -----
PSVersion                      5.1.17763.316
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.17763.316
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
@jsnape jsnape added the Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a label Apr 3, 2019
@jsnape jsnape changed the title ConvertTo-Csv / Export-Csv do not create valid CSV with blank lines and double-quotes ConvertTo-Csv / Export-Csv do not create valid CSV with new-lines and double-quotes Apr 3, 2019
@SteveL-MSFT SteveL-MSFT added the Up-for-Grabs Up-for-grabs issues are not high priorities, and may be opportunities for external contributors label Apr 3, 2019
@banoresaurabh
Copy link

Hi there, I would like to work on this issue.

@vexx32
Copy link
Collaborator

vexx32 commented Apr 7, 2019

It's Up-for-Grabs, so feel free to have at it. 😄

@rspears74
Copy link

I've been looking at this. The -UseQuotes parameter can be used to specify whether or not to use quotes. Default is Always. I tried specifying Never and sure enough the output has mostly no quotes, with a few odd extras left over. The AsNeeded option, which seems like what should be the default, returns a null object error and the output csv stops right after the comma after blank, which probably explains why it's not the default.

@bobbytreed
Copy link

I was preparing to document this and ran into the same issue as @rspears74. The error is not very descriptive at all and really doesn't tell you what you need to do to resolve it.

@vexx32
Copy link
Collaborator

vexx32 commented May 31, 2019

Given that typically newlines are record separators in a CSV file, it almost seems like we should be encoding newlines somehow in the file. Is there an established standard for doing so?

@jsnape
Copy link
Author

jsnape commented May 31, 2019

@vexx32 No there isn't a standard for escaping control characters. That would be more like CTX: http://www.creativyst.com/Doc/Std/ctx/ctx.htm

There is a secondary standard which uses an escape character instead e.g. backslash before the newline; a bit like unix pipe delimited files. However this is not supposed to be mixed with quoted fields. i.e. you either quote or escape. A lot of csv parsers don't handle the quoted version though (they only handle the RFC4180 https://tools.ietf.org/html/rfc4180).

If you are asking about converting the newlines to something else e.g. \r\n then, no there is nothing standard about this. That is outside the scope of CSV since really it depends on the reader and writer of these files. Given PowerShell is such a good toolbelt then we could be looking to read and write csv written by many other applications as long as it conforms to RFC4180. Handling escapes would be nice but extra.

@iSazonov
Copy link
Collaborator

We could look how Excel output multi line cells to csv.

@iSazonov iSazonov added the WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module label Jan 15, 2021
@lselden
Copy link
Contributor

lselden commented Jun 15, 2021

Is this too stale to re-comment on? This is still an issue as of Powershell v7.2-preview6. If you use -UseQuotes AsNeeded then fields with newlines or quotes won't be formatted properly. As per @jsnape There is an accepted standard for escaping fields (RFC4180) that has the following relevant rule (link):

  1. Fields containing line breaks (CRLF), double quotes, and commas
    should be enclosed in double-quotes.

The Import-Csv and ConvertFrom-Csv both correctly handle CSVs formatted in this standard. Additionally, it's the standard used by both Excel and Google Sheets, as well as the popular libraries CsvHelper and fast-csv.

Relevant code:

To be RFC-4180 compliant you'd quote if you detect any newline characters (\r or \n), the escape character (i.e. ") or the delimiter (i.e. " or \t).

@iSazonov iSazonov added Issue-Enhancement the issue is more of a feature request than a bug and removed Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a labels Jun 15, 2021
lselden pushed a commit to lselden/PowerShell that referenced this issue Jul 13, 2021
Fix for issue PowerShell#9284 - Escape fields that contain quotes and newlines, not just those that contain the delimiter
@kganjam
Copy link

kganjam commented Dec 15, 2021

I have a file that is tab delimited and includes quoted strings. Python's csv library has an option to not interpret quotes (quoting=csv.QUOTE_NONE). It would be nice if PowerShell had an option for this, otherwise I don't have a straightforward way to properly read these files.

e.g., the following loses quotes even though I am using simple tab delimited format:

"`"foo`"`t`"bar`", a`ta, `"baz`"`n`"foo`"`t`"bar`", a`ta, `"baz`"" | convertfrom-csv -Delimiter "`t"
foo bar, a a, "baz"
--- ------ --------
foo bar, a a, "baz"

@lselden
Copy link
Contributor

lselden commented Dec 16, 2021 via email

@microsoft-github-policy-service microsoft-github-policy-service bot added Resolution-No Activity Issue has had no activity for 6 months or more labels Nov 16, 2023
@jsnape
Copy link
Author

jsnape commented Nov 17, 2023

This issue still exists on PowerShell 7.3:

PS>  $PSVersionTable

Name                           Value
----                           -----
PSVersion                      7.3.9
PSEdition                      Core
GitCommitId                    7.3.9
OS                             Microsoft Windows 10.0.19045
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

The output is still incorrect and unchanged from above:

"label","testvalue"
"blank",""
"null",
"comma","the quick, brown, fox"
"single-quote","the quick 'brown' fox"
"double-quote","the quick ""brown"" fox"
"newline","the quick brown"
"fox""",
"newline after double-quote","the quick ""brown"""
"fox""",
"newline before double-quote","the quick ""brown"
" fox""",
"double-quote comma newline","the quick ""brown"","
"fox""",

When using -UseQuotes AsNeeded the behaviour is still incorrect:

label,testvalue
blank,
null,
comma,"the quick, brown, fox"
single-quote,the quick 'brown' fox
double-quote,"the quick ""brown"" fox"
newline,the quick brown
"fox""",
newline after double-quote,"the quick ""brown"""
"fox""",
newline before double-quote,"the quick ""brown"
" fox""",
double-quote comma newline,"the quick ""brown"","
"fox""",

@microsoft-github-policy-service microsoft-github-policy-service bot removed the Resolution-No Activity Issue has had no activity for 6 months or more label Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Enhancement the issue is more of a feature request than a bug Up-for-Grabs Up-for-grabs issues are not high priorities, and may be opportunities for external contributors WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module
Projects
None yet
Development

No branches or pull requests

9 participants