-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The last cell of an empty column read by ConvertFrom-Csv
is inconsistently $Null
#17702
Comments
ConvertFrom-Csv
in inconsistently $Null
ConvertFrom-Csv
is inconsistently $Null
Not valid CSV? $Data = ConvertFrom-Csv @'
Id,Name,Note
01,John
02,Jack
03,Ryan,Just a note
04,Luke
05,Noah
'@
$Data.Note.ForEach{ $_ ? 'String' : 'Null' }
<#
Null
Null
String
Null
Null
#> |
The example you show is in fact less "valid" as the one issued because it has less delimiters in the data area than the header. ConvertFrom-Csv @'
Id,Name,Note
01,John
02,Jack
03,Ryan,Just a note
04,Luke
05,Noah
'@ |ConvertTo-Csv -UseQuotes AsNeeded
Id,Name,Note
01,John,
02,Jack,
03,Ryan,Just a note
04,Luke,
05,Noah, |
|
The problem also shows up with |
A comment in the code explicitly says it is "by-design" (for all rows where no data is for tail columns): PowerShell/src/Microsoft.PowerShell.Commands.Utility/commands/utility/CsvCommands.cs Lines 1747 to 1751 in c978d46
Changing this would be a breaking change. So, it is question for WG should be change the design and accept the breaking change. |
@iSazonov, |
@dkaszews You say about edge case. I ask about common case - should we write null or empty string for absent data in any cell. $Data = ConvertFrom-Csv @'
Id,Name,Note
01,John,
02,Jack
03,Ryan,Just a note
04,Luke,
'@
$Data.Note.ForEach{ if ($Null -eq $_) { 'Null' } else { 'String' } } |
@iSazonov From original example, it should be $Data = ConvertFrom-Csv @'
Id,Note,Name
01,,John
02,
03,Just a note,Ryan
04,,Luke
'@ The way I see it, first you split on the delimiter, which gives you array of potentially empty values, then pad with nulls to correct length. With the examples present, it somehow also depends on the very presence of another row after, so reordering rows or truncating the table can change types. |
I tend to think that nulls make no sense at all for CSV. $a = @{q = $null}
$a | convertTo-Csv
"q"
ConvertTo-Csv: Object reference not set to an instance of an object. It is a bug we must fix. But no null literals are in CVS format. So we have to output empty string. Then we could conclude it makes no sense to designate nulls and empties on read too. This looks more consistent. |
Fair, but I'm afraid that changing those nulls may be a breaking change 😕 . That's why I would rather limit the fix to just consistency, so that shuffling, truncating or separating rows does not change their output. |
That's what I originally thought too. But I can't think of a scenario where that would be appropriate. _The main thing here is that CSV has no explicit constant for null._If someone wants to reliably handle nulls, he'll obviously have to come up with such a constant for himself. Only for a specific application or script - it won't work anywhere else. (So technically it's a breaking change bracket 3 - it is unbelievable that something will be destroyed.) |
I guess, it only comes up in malformed CSVs with inconsistent number of columns. So that's what we in C++ world called Undefined Behavior - program can do whatever it wants to. And if somebody has to deal with malformed CSVs, they should program it defensively. Source: used to do processing of malformed CSVs in my previous job. |
For reading, it might break a simple condition as below if ($Null -eq $Data[4].Note) { ... or if if ($Null -eq $Data.Note) { ... Or (incorrectly) having if ($Data.Note -eq $Null) { ... |
@iRon7 These are obvious code snippets. The right question is whether there are popular applications or services that create such corrupted files that could lead to code like yours. I don't know of any. And I can't think of a reason why they would be created. From this I conclude that we should get rid of nulls. |
@dkaszews Is the source of these files in common use? And what did you do with the nulls? |
@iSazonov It was random data downloaded all over the internet, mostly how words are related to each other in different languages. Much of it was poorly exported spreadsheets. I had to safeguard against stuff like random mismatched quotes or u escaped commas and discard rows just to not corrupt everything else, otherwise we had a stuff like 10k rows parsed as a single value. But I was in a different situation because I could discard those rows I did not like, as the entire project was about statistical analysis. |
@dkaszews Thanks! It is not a scenario we should care. |
Still reproduces in: Name Value PSVersion 7.3.9 |
This issue has been marked as "No Activity" as there has been no activity for 6 months. It has been closed for housekeeping purposes. |
If the last cell of an unquoted
Csv
file is empty, the cmdletsConvertFrom-Csv
andImport-Csv
inconsistently returns a$Null
rather than am empty string.Steps to reproduce
Expected behavior
Actual behavior
As shown above all other cells in the above
Note
column are interpreted as string, except for the last cell with appears to be$Null
. This would have been correct if the last comma was omitted but that is not the case.Error details
No response
Environment data
See also: how to filter empty values in a column in powershell
The text was updated successfully, but these errors were encountered: