Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export-Csv and ConvertTo-Csv fail when the first object is empty #15273

Open
iRon7 opened this issue Apr 19, 2021 · 11 comments
Open

Export-Csv and ConvertTo-Csv fail when the first object is empty #15273

iRon7 opened this issue Apr 19, 2021 · 11 comments
Labels
Needs-Triage The issue is new and needs to be triaged by a work group. WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module

Comments

@iRon7
Copy link

iRon7 commented Apr 19, 2021

Export-Csv and ConvertTo-Csv fail when the first object is empty, yet the display output (and other cmdlets like Format-* cmdlets) appear to work fine.

Steps to reproduce

@(
    [pscustomobject]@{}             # Remvoe this line to "fix" the issue
    [pscustomobject]@{a=2; b=5}
    [pscustomobject]@{a=4; b=6}
) | ConvertTo-Csv

Expected behavior

"a","b"
"2","5"
"4","6"

Actual behavior

No results

Environment data

Name                           Value
----                           -----
PSVersion                      7.1.3
PSEdition                      Core
GitCommitId                    7.1.3
OS                             Microsoft Windows 10.0.19042
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

Related: ConvertTo-Csv and Export-Csv fail when -usequotes asneeded option is used and input object has null properties #12279

@iRon7 iRon7 added the Needs-Triage The issue is new and needs to be triaged by a work group. label Apr 19, 2021
@mklement0
Copy link
Contributor

mklement0 commented Apr 19, 2021

ConvertTo-Csv / Export-Csv, like (possibly implicit) -Format-Table (normally), lock in the set of properties to serialize / display in columns based on the first object only.

Based on that, were' seeing two inconsistencies:

  • ConvertTo-Csv should output 4 empty lines , as Export-Csv does (in the latter case followed by a trailing newline, as usual); that is, an empty line for the header and 3 empty data rows; it currently outputs an empty string.

  • (Implicit) Format-Table should similarly output empty lines only (the behavior just happens to be more helpful in this case, but I suspect that just's a lucky fallout from the implementation rather than by design).

We could consider emitting a warning in this case.

See also:

@iRon7
Copy link
Author

iRon7 commented Apr 19, 2021

@mklement0,

You have a point, I indeed didn't look at it this way.
Yet:

ConvertTo-Csv should output 4 empty lines

If the first empty object, is used to define the columns, the output should be a table with zero columns where you might question whether any of its rows (with zero columns) should be represented as an empty line or just nothing...
Meaning that it differs a little from #13906 where the first object has at least some data.
Rather than emitting a warning in this case, I think it might be safe to consider to skip any empty objects and start with defining the properties (columns) and processing data when the pipeline hits the first object that isn't empty.

@mklement0
Copy link
Contributor

mklement0 commented Apr 19, 2021

Personally, I think emitting an empty line for each input object is more consistent, but it's definitely a pathological case, which is why I think a warning is called for.

To me, the question is whether (public) property-less objects among the input objects occur frequently enough to warrant implementing their getting ignored as a fallback - when do such empty objects occur, and is ignoring them helpful, or are they the result of an upstream error that users had better be alerted to?

And even if ignoring is the way to go, I'd be inclined to also emit a warning.

@iRon7
Copy link
Author

iRon7 commented Apr 20, 2021

@mklement0,

when do such empty objects occur

Sorry, I think I wasn't clear, it only concerns the first consecutive empty objects.
To be more clear, taken the following object list:

@(
    [pscustomobject]@{}         # Row 1
    [pscustomobject]@{}         # Row 2
    [pscustomobject]@{a=1; b=5} # Row 3
    [pscustomobject]@{a=2; b=6} # Row 4
    [pscustomobject]@{}         # Row 5
    [pscustomobject]@{a=3; b=7} # Row 6
) | ConvertTo-Csv

I would expect the output to be:

"a","b"
"1","5"
"2","6"
,
"3","7"

... and not to be empty (or any list of empty lines).
As a known PowerShell behavior, the ConvertTo-Csv cmdlet (along some other cmdlets) takes the first object to define the properties of the following object in the pipeline. But if the first object (or consecutive objects, as in the above example Row 1 and Row 2) is an empty object, it is quiet pointless to take that as model to define the column header names (as there aren't any, causing any void/column-less table with void/column-less rows).
Meaning, you might as well skip (omit) just the first empty objects and begin outputting (starting with a header row "a","b") at the first filled object: "1","5" (Row 3).
In this case, a warning might make sense as a scripter might expect the following output instead (which will not happen because PowerShell needs to use the first empty object to define the header for that):

"a","b"
,
,
"1","5"
"2","6"
,
"3","7"

the warning could be something like:

The first consecutive objects have been omitted because they didn't contain any property information.

For the empty object at Row 5. This is a different situation, the properties (column names) are already known and defined. If it concerns a single column it might just produce an empty line, if it concerns multiple lines, it should return a (column count - 1) number of consecutive delimiters (commas). I think, in this case it is clear from the output that the object is empty (and therefore doesn't require any additional warnings).

@iRon7
Copy link
Author

iRon7 commented Apr 20, 2021

Second/other thought (quoting myself):

In this case, a warning might make sense as a scripter might expect the following output instead ...

Considering the fact that it might hardly use any memory just to count the preceding empty objects ($EmptiesCount).
The $EmptiesCount can be used at the moment the first non-empty object arrives from the pipeline, and than:

  • Output the csv header: "a","b"
  • Output the lines with empty properties: @(',' * (@("a","b").count - 1)) * $EmptiesCount
  • Output the first non-empty object "1","5"

After that, treat each following pipeline object (Output "2","6" etc.) as usual.

(if in the end no objects where found with properties, return nothing: AutomationNull)

Caveat: If the list of input objects also contains non-unified objects (#13906) it might get somewhat confusing as it will take the first object that has properties to define the column names (rather than the first possible empty object).

@mklement0
Copy link
Contributor

it only concerns the first consecutive empty objects.

Understood, but the questions in my mind still are:

  • When do such objects occur?
  • When they do, are they a legitimate byproduct of intended activity, or are they indicative of a problem?

In the latter case, I don't think it is worth special-casing this scenario, and a warning - as a courtesy - would alert users to the fact that the input results in no meaningful output.

@iSazonov
Copy link
Collaborator

Situations like this cannot be solved in general. Specific scenarios need to be considered. Is there such a scenario?

Example, I read an information from an service, then send to ConvertTo-Csv

  1. For [pscustomobject]@{} - what is output I expect?
"a","b"
,
  1. I re-run and get [pscustomobject]@{a=1; b=5} - what is output I expect?
"a","b"
1,5

This obviously says I need to pre-process my input before send to ConvertTo-Csv to get expected results.

@iSazonov iSazonov added the WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module label Apr 20, 2021
@iRon7
Copy link
Author

iRon7 commented Apr 20, 2021

My used case is actually stackoverflow answer: PowerShell: Extracting HTML table as CSV.
In the earlier function version (below)

Read-HtmlTable (old)
Function Read-HtmlTable {
    [CmdletBinding()][OutputType([Object[]])] param(
        [Parameter(ValueFromPipeLine = $True, Mandatory = $True)][String]$Html,
        [Int[]]$TableIndex
    )
    Begin {
        Function Get-Elements { # Get closed descendants by tag name
            [CmdletBinding()][OutputType([__ComObject[]])] param(
                [Parameter(Mandatory = $True)][String]$TagName,
                [Parameter(Mandatory = $True, ValueFromPipeLine = $True)]$Element
            )
            if ($Element.tagName -eq $TagName) { $Element }
            else { $Element.Children | Foreach-Object { $_ | Get-Elements $TagName } } 
        }
    }
    Process {
        $Unicode = [System.Text.Encoding]::Unicode.GetBytes($Html)
        $Document = New-Object -Com 'HTMLFile'
        if ($Document.IHTMLDocument2_Write) { $Document.IHTMLDocument2_Write($Unicode) } else { $Document.write($Unicode) }
        $Index = 0 
        foreach($table in ($Document.Body | Get-Elements 'table')) {
            if (!$PSBoundParameters.ContainsKey('TableIndex') -or $Index++ -In $TableIndex) {
                $Names = $Null
                foreach ($tr in ($table | Get-Elements 'tr')) {
                    if (!$Names) { $Names = ($tr | Get-Elements 'th').innerText }
                    if (!$Names) { $Names = ($tr | Get-Elements 'td').innerText }
                    else {
                        $Values = ($tr | Get-Elements 'td').innerText
                        $Properties = [Ordered]@{}
                        $Count = [Math]::Min($Names.Count, $Values.Count)
                        for ($i = 0; $i -lt $Count; $i++) { $Properties[$Names[$i]] = $Values[$i] }
                        if ($Properties.Count -gt 0) { [pscustomobject]$Properties }
                        [pscustomobject]$Properties
                    }
                }
            }
        }
    }
}

$url = "https://winreleaseinfoprod.blob.core.windows.net/winreleaseinfoprod/en-US.html"
$webClient = New-Object System.Net.Webclient
$HTML = $webClient.DownloadString($url)

There was a bug where it produced an unwanted empty object ([pscustomobject]@{}) at start.
The pitfall in this, is that if you look at the display output:

$HTML | Read-HtmlTable -Table 0 | Format-Table

It appears to run just fine, yet if you output the results to a csv file (using Export-Csv or ConvertTo-Csv):

$HTML | Read-HtmlTable -Table 0 | ConvertTo-Csv

the output completely disappears.

Maybe there is too much weight on this issue and should just be noted and closed...

@iSazonov
Copy link
Collaborator

I can only think about -Header parameter (as in Import-Csv). Although it too complicates the cmdlet. A simple filter is much simpler and clearer.

@mklement0
Copy link
Contributor

If we were to take action here, my recommendation is as follows:

  • Fix the inconsistent Format-Table and ConvertTo-Csv behaviors: output empty lines for each input object if the first one has no (public) properties, as Export-Csv already does.

  • Emit a warning in this case.

@microsoft-github-policy-service microsoft-github-policy-service bot added Resolution-No Activity Issue has had no activity for 6 months or more labels Nov 16, 2023
@iRon7
Copy link
Author

iRon7 commented Nov 16, 2023

The Issue-Bug still exists:

$PSVersionTable

Name                           Value
----                           -----
PSVersion                      7.3.9
PSEdition                      Core
GitCommitId                    7.3.9
OS                             Microsoft Windows 10.0.19045
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

@microsoft-github-policy-service microsoft-github-policy-service bot removed the Resolution-No Activity Issue has had no activity for 6 months or more label Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs-Triage The issue is new and needs to be triaged by a work group. WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module
Projects
None yet
Development

No branches or pull requests

3 participants