New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider removing the default -Depth value from ConvertTo-Json #8393

Open
mklement0 opened this Issue Dec 4, 2018 · 25 comments

Comments

Projects
None yet
6 participants
@mklement0
Contributor

mklement0 commented Dec 4, 2018

Summary of the proposal:

  • Remove the default value for -Depth

    • The hard-coded internal limit of 100, which when exceeded, reports an error, is sufficient to prevent "runaway" JSON strings stemming from object trees with cyclic dependencies.
    • Typical input objects will then be fully serialized by default, which is typically the intent.
  • Use -Depth solely at the user's discretion in order to:

    • Intentionally truncate the input object tree at the specified depth.
    • On rare occasions, allow serialization of object trees that are deeper than 100 levels (this could also be a solution to #3181).

Motivation

-Depth defaulting to 2 in ConvertTo-Json has caused much confusion and frustration over the years; @iRon7 has recently tried to create a "canonical" post on SO, which also shows how frequently the issue is arising.

Currently, an input object tree that exceeds the (default) depth doesn't cause an error, but results in near-useless .psobject.ToString() serialization of property values that exceed the depth (see #8381 for a proposal to visualize the cut-off differently).

In combination with the low default -Depth value of 2, that makes for frequent user frustration, because the behavior frequently amounts to quiet de-facto failure that may not be discovered until later.

The seemingly arbitrary and quiet truncation is surprising to most users, and having to account for it in every ConvertTo-Json call is an unnecessary burden.

Backward-compatibility impact

The only way in which I think existing code could be impacted is in that payloads generated with ConvertTo-Json could now increase in depth (and size), if users previously relied on the implicit cut-off at depth 2 - that strikes me as a Bucket 3: Unlikely Grey Area change.

@iSazonov

This comment has been minimized.

Collaborator

iSazonov commented Dec 4, 2018

@iRon7

This comment has been minimized.

iRon7 commented Dec 4, 2018

@mklement0, thank for picking this up.
It is not completely clear to me what happens with the circular references.

  • If you leave them intact up till the hard-coded internal limit of 100, then your Backward-compatibility impact might be even higher than expected: Take for example: get-service | Where-Object {$_.DisplayName -match 'tcp/ip*'} | ConvertTo-Json -Depth 10 where it just concerns a single object (and not even all services) with a depth of 10. Knowing that a depth of 10 for this command takes already several seconds and produces a file of 22Mb (and for -Depth 12; 40sec/ 160Mb).

  • At the other hand, if you cut them off at the point where the circular reference is detected, users might lose circular properties (e.g. $Object.parent.parent.parent.parent). Which might be a downwards compatibility concern too if the property is within the originally used depth (2).

Maybe an new (user) parameter something like: -RecurringDepth = 2 (default: 2), in addition to your purpose might help. In this idea, the -RecurringDepth will cut off only recurring references at the given depth (or below).
But would also understand if additional parameters are generally unwanted.

@SteveL-MSFT

This comment has been minimized.

Member

SteveL-MSFT commented Dec 4, 2018

@iRon7 I don't know the history of -Depth 2, but I also suspect it has to do with recursion where .NET objects reference other instances. The obvious case as you noted is with services which have DependentServices and ServicesDependedOn creating circular references. A -RecurringDepth parameter seems like a reasonable solution along with a warning if the depth gets hit. New parameters are fine as long as they make sense and generally help avoid a breaking change.

@mklement0

This comment has been minimized.

Contributor

mklement0 commented Dec 4, 2018

@iRon7:

I see the problem with the depth of 100, but my guess is that it's fine to lower this to a more reasonable number that covers the majority of use cases while preventing infinite recursion.
(Again, only people with excessively deep trees who rely on default truncation at depth 2 would be affected, which doesn't strike me as likely.)

As for circular references and the default value preventing "runaway" strings when serializing arbitrary .NET types:
It doesn't really make sense to use ConvertTo-Json for that - use Export-CliXml instead.
So I'm not sure that -RecurringDepth is needed, but, as @SteveL-MSFT says, adding parameters is an option.

However, the gist of my proposal is to bring sane default behavior to ConvertTo-Json, and that won't be possible without what is technically a breaking change - fingers crossed for bucket 3, though.

The typical use case is to use custom-made data-transfer objects ("property bags"; hashtables or custom objects) that you want serialized in full - the internal hard limit is there as a safety belt.

Cutting off the input object tree at a given depth should always be an explicit decision, not quietly applied default behavior that may initially even go unnoticed.

That concerns about incautious, atypical use dictate default behavior that:

  • defies user expectations to begin with,
  • is a challenge to remember,
  • is a nuisance to work around, because you need to figure out the actual depth of your data and keep the parameter in sync with later changes (unless you go for -Depth 100, which shouldn't be necessary)

has led to longstanding frustration with the cmdlet.

@SteveL-MSFT

This comment has been minimized.

Member

SteveL-MSFT commented Dec 5, 2018

@mklement0 assuming your assertion on use case is correct, it would be possible to treat PSCustomObject and HashTables differently from other .NET types, but then we have asymmetric behavior and could be confusing to some users.

It would be great if someone could script out some tests to see if other types besides Service has circular references. If that is the case, it may make more sense to special case that one (maybe output a warning) and increase the depth by default.

@mklement0

This comment has been minimized.

Contributor

mklement0 commented Dec 6, 2018

@SteveL-MSFT:

As for what use cases are typical:

My observations are based on what I came across on Stack Overflow and what makes sense to me, but you could use your Twitter clout to solicit feedback from a wider audience.

On a similar note, @iRon7: If you support this proposal at least in essence, please update your SO self-answer to point to my answer, which promotes this proposal. With the current vote tally, users are likely to focus on your justification of the status quo.

@mklement0

This comment has been minimized.

Contributor

mklement0 commented Dec 6, 2018

As for treating [hashtable] and [pscustomobject] instances differently:

Yes, that could be confusing, so we should avoid it. That said, if too much legacy code turns out to rely on the implicit -Depth 2 truncation, it may be an option, though.

It would be great if someone could script out some tests to see if other types besides Service has circular references.

Given that it generally makes no sense to blindly use ConvertTo-Json with arbitrary .NET types, I wouldn't be too concerned about that - except with respect to legacy code in order to assess whether the -Depth 2 should be retained selectively for non-hashtable-non-custom-object instances.

Here are the ca. 20,000 hits for PowerShell code that calls ConvertTo-Json on GitHub:

https://github.com/search?p=5&q=language%3Apowershell+ConvertTo-Json&type=Code

@markekraus

This comment has been minimized.

Collaborator

markekraus commented Dec 7, 2018

I have long suspected the default Depth was too low. That was until I experimented with setting the Depth to 100 in $PSDefaultParameterValues. The result was a marked increase in execution times for bunch of automation I was responsible for. In some cases, the scripts would execute indefinitely. Large numbers of deep and wide (and sometimes infinitely deep) objects can account for that. Changing this could result in user complaints that scripts are taking longer after upgrading to a pwsh version where the depth was set too high. In some cases, they may even have scripts that never finish executing.

While I definitely feel the pain of the default depth being low and silently truncating, I disagree with the assertion that this would not be an impactful change.

I think the default behavior from the beginning should have been to error instead of silently truncate. I agree that 2 is too low. I disagree that increasing the default to 100 is a good idea.

As was discussed in another issue or PR, there are 2 concepts at play here: depth and action when that depth is reached. I think for most uses, slightly increasing the default depth and changing the default behavior to error would be sufficient is solving the majority of pain points. This would need to come with the ability to truncate as an option, as without it infinitely deep objects could never be serialized.

I believe that the assertion that users rarely serialize arbitrary .NET types is false. I have seen plenty of logging that lazily throws any and all objects through ConvertTo-Json -Compress. I would caution that this assertion be investigated thoroughly before any decisions are made.

@mklement0

This comment has been minimized.

Contributor

mklement0 commented Dec 10, 2018

I have seen plenty of logging that lazily throws any and all objects through ConvertTo-Json -Compress

Yeah, looking at some of the code here on GitHub seems to confirm the unfortunate reality of lazy ConvertTo-Json use being not uncommon.

In light of this, @SteveL-MSFT's suggestion is well worth considering; to flesh it out some more:

  • If an input object is a deliberately constructed DTO (property bag), do not apply the default depth (let the internal max. depth prevent infinite recursion); specifically, this applies to:

    • Hashtables and ordered hashtables
    • [pscustomobject] instances whose .pstypenames array contains only System.Management.Automation.PSCustomObject and System.Object (to rule out arbitrary deserialized .NET types)
    • Classes created with PowerShell's class construct.
    • Treat the entries / properties of these two types as follows:
      • Serialize primitive .NET types as usual ([string], [bool], [int], ...)
      • Recurse on entries / properties of the same type (nested hashtable / custom object), applying only the overall max. depth)
      • Serialize any other types with hard-coded per-object depth 1 - that is, serialize only the immediate properties of such a type, and represent nested objects as they .ToString() values.
        • In general, assume that all properties that matter in the resulting JSON should be spelled out via hashtable / [pscustomobject] instances (ultimately) containing properties that have scalar JSON representations - i.e., deliberately constructed as DTOs.
        • The hard-coded per-object depth of 1 ensures that old code that wrapped arbitrary .NET types in hashtables / custom objects continues to work as before (e.g.,
          @{ foo = Get-Item / } | ConvertTo-Json)
  • With any other input, retain the current behavior.

This would alleviate the pain based on straightforward exception rules (the inevitable price to pay for remaining mostly backward-compatible):

  • Lazy piping of arbitrary .NET types to ConvertTo-Json would work as before.

  • Deliberately constructed DTOs would no longer be subject to the quiet truncation at depth 2.

@markekraus

This comment has been minimized.

Collaborator

markekraus commented Dec 10, 2018

@mklement0 How would you handle mixed scenarios? For example, depth 1-5 is occupied by property bags, 6 has a .NET Type but 8-11 are property bags again (possible due to ETS)?

@mklement0

This comment has been minimized.

Contributor

mklement0 commented Dec 10, 2018

@markekraus: What I'm proposing amounts to 3 distinct scenarios:

  • If -Depth is present, the behavior will be as it is now, irrespective of the type of input object (quiet truncation at specified depth).

    • Additionally, for symmetry with #8199, it should be possible to use -Depth as an opt-in to depths greater than the internal max. depth of 100.
  • Otherwise, in the absence of -Depth:

    • If an object is a DTO, as described above:

      • No overall -Depth value is implied; only the internal max. depth is in effect (which can be increased with an explicit -Depth argument).
      • On a per-nested-non-DTO basis, a fixed, non-configurable depth of 1 is applied (note that if it weren't for backward compatibility, it might be better to use 0, i.e., to only include the immediate properties).
    • If an object is of any other type, the behavior will be as it is now (default depth 2).

To put it differently: in the absence of -Depth, it is the root of a given input object's tree that selects one of two fundamental behaviors:

  • A DTO root is not limited depth-wise (except by the max. depth), except for nested non-DTOs, if any, which themselves are limited to depth 1.
  • A non-DTO root is limited to overall depth 2, as before.

Note that I've updated the definition of DTO in the previous comment to include classes created with PowerShell's class construct.

As for the presence of ETS properties: only the base object's type should govern the behavior.

@markekraus

This comment has been minimized.

Collaborator

markekraus commented Dec 10, 2018

I think that proposal of silent asymmetrical behavior will cause more problems than it will solve.

I really do think there is no decent solution to this that does not involve both a depth parameter and an action to be taken at that depth. I also agree that the default depth could be increased.

@mklement0

This comment has been minimized.

Contributor

mklement0 commented Dec 11, 2018

@markekraus:

I also agree that the default depth could be increased.

Given that lazy to-JSON conversion isn't uncommon, that's not really an option, if you want to avoid "marked increase in execution times", as you've stated.
Attempting a lukewarm compromise along the lines of increasing the default depth to say, 4, is not worth the effort, in my estimation: it can still substantially increase the runtime/memory requirements of lazy conversions while still running the risk of unexpected silent truncation for purposefully constructed DTOs.

I think that proposal of silent asymmetrical behavior will cause more problems than it will solve.

Let's start with the problems that it does solve or avoids:

Deliberately constructed DTOs will serialize as expected - with no depth restriction other than the "safety belt" of 100 levels (which you'll be able to override with -Depth).
This will remove the major pain point of the current behavior.

Backward-compatibility:

  • Old code with explicit -Depth will continue to work as-is.
  • Old code that lazily serializes non-DTOs without -Depth will continue to work as-is.
  • The only old code potentially affected is one with explicitly constructed DTOs that relied on implicit truncation at level 2, which strikes me as unlikely.

Which problems do you think it will cause?

@markekraus

This comment has been minimized.

Collaborator

markekraus commented Dec 11, 2018

Which problems do you think it will cause?

"Why is my object cut off? this tree of the object goes 20 deep, but this tree stops at 5"

@mklement0

This comment has been minimized.

Contributor

mklement0 commented Dec 11, 2018

@markekraus:

Fundamentally: The basic DTO/non-DTO dichotomy needs to be well-documented, then there's no mystery.

As for your specific example: If you know that your object tree is 20 levels deep, the implication is that you've constructed a DTO, and no truncation will occur.

By contrast, if you lazily pass arbitrary types to ConvertTo-Json without worrying about (potentially infinite) serialization depth, you're probably not concerned with details such as serialization depth to begin with.

@markekraus

This comment has been minimized.

Collaborator

markekraus commented Dec 11, 2018

@mklement0 I'm referring to a mix of the the two which is common. top level has a property bag but down one branch there is a .NET type with a depth of its own and in that depth there may be another property bag, but another branch may be property bags all the way down. This is asymmetrical and silent. the user will be very confused no matter how much documentation you write.

Again, I think the only real solution to this problem is to add a behavior toggle that affects what happens at dept and change the default behavior to error when depth is reach.

The solution in this issue solves one "why did this happen" and adds a few others. The behavior is inconsistent. Inconsistent behavior leads to constant slew of bug reports and user issues. If the goal is to reduce SO posts, this solution will not accomplish that.

@mklement0

This comment has been minimized.

Contributor

mklement0 commented Dec 11, 2018

The behavior with such mixed objects is explained with this simple rule:

  • if a property is a DTO, serialize without restriction (except the overall max. depth)
  • if it's not, serialize that object with depth 1 (irrespective of whether DTOs happen to be among the properties).

This is reasonable, because you generally can't assume that arbitrary non-DTOs convert meaningfully to JSON. Thus, for a predictable conversion, deliberately constructed DTOs shouldn't have nested non-DTOs, and the depth 1 behavior is again a safety feature that prevents "runaway" serialization.

Also, remember that the proposed change only applies to calls where -Depth isn't specified and your scenario requires -Depth with the current behavior, so existing code isn't affected.

For code going forward, the current confusion would be significantly lessened:

  • Deliberately and properly constructed DTOs will then serialize as expected - it "just works".

  • Only if you throw in non-DTOs and don't also specify -Depth, i.e., "lazy" conversion, can confusion arise - and that confusion can be cleared up with the simple rule above.

Complementarily, the ConvertTo-Json help topic should be updated with guidance as to when use of the cmdlet is appropriate, the pitfalls of non-DTO serialization, ...

The guidance could be summed as follows (would have to be worded differently):

  • To get predictable to-JSON serialization, deliberately construct your input as DTO trees (a DTO whose descendants are also all DTOs).

    • If your tree is more than 100 levels deep and you want to serialize beyond that, use -Depth to bypass the built-in max. depth.
  • If you supply non-DTOs, we'll protect you from "runaway" serialization by silently truncating their object trees at a default depth.

    • Truncation means that non-primitive property values at a given depth are serialized as their .ToString() values instead of further descending into their properties.
    • The default depth is 2 for a non-DTO input object (truncation at the grandchild level), and 1 for non-DTO objects nested inside a DTO input object (truncation at the child level).
  • To explicitly truncate input object trees at a specified depth, use the -Depth parameter.


I think the only real solution to this problem is to add a behavior toggle

That's not a solution, because it doesn't address the problematic default behavior.


As an aside:

I can easily see how "lazy" piping to ConvertTo-Json of non-DTOs stored in a hashtable or custom object (i.e., a DTO root with non-DTO properties) happens - and with the rule above, that would result in effective -Depth 2, i.e., the current behavior is retained.

I personally haven't come across the mixed scenario you describe, where the non-DTO properties in the scenario above in turn have DTOs as properties and where users would have the expectation that they'll serialize without limitation by default.
Can you explain the use case and perhaps give a few examples?

@vexx32

This comment has been minimized.

Contributor

vexx32 commented Dec 11, 2018

That's not a solution, because it doesn't address the problematic default behavior.

Yes, it does. It informs the user that there is a problem, and how to correct it, and the cause of the problem.

Attempting to futz it and make it half work is bound to be full of little idiosyncrasies that just make it too unpredictable for the majority of people to use.

If we were to recommend a thorough solution to a problem like this, I would imagine the code would probably end up needing to do something like this:

  1. Iterate through properties, looking for simple value types. For these, simply write JSON properties directly.
  2. Create an empty list to hold object references.
  3. Recurse through each reference-type complex object or property bag.
  4. Store a reference to each reference-type object to check for recursion, calling ReferenceEquals() on each and if it matches a previously found one, simply call .ToString() or something similar.
  5. If it doesn't match a previously discovered object in the tree, store the reference and proceed to recurse through that object as well.

I don't know whether that would be considered worthwhile, though. 😄

@mklement0

This comment has been minimized.

Contributor

mklement0 commented Dec 11, 2018

Yes, it does. It informs the user that there is a problem, and how to correct it, and the cause of the problem.

I missed the part about defaulting to an error; yes, that would help, but, given the practice of lazy ConvertTo-Json use, would be a serious breaking change - my understanding is that this is considered unacceptable. Even just increasing the default depth is problematic.

full of little idiosyncrasies that just make it too unpredictable for the majority of people to use.

What about the simple rules above strikes you as unpredictable?

Also note that they only apply to haphazard use of the cmdlet; someone who deliberately constructs DTOs as input won't have any problems.

@vexx32

This comment has been minimized.

Contributor

vexx32 commented Dec 11, 2018

The inconsistency arises because there are too many rules for too many similar things. Most users don't distinguish heavily between DTOs and regular objects when it comes to serialising them, and would probably[citation needed] want them and expect them to behave about the same.

@mklement0

This comment has been minimized.

Contributor

mklement0 commented Dec 11, 2018

Most users don't distinguish heavily between DTOs and regular objects when it comes to serialising them

They have to, if they want sensible results.

If they don't care about the specifics of the results, such as piping from Get-ChildItem, (a) nothing will change and (b) they won't need to learn the rules.

would probably want them and expect them to behave about the same.

They shouldn't expect that, and that's where guidance in the help topic comes in.

There's a clear dichotomy between:

  • DTOs you deliberately construct, whose depth is readily obvious, and whose composition of primitive values and nested DTOs only gives you the desired JSON output.

  • Arbitrary (non-primitive) non-DTOs, whose depth you typically don't even know and whose serialization may result in useless JSON values, excessive output or even (potentially) infinite loops.

The latter, "lazy" use may be not uncommon now - hence the need to preserve backward compatibility - but it should be discouraged going forward.

To intentionally incorporate non-DTOs in DTOs, say, Get-ChildItem output, pass them to Select-Object first, selecting the properties of interest, which results in a DTO that serializes predictably and in full.

In short:

  • Users who deliberately construct their DTOs - which should be the primary use case - won't be affected - they'll get the expected behavior by default, which solves the primary problem.

  • Only users who use ConvertTo-Json "lazily" with non-DTOs - if they're even unhappy with the default truncation - need to read the help to understand (a) why the truncation is applied and (b) that they must use the -Depth parameter (which existing users are likely already familiar with) to serialize in greater (or lesser) depth.

As an aside, speaking of lesser depth: arguably, ConvertTo-Json should support -Depth 0 also, so that serialization can be limited to the immediate properties.

@vexx32

This comment has been minimized.

Contributor

vexx32 commented Dec 11, 2018

🤷‍♂️ I offered one potential solution to avoid infinite recursion, and I'm sure there are likely several others. I don't think this is the answer. 😄

@mklement0

This comment has been minimized.

Contributor

mklement0 commented Dec 11, 2018

If you want sensible default behavior while preserving backward compatibility, I don't see an alternative.

Aside from that, sure, detecting infinite recursion early is preferable to waiting for the max. depth to kick in (you may run out of memory first) and it's worth implementing, but it doesn't help the issue at hand:

You don't need infinite recursion to get into trouble: try Get-Item file.txt | ConvertTo-Json -Depth 5, for instance, which runs a good 30 seconds on my machine.

Thus, having a default depth makes sense - for non-DTOs. For DTOs, the depth is finite by definition.
Yes, reporting an error on exceeding the default depth rather than performing quiet truncation would be the more sensible default behavior, but that ship has sailed.

@vexx32

This comment has been minimized.

Contributor

vexx32 commented Dec 11, 2018

Maybe, maybe not. I think that while by and large PowerShell tends to backwards compatibility there are cases where it makes sense to break a few eggs to make a better omelet. 😄

Perhaps this is one of those cases. 🙂

@mklement0

This comment has been minimized.

Contributor

mklement0 commented Dec 11, 2018

are cases where it makes sense to break a few eggs to make a better omelet.

Amen to that. There are many examples where sticking with backward compatibility gets in the way of sensible behavior.

In the case at hand, however, I think it's not necessary to break backward compatibility in order to get sensible behavior for the primary use case.

That said, if a breaking change is acceptable, after all (which is fine by me, but I suspect not by many others), then we could to the following:

  • Perform truncation only on request, via -Depth.

  • By default, error out if the fixed max. depth is exceeded; going beyond the fixed max. depth requires -Depth (which may still involve truncation).

  • Choose the fixed max. depth carefully (much lower than now) so that it (a) works for most deliberately constructed DTOs, while (b) preventing excessive serialization times / sizes for non-DTOs.

  • I then wouldn't bother with Infinite loop detection, as the fixed max. depth would catch the problem. Alternatively, implement it and error out (early), if -Depth isn't also passed.

The above would make the special-casing of non-DTOs unnecessary, but the fixed max. depth may need to be so large - to support the primary use case - that non-DTOs blindly sent to ConvertTo-Json may still be problematic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment