Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing pattern constraint causes "cue export" to spin indefinitely #1940

Closed
seh opened this issue Sep 16, 2022 · 4 comments
Closed

Introducing pattern constraint causes "cue export" to spin indefinitely #1940

seh opened this issue Sep 16, 2022 · 4 comments

Comments

@seh
Copy link
Contributor

seh commented Sep 16, 2022

What version of CUE are you using (cue version)?

cue version v0.4.4-0.20220915174651-ad253ed099e9

       -compiler gc
     CGO_ENABLED 1
          GOARCH amd64
            GOOS darwin
         GOAMD64 v1

Does this issue reproduce with the latest release?

Yes, both in version 0.4.3 and the tip as of today (ad253ed).

What did you do?

I wrote a few definitions as described in preceding discussion in the "language" channel of the "CUE" Slack workspace, involving a recursive definition as part of describing Terraform's JSON output format. Building up from there, I wrote definitions for types, values of any type, values of a particular type, and documents with fields bearing values of those types.

Most of this works fine, up until when I try to use a pattern constraint to define that general document schema, where each of the field must bear one of these values. When I try, the cue export command spins using 2.5 CPUs for as long as twenty minutes. That's the longest I've let it run so far before killing it.

I'll share the definitions that I've written so far, together with a small JSON file that shows input data against which I'd unify these definitions.

CUE definitions
#ValueType:
	// See https://github.com/zclconf/go-cty/blob/3792a7b5328c364b10c6d642e3ea0d9761338740/cty/json.go#L16-L89.
	"bool" | // Primitive types
	"number" |
	"string" |
	["list", #ValueType] |
	["map", #ValueType] | // Complex types
	["object", {[string]: #ValueType}] |
	["object", {[string]: #ValueType}, [...string]] |
	["set", #ValueType] |
	["tuple", [...#ValueType]] |
	"dynamic" // Pseudo-type

#OutputValue: {
	type:         #ValueType
	sensitive:    bool
	description?: string
	value:        _
}

// Make defining schemas more convenient.
#BoolTypedOutputValue: #OutputValue & {
	type:  "bool"
	value: bool
}
#NumberTypedOutputValue: #OutputValue & {
	type:  "number"
	value: number
}
#StringTypedOutputValue: #OutputValue & {
	type:  "string"
	value: string
}
let compositeTypedOutputValue = #OutputValue & {
	#collectionType: "list" | "map" | "object" | "set" | "tuple"
	{
		type: [#collectionType, ...]
	}
}
#ListTypedOutputValue: compositeTypedOutputValue & {
	#collectionType: "list"
	value: [...]
}
#MapTypedOutputValue: compositeTypedOutputValue & {
	#collectionType: "map"
	value: [string]: _
}
#ObjectTypedOutputValue: compositeTypedOutputValue & {
	#collectionType: "object"
	value: [string]: _
}
#SetTypedOutputValue: compositeTypedOutputValue & {
	#collectionType: "set"
	value: [...]
}
#TupleTypedOutputValue: compositeTypedOutputValue & {
	#collectionType: "tuple"
	value: [...]
}

// Output defines the schema for data emitted by the "terraform output -json" command.
// THIS IS THE PROBLEM.
//#Output: [string]: #OutputValue
// Loosen the schema just to compare the behavior.
#Output: [string]: _

_#AWSTFOutput: #Output & {
	let hostedZones = #TupleTypedOutputValue & {
		value: [...{
			id:   string // More elaborate in actual code.
			name: string // More elaborate in actual code.
		}]
	}

	acme_dns_challenge_prover_iam_role:   #ObjectTypedOutputValue
	assume_dns_record_publisher_iam_role: #ObjectTypedOutputValue
	external_dns_route_53_hosted_zones:   hostedZones
	iam_users:                            #ObjectTypedOutputValue
	iam_users: value: [string]: {
		arn:       string // More elaborate in actual code.
		unique_id: string // More elaborate in actual code.
	}
	route_53_hosted_zones: hostedZones
}

// Example use of these definitions unified with real input data
example: _#AWSTFOutput & {
	// NB: This is a JSON object pasted in, redacted in places.
	"acme_dns_challenge_prover_iam_role": {
		"sensitive": false
		"type": [
			"object",
			{
				"arn":       "string"
				"name":      "string"
				"unique_id": "string"
			},
		]
		"value": {
			"arn":       "arn:aws:iam::123456789012:role/some/path/acme-dns-challenge-prover"
			"name":      "acme-dns-challenge-prover"
			"unique_id": "AROAZWV55UJHHXTZOPWHI"
		}
	}
	"assume_dns_record_publisher_iam_role": {
		"sensitive": false
		"type": [
			"object",
			{
				"arn":       "string"
				"name":      "string"
				"unique_id": "string"
			},
		]
		"value": {
			"arn":       "arn:aws:iam::123456789012:role/some/path/dns-record-publisher"
			"name":      "dns-record-publisher"
			"unique_id": "AROAZWV55UJHO5BPDUMOB"
		}
	}
	"external_dns_route_53_hosted_zones": {
		"sensitive": false
		"type": [
			"tuple",
			[
				[
					"object",
					{
						"id":   "string"
						"name": "string"
					},
				],
				[
					"object",
					{
						"id":   "string"
						"name": "string"
					},
				],
			],
		]
		"value": [
			{
				"id":   "ZBIUQ9J8XHK3M"
				"name": "example.com"
			},
			{
				"id":   "Z08962273OP4H9PT589NQ"
				"name": "example-1.com"
			},
		]
	}
	"iam_users": {
		"sensitive": false
		"type": [
			"object",
			{
				"ExternalDNS": [
					"object",
					{
						"arn":       "string"
						"unique_id": "string"
					},
				]
				"cert-manager": [
					"object",
					{
						"arn":       "string"
						"unique_id": "string"
					},
				]
			},
		]
		"value": {
			"daExternalDNS": {
				"arn":       "arn:aws:iam::123456789012:user/some/path/ExternalDNS"
				"unique_id": "AIDAZWV55UJ0123456789"
			}
			"dacert-manager": {
				"arn":       "arn:aws:iam::123456789012:user/some/path/cert-manager"
				"unique_id": "AIDAZWV55UJ0123456789"
			}
		}
	}
	"route_53_hosted_zones": {
		"sensitive": false
		"type": [
			"tuple",
			[
				[
					"object",
					{
						"id":   "string"
						"name": "string"
					},
				],
			],
		]
		"value": [
			{
				"id":   "ZBI0123456789"
				"name": "example.com"
			},
		]
	}
}

Note the comment in the source that reads "THIS IS THE PROBLEM." If I uncomment the line after it to read as follows, cue export gets stuck:

#Output: [string]: #OutputValue

The same problem arises if I write the following:

_#AWSTFOutput: [string]: #OutputValue

That is, I'd like to express that _#AWSTFOutput is a JSON document produced by terraform output -json with a particular set of output values, such that each field in the JSON object must bear a value that conforms to #OutputValue.

Relaxing that pattern constraint to the following allows cue export to proceed:

#Output: [string]: _

However, that fails to satisfy my goal of describing a valid JSON object, as these values could be of any type.

What did you expect to see?

cue export should emit the JSON document as it was, very quickly, without complaining about any schema violations.

What did you see instead?

cue export consumes several CPUs for as long as I'll allow it to run, while emitting no output.

@seh seh added NeedsInvestigation Triage Requires triage/attention labels Sep 16, 2022
@seh
Copy link
Contributor Author

seh commented Sep 16, 2022

Note that replacing unification with embedding also induces the same problem. That is, if I write the following—with the aforementioned pattern constraint in place—this problem still occurs:

_#AWSTFOutput: {
	#Output

	// ...
}

Likewise, an inline pattern constraint here induces the problem too:

_#AWSTFOutput: {
	[string]: #OutputValue

	// ...
}

@rogpeppe
Copy link
Member

Here's a shorter version of the above example that seems to exhibit the same problem:

#ValueType:
	["list", #ValueType] |
	["map", #ValueType] |
	["set", #ValueType] |
	["tuple", [...#ValueType]]

#OutputValue: type: #ValueType

#Output: [string]: #OutputValue

#AWSTFOutput: #Output & {
	x: #OutputValue
}

@mpvl
Copy link
Member

mpvl commented Sep 20, 2022

Analysis: it seems that when there is a "cross product" of a disjunction with itself, the cycle detection allows progressing one level deeper, which has the effect of causing quite significant expansion with each "cross product". The issue seems to be that the cycle context is not chained when computing disjunction cross products.

It also seems related to another problem where tracked references (for cycle detection) were inadvertently deleted. This would compound the issue. A fix for that has been sent out.

@mpvl
Copy link
Member

mpvl commented Sep 20, 2022

This will fix the performance issue by detecting cycles on the earliest possible point, preventing the combinatorial computation in the first place:

https://review.gerrithub.io/c/cue-lang/cue/+/543675/1

@mpvl mpvl removed NeedsInvestigation Triage Requires triage/attention labels Sep 20, 2022
cueckoo pushed a commit that referenced this issue Sep 23, 2022
This is no longer necessary with the latest tweaks in the
cycle detection algorithm. Also, it could cause a hang
in some cases.

Oddly, this seems to deepen some of the cycles. Unity shows
no performance impact, though.

This is related to, although not a full solution for,

Issue #1940

Signed-off-by: Marcel van Lohuizen <mpvl@gmail.com>
Change-Id: Iaeee44a7652fa1dbf2e52dc91a1f62e582bf93c6
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/543656
Reviewed-by: Roger Peppe <rogpeppe@gmail.com>
Unity-Result: CUEcueckoo <cueckoo@cuelang.org>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants