Added a framework for creating PostgreSQL functions #82

Hydrocharged · 2023-12-23T12:34:22Z

This is the 2nd version of my attempt at implementing functions.

The first iteration hijacked *tree.FuncExpr and special-cased names of functions that matched the built-in ones for Postgres. Then it returned a new node that approximated the functionality. In some cases this was as simple as returning a different *vitess.FuncExpr node that pointed to a different function, however some cases did not have direct analogues in MySQL. Originally I was going to ignore those, but since I was working on functions anyway, I decided to tackle them to get it over with, similar to my approach with the entire AST conversion. Well that's when I started running into two key issues:

No straightforward conversion for some functions
Different behavior for similar functions

The second was born from the first, as I needed to extend my tests to make sure my massive nodes actually worked for most cases. However, extending the tests showed some issues that are somewhat fundamentally different to how MySQL approaches functions, with a big one being overloading.

The first was a major issue though. I'll post an example in another comment, but some of the nodes became almost unreadable, and they also took forever to create (multiple hours per function, hence the continual delays). The only real maintainable alternative would be to skip the AST conversion pipeline and jump straight to the GMS expressions, but that posed its own issues. Not only is there a lot of boilerplate for expressions, but some of the more sophisticated aspects (such as the origin of values, which I'm calling their Source) are specific to PostgreSQL, So I instead decided to modify the entire set of functions by replacing all of the built-ins (Postgres doesn't want MySQL's built-ins anyway) with a custom "compiled" function structure.

So now, this mimics a bit how overloaded stored procedures work. We define a set of functions under a single name, and those functions become overloads for that name. Whenever a function is called, we evaluate all of the parameters, and use their types to look for an exact match. If one is not found, then we allow some types to cast to others, and check while rotating type casts. If we still don't find anything, then we assume that the function does not exist. In addition, some functions allow for some casts and others don't, so we also store the original type before the cast. This way, a function only needs to worry about its own inputs, and the rest is handled automatically.

This does create the consequence of needing to use specific types (Integer, Float, etc.) for parameters and the return value. I considered allowing native Go types and using a context to host the additional information, but that made it a bit more difficult to handle the overloading and automatic casting, but I could be convinced that it's a better alternative (and probably easier to upgrade too as we add more functions).

On the note of functions, this only adds a few as a proof-of-concept. It's astronomically quicker to write these compared to the AST approach, and I'm going to use that test generation framework to get better testing coverage for functions, since the testing scope is orders of magnitude smaller compared to permutations of every statement, and the think-write-eval loop of coming up with tests manually takes quite a bit of time.

Last thing, all of the files that are related to the function compilation are prefixed with 0_. I wanted to prefix them with a single underscore instead, but apparently Go ignores files that start with an underscore. I didn't want to just dump them together with the functions without any special indication, because they'll end up lost in the sea of files. For example, which files in server/ast aren't nodes? Only way to know is to scroll through all 141 files, so I wanted to avoid that. I also tried separating packages, but then that required either a lone file that would get lost, or an init() in every file that adds functions to the catalog, and both of those seemed worse.

Hydrocharged · 2023-12-23T12:37:04Z

Not planning to actually merge this until I'm completely done, otherwise we'll probably have some regressions in the sqllogictests.

For fun, here's what the lcm function looked like in the node implementation:

func lcm(exprs vitess.Exprs) (vitess.Expr, bool, error) {
	if len(exprs) != 2 {
		return nil, false, ErrFunctionDoesNotExist
	}
	num1Expr := &vitess.FuncExpr{
		Name: vitess.NewColIdent("ABS"),
		Exprs: vitess.SelectExprs{
			&vitess.AliasedExpr{
				Expr: &vitess.ConvertExpr{
					Name: "CAST",
					Expr: exprs[0],
					Type: &vitess.ConvertType{
						Type: "SIGNED",
					},
				},
			},
		},
	}
	num2Expr := &vitess.FuncExpr{
		Name: vitess.NewColIdent("ABS"),
		Exprs: vitess.SelectExprs{
			&vitess.AliasedExpr{
				Expr: &vitess.ConvertExpr{
					Name: "CAST",
					Expr: exprs[1],
					Type: &vitess.ConvertType{
						Type: "SIGNED",
					},
				},
			},
		},
	}
	const (
		cteName    = "cte"
		num1Name   = "num1"
		num2Name   = "num2"
		num12Name   = "num12"
		num22Name   = "num22"
		nums2Name   = "nums2"
		resultName = "lcm"
	)
	return &vitess.ConvertExpr{
		Name: "CAST",
		Expr: &vitess.FuncExpr{
			Name: vitess.NewColIdent("IF"),
			Exprs: vitess.SelectExprs{
				&vitess.AliasedExpr{
					Expr: &vitess.AndExpr{
						Left:  &vitess.ComparisonExpr{
							Operator: vitess.EqualStr,
							Left:     num1Expr,
							Right:    vitess.NewIntVal([]byte("0")),
						},
						Right: &vitess.ComparisonExpr{
							Operator: vitess.EqualStr,
							Left:     num2Expr,
							Right:    vitess.NewIntVal([]byte("0")),
						},
					},
				},
				&vitess.AliasedExpr{
					Expr: vitess.NewIntVal([]byte("0")),
				},
				&vitess.AliasedExpr{
					Expr: &vitess.Subquery{
						Select: &vitess.Select{
							With: &vitess.With{
								Ctes: vitess.TableExprs{
									&vitess.CommonTableExpr{
										AliasedTableExpr: &vitess.AliasedTableExpr{
											Expr: &vitess.Subquery{
												Select: &vitess.SetOp{
													Type: vitess.UnionAllStr,
													Left: &vitess.Select{
														SelectExprs: vitess.SelectExprs{
															&vitess.AliasedExpr{
																Expr: num1Expr,
																As: vitess.NewColIdent(num1Name),
															},
															&vitess.AliasedExpr{
																Expr: num2Expr,
																As: vitess.NewColIdent(num2Name),
															},
														},
													},
													Right: &vitess.Select{
														SelectExprs: vitess.SelectExprs{
															&vitess.AliasedExpr{
																Expr: vitess.NewColName(num2Name),
															},
															&vitess.AliasedExpr{
																Expr: &vitess.BinaryExpr{
																	Operator: vitess.ModStr,
																	Left:     vitess.NewColName(num1Name),
																	Right:    vitess.NewColName(num2Name),
																},
															},
														},
														From: vitess.TableExprs{
															&vitess.AliasedTableExpr{
																Expr: vitess.TableName{
																	Name: vitess.NewTableIdent(cteName),
																},
															},
														},
														Where: &vitess.Where{
															Type: vitess.WhereStr,
															Expr: &vitess.ComparisonExpr{
																Operator: vitess.GreaterThanStr,
																Left:     vitess.NewColName(num2Name),
																Right:    vitess.NewIntVal([]byte("0")),
															},
														},
													},
												},
											},
											As: vitess.NewTableIdent(cteName),
										},
										Columns: vitess.Columns{
											vitess.NewColIdent(num1Name),
											vitess.NewColIdent(num2Name),
										},
									},
								},
								Recursive: true,
							},
							SelectExprs: vitess.SelectExprs{
								&vitess.AliasedExpr{
									Expr: &vitess.BinaryExpr{
										Operator: vitess.DivStr,
										Left:     &vitess.BinaryExpr{
											Operator: vitess.MultStr,
											Left:     vitess.NewColName(num12Name),
											Right:    vitess.NewColName(num22Name),
										},
										Right: vitess.NewColName(num1Name),
									},
									As: vitess.NewColIdent(resultName),
								},
							},
							From: vitess.TableExprs{
								&vitess.JoinTableExpr{
									LeftExpr: &vitess.AliasedTableExpr{
										Expr: vitess.TableName{
											Name: vitess.NewTableIdent(cteName),
										},
									},
									Join:      vitess.JoinStr,
									RightExpr: &vitess.AliasedTableExpr{
										Expr: &vitess.Subquery{
											Select: &vitess.Select{
												SelectExprs: vitess.SelectExprs{
													&vitess.AliasedExpr{
														Expr:  num1Expr,
														As:    vitess.NewColIdent(num12Name),
													},
													&vitess.AliasedExpr{
														Expr:  num2Expr,
														As:    vitess.NewColIdent(num22Name),
													},
												},
											},
										},
										As:   vitess.NewTableIdent(nums2Name),
									},
								},
							},
							Where: &vitess.Where{
								Type: vitess.WhereStr,
								Expr: &vitess.ComparisonExpr{
									Operator: vitess.EqualStr,
									Left:     vitess.NewColName(num2Name),
									Right:    vitess.NewIntVal([]byte("0")),
								},
							},
						},
					},
				},
			},
		},
		Type: &vitess.ConvertType{
			Type: "SIGNED",
		},
	}, true, nil
}

Compared to the current implementation...it's night and day. The current is also way more robust and accurate.

zachmu

Generally quite sensible, few overarching comments:

There are places in GMS where we use reflect.DeepEquals to compare expressions. We don't do this as part of analysis anymore, but it's used in various test frameworks. Because function fields never return true for DeepEquals, that's not gonna work with these. I don't think you need to do anything about this yet but keep it in mind.
Really strongly opposed to clearing out the MySQL functions at this stage. Leave them alone until we get this further along.
You should convert this to a non-draft and get it checked in ASAP so we can avoid stomping on eachother. I'm doing quite a bit of refactoring for prepared statements and I would rather have the conflicts sooner.

Nice work!

server/functions/0_catalog.go

server/functions/0_compiled_function.go

server/functions/0_overload_deduction.go

server/functions/0_parameters.go

server/functions/gcd.go

server/functions/round.go

server/functions/0_catalog.go

Hydrocharged requested a review from zachmu December 23, 2023 12:34

Hydrocharged marked this pull request as draft December 23, 2023 12:34

Hydrocharged force-pushed the daylon/functions2 branch from 3cb1ac6 to 2624078 Compare December 23, 2023 12:57

zachmu approved these changes Jan 1, 2024

View reviewed changes

Added a framework for creating PostgreSQL functions

5dd1b94

Hydrocharged force-pushed the daylon/functions2 branch from 2624078 to 5dd1b94 Compare January 3, 2024 11:02

Hydrocharged marked this pull request as ready for review January 3, 2024 11:29

Hydrocharged merged commit 9262b49 into main Jan 3, 2024
7 checks passed

Hydrocharged deleted the daylon/functions2 branch January 3, 2024 11:29

Porkepix mentioned this pull request Feb 15, 2024

doltgres 0.4.0 Homebrew/homebrew-core#162807

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a framework for creating PostgreSQL functions #82

Added a framework for creating PostgreSQL functions #82

Hydrocharged commented Dec 23, 2023

Hydrocharged commented Dec 23, 2023

zachmu left a comment

Added a framework for creating PostgreSQL functions #82

Added a framework for creating PostgreSQL functions #82

Conversation

Hydrocharged commented Dec 23, 2023

Hydrocharged commented Dec 23, 2023

zachmu left a comment

Choose a reason for hiding this comment