Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a framework for creating PostgreSQL functions #82

Merged
merged 1 commit into from
Jan 3, 2024

Conversation

Hydrocharged
Copy link
Collaborator

This is the 2nd version of my attempt at implementing functions.

The first iteration hijacked *tree.FuncExpr and special-cased names of functions that matched the built-in ones for Postgres. Then it returned a new node that approximated the functionality. In some cases this was as simple as returning a different *vitess.FuncExpr node that pointed to a different function, however some cases did not have direct analogues in MySQL. Originally I was going to ignore those, but since I was working on functions anyway, I decided to tackle them to get it over with, similar to my approach with the entire AST conversion. Well that's when I started running into two key issues:

  1. No straightforward conversion for some functions
  2. Different behavior for similar functions

The second was born from the first, as I needed to extend my tests to make sure my massive nodes actually worked for most cases. However, extending the tests showed some issues that are somewhat fundamentally different to how MySQL approaches functions, with a big one being overloading.

The first was a major issue though. I'll post an example in another comment, but some of the nodes became almost unreadable, and they also took forever to create (multiple hours per function, hence the continual delays). The only real maintainable alternative would be to skip the AST conversion pipeline and jump straight to the GMS expressions, but that posed its own issues. Not only is there a lot of boilerplate for expressions, but some of the more sophisticated aspects (such as the origin of values, which I'm calling their Source) are specific to PostgreSQL, So I instead decided to modify the entire set of functions by replacing all of the built-ins (Postgres doesn't want MySQL's built-ins anyway) with a custom "compiled" function structure.

So now, this mimics a bit how overloaded stored procedures work. We define a set of functions under a single name, and those functions become overloads for that name. Whenever a function is called, we evaluate all of the parameters, and use their types to look for an exact match. If one is not found, then we allow some types to cast to others, and check while rotating type casts. If we still don't find anything, then we assume that the function does not exist. In addition, some functions allow for some casts and others don't, so we also store the original type before the cast. This way, a function only needs to worry about its own inputs, and the rest is handled automatically.

This does create the consequence of needing to use specific types (Integer, Float, etc.) for parameters and the return value. I considered allowing native Go types and using a context to host the additional information, but that made it a bit more difficult to handle the overloading and automatic casting, but I could be convinced that it's a better alternative (and probably easier to upgrade too as we add more functions).

On the note of functions, this only adds a few as a proof-of-concept. It's astronomically quicker to write these compared to the AST approach, and I'm going to use that test generation framework to get better testing coverage for functions, since the testing scope is orders of magnitude smaller compared to permutations of every statement, and the think-write-eval loop of coming up with tests manually takes quite a bit of time.

Last thing, all of the files that are related to the function compilation are prefixed with 0_. I wanted to prefix them with a single underscore instead, but apparently Go ignores files that start with an underscore. I didn't want to just dump them together with the functions without any special indication, because they'll end up lost in the sea of files. For example, which files in server/ast aren't nodes? Only way to know is to scroll through all 141 files, so I wanted to avoid that. I also tried separating packages, but then that required either a lone file that would get lost, or an init() in every file that adds functions to the catalog, and both of those seemed worse.

@Hydrocharged Hydrocharged marked this pull request as draft December 23, 2023 12:34
@Hydrocharged
Copy link
Collaborator Author

Not planning to actually merge this until I'm completely done, otherwise we'll probably have some regressions in the sqllogictests.

For fun, here's what the lcm function looked like in the node implementation:

func lcm(exprs vitess.Exprs) (vitess.Expr, bool, error) {
	if len(exprs) != 2 {
		return nil, false, ErrFunctionDoesNotExist
	}
	num1Expr := &vitess.FuncExpr{
		Name: vitess.NewColIdent("ABS"),
		Exprs: vitess.SelectExprs{
			&vitess.AliasedExpr{
				Expr: &vitess.ConvertExpr{
					Name: "CAST",
					Expr: exprs[0],
					Type: &vitess.ConvertType{
						Type: "SIGNED",
					},
				},
			},
		},
	}
	num2Expr := &vitess.FuncExpr{
		Name: vitess.NewColIdent("ABS"),
		Exprs: vitess.SelectExprs{
			&vitess.AliasedExpr{
				Expr: &vitess.ConvertExpr{
					Name: "CAST",
					Expr: exprs[1],
					Type: &vitess.ConvertType{
						Type: "SIGNED",
					},
				},
			},
		},
	}
	const (
		cteName    = "cte"
		num1Name   = "num1"
		num2Name   = "num2"
		num12Name   = "num12"
		num22Name   = "num22"
		nums2Name   = "nums2"
		resultName = "lcm"
	)
	return &vitess.ConvertExpr{
		Name: "CAST",
		Expr: &vitess.FuncExpr{
			Name: vitess.NewColIdent("IF"),
			Exprs: vitess.SelectExprs{
				&vitess.AliasedExpr{
					Expr: &vitess.AndExpr{
						Left:  &vitess.ComparisonExpr{
							Operator: vitess.EqualStr,
							Left:     num1Expr,
							Right:    vitess.NewIntVal([]byte("0")),
						},
						Right: &vitess.ComparisonExpr{
							Operator: vitess.EqualStr,
							Left:     num2Expr,
							Right:    vitess.NewIntVal([]byte("0")),
						},
					},
				},
				&vitess.AliasedExpr{
					Expr: vitess.NewIntVal([]byte("0")),
				},
				&vitess.AliasedExpr{
					Expr: &vitess.Subquery{
						Select: &vitess.Select{
							With: &vitess.With{
								Ctes: vitess.TableExprs{
									&vitess.CommonTableExpr{
										AliasedTableExpr: &vitess.AliasedTableExpr{
											Expr: &vitess.Subquery{
												Select: &vitess.SetOp{
													Type: vitess.UnionAllStr,
													Left: &vitess.Select{
														SelectExprs: vitess.SelectExprs{
															&vitess.AliasedExpr{
																Expr: num1Expr,
																As: vitess.NewColIdent(num1Name),
															},
															&vitess.AliasedExpr{
																Expr: num2Expr,
																As: vitess.NewColIdent(num2Name),
															},
														},
													},
													Right: &vitess.Select{
														SelectExprs: vitess.SelectExprs{
															&vitess.AliasedExpr{
																Expr: vitess.NewColName(num2Name),
															},
															&vitess.AliasedExpr{
																Expr: &vitess.BinaryExpr{
																	Operator: vitess.ModStr,
																	Left:     vitess.NewColName(num1Name),
																	Right:    vitess.NewColName(num2Name),
																},
															},
														},
														From: vitess.TableExprs{
															&vitess.AliasedTableExpr{
																Expr: vitess.TableName{
																	Name: vitess.NewTableIdent(cteName),
																},
															},
														},
														Where: &vitess.Where{
															Type: vitess.WhereStr,
															Expr: &vitess.ComparisonExpr{
																Operator: vitess.GreaterThanStr,
																Left:     vitess.NewColName(num2Name),
																Right:    vitess.NewIntVal([]byte("0")),
															},
														},
													},
												},
											},
											As: vitess.NewTableIdent(cteName),
										},
										Columns: vitess.Columns{
											vitess.NewColIdent(num1Name),
											vitess.NewColIdent(num2Name),
										},
									},
								},
								Recursive: true,
							},
							SelectExprs: vitess.SelectExprs{
								&vitess.AliasedExpr{
									Expr: &vitess.BinaryExpr{
										Operator: vitess.DivStr,
										Left:     &vitess.BinaryExpr{
											Operator: vitess.MultStr,
											Left:     vitess.NewColName(num12Name),
											Right:    vitess.NewColName(num22Name),
										},
										Right: vitess.NewColName(num1Name),
									},
									As: vitess.NewColIdent(resultName),
								},
							},
							From: vitess.TableExprs{
								&vitess.JoinTableExpr{
									LeftExpr: &vitess.AliasedTableExpr{
										Expr: vitess.TableName{
											Name: vitess.NewTableIdent(cteName),
										},
									},
									Join:      vitess.JoinStr,
									RightExpr: &vitess.AliasedTableExpr{
										Expr: &vitess.Subquery{
											Select: &vitess.Select{
												SelectExprs: vitess.SelectExprs{
													&vitess.AliasedExpr{
														Expr:  num1Expr,
														As:    vitess.NewColIdent(num12Name),
													},
													&vitess.AliasedExpr{
														Expr:  num2Expr,
														As:    vitess.NewColIdent(num22Name),
													},
												},
											},
										},
										As:   vitess.NewTableIdent(nums2Name),
									},
								},
							},
							Where: &vitess.Where{
								Type: vitess.WhereStr,
								Expr: &vitess.ComparisonExpr{
									Operator: vitess.EqualStr,
									Left:     vitess.NewColName(num2Name),
									Right:    vitess.NewIntVal([]byte("0")),
								},
							},
						},
					},
				},
			},
		},
		Type: &vitess.ConvertType{
			Type: "SIGNED",
		},
	}, true, nil
}

Compared to the current implementation...it's night and day. The current is also way more robust and accurate.

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally quite sensible, few overarching comments:

  1. There are places in GMS where we use reflect.DeepEquals to compare expressions. We don't do this as part of analysis anymore, but it's used in various test frameworks. Because function fields never return true for DeepEquals, that's not gonna work with these. I don't think you need to do anything about this yet but keep it in mind.

  2. Really strongly opposed to clearing out the MySQL functions at this stage. Leave them alone until we get this further along.

  3. You should convert this to a non-draft and get it checked in ASAP so we can avoid stomping on eachother. I'm doing quite a bit of refactoring for prepared statements and I would rather have the conflicts sooner.

Nice work!

server/functions/0_catalog.go Outdated Show resolved Hide resolved
server/functions/0_compiled_function.go Outdated Show resolved Hide resolved
server/functions/0_overload_deduction.go Outdated Show resolved Hide resolved
server/functions/0_parameters.go Outdated Show resolved Hide resolved
server/functions/0_parameters.go Outdated Show resolved Hide resolved
server/functions/0_parameters.go Outdated Show resolved Hide resolved
server/functions/0_parameters.go Outdated Show resolved Hide resolved
server/functions/gcd.go Outdated Show resolved Hide resolved
server/functions/round.go Outdated Show resolved Hide resolved
server/functions/0_catalog.go Outdated Show resolved Hide resolved
@Hydrocharged Hydrocharged marked this pull request as ready for review January 3, 2024 11:29
@Hydrocharged Hydrocharged merged commit 9262b49 into main Jan 3, 2024
7 checks passed
@Hydrocharged Hydrocharged deleted the daylon/functions2 branch January 3, 2024 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants