-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect parameter nullability from query and suggest correct parameter functions #7
Comments
The 'best' way to do this would be to make bindings against the native c postgresql API and just call the parser it provides: https://wiki.postgresql.org/wiki/Query_Parsing This is what nodejs tools like https://github.com/zhm/pg-query-parser do, and it wouldn't be incredibly hard to do. Then your task just changes to a tree-traversal. To do this, you'd need:
This could be put into a separate nuget package (nuget supports native dependencies), or bundled in this this library. |
@baronfel Last time we checked, the native parser was not cross platform. It wasn't working on windows and the fork that made it work isn't up to date anymore. I would love to use it is possible now |
To clarify, all types are nullable in PostgreSQL without any exclusion, so it correctly detects them. What you're trying to achieve is to get information about applied constraints. From the documentation:
To support the feature at least partially you can determine nullability for domain types since you can take a value of the |
@YohDeadfall There are many cases where we know exactly when an input parameter should not be null. The easiest example is when inserting values into tables for non-nullable columns. There are many other cases when comparing values of tables against values of parameters: Detecting a bunch of those case would make all the difference in this analyzer even though from the database perspective, null is allowed, in many cases the database clients don't want to provide nulls by mistake. |
libpg_query#33 Windows Support has been stale for quite some time and there doesn't seem to be interest making it work out of the box. My knowledge in C is lacking to know what the library needs for Windows support |
Yeah, I just pointed that nullability isn't a type property (except for domains), but a constraint on a field. So what you need is to detect relationships between supplied parameters and their usages. But there is one thing which makes your life easier. You can't compare any value with null using operators, so in that cases you can be sure that a value isn't nullable. For example, the following query will return you -- @p is new NpgsqlParameter<int?>("p", null)
SELECT @p = NULL::integer; But Therefore, if a parameter doesn't participate in a function, not a part of |
Thanks @YohDeadfall I think that does make life easier. I think this project can start with parsing simple queries and try detect nullability when possible then falling back to "I don't know whether this is nullable or not" for more complicated queries. Let's see how it goes first because detecting the usages will be hard part 😄 |
In case if you would like to implement a full parser take a look at https://github.com/cockroachdb/cockroach written in Go (see contents of /pkg/sql/parser). It's a PostgreSQL like database engine, so should be mostly compatible. It's okay to ping me and ask for help. |
I will check it out, thanks a lot @YohDeadfall ❤️ |
This is the SQL behavior. If you compare a value of NULL (stored in a field or a variable or simply NULL) with anything else (where that compare operation is not IS NULL or IS NOT NULL) that expression is going to return null. Also 100 + NULL returns NULL and so on, calling standard functions with NULL parameters might or might not return NULL values. If I call sin(NULL) it returns NULL but if we have a custom function, it might not return NULL. What do you do if you have expressions that combine multiple parameters or case expressions? Such a parser would have to actually compute types/nullability of an expression in the same way PostgreSql does it. I would say parsing is the easy part. The harder part is establishing the rules that compute the nullability. Imo, we should define the rules for the computation first and go from there. re: Parsers, sorry, it might be an unpopular opinion in this context, but I am a fan of parser generators such as antlr. I haven't seen anything (i.e. articles) that compare parser generators against FP parser combinators. I know that the t-sql parser produced by Microsoft uses antlr. Quite a few years back, I was curious of how postgresql does the parsing - I think it was version 7, and they used a bison/yacc flavor grammar. I would definitely not write SQL parsers by hand. |
@costa100 I don't want to write the parser by hand if there is something usable in .NET targeting
Yeah that is the plan, but in the beginning I don't plan on supporting all possibilities of null detection. Starting with commonly used expressions and queries at first so I don't really need the full language parser but if it is available, I would love to use it 😄 |
My question is this. Look at the Microsoft transact-sql parser: https://docs.microsoft.com/en-us/dotnet/api/microsoft.sqlserver.transactsql.scriptdom.tsqlparser?view=sql-dacfx-150. Is this where you want to go ultimately for your postgresql parser, or perhaps a subset of the sql statements? Anyway, I found this thread: There are links to antlr grammars. This seems to be the most up-to-date: https://github.com/pgcodekeeper/pgcodekeeper/blob/master/apgdiff/antlr-src/SQLParser.g4. Antlr can generate C# code, you can configure the grammar to generate C# code. And interestingly enough, this is the link to the postgresql yacc grammar: https://github.com/postgres/postgres/blob/master/src/backend/parser/gram.y. Side note, I love the code in the postgresql yacc file - it is very readable. |
Thanks a lot for the links @costa100 💯 I will look into the code-gen story for C#, I haven't worked with it before but it looks doable 😄 |
Alright, so to come up with a proof of concept, I started using FParsec and oh boy it is good! This short parser implementation can already parse testSelect """
SELECT username, email
FROM users
WHERE user_id IN (SELECT id FROM user_ids WHERE id IS NOT NULL)
""" {
SelectExpr.Default with
Columns = [Expr.Ident "username"; Expr.Ident "email"]
From = Some (Expr.Ident "users")
Where = Some (Expr.In(Expr.Ident "user_id", Expr.Query(TopLevelExpr.Select {
SelectExpr.Default with
Columns = [Expr.Ident "id"]
From = Some (Expr.Ident "user_ids")
Where = Some(Expr.Not(Expr.Equals(Expr.Null, Expr.Ident "id")))
})))
} I think building a simplified AST in F# makes the type inference and pattern matching easier later on. The parser doesn't need to actually check and validate the query, we let the database do that so here we only use for parameters and detecting which ones should be null or not. I think things will start getting real complicated when I have to take table joins into account and how that affects nullability of parameters. If you guys want to help out with the implementation, feel free to join in and hack away 😄 |
PostgreSQL is a very flexible and extensible database engine which built around itself, so all required information about it's own functions and operators can be obtained from system tables. There are only three types of objects: types, attributes and functions. For all types except some domain types To understand does some type supports Expressions consist from operators and functions. An operator simply is a name and references to functions which actually gets executed. As @costa said the result of a function depends on its inputs. Some functions allow Having this information, the analyzer can request the server about interesting functions and types or all data at once to cache it. To understand what function, operator, type or table the user means in the query
If there is no |
Don't know F#, but if there are simple issues where I can learn the language and help, ping me. |
@Zaid-Ajaj that sounds odd... I may not be understanding what you're trying to do here, but it seems entirely reasonable to try to compare a nullable value to a non-nullable value. As has been noted above, when the nullable value is actually null, the result of the expression is null (which evaluates to false if it's the WHERE predicate) - but that's fine, isn't it? That's very different from trying to insert a null into a non-nullable column, in which case an error is triggered... Stepping back a bit, I'll just say that in Entity Framework Core we deal with nullability calculation a lot, and it is an extremely difficult subject to get right (and to do it efficiently). I don't really know what's being attempted here - maybe a bit more context would be helpful. |
It's just an analyzer which helps by hints and warnings like Rider loved by you (: It may not cover edge cases. |
@roji First we have to understand that this project is simply providing embedded static SQL analysis when using the library Npgsql.FSharp, so it validates the usage from the F# code against the structure of the query, the database schema definitions and the used parameters. The analyzer is already able to properly validate F# code that incorrectly reads a values from a result set (see gif on README). However, when it comes to providing parameters, the analyzer doesn't know which a parameter should be nullable or not to properly correct the corresponding F# code usage.
Scenario nr. 1: checking equality with a non-nullable columnCREATE TABLE users (user_id serial primary key, username text not null) And the query SELECT * FROM users WHERE user_id = @user_id What should the type of Sql.int 42
Sql.intOrNone <Some int | None>
Sql.dbnull I want to extend the analyzer to eliminate the incorrect suggestions of Scenario nr. 2: inserting values into non-nullable columnsTake the query INSERT INTO users (username) VALUES (@input_username)` Again here, the parameter Sql.string
Sql.stringOrNone
Sql.dbnull For this case, I want to make eliminate the suggestions I don't plan on supporting every single scenario and every single edge case. I just want to obtain more information about what values are allowed for parameters and suggesting the users of Npgsql.FSharp with more correct code usage hints. |
@Zaid-Ajaj thanks for the detailed description! So as I wrote above, the second scenario makes a lot more sense to me than the first - since a null will definitely cause an error. For the first case, if your user happens to have an intOrNone (which I assume is similar to a nullable int? am a F# noob...), isn't it perfectly reasonable to pass that as a parameter in that query? Passing a dbnull indeed doesn't make much sense, even if it still doesn't seem incorrect in the same sense as the second scenario (a warning does seem right in this case). |
@roji I admit that in the 1st scenario, I am making some assumptions, namely that someone wouldn't want to test a non-nullable column against a nullable value. In my opinion I find it more reasonable than suggesting to use Yeah
All hints suggestions triggered by this analyzers are warnings |
OK, really interesting stuff - I'd be very interested in seeing where it goes. Note that Npgsql itself contains a basic lexical parser for the PG SQL dialect, since it needs to convert from named |
@roji @YohDeadfall @costa100 in case you are interested, today I went live on stream and actually implemented an initial prototype of how this would look like. I managed to make the analyzer give correct suggestions to INSERT queries when using non-nullable columns. You can watch the stream here on Twitch |
Right now, when we derive the required parameters for a query, we only know what the type is of the parameter. However, we don't know whether a parameter is allowed to be null or not. Currently when the analyzer detects incorrect usage of SQL parameter, it suggests
Sql.{type}
,Sql.{type}orNone
andSql.dbnull
. For non-nullable parameter types, it should only suggestSql.{type}
to use as parameter.According to Npgsql#3115 Unable to detect parameter nullability when using DeriveParameters, this is a limitation of PostgreSQL itself when it describes the types of the available parameters.
Our best bet is build a SQL parser for the query and detect parameter nullability ourselves by using the definitions from the schema of the database. As well as detecting untyped parameters and inferring their types for example
WHERE @p > 10
should infer that@p
isint
orbigint
The text was updated successfully, but these errors were encountered: