-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better support for byte arrays #228
Conversation
Update: for varbinary column @lauxjpn Any thoughts on this? |
@ChrisJollyAU Please post me some test names that demonstrate these issues, so I can tinker around with it. |
For the contains: For the one with length on varbinary You can pretty much just filter on |
So, it is not pretty, but it seems to work to convert the binary data to a unicode string first and then to use normal string functions on it. For example, for the SELECT `s`.`Id`, `s`.`Banner`, `s`.`Banner5`, `s`.`InternalNumber`, `s`.`Name`
FROM `Squads` AS `s`
WHERE INSTR(1, `s`.`Banner`, 0x01, 0) > 0 But the result for this one is correct: SELECT `s`.`Id`, `s`.`Banner`, `s`.`Banner5`, `s`.`InternalNumber`, `s`.`Name`
FROM `Squads` AS `s`
WHERE INSTR(1, STRCONV(`s`.`Banner`, 64), 0x01, 0) > 0 The docs state:
So the following translation should work: return _sqlExpressionFactory.GreaterThan(
_sqlExpressionFactory.Function(
"INSTR",
new[]
{
_sqlExpressionFactory.Constant(1),
_sqlExpressionFactory.Function(
"STRCONV",
new [] { source, _sqlExpressionFactory.Constant(64) },
nullable: true,
argumentsPropagateNullability: new[] { true, false },
typeof(string)),
value,
_sqlExpressionFactory.Constant(0)
},
nullable: true,
argumentsPropagateNullability: new[] { true, true },
typeof(int)),
_sqlExpressionFactory.Constant(0)); I haven't checked, whether we should force a specific |
Would not not be better to try to convert the |
That shouldn't work. Remember, the issue is that there seems to be no function to search a binary/blob/byte array in Jet. So we need to use a function to search strings instead. Strings are Unicode (basically USC-2) in Jet, so every character is represented by two bytes. Therefore, in the clause If you would use |
I'm going to play around a bit more with it. You will only ever be looking for 1 byte. The There are byte versions of most string related functions |
Also just a comment on the length with |
Same mechanism as explained here in this post. The solution should be to convert to a Unicode string first, then use the non-B-version (which is the same as There don't seem to be any binary/blog/byte array targeting functions at all in Jet. |
Yes and no. Within Jet there is no specific byte type so everything of that is read into strings. Of course being unicode based, the size of the strings are allocated based on unicode (double byte) characters that the data will fit in. But the data within that is still using single-byte data. Most string related functions have a
The functions of The opposite of
Now that I'm typing this, it comes to me that the value to search FOR is still a string type and no matter whether you have a single byte or more, as it is created by unicode conventions the function will ultimately get a 1 unicode char (2-byte) parameter. Was hoping to only need to convert the small amount of data, rather than the bigger string to search IN Slightly off topic but helpful if you need for debugging. You can access 2 nice functions that is also supported in Jet:
Expr6 will produce: String and Expr7 will produce 8 (vbString) |
@ChrisJollyAU Well that's good, if most of the B-functions work as you expect, then use them in that way. The fact that The following test would currently fail: [ConditionalTheory]
[MemberData(nameof(IsAsyncData))]
public virtual async Task Byte_array_filter_by_length_literal2(bool async)
{
await AssertQuery(
async,
ss => ss.Set<Squad>().Where(w => w.Banner5.Length == 5),
ss => ss.Set<Squad>().Where(w => w.Banner5 != null && w.Banner5.Length == 5));
AssertSql(
"""
SELECT `s`.`Id`, `s`.`Banner`, `s`.`Banner5`, `s`.`InternalNumber`, `s`.`Name`
FROM `Squads` AS `s`
WHERE LENB(`s`.`Banner5`) = 5
""");
} (It would succeed with This would also be the case, if we would translate it to this: SELECT `s`.`Id`, `s`.`Banner`, `s`.`Banner5`, `s`.`InternalNumber`, `s`.`Name`
FROM `Squads` AS `s`
WHERE LEN(STRCONV(`s`.`Banner5`, 64)) = 5 Looks like The same is true for the |
@lauxjpn Its Jet. Weirdness abounds Byte_array_contains_literal now works but it still fails if using parameter of other column Byte_array_contains_parameter Edit: |
Got a theory for the length
So anything with a length query becomes something like
This does work for the current test cases. The ONLY exception I can see is if your byte data was an even number of bytes and is MEANT to end in a 0 (0x00). Then your length will be 1 short of what you want. This is probably the best we can do under the limitations and probably good enough for the simple stuff. If you have any large data its probably better to query it first into the client then do whatever you want with it Slight tangent: At least in SQL Server you can use the string functions on |
I think there are two ways to deal with this issue:
BTW, theoretically, cases that use a fixed binary column length that is even, can be translated normally, in case you want to implement it. |
I think if I'm going to do anything I would add an option in
How do you mean. In |
I think that is inferior to using an extension method and kind of dangerous. With an extension method, users can precisely use the workaround where needed, which we should support for one that has quirks. I would not want to globally enable this workaround in my user code, because it could change the behavior of queries that I add in the future to my code, without any warning. That could then easily lead to subtle and very hard to trace bugs again (that we would probably only hit once the app runs in production), which we should want to avoid at all costs.
You should generally be able to access the type mapping, which contains the size of the field if the size/field is fixed. If it is not available in the expression visitor you currently use to translate the expression, you can check it later once it is available. |
What about having both options. If they want to turn it on for all cases then they can do that. As well as having the EF.Functions support for if they just want to do it selectively
Will find a test case that I can look at this scenario |
While that is of course technically possible, it still does not solve the issue that using the global option has a high chance of introducing subtle bugs to code, that I as a use have not even written at the time that I enable that option. I would avoid this at all costs. We should setup our users for success, not for failure. Throwing an exception to prompt the user to rewrite a query or use the extension method is cheap. |
…ength due to certain situations with unicode strings
Was more thinking of it being there to give them the choice as to the default behaviour. After thinking about it, its not going to be used too much so not too much necessity for it.
Is the wording of the error message fine or do you have any further suggestions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split the information in two.
src/EFCore.Jet/Query/ExpressionTranslators/Internal/JetByteArrayMethodTranslator.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes some long-standing issues with byte arrays
Some comments
Contains
still doesn't workLENB
seems to only work properly if column is alongbinary
. If has a max length set column type becomesvarbinary(5)
(if length is 5). In this case the return value is6
. Is this a length+1 or the closest multiple of 2 (because double byte char set asLEN
returns3
)