Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Do not lift types unless explicitly asked for #209

Open
rogeralsing opened this issue Aug 22, 2017 · 28 comments
Open

SQL: Do not lift types unless explicitly asked for #209

rogeralsing opened this issue Aug 22, 2017 · 28 comments

Comments

@rogeralsing
Copy link

When running aggregating SQL queries over numeric fields, SC will lift the underlying type to a bigger one.

Int32 is lifted to Int64, Double is lifted to Decimal.
This is unintuitive and feels weird, especially lifting Double to Decimal which is not even the same kind of floating point mechanics.

This has implications for e.g. the Linq provider as when you do .Sum(x => x.Int32Prop),
Starcounter will actually return a Int64, we have to have special hacks that check if the generic T returned is of int32, then run the query expecting an int64, cast and return.

I get that operations like Sum can overflow their type, but this still applies to whatever type we lift to, its just a matter of volume.
I'd much rather see that we get the type of the property itself. and add the ability to specify a larger type if you need.

@rogeralsing
Copy link
Author

This is also an issue due to X6Decimal: #169 (comment)

@miyconst
Copy link
Member

Int32 is lifted to Int64.

This is not a big deal, and can be omitted if difficult to implement.

Double is lifted to Decimal

This one is a real problem, since Double and Decimal are different data types, and when the result is not able to fit into current X6Decimal precision it crashes.

@rogeralsing
Copy link
Author

This is not a big deal, and can be omitted if difficult to implement.

It sort of is, when it comes to build generic methods ontop of the API.
e.g. in Linq, any operation like Sum, Count, etc. we need to have hacks to solve this right now.
As the code expects int32 while the SC api will give you int64 under those conditions.

@Mackiovello
Copy link
Contributor

From your description, it seems like the type is just lifted one level, when running aggregates in SQL queries, it seems like it lifts the type several levels:

using Starcounter;
using System.Linq;

[Database]
public class Person
{
    public int Age { get; set; }
}

class Program
{
    static void Main()
    {
        Db.Transact(() =>
        {
            new Person { Age = 1 };
            new Person { Age = 2 };
            new Person { Age = 4 };
        });

        Db.SQL("SELECT AVG(p.Age) FROM Person p").FirstOrDefault(); // ScErrCLRDecToX6DecRangeError 
    }
}

Is there any workaround for this? It would be good to document

@miyconst miyconst changed the title SQL: Do not lift types unless explicitly asked for ⏰ SQL: Do not lift types unless explicitly asked for Oct 9, 2017
@un-tone
Copy link

un-tone commented Feb 23, 2018

Double is lifted to Decimal

What the case for this? Looks that there is no issues with Double type - AVG for it returns a double type value.

@un-tone
Copy link

un-tone commented Feb 23, 2018

Shouldn't AVG return double type result, not decimal? Enumerable.Average works exactly so, for example. It will solve issues like in @Mackiovello's comment above.

cc @miyconst @bigwad @k-rus

@un-tone
Copy link

un-tone commented Feb 23, 2018

Also, I propose to leave Int64 result type for SUM operator for all integer types. It looks reasonable for me.

@miyconst
Copy link
Member

Shouldn't AVG return double type result, not decimal? Enumerable.Average works exactly so, for example. It will solve issues like in @Mackiovello's comment above.

Sounds like the right decision for me. @bigwad?

Also, I propose to leave Int64 result type for SUM operator for all integer types. It looks reasonable for me.

Same here.

@bigwad
Copy link
Member

bigwad commented Feb 26, 2018

We can prefer to adhere to SQL standard here.

p 6.5.9 says:

        b) If SUM is specified and DT is exact numeric with scale
         S, then the data type of the result is exact numeric with
          implementation-defined precision and scale S.

         c) If AVG is specified and DT is exact numeric, then the data
          type of the result is exact numeric with implementation-
          defined precision not less than the precision of DT and
          implementation-defined scale not less than the scale of DT.

in other words,
(b) says that if we SUM Int32 we can promote return type to Int64
(c) says, that for AVG we should never change an exact numeric type (i.e. int, decimal) to approximate type (i.e. double, float)

@bigwad
Copy link
Member

bigwad commented Feb 26, 2018

Shouldn't AVG return double type result, not decimal? Enumerable.Average works exactly so, for example.

Seems it doesn't: https://msdn.microsoft.com/en-us/library/bb354760(v=vs.110).aspx

public static decimal Average(
	this IEnumerable<decimal> source
)

which makes perfect sense as overwise it would ruin decimal's accuracy.

@un-tone un-tone assigned un-tone and unassigned miyconst Mar 1, 2018
@un-tone
Copy link

un-tone commented Mar 1, 2018

I have added tests for each aggregation operator (except COUNT) and input type combination here https://github.com/Starcounter/level1/compare/v2.3-Home209.
I also made a summary table with result types:

Operation Decimal Single Double SByte Int16 Int32 Int64 Byte UInt16 UInt32 UInt64
SUM Decimal Double Double Int64 Int64 Int64 Int64 n/a n/a n/a n/a
SUM changes - - - - - - - - - - -
MAX Decimal Double Double Int64 Int64 Int64 Int64 UInt64 UInt64 UInt64 UInt64
MAX changes - Single - SByte Int16 Int32 - Byte UInt16 UInt32 -
MIN Decimal Double Double Int64 Int64 Int64 Int64 UInt64 UInt64 UInt64 UInt64
MIN changes - Single - SByte Int16 Int32 - Byte UInt16 UInt32 -
AVG Decimal Double Double Decimal Decimal Decimal Decimal Decimal Decimal Decimal Decimal
AVG changes - - - Double Double Double Double Double Double Double Double
COUNT Int64

In the "* changes" rows I wrote result types which I suppose should be implemented for the case.

@bigwad @miyconst please correct me or approve the proposal.

@un-tone
Copy link

un-tone commented Mar 2, 2018

After talking with Kostiantyn, I'm going to update and fix the table with required changes.

@bigwad
Copy link
Member

bigwad commented Mar 2, 2018

  • SUM - why it's not applicable for unsigned datatypes?
  • AVG - implemented in accordance with the standard. no changes needed

@un-tone
Copy link

un-tone commented Mar 2, 2018

SUM - why it's not applicable for unsigned datatypes?

I was getting an exception when did the tests with unsigned integers. I'll re-check this and do updates here or will create a new GH issue for that

@bigwad
Copy link
Member

bigwad commented Mar 2, 2018

AVG - implemented in accordance with the standard. no changes needed

Except the fact, that our decimal range is less than int64, then AVG should be int64 also. At least when the result exceeds decimal limit.

@un-tone
Copy link

un-tone commented Mar 2, 2018

@miyconst agree that SUM should not be changed.
@bigwad doesn't agree that AVG should be changed.

So, the things we should change is only MIN and MAX operators?

@miyconst
Copy link
Member

miyconst commented Mar 2, 2018

Except the fact, that our decimal range is less than int64...

Implement full .Net CLR Decimal support.

@un-tone
Copy link

un-tone commented Mar 2, 2018

Created an issue #385 for the error with unsigned integer types.

@un-tone
Copy link

un-tone commented Mar 10, 2018

So, the things we should change is only MIN and MAX operators?

I need an approval to this conclusion before starting the work.

Operation Decimal Single Double SByte Int16 Int32 Int64 Byte UInt16 UInt32 UInt64
MAX Decimal Double Double Int64 Int64 Int64 Int64 UInt64 UInt64 UInt64 UInt64
MAX changes - Single - SByte Int16 Int32 - Byte UInt16 UInt32 -
MIN Decimal Double Double Int64 Int64 Int64 Int64 UInt64 UInt64 UInt64 UInt64
MIN changes - Single - SByte Int16 Int32 - Byte UInt16 UInt32 -

@miyconst
Copy link
Member

@bigwad what do you say? I think the final table looks correct, but we need an option to cast smaller data types to bigger ones, in case of the overflow exception.

@bigwad
Copy link
Member

bigwad commented Mar 12, 2018

I think the final table looks correct, but we need an option to cast smaller data types to bigger ones, in case of the overflow exception.

Don't get your point on overflow. Are we discussing the latest table with MIN and MAX functions?

@miyconst
Copy link
Member

Don't get your point on overflow.

Apparently I was not fully awake, MIN/MAX will never overflow.

@bigwad
Copy link
Member

bigwad commented May 3, 2018

Spent some time on this upon @un-tone's request to figure out the best way to refactor Anton's latest changes. My conclusion is simple - let's keep away from this. Unless we have guts to redesign half of the query processor, including Prolog parser.

Why is that?
First, this PR solves only part of typing problems, all of which have the same roots: incomplete type inferring in Prolog and limited abilities to process typed expressions in c# part. Because of that given such class:

    [Database]
    public class Bar
    {
        public float f { get; set; }
        public int i { get; set; }
        public Foo b { get; set; }
    }

Following queries will also return double instead of float and they are not fixed by Anton's PR
select MAX(b.b.f) from Bar b
select b.b.f from Bar b
select b.f from Bar b group by b.f having max(b.f)=1.0

Second, this PR has broken some queries that worked before:
select MAX(b.b) from Bar b group by i
select b.f from Bar b group by b.f having max(b.f)=1.0

So, ad hoc fixes don't work as they just add a mess to the query processor and miss the whole picture. And I'm sure I've missed a lot of corner cases myself.

@un-tone
Copy link

un-tone commented May 4, 2018

Thank you, @bigwad, for the investigation. Agree with your conclusion.

@bigwad
Copy link
Member

bigwad commented May 4, 2018

We still shouldn't forget about the issue reported by @Mackiovello earlier in this thread: #209 (comment)

Though it is not about inappropriate type conversion - it's about we not respecting our own decimal limitations when performing calculations. I think we should do it in a separate issue.

Also, timebomb icon can be removed I think, as the most severe issue (converting decimal to double) is not confirmed.

Also2, while working on this I found out we don't have tests for HAVING clause in SqlTest.

@un-tone
Copy link

un-tone commented May 4, 2018

We still shouldn't forget about the issue reported by @Mackiovello earlier in this thread

https://github.com/Starcounter/level1/issues/4631

we don't have tests for HAVING clause in SqlTest

https://github.com/Starcounter/level1/issues/4630

@miyconst miyconst changed the title ⏰ SQL: Do not lift types unless explicitly asked for SQL: Do not lift types unless explicitly asked for May 7, 2018
@un-tone
Copy link

un-tone commented Jul 12, 2018

@miyconst can we specify the status of the issue? Should we move it to another project board or a release? Maybe even close it?

@miyconst
Copy link
Member

@un-tone I move it to later, but let's keep it around for now.

@un-tone un-tone removed their assignment Apr 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants