Duplicating non-stable field expr. in query - is completely wrong and can produce wrong results #682

mvorisek · 2020-07-28T11:35:25Z

Of course, I am thinking about optimal data modelling, not about implementation. Implementation is always possible.

Because of separation of Model and Persistence. Each persistence handles the JOIN in a child class

that is the same as with Union - we need universal nesting support from Model which could result in join, union, wrap.

The current design is actually completely wrong as presented a few days ago:
Imagine field with expr RAND() > 0. This field is currently rendered multiple like:

SELECT RAND() > 0 FROM DUAL WHERE RAND() > 0

it is not only rand, any expression like SELECT name FROM t LIMIT 1 is unstable as it has no explicit ORDER. Even if we (in theory, we normally do no even know them) add all table columns as a fallback order, queries like SELECT name, COUNT(*) FROM t ORDER BY name LIMIT 1 will be still unstable because of grouping first.

This is major bug which must be addressed.

Any non-trivial expression must SELECT under an alias ONCE and then this alias must be used anywhere else - https://stackoverflow.com/questions/35695171/remove-duplicate-sub-query

The text was updated successfully, but these errors were encountered:

DarkSide666 · 2020-07-30T17:41:12Z

This is important to think about and also this is not that simple to change and fix. That's because there can be anything in expression field. It can be subquery, some fieldname, function call etc.

What resulting select you would like to see instead of this ?

-- foo = field expression: RAND()
SELECT RAND() as foo FROM DUAL WHERE RAND() > 0

mvorisek · 2020-07-30T17:55:52Z

-- foo = field expression: RAND()
SELECT RAND() as foo FROM DUAL WHERE RAND() > 0

should ALWAYS be queried like

SELECT RAND() as foo FROM DUAL HAVING foo > 0

non trivial fields (or to be precise fields that are non-constant and not using constant functions) that are used multiple times must be firstly selected and then reused/conditioned using HAVING

sql render using nesting is wrong, we must render to some intermediate form and then build sql once

DarkSide666 · 2020-07-30T18:34:39Z

Nested queries should be much more faster that using having which executes on full result-set.
On other hand, in some cases (like mentioned above) using having is a solution.

But again, it will perform much slower. For example, this simple case:

SELECT
    (select name from person where id=e.person_id) AS emp_name
FROM employee e
WHERE 
    (select name from person where id=e.person_id) = 'John'

This is probably faster than using having in case we have millions of employees.

Even more complexity shows up if we use both - where and having in same query. And what if conditions are nested?

mvorisek · 2020-07-30T18:36:47Z

@DarkSide666 this should use join if 1:1, if 1:N, your query will fail (bacause subquery will return more than 1 row)

MySQL and other DBs like Oracle or MSSQL engines can probably optimize some HAVING clauses the same like subqueries

nested conditions - you must query data and then you can use constant functions/junctions/operators like you want

DarkSide666 · 2020-07-30T18:37:33Z

Yes, this example is for simple hasOne case and standard title_field.

mkrecek234 · 2022-12-20T06:01:22Z

Can confirm that - if you have more complex hasOne/hasMany table structures, Atk4/Data gets very slow in case you add fields to child model from parent, grand-parent and grand-grand-parent tables via hasOne->addFields(...). This requires optimization, otherwise for performance reason it is much faster to load parent/grand-parent tables only where needed for single records, rather than adding those (grand-)parent fields straight into the model. If we render SQL correctly, this should not slow down SQL performance though.

mvorisek · 2022-12-20T07:53:13Z

@mkrecek234 can you please post here a repro code like in #1078 (comment) for your nested usecase?

mkrecek234 · 2022-12-20T08:39:45Z

@mvorisek Here you go - sorry not designed as a test but normal code:

<?php

use Atk4\Data\Persistence;
use \Atk4\Data\Model;
use \Atk4\Data\Model\AggregateModel;

include '../vendor/autoload.php';

$db = new \Atk4\Data\Persistence\Sql('mysql:host=localhost;dbname=atk4;port=3306;user=root;password=root');

/* Create database 'atk4' in localhost MySql
 *

CREATE TABLE `account_group` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(32) DEFAULT NULL,
  `sign` float DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `account_group` (`id`, `name`, `sign`)
VALUES
	(1, 'Acount Group', -1);

CREATE TABLE `account` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(32) DEFAULT NULL,
  `account_group_id` int unsigned DEFAULT NULL,
  `fx_rate` float DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `account` (`id`, `name`, `account_group_id`, `fx_rate`)
VALUES
	(1, 'Test Account', 1, 1.2);

CREATE TABLE `booking` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `account_id` int unsigned DEFAULT NULL,
  `booking_group` varchar(32) DEFAULT NULL,
  `value` float DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `booking` (`id`, `account_id`, `booking_group`, `value`)
VALUES
	(1, 1, 'Group1', 123),
	(2, 1, 'Group1', 456),
	(3, 1, 'Group2', 789);

 */
class AccountGroup extends \Atk4\Data\Model {
    public $table = 'account_group';

    protected function init(): void
    {
        parent::init();
        $this->addFields(['name', 'sign' => ['type' => 'float']]);;
    }
}
class Account extends \Atk4\Data\Model {
	public $table = 'account';
    
    protected function init(): void
    {
		parent::init();
		$this->addFields(['name', 'fx_rate' => ['type' => 'float']]);
        $this->hasOne('account_group_id', ['model' => [AccountGroup::class]])->addField('sign');
	}
}

class Booking extends \Atk4\Data\Model {
	public $table = 'booking';

    protected function init(): void
    {
		parent::init();
		$this->addFields(['booking_group', 'value' => ['type' => 'float']]);
		$this->hasOne('account_id', ['model' => [Account::class]])->addFields(['fx_rate', 'sign']);
		
	}
}

    $accountModel = new Account($db);
    $bookingModel = new Booking($db);

    $bookingAggregate = new AggregateModel($bookingModel);
    $bookingAggregate->setGroupBy(['booking_group'], [
        'total' => ['expr' => 'sum([value])'], ]);

    foreach ($bookingAggregate as $booking) {
        echo 'Result: ' . $booking->get('booking_group') . ' | ' . $booking->get('total');
    }

    echo 'That worked..';

    $bookingAggregate = new AggregateModel($bookingModel);
    $bookingAggregate->setGroupBy(['booking_group'], [
        'total' => ['expr' => 'sum([sign] * [value])'], ]);

    foreach ($bookingAggregate as $booking) {
        echo 'Result: ' . $booking->get('total');
    }
    echo 'That did not work..';

mvorisek · 2022-12-20T11:50:42Z

it seems you somewhat copied the AggregateModel testcase from aggregate issue

subquery like:

      (
        select
          `fx_rate`
        from
          `account` `_a_8a089c2a7e6c`
        where
          `id` = `booking`.`account_id`
      ) `fx_rate`

should be quite fast as long as the selection is done on unique index, any reasonable SQL optimizer should detect it is a 1:1 join query

In chat you have written:

Just sonething I noticed: When you have multi-layered hasOne relations (so parents tables, grand-parent tables and grand-grandparents tables) and if you always addFields of the respective higher levels, so in your child table you have a lot of fields from grand/grand/grand-tables, Atk4/data gets terribly slow. I stopped doing this and even load fields later, instead of adding them theough hasOne relations. In SQL this would not be so slow IMHO, because you do: SELECT child.field, parent.field, grand-parent.field from child join parent on (…) join grand-parent on (…) join grand-grand-parent on (…)
Atk4/data seems to create separate sub-query selects for each added field.

It would be good if you can provide the actual slow query and table metrics

mkrecek234 · 2022-12-20T12:04:41Z

@mvorisek The example above shows a nesting, as we cascade the field down from AccountGroup via Account to Booking. If this was not what you meant, please specify what you need. On the performance I do not have performance metrics, but I can tell that complex models with down-cascaded fields slow down simple load and save model commands to a good extent.

mvorisek · 2022-12-20T12:40:47Z

so no AggregateModel above is needed and (new Booking($db))->executeCountQuery() has the same performance problem?

mvorisek · 2023-11-06T13:47:08Z

⚠ even more conterintuitive: https://sqlite.org/forum/forumpost/8202db3916

mvorisek added bug MAJOR labels Jul 28, 2020

mvorisek mentioned this issue Jul 28, 2020

Move Join to Model\Join #680

Merged

DarkSide666 added the Refactoring label Jul 30, 2020

mvorisek added the help wanted label Sep 19, 2020

mvorisek removed the Refactoring label Oct 11, 2021

mvorisek changed the title ~~Current design - duplicating non-stable field expr. in query - is completely wrong and can produce wrong results~~ Duplicating non-stable field expr. in query - is completely wrong and can produce wrong results Nov 12, 2021

mvorisek linked a pull request Nov 26, 2021 that will close this issue

Never select twice, fix same row field stability #934

Draft

mvorisek mentioned this issue May 29, 2022

Cross-persistence traversal #158

Open

mvorisek mentioned this issue Dec 19, 2022

AggregateModel with referenced field in expression throwing error #1078

Closed

mvorisek mentioned this issue Jan 6, 2023

MSSQL: Cannot perform an aggregate function on an expression containing an aggregate or a subquery. #976

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicating non-stable field expr. in query - is completely wrong and can produce wrong results #682

Duplicating non-stable field expr. in query - is completely wrong and can produce wrong results #682

mvorisek commented Jul 28, 2020 •

edited

DarkSide666 commented Jul 30, 2020 •

edited

mvorisek commented Jul 30, 2020

DarkSide666 commented Jul 30, 2020 •

edited

mvorisek commented Jul 30, 2020 •

edited

DarkSide666 commented Jul 30, 2020 •

edited

mkrecek234 commented Dec 20, 2022

mvorisek commented Dec 20, 2022

mkrecek234 commented Dec 20, 2022

mvorisek commented Dec 20, 2022

mkrecek234 commented Dec 20, 2022

mvorisek commented Dec 20, 2022

mvorisek commented Nov 6, 2023

Duplicating non-stable field expr. in query - is completely wrong and can produce wrong results #682

Duplicating non-stable field expr. in query - is completely wrong and can produce wrong results #682

Comments

mvorisek commented Jul 28, 2020 • edited

DarkSide666 commented Jul 30, 2020 • edited

mvorisek commented Jul 30, 2020

DarkSide666 commented Jul 30, 2020 • edited

mvorisek commented Jul 30, 2020 • edited

DarkSide666 commented Jul 30, 2020 • edited

mkrecek234 commented Dec 20, 2022

mvorisek commented Dec 20, 2022

mkrecek234 commented Dec 20, 2022

mvorisek commented Dec 20, 2022

mkrecek234 commented Dec 20, 2022

mvorisek commented Dec 20, 2022

mvorisek commented Nov 6, 2023

mvorisek commented Jul 28, 2020 •

edited

DarkSide666 commented Jul 30, 2020 •

edited

DarkSide666 commented Jul 30, 2020 •

edited

mvorisek commented Jul 30, 2020 •

edited

DarkSide666 commented Jul 30, 2020 •

edited