Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Connection: Upsert #2939

Closed
wants to merge 1 commit into from
Closed

[WIP] Connection: Upsert #2939

wants to merge 1 commit into from

Conversation

jnvsor
Copy link
Contributor

@jnvsor jnvsor commented Dec 15, 2017

This proposes a method that would allow consistent cross-platform upserts, with the downside that you have to explicitly provide the fields to check for existence of the row (It doesn't automatically handle unique constraint violations)

$db->upsert('tablename', ['fielda' => 'valuea'], ['fieldb' => 'valueb']);
// INSERT INTO tablename (fielda, fieldb) VALUES(?, ?) ['valuea', 'valueb']
$db->upsert('tablename', ['fielda' => 'valuea2'], ['fieldb' => 'valueb']);
// UPDATE tablename SET fielda = ? WHERE fieldb = ? ['valuea2', 'valueb']
$db->upsert('tablename', ['fielda' => 'valuea3', 'fieldb' => 'valueb2'], ['fieldb' => 'valueb']);
// UPDATE tablename SET  fielda = ?, fieldb = ? WHERE fieldb = ? ['valuea3', 'valueb2', 'valueb']
  • Do we want to allow upserts that affect multiple rows?
    • If not, should we make a new exception class to handle this edge case?
  • Any other suggestions?

@morozov
Copy link
Member

morozov commented Dec 15, 2017

It's not really an UPSERT, it's a combination of SELECT, INSERT and UPDATE. A proper implementation could be built on ON DUPLICATE KEY UPDATE for MySQL or MERGE for other platforms which support it.

@jnvsor
Copy link
Contributor Author

jnvsor commented Dec 15, 2017

@morozov Is it really that easy? I was under the impression there were some insurmountable issues with that, given that it's been high on the wishlist for years now

@morozov
Copy link
Member

morozov commented Dec 15, 2017

@jnvsor you can try and see if it is. There indeed are some serious issues. You can try approaching it and seeing where you end up. The last time I tried it, I ended up with INSERT ON DUPLICATE KEY UPDATE and MERGE requiring a different number of parameters bound to the query. So the eventual implementation will spread across the platform and connection layers.

public function upsert($tableExpression, array $data, array $identifier, array $types = array())
{
$qb = $this->createQueryBuilder()
->select($this->getDatabasePlatform()->getCountExpression('*'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting COUNT(*) with arbitrary WHERE may be expensive depending on the table size and filtering conditions. We don't need the count, we only need if there's a single row matching the criteria. Maybe, WHERE EXISTS (SELECT *) would be better than SELECT COUNT(*).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need the count if you wanted to prevent upsert accidentally updating multiple rows here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a slippery slope. If the UPSERT query is not specific enough so that it can match more than one record, then it's invalid for this use case, no matter if zero, one or more matching records exist.

Could you give an example of a table and/or query where INSERT would succeed but UPDATE would affect multiple rows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's also a good point

@jnvsor
Copy link
Contributor Author

jnvsor commented Dec 15, 2017

@morozov Is there a code-style rule that says AbstractDriver can't return multiple values? If it could return both the SQL and the values to bind to it that would solve the problem, but it would be a first so I'm guessing that's not gonna fly.

@morozov
Copy link
Member

morozov commented Dec 15, 2017

both the SQL and the values to bind to it

Isn't it what QueryBuilder is for?

@jnvsor
Copy link
Contributor Author

jnvsor commented Dec 15, 2017

That... Is actually a really good point. So we'd pass a querybuilder and an array of fields into the driver and the driver would return a new querybuilder with the query completed? (Or should we just alter the qb passed in in the first place?)

@morozov
Copy link
Member

morozov commented Dec 16, 2017

@jnvsor it doesn't matter that much at this point. You can make whatever properly working and tested PoC first and then we can polish the details.

@jnvsor
Copy link
Contributor Author

jnvsor commented Dec 16, 2017

@morozov Actually querybuilder won't work well either. The reason the drivers all return strings is because they tend to be non-standard. We'd have to shove the entire ON DUPLICATE KEY UPDATE clause into the orderBy on the QB for mysql for example. I mean, we can but... Seems really ugly.

@jnvsor
Copy link
Contributor Author

jnvsor commented Jan 9, 2018

Would require a more flexible querybuilder first. I've added an issue for that: #2972

@jnvsor
Copy link
Contributor Author

jnvsor commented Jan 9, 2018

Since @Ocramius has said we can't make the querybuilder more flexible, I don't see any way upsert can be implemented in DBAL. Closing.

@jnvsor jnvsor closed this Jan 9, 2018
@Ocramius
Copy link
Member

Ocramius commented Jan 9, 2018

@jnvsor a public one cannot be implemented, but an upsert emulation API can be provided.

@jnvsor
Copy link
Contributor Author

jnvsor commented Jan 9, 2018

@Ocramius Is there any way to get the fields from the constraint in question from a UniqueConstraintViolationException? What if the upsert is violating multiple unique constraints?

@Ocramius
Copy link
Member

Ocramius commented Jan 9, 2018 via email

@jnvsor
Copy link
Contributor Author

jnvsor commented Jan 9, 2018

@Ocramius We'd need to fetch all of the unique constraints one by one before insert/updating. Worst case scenario we'd have multiple structure lookups followed by multiple fetches before even deciding whether to insert or update

@Ocramius
Copy link
Member

Ocramius commented Jan 9, 2018 via email

@jnvsor
Copy link
Contributor Author

jnvsor commented Jan 9, 2018

Given that that could be better performed when structure information is cached, perhaps we should bounce this back to orm and leave it out of DBAL?

@Ocramius
Copy link
Member

Ocramius commented Jan 9, 2018 via email

@jnvsor
Copy link
Contributor Author

jnvsor commented Jan 9, 2018

Sorry, I presumed the orm layer cached structure information and probably unique constraints too

@Ocramius
Copy link
Member

The ORM does that, but DBAL should still provide an API for performing the operation. The problem is that it cannot be done via query builder, but only as atomic transactional operation to deal with emulation.

@jnvsor
Copy link
Contributor Author

jnvsor commented Jun 7, 2019

@morozov I don't feel the kind of complex emulation discussed above is worth implementing, since it's very complex and very slow. My original idea was simpler, but rejected since it's not a true upsert.

I don't intend to work on this, and I'm fine with leaving this as a wontfix personally. Perhaps someone will redesign the querybuilder/drivers to allow upserts some day.

Maybe this should be moved to a more general 'Upsert' issue?

@morozov
Copy link
Member

morozov commented Jun 7, 2019

Perhaps someone will redesign the querybuilder/drivers to allow upserts some day.

I don't believe it's going to happen any time soon because of the complexity of the issue and absence of real demand. In my experience, upserts are mostly used when a DB table is used as a key-value storage which is usually way slower than other alternatives.

@gjdanis
Copy link
Contributor

gjdanis commented Mar 17, 2020

@morozov this issue is linked to a similar issue about supporting custom entity persisters.

If we can't fix this issue, can we at least look at making the entity persistence strategy configurable? It seems like we should be able to inject this somewhere in the UnitOfWork or perhaps provide it as metadata on the entity: https://github.com/doctrine/orm/blob/99b7d1e9d4bfa6efb86f0a77b1e5522ec8e9bf9a/lib/Doctrine/ORM/UnitOfWork.php#L2562

The specific use case I'm asking about is for MS SQL Server where one of my tables has a trigger that inserts into another another table. For this entity I want to use OUTPUT in my INSERT statements to get the identity rather than @@IDENTITY or SCOPED_IDENTITY(), neither of which seem to work correctly.

@morozov
Copy link
Member

morozov commented Mar 19, 2020

If we can't fix this issue, can we at least look at making the entity persistence strategy configurable?

This would be a question to ask in doctrine/orm.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants