Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for conditionally inserting a single row. #2528

Closed
wants to merge 1 commit into from
Closed

Add support for conditionally inserting a single row. #2528

wants to merge 1 commit into from

Conversation

mkmoisen
Copy link

@mkmoisen mkmoisen commented Feb 18, 2022

Hello,

This PR enables conditional insertion of a single row, with default values automatically provided, in a single trip to the database.

Would you be willing to support this syntactic sugar?

In this example, we are conditionally inserting a Tweet, so long as the User has not already tweeted in the last hour, via a single trip to the DB:

from datetime import datetime, timedelta
from peewee import SQL, fn

user = User.create(username='Matthew')

Tweet.insert(
    content='my tweet',
    user=user,
).where(
    ~fn.EXISTS(
        Tweet.select(SQL('1')).where(
            Tweet.user == user,
            Tweet.timestamp > datetime.now() - timedelta(hours=1)
        ) 
    )
)

This would translate into the following SQL (Note the replacement of VALUES with SELECT)

INSERT INTO tweet (
    timestamp,  -- note that timestamp default value is automatically provided
    content,
    user_id
) SELECT 
    %s,
    %s,
    %s
WHERE NOT EXISTS (
    SELECT 1
    FROM tweet
    WHERE user_id = %s
        AND timestamp > %s
)

[datetime.datetime(2022, 2, 17, 23, 49, 57, 299578), 'my tweet', 1, 1, datetime.datetime(2022, 2, 17, 22, 49, 57, 299578)]

This PR is a POC with minimal code changes to get this working. You could either use this PR or redo it yourself according to your preference. I could add some documentation or tests to beef up this PR later if you are on board with the idea.

I've tested both cases where it successfully inserts and returns the PK or when it does not insert and returns None instead.


You might wonder what the point of this is since you could just do it like the following:

if not Tweet.select(Tweet.id).where(Tweet.user == user, Tweet.timestamp > datetime.now() - timedelta(hours=1)).exists():
    Tweet.insert(...)  # or Tweet.create()

The main problem with this is that it requires two remote calls to the database.

The secondary issue is that it has a race condition when another thread modifies the User row after the exists() check and before the Tweet.insert. (Actually, my solution also has a race condition - I think - but it is a much smaller window, by an order of magnitude, compared to using two remote calls).


The only other way to accomplish this right now that I've figured out is a bit ugly and doesn't compute default values.

Tweet.insert_from(
    peewee.Select(
        columns=['my_tweet', user, datetime.now()]  # Necessary to manually add default values like datetime.now()
    ).where(
        ~fn.EXISTS(
            Tweet.select().where(
                Tweet.user == user,
                Tweet.timestamp > datetime.now() - timedelta(hours=1)
            ) 
        )
    ),
    [Tweet.content, Tweet.user, Tweet.timestamp]
)

# get the PK back
results = list(_.execute())
if results:
    print(results[0]))

@coleifer
Copy link
Owner

I'm going to pass for now, thanks.

@coleifer coleifer closed this Feb 18, 2022
@mkmoisen
Copy link
Author

Hi @coleifer would you have another suggestion for being able to accomplish the insert myself?

I came up with this but do not like it. Is there a better way?

Tweet.insert_from(
    peewee.Select(
        columns=['my_tweet', user, datetime.now()]  # Necessary to manually add default values like datetime.now()
    ).where(
        ~fn.EXISTS(
            Tweet.select().where(
                Tweet.user == user,
                Tweet.timestamp > datetime.now() - timedelta(hours=1)
            ) 
        )
    ),
    [Tweet.content, Tweet.user, Tweet.timestamp]
)

@coleifer
Copy link
Owner

I'd suggest just wrapping it up in a helper method on a BaseModel class you keep in your library:

class BaseModel(Model):
    @classmethod
    def insert_where(cls, data, *where):
        fields, values = zip(*data.items())
        sq = Select(columns=values).where(~fn.EXISTS(cls.select().where(*where)))
        return cls.insert_from(sq, fields)

ins = Tweet.insert_where({
    Tweet.content: 'my_tweet',
    Tweet.user: user,
    Tweet.timestamp: datetime.now()},
    (Tweet.user == user) & (Tweet.timestamp > datetime.now() - timedelta(hours=1)))

@mkmoisen
Copy link
Author

Sweet, thank you.

coleifer added a commit that referenced this pull request Feb 19, 2022
@coleifer
Copy link
Owner

I've gone ahead and thrown this into the playhouse.shortcuts module, which would result in:

from playhouse import shortcuts

class BaseModel(Model):
    @classmethod
    def insert_where(cls, data, *where):
        return shortcuts.insert_where(cls, data, *where)

@mkmoisen
Copy link
Author

Hi @coleifer

Thank you. I have a few suggestions. Please let me know what you think.

  1. Automatically provide default values like .insert
  2. Change the API to resemble .insert(**insert).
  3. Change the API to to return the PK like .insert().
  4. Allow the user to specify the entire WHERE clause (to do EXISTS or NOT EXISTS on same or different table)

Here is the function that I've been using this weekend for my application:

    @classmethod
    def insert_where(cls, where=None, **insert):
        for field, default in cls._meta.defaults.items():
            if field.name not in insert:
                val = default
                if callable_(val):
                    val = val()
                insert[field.name] = val

        fields, values = zip(*insert.items())

        cursor = cls.insert_from(
            Select(columns=values).where(
                where
            ),
            fields
        ).execute()

        return cls._meta.database.last_insert_id(cursor, Insert.SIMPLE)

This let's me use it in the following way:

# Only allow insert to Tweet if the User is a paid subscriber and not a free user:
tweet_id = Tweet.insert_where(
    content="my tweet", user=user, 
    where=fn.EXISTS(User.select().where(User.id == user, User.subscriber_type == 'PAID'))
)

# Same thing but using a NOT EXISTS
tweet_id = Tweet.insert_where(
    content="my tweet", user=user,
    where=~fn.EXISTS(User.select().where(User.id == user, User.subscriber_type == 'FREE'))
)

# Show how a empty `where` still works as expected:
where = None
if user_type != 'admin':
    where = ~fn.EXISTS(Tweet.select().where(Tweet.timestamp > now() - timedelta(hours=1)))

tweet_id = Tweet.insert_where(
    content="my tweet", user=user,
    where=where
)

I like this API because it closely resembles Tweet.insert, insofar as you can pass the column=value pairs via **insert, and it also returns the primary key just like Tweet.insert().execute().

This also automatically provides default values like Tweet.insert.

The where=None is nice because sometimes you might want to conditionally build up the where clause, but have nothing to build. For example if the user is an admin, you can still call insert_where with where=None and it will just do a normal insert.

@coleifer
Copy link
Owner

Adding the defaults and providing more flexibility definitely seems fine to me. Check out 23ae5d2 -- I'm still returning the query rather than executing it, since that is the common pattern in Peewee. But most of the other changes you described are present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants