-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(git): consolidate repository webhook #2667
feat(git): consolidate repository webhook #2667
Conversation
…ook created by bytebase in that repository
…t one to use this webhook
…en we receive a code host webhook push event
Also, with this logic, can we also declare a unique key on web_url? |
Co-authored-by: tianzhou <t@bytebase.com>
Co-authored-by: tianzhou <t@bytebase.com>
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
You are right. On the hand, we can try using the EXCLUDE constraint: https://www.postgresql.org/docs/current/ddl-constraints.html
This essentially means if any of the two rows have the same Or If any of the two rows have different |
I will try to add this constraint in another pull request. I'm more worried about the token than the |
I try to add it. But it seems that we need Although |
We are creating a separate database to store our metadata, the installed extension scope is the schema under the database. As long as we have the right permission, it should be fine. |
Yes, on the other hand we are not sure about external PostgreSQL, let's say the user is using RDS, but maybe this extension is not available on RDS. |
btree_gist is a basic extension that is included in pretty much every pg distribution. It's more viewed as a built-in feature implemented as an extension. AWS RDS https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html It's a fair ask to require btree_gist GitLab https://docs.gitlab.com/ee/install/postgresql_extensions.html The reason why I would push this is shit happens, and the database constraint is our last defense. As our schema becomes more complex, it's unavoidable to rely on btree_gist or we just run more risks of having data issues. |
Especially for this one, without db constraints, we can create multiple webhooks if the timing is right. So our 1 webhook per VCS project is not an invariant. |
I’m more worried about the privileges that our users give to us as CREATE EXTENSION typically requires superuser privilege. But I think that works on GCP and AWS. We may have to try out other platform as update documentations if needed. https://www.postgresql.org/docs/current/sql-createextension.html |
https://www.postgresql.org/docs/current/btree-gist.html "This module is considered “trusted”, that is, it can be installed by non-superusers who have CREATE privilege on the current database." |
I thought of another way to solve the inconsistency.
The sequence of operation:
In this case, we will create two identical webhooks like this: This will degenerate into our previous logic.
In this case, the token of the newly inserted project is expired, the user will find this project unavailable until the next token refresh. I think the fundamental reason is that after we reuse webhooks, the records in the repository will have redundancy in many fields. We can move some columns to a new table(e.g.: external_repository), and then this part of the schema will look like this For the problem1, insert the second record that will encounter the error because the For the problem2, since we only need to update one field in external_repo, this problem should disappear. |
Of course, it looks like using a GiST index can solve both problems as well. We just need to weigh whether to introduce an extension. |
The external repository is my preference as that’s a perfect design. We are less vulnerable because the # of repositories and projects isn’t quite large and updates are not frequent (except for tokens). |
Yes, it's easy to have trouble representing N:1 mappings with redundancy within a table. |
What do you think? @tianzhou |
@h3n4l well, my intention is to fix the data consistency issue at a low cost. And I think introducing gist extension is a low cost. But looks like we don't feel the same way on this. And TBH, I think the new proposed solution to introduce a new table is overkill to just solve that data inconsistency. Introducing a new table bears a high cost, and for this one, it requires data migration. I don't feel it justifies the effort. In summary, my preference:
To me, introducing a new table is probably a No to me at this point. And if you do think "Use gist extension" is problematic, then I am OK with "Do nothing". |
Do nothing is not suitable. Let's use GiST index to solve this. We can consider introduce a new table again if users encounter error about using GiST index. We should do our best effort. @tianzhou @d-bytebase |
What will happen if we do nothing and how could it happen? |
As #2667 (comment) said, the main problems we will encounter are:
|
The updates are rare so it’s okay to do nothing. Even with EXCLUDE rule, we’re preventing bugs to damage data in the struct but we are still losing the updating data. |
EXCLUDE rule can prevent those problems. We need something like |
We used to create a webhook for every VCS project, which brought some problems:
We try the following solutions:
Some implementation details: