Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL changes workover #169

Merged
merged 20 commits into from
May 28, 2018
Merged

SQL changes workover #169

merged 20 commits into from
May 28, 2018

Conversation

a-tze
Copy link
Collaborator

@a-tze a-tze commented Apr 28, 2018

See issue #166 .

Functionality testing still outstanding. Some minor, mostly cosmetic differences to current maser exist.

Feel free to add commits to this branch, I'll squash them in before removing WIP status. Accordingly, expect history rewrites in this branch as long as it is in WIP status.

The current last migration file will be replaced in a final commit when removing WIP state. If there are other migrations in the meantime, then a suitable change will be added.

@a-tze a-tze added this to To do in Documentation via automation Apr 28, 2018
@jjeising jjeising removed this from To do in Documentation Apr 30, 2018
@pegro
Copy link
Member

pegro commented May 14, 2018

First of all: thanks for (re-)working (on) this.
Second: I would prefer keeping all these commits separate in the history and don't squash them. I think we aren't really bisectable anyway, so that shouldn't be a concern.

I'll try to comment on the other things inline.

@@ -27,11 +27,20 @@ BEGIN
t.id = param_ticket_id;

SELECT
state >= p.dependent_ticket_trigger_state INTO satisfaction
dependent_ticket_state.sort >= configured_trigger_state.sort INTO satisfaction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with some careful migration patches it would be possible to get the ticket states into a useful native order.
But not sure if it'd be worth the effort.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this breaks the possibility of later changes without deleting all tickets, e.g. not being able to REPLACE the enum when there are references to it.
It is quite some code to use the sort property, but may be better in the long run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inplace replacing is difficult, that's true. But one could introduce the new enum with a temporary name, add temporary columns to all related tables, copy the value from the old column, remove the old one and rename the enum column back to the old name. Should™ work ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.. but please create an issue for that. I like the idea, but dont want to do it in this PR...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, see #173

JOIN
tbl_ticket_state configured_trigger_state ON
t.ticket_type = 'encoding' AND
configured_trigger_state.ticket_state = p.dependent_ticket_trigger_state
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It always takes a while for me to understand that code.
I think what confuses me, is the use of dependent and depending "tickets". One could read "dependent_ticket_state" as "the state of the ticket the given ticket is depending on" or as "the state of the given ticket which is depending on another ticket" (making it a dependent ticket).

I would propose to use "dependee" (https://en.wiktionary.org/wiki/dependee) and "depender" (https://en.wiktionary.org/wiki/depender), or just use "master ticket", whenever you mean the master ticket.
If you absolutely detest that, maybe I'm ok with more comments explaining the relationships.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and you should probably mention in the commit message, that you switched from using the param_ticket_id as the depender-ticket-id to the master (or dependee) ticket-id.
Which is confusing, since ticket_depending_encoding_ticket_state still asks for the depender-ticket-id.

Or did I miss anything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just use "master ticket", whenever you mean the master ticket.

I'm probably the reason we're not currently doing this. As we set depends_on for encoding profiles, we have no notion of a master or an informal subticket anywhere else in the UI or code. Therefor i suggested something like dependent. I like dependee and maybe dependent? The last is a little more common. This would also free us from the language issues around master/slave.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependee and dependent is fine with me.
Regarding the "master" notion, in #171 it's about to be called that, if I saw that correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in #171 it's about to be called that, if I saw that correctly.

I think this is only a comment and a UI label – I already requested changing this some time ago.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantical change of the parameter is likely an error, I will investigate that.

LEFT JOIN
tbl_ticket_state masterstate ON
masterstate.ticket_type = 'encoding' AND
masterstate.ticket_state = COALESCE(ticket_depending_encoding_ticket_state(t.id),pj.dependent_ticket_trigger_state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I see the use of "master" here, I'd be in favor to use it above as well. See comment above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the last commit in this PR, the view is reworked again and the selection of the master ticket is much clearer. So I guess the same should be done in ticket_depending_encoding_ticket_state_satisfied, if the change of parameter semantics is reverted.

@@ -36,7 +36,7 @@ CREATE OR REPLACE VIEW view_serviceable_tickets AS

WHERE
pj.read_only = false AND
t.ticket_type != 'meta' AND
t.ticket_type IN ('recording','encoding','ingest') AND
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious: why is that faster?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because for some reasons Postgres doesnt get it that this is actually the same. The newer line will result in using the ANY operation which will in turn make use of a hash index, while using != will result in a "simple filter" on a full index scan or even table scan.

@@ -616,7 +616,7 @@ public function assignNextUnassignedForState($ticketType = '', $ticketState = ''
->scoped([
'virtual_property_filter' => [$propertyFilters]
])
->orderBy('ticket_priority(id) DESC');
->orderBy('calculated_priority DESC');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason why we order here again, when the view already returns ordered results?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably just legacy and/or a case of "better safe than sorry".

@@ -136,6 +136,9 @@ WITHOUT OIDS;
CREATE INDEX tbl_ticket_project_id_idx ON tbl_ticket USING btree(project_id);
CREATE INDEX tbl_ticket_fahrplan_id_idx ON tbl_ticket USING btree(fahrplan_id);
CREATE INDEX tbl_ticket_handle_id_idx ON tbl_ticket USING btree(handle_id);
CREATE INDEX tbl_ticket_parent_id_idx ON tbl_ticket USING hash(parent_id);
CREATE INDEX tbl_ticket_project_id_idx ON tbl_ticket USING hash(project_id);
CREATE INDEX tbl_ticket_view_servicable_idx ON tbl_ticket USING btree(failed, service_executable, ticket_type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those indexes are missing in the migration file.

Copy link
Member

@pegro pegro May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh I guess the whole migration patch file is not updated.
Edit: Oh you mentioned that in the PR comment. Sorry!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and tbl_ticket_project_id_idx already exists. See line 136.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch with the duplicate index, will change the name.

@a-tze
Copy link
Collaborator Author

a-tze commented May 15, 2018

@pegro The "squashing in" did not mean I want to squash the commits/ the PR once it is approved. It's the solely purpose of this PR to introduce multiple commits with more explanation.
What I would like to squash into other commits are the corrections that are done now, e.g. renaming the duplicate index or stuff like that, just to have a super clean history.
Does that make sense?

@pegro
Copy link
Member

pegro commented May 15, 2018

Yes, makes sense. ;)

a-tze added 12 commits May 15, 2018 23:54
… comparison

Belongs to feature #157. Enums do not have correct native order, instead a dedicated sort property exists, so this must be used to check if the configured state is reached by the dependent ticket.
Using a function for ordering seems to be not optimized by Postgres and therefore very expensive. Additionally, it seems with older Postgres versions this makes the use of LIMIT impossible when querying the view.
Eliminate another function call in a JOIN clause by incorporating the logic. Limit performance impact by using LATERAL JOIN.
@a-tze
Copy link
Collaborator Author

a-tze commented May 15, 2018

Ok.. rebased to current master, added migration and did some testing. The branch now includes a somewhat consistent renaming to dependee and depender.

@pegro @jjeising please do a final review. Note: Github seems to get the order of commits wrong.

@a-tze a-tze changed the title WIP: SQL changes workover SQL changes workover May 15, 2018
Copy link
Member

@pegro pegro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@@ -137,7 +137,6 @@ CREATE INDEX tbl_ticket_project_id_idx ON tbl_ticket USING btree(project_id);
CREATE INDEX tbl_ticket_fahrplan_id_idx ON tbl_ticket USING btree(fahrplan_id);
CREATE INDEX tbl_ticket_handle_id_idx ON tbl_ticket USING btree(handle_id);
CREATE INDEX tbl_ticket_parent_id_idx ON tbl_ticket USING hash(parent_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing indexes were btree indexes, the added ones are hash indexes. According to the PostgreSQL documentation they are of limited use: Hash indexes can only handle simple equality comparisons [...]. Maybe change them to btree indexes?!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional. You usually do not query "less than" or "greater than" for things like IDs. Unfortunately in my tests those indexes were not used according to the execution plan when they were of type btree. Makes some sense in my head for JOINS. I can test that again if you think btree should be used by the query optimizer in every case a hash index is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so this was a conscious choice. Just wanted to make sure.

wantedstate.ticket_type = t2.ticket_type AND
wantedstate.ticket_state = p.dependent_ticket_trigger_state
tbl_ticket_state configured_trigger_state ON
depender_ticket.ticket_type = 'encoding' AND
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that the JOIN conditions in line 24 and 28 filter for encoding tickets, but since they don't refer to the table joined, there might be multiple ticket states with the same ticket_state value?!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But ticket_state is an enum and therefore unique, isnt it? The JOIN is meant to deliver the state attributes for encoding tickets and NULLs for other ticket types. So is your concern that these JOINS produce more tupels? I cannot think of a case where this might happen.

If your concern is that dependee_ticket_state and configured_trigger_state might contain the same tupel of the state table, then yes this is possible of course.

Copy link
Member

@pegro pegro May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ticket_state is an enum, but tbl_ticket_state is a table (which you are joining here) and it contains multiple rows with e.g. ticket_state = 'removing'. I know, it's not really likely that the trigger state would be one of those, but technically it could happen.

The JOIN is meant to deliver the state attributes for encoding tickets

But the ticket_type of tbl_ticket_state is not part of any condition... That's what I'm referring to.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I know what you mean. I had in memory that the states are unique, but they aren't. So this clash could happen for example with the state "gone". You are absolutely right, there needs to be an additional or changed condition. For easy review, I will put a commit for this on top of the branch (and another one for truncating old migration file).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, although I'd have used dependee_ticket_state.ticket_type = depender_ticket.ticket_type just to have the hard-coded value only once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And since you don't do any left joins and such, I'd suggest to move the depender_ticket.ticket_type = 'encoding' condition to the where clause, to only have it once. The function should still return NULL, if the ticket is not a encoding ticket.

@@ -3,17 +3,17 @@ BEGIN;
SET ROLE TO postgres;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this migration file could just be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No thats a bigger change, because it contains ADD COLUMN dependent_ticket_trigger_state, while the newer migration file contains a RENAME of that column.

I wanted to leave it in there in this PR because some installations are in that state currently. But it could be shrunk to the ADD COLUMN part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd be in favor of shrinking.

@a-tze
Copy link
Collaborator Author

a-tze commented May 16, 2018

@pegro please have a look at the last 3 commits from the latest batch. Another round of testing is still outstanding.

wantedstate.ticket_type = t2.ticket_type AND
wantedstate.ticket_state = p.dependent_ticket_trigger_state
tbl_ticket_state configured_trigger_state ON
depender_ticket.ticket_type = 'encoding' AND
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, although I'd have used dependee_ticket_state.ticket_type = depender_ticket.ticket_type just to have the hard-coded value only once.

wantedstate.ticket_type = t2.ticket_type AND
wantedstate.ticket_state = p.dependent_ticket_trigger_state
tbl_ticket_state configured_trigger_state ON
depender_ticket.ticket_type = 'encoding' AND
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And since you don't do any left joins and such, I'd suggest to move the depender_ticket.ticket_type = 'encoding' condition to the where clause, to only have it once. The function should still return NULL, if the ticket is not a encoding ticket.

LEFT JOIN
tbl_ticket_state wantedstate ON wantedstate.ticket_type = 'encoding' AND wantedstate.ticket_state = pj.dependent_ticket_trigger_state
tbl_ticket_state configured_trigger_state ON
configured_trigger_state.ticket_type = 'encoding' AND
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess configured_trigger_state.ticket_type = t.ticket_type would do the same, but it always be encoding as long we don't allow other ticket types as encoding to have a encoding_profile_version_id set.
I'm ok with this. It's just I like to avoid as much hard-coding as possible without performance impact.

@a-tze
Copy link
Collaborator Author

a-tze commented May 25, 2018

@pegro good idea, I made this change for the function. Ommitting the change for the view right now.

Also @jjeising: If no objections exist, I will merge this in the next 3 days so that the other PRs can move forward.

@pegro
Copy link
Member

pegro commented May 25, 2018

Fine with me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants