New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMBARI-23552. Switch to using Surrogate PK in Ambari DB tables, wherever applicable. #975
AMBARI-23552. Switch to using Surrogate PK in Ambari DB tables, wherever applicable. #975
Conversation
Refer to this link for build results (access rights to CI server needed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure to run DDLTests
for some sanity checks on the DDL/JPA.
Also, there are some additional unit tests failures compared to the base branch.
CONSTRAINT UQ_service_id UNIQUE (id), | ||
CONSTRAINT FK_clusterservices_cluster_id FOREIGN KEY (service_group_id, cluster_id) REFERENCES servicegroups (id, cluster_id)); | ||
CONSTRAINT PK_clusterservices PRIMARY KEY (id), | ||
CONSTRAINT UK_clusterservices_id UNIQUE (id, cluster_id, service_group_id, service_name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since id
is the primary key, this is guaranteed to be unique even with duplicate values for the rest of the columns. Shouldn't it constrain (cluster_id, service_group_id, service_name)
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
CONSTRAINT FK_servicegroups_stack_id FOREIGN KEY (stack_id) REFERENCES stack (stack_id), | ||
CONSTRAINT UQ_TEMP_UNTIL_REAL_PK UNIQUE(id)); | ||
CONSTRAINT PK_servicegroups PRIMARY KEY (id), | ||
CONSTRAINT UK_servicegroups_id UNIQUE (id, cluster_id, service_group_name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unique constraint is not needed unless you want to remove the ID value so that a service group name is unique in a cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want SG name to be unique across cluster, as the name is the only thing distinguishable from API perspective. CC @jayush
CONSTRAINT UQ_service_id UNIQUE (id), | ||
CONSTRAINT FK_clusterservices_cluster_id FOREIGN KEY (service_group_id, cluster_id) REFERENCES servicegroups (id, cluster_id)); | ||
CONSTRAINT PK_clusterservices PRIMARY KEY (id), | ||
CONSTRAINT UK_clusterservices_id UNIQUE (id, cluster_id, service_group_id, service_name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, don't use a PK as part of a unique clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
return query.getSingleResult(); | ||
} catch (NoResultException ignored) { | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that this change is a good idea. Moving away from a PK for this table, it means that any lookup must hit the database instead of the L1 cache. Why was this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change was made as there is a way to simplify what constitutes as primary here. Servceid will be unique/primary here.
With my current change of using service_id as primary key, and querying it as :
return entityManagerProvider.get().find(ServiceDesiredStateEntity.class, serviceId);
That will get the query serviced via cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But your final code here does not do this. Instead, to lookup a ServiceDesiredStateEntity via a NamedQuery. That means anytime you need the state of a service, it's hitting the DB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I was not clear on my earlier comment. Given that we were discussing on whether I should change PK for the servicedesiredstate table or not and given that I had changed it, but with the change the change in PK, I was not using EntityManager.
So, I had posted the change in comment that if we want to go changing the Pk construct (which I had done earlier), and by using the EntityManager (suggested in comments), we can leverage the caching mechanism.
So, I have incorporated the EntityManager change now.
TypedQuery<ServiceGroupEntity> query = entityManagerProvider.get() | ||
.createNamedQuery("serviceGroupByClusterAndServiceGroupIds", ServiceGroupEntity.class); | ||
query.setParameter("clusterId", clusterId); | ||
.createNamedQuery("serviceGroupById", ServiceGroupEntity.class); | ||
query.setParameter("serviceGroupId", serviceGroupId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very bad idea - you're not actually looking up the SG by a PK, so EclipseLink will always make a DB query. You should use the EntityManager to findByPK ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Fetching it like this now.
return entityManagerProvider.get().find(ServiceGroupEntity.class, serviceGroupId);
Similar change done for ServiceDAO as well.
uniqueConstraints = @UniqueConstraint( | ||
name = "UK_clusterservices_id", | ||
columnNames = {"id" , "service_name", "service_group_id", "cluster_id"})) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No PK ID fields in unique clauses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@Table( | ||
name = "servicedesiredstate", | ||
uniqueConstraints = @UniqueConstraint(name = "UQ_servicedesiredstate", | ||
columnNames = {"service_id"})) | ||
@Entity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spacing here looks off - 4 spaces instead of 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
uniqueConstraints = @UniqueConstraint( | ||
name = "UK_servicegroups_id", | ||
columnNames = { "id" , "cluster_id", "service_group_name" })) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No PK ID fields in unique constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
fdacaf8
to
7b5c378
Compare
Updated code based on above review comments. |
Refer to this link for build results (access rights to CI server needed): |
return query.getSingleResult(); | ||
} catch (NoResultException ignored) { | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But your final code here does not do this. Instead, to lookup a ServiceDesiredStateEntity via a NamedQuery. That means anytime you need the state of a service, it's hitting the DB.
"JOIN clusterService.clusterEntity clusterEntity " + | ||
"WHERE clusterService.serviceName=:serviceName " + | ||
"AND serviceGroup.serviceGroupName=:serviceGroupName " + | ||
"AND clusterEntity.clusterName=:clusterName") | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing the JOINS
yourself, can you do
SELECT clusterService FROM ClusterServiceEntity clusterService WHERE
clusterService.serviceName=:serviceName AND
clusterService.serviceGroupEntity.serviceGroupName=:serviceGroupName AND
clusterService.clusterEntity.clusterName:=clusterName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Updated this part.
uniqueConstraints = @UniqueConstraint( | ||
name = "UK_clusterservices_id", | ||
columnNames = {"service_name", "service_group_id", "cluster_id"})) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is cluster_id necessary in this unique clause? Aren't service group IDs unique anyway and only able to be associated with a single cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Removed cluster_id here and in DB table for UQ constraint.
7b5c378
to
4af4f8d
Compare
Updated code based on above review comments. CC @jonathan-hurley |
Refer to this link for build results (access rights to CI server needed): |
@Table( | ||
name = "servicedesiredstate", | ||
uniqueConstraints = @UniqueConstraint(name = "UQ_servicedesiredstate", | ||
columnNames = {"service_id"})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this unique constraint actually exist in the SQL files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Doesnt exist.
In fact given that 'service_id' is the Primary key, Unique wouldn't be required. SO removed it from Entity.
@Table( | ||
name = "servicegroups", | ||
uniqueConstraints = @UniqueConstraint( | ||
name = "UK_servicegroups_id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be UQ_ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
name = "clusterservices", | ||
uniqueConstraints = @UniqueConstraint( | ||
name = "UK_clusterservices_id", | ||
columnNames = {"service_name", "service_group_id"})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be UQ_?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
4af4f8d
to
4adcb38
Compare
Refer to this link for build results (access rights to CI server needed): |
@swapanshridhar Please do not force-push to the source branch of pull requests, since it makes it harder to see what previous comments were about and how the comments were addressed. Instead, please push additional commits. They can be squashed during merge if a single commit on the target branch is desired. |
Is that why the discussion threads were lost? Yeah, you can just make incremental commits. Once you're approved, then Squash and Merge to preserve the 1:1 relationship betweeen Jira and PR. |
Sure Will take care onwards. CC @jonathan-hurley |
What changes were proposed in this pull request?
Problem Statement:
The {{clusterservices}} table was given a new surrogate, auto-incrementing primary key:
However, the table doesn't use this for its PK. Instead, it combines it with 2 other columns. This would allow a single Service Group to be a part of 2 clusters and still be considered unique (which is incorrect). Compound PKs also present a problem in slower cloud-based databases as they can cause table locks on read which lead to deadlocks in the database:
By not using the surrogate PK, we also cause other tables, like {{serviceconfig}} to have to create compound FKs as well:
CONSTRAINT FK_serviceconfig_clstr_svc FOREIGN KEY (service_id, service_group_id, cluster_id) REFERENCES clusterservices (id, service_group_id, cluster_id),
This should just be a single FK to the surrogate ID.
Same for some other other tables, too, like {{servicegroups}}:
It uses a surrogate auto-incrementing ID, but it's PK is a compound.
Fix:
Update code to used Surrogate for PK, and updated the relevant foreign references to them from other tables.
How was this patch tested?