Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMBARI-23552. Switch to using Surrogate PK in Ambari DB tables, wherever applicable. #975

Conversation

swapanshridhar
Copy link
Contributor

@swapanshridhar swapanshridhar commented Apr 11, 2018

What changes were proposed in this pull request?

  • Problem Statement:

The {{clusterservices}} table was given a new surrogate, auto-incrementing primary key:

@Entity
@TableGenerator(name = "service_id_generator",
  table = "ambari_sequences", pkColumnName = "sequence_name", valueColumnName = "sequence_value"
  , pkColumnValue = "service_id_seq"
  , initialValue = 1
)

However, the table doesn't use this for its PK. Instead, it combines it with 2 other columns. This would allow a single Service Group to be a part of 2 clusters and still be considered unique (which is incorrect). Compound PKs also present a problem in slower cloud-based databases as they can cause table locks on read which lead to deadlocks in the database:

CREATE TABLE clusterservices (
  id BIGINT NOT NULL,
  service_name VARCHAR(255) NOT NULL,
  service_type VARCHAR(255) NOT NULL,
  cluster_id BIGINT NOT NULL,
  service_group_id BIGINT NOT NULL,
  service_enabled INTEGER NOT NULL,
  CONSTRAINT PK_clusterservices PRIMARY KEY (id, service_group_id, cluster_id),
  CONSTRAINT UQ_service_id UNIQUE (id),
  CONSTRAINT FK_clusterservices_cluster_id FOREIGN KEY (service_group_id, cluster_id) REFERENCES servicegroups (id, cluster_id));

By not using the surrogate PK, we also cause other tables, like {{serviceconfig}} to have to create compound FKs as well:

CONSTRAINT FK_serviceconfig_clstr_svc FOREIGN KEY (service_id, service_group_id, cluster_id) REFERENCES clusterservices (id, service_group_id, cluster_id),

This should just be a single FK to the surrogate ID.

Same for some other other tables, too, like {{servicegroups}}:

CREATE TABLE servicegroups (
  id BIGINT NOT NULL,
  service_group_name VARCHAR(255) NOT NULL,
  cluster_id BIGINT NOT NULL,
  CONSTRAINT PK_servicegroups PRIMARY KEY (id, cluster_id),
  CONSTRAINT FK_servicegroups_cluster_id FOREIGN KEY (cluster_id) REFERENCES clusters (cluster_id));

It uses a surrogate auto-incrementing ID, but it's PK is a compound.

  • Fix:

Update code to used Surrogate for PK, and updated the relevant foreign references to them from other tables.

How was this patch tested?

  • Fixed few UTs
  • Ran UTs corresponding to the changes.
  • Tested on cluster.

@asfgit
Copy link

asfgit commented Apr 12, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/Ambari-Github-PullRequest-Builder/1795/
Test FAILed.
Test FAILured.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to run DDLTests for some sanity checks on the DDL/JPA.

Also, there are some additional unit tests failures compared to the base branch.

CONSTRAINT UQ_service_id UNIQUE (id),
CONSTRAINT FK_clusterservices_cluster_id FOREIGN KEY (service_group_id, cluster_id) REFERENCES servicegroups (id, cluster_id));
CONSTRAINT PK_clusterservices PRIMARY KEY (id),
CONSTRAINT UK_clusterservices_id UNIQUE (id, cluster_id, service_group_id, service_name),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since id is the primary key, this is guaranteed to be unique even with duplicate values for the rest of the columns. Shouldn't it constrain (cluster_id, service_group_id, service_name) instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

CONSTRAINT FK_servicegroups_stack_id FOREIGN KEY (stack_id) REFERENCES stack (stack_id),
CONSTRAINT UQ_TEMP_UNTIL_REAL_PK UNIQUE(id));
CONSTRAINT PK_servicegroups PRIMARY KEY (id),
CONSTRAINT UK_servicegroups_id UNIQUE (id, cluster_id, service_group_name),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unique constraint is not needed unless you want to remove the ID value so that a service group name is unique in a cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want SG name to be unique across cluster, as the name is the only thing distinguishable from API perspective. CC @jayush

CONSTRAINT UQ_service_id UNIQUE (id),
CONSTRAINT FK_clusterservices_cluster_id FOREIGN KEY (service_group_id, cluster_id) REFERENCES servicegroups (id, cluster_id));
CONSTRAINT PK_clusterservices PRIMARY KEY (id),
CONSTRAINT UK_clusterservices_id UNIQUE (id, cluster_id, service_group_id, service_name),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, don't use a PK as part of a unique clause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

return query.getSingleResult();
} catch (NoResultException ignored) {
return null;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this change is a good idea. Moving away from a PK for this table, it means that any lookup must hit the database instead of the L1 cache. Why was this necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change was made as there is a way to simplify what constitutes as primary here. Servceid will be unique/primary here.

With my current change of using service_id as primary key, and querying it as :

return entityManagerProvider.get().find(ServiceDesiredStateEntity.class, serviceId);

That will get the query serviced via cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But your final code here does not do this. Instead, to lookup a ServiceDesiredStateEntity via a NamedQuery. That means anytime you need the state of a service, it's hitting the DB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I was not clear on my earlier comment. Given that we were discussing on whether I should change PK for the servicedesiredstate table or not and given that I had changed it, but with the change the change in PK, I was not using EntityManager.

So, I had posted the change in comment that if we want to go changing the Pk construct (which I had done earlier), and by using the EntityManager (suggested in comments), we can leverage the caching mechanism.

So, I have incorporated the EntityManager change now.

TypedQuery<ServiceGroupEntity> query = entityManagerProvider.get()
.createNamedQuery("serviceGroupByClusterAndServiceGroupIds", ServiceGroupEntity.class);
query.setParameter("clusterId", clusterId);
.createNamedQuery("serviceGroupById", ServiceGroupEntity.class);
query.setParameter("serviceGroupId", serviceGroupId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very bad idea - you're not actually looking up the SG by a PK, so EclipseLink will always make a DB query. You should use the EntityManager to findByPK ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Fetching it like this now.

return entityManagerProvider.get().find(ServiceGroupEntity.class, serviceGroupId);

Similar change done for ServiceDAO as well.

uniqueConstraints = @UniqueConstraint(
name = "UK_clusterservices_id",
columnNames = {"id" , "service_name", "service_group_id", "cluster_id"}))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No PK ID fields in unique clauses

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@Table(
name = "servicedesiredstate",
uniqueConstraints = @UniqueConstraint(name = "UQ_servicedesiredstate",
columnNames = {"service_id"}))
@Entity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing here looks off - 4 spaces instead of 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

uniqueConstraints = @UniqueConstraint(
name = "UK_servicegroups_id",
columnNames = { "id" , "cluster_id", "service_group_name" }))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No PK ID fields in unique constraints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@swapanshridhar swapanshridhar force-pushed the AMBARI-23552-branch-feature-AMBARI-14714 branch from fdacaf8 to 7b5c378 Compare April 12, 2018 21:38
@swapanshridhar
Copy link
Contributor Author

AMBARI-23552. Switch to using Surrogate PK in Ambari DB tables, where… …
…ver applicable.

Updated code based on above review comments.

@asfgit
Copy link

asfgit commented Apr 12, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/Ambari-Github-PullRequest-Builder/1827/
Test FAILed.
Test FAILured.

return query.getSingleResult();
} catch (NoResultException ignored) {
return null;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But your final code here does not do this. Instead, to lookup a ServiceDesiredStateEntity via a NamedQuery. That means anytime you need the state of a service, it's hitting the DB.

"JOIN clusterService.clusterEntity clusterEntity " +
"WHERE clusterService.serviceName=:serviceName " +
"AND serviceGroup.serviceGroupName=:serviceGroupName " +
"AND clusterEntity.clusterName=:clusterName")
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing the JOINS yourself, can you do

SELECT clusterService FROM ClusterServiceEntity clusterService WHERE 
  clusterService.serviceName=:serviceName AND 
  clusterService.serviceGroupEntity.serviceGroupName=:serviceGroupName AND 
  clusterService.clusterEntity.clusterName:=clusterName

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Updated this part.

uniqueConstraints = @UniqueConstraint(
name = "UK_clusterservices_id",
columnNames = {"service_name", "service_group_id", "cluster_id"}))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is cluster_id necessary in this unique clause? Aren't service group IDs unique anyway and only able to be associated with a single cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Removed cluster_id here and in DB table for UQ constraint.

@swapanshridhar swapanshridhar force-pushed the AMBARI-23552-branch-feature-AMBARI-14714 branch from 7b5c378 to 4af4f8d Compare April 13, 2018 23:12
@swapanshridhar
Copy link
Contributor Author

swapanshridhar commented Apr 13, 2018

AMBARI-23552. Switch to using Surrogate PK in Ambari DB tables, where… …
…ver applicable.

Updated code based on above review comments. CC @jonathan-hurley

@asfgit
Copy link

asfgit commented Apr 14, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/Ambari-Github-PullRequest-Builder/1854/
Test FAILed.
Test FAILured.

@Table(
name = "servicedesiredstate",
uniqueConstraints = @UniqueConstraint(name = "UQ_servicedesiredstate",
columnNames = {"service_id"}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this unique constraint actually exist in the SQL files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Doesnt exist.
In fact given that 'service_id' is the Primary key, Unique wouldn't be required. SO removed it from Entity.

@Table(
name = "servicegroups",
uniqueConstraints = @UniqueConstraint(
name = "UK_servicegroups_id",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be UQ_ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

name = "clusterservices",
uniqueConstraints = @UniqueConstraint(
name = "UK_clusterservices_id",
columnNames = {"service_name", "service_group_id"}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be UQ_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@swapanshridhar swapanshridhar force-pushed the AMBARI-23552-branch-feature-AMBARI-14714 branch from 4af4f8d to 4adcb38 Compare April 16, 2018 21:34
@asfgit
Copy link

asfgit commented Apr 16, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/Ambari-Github-PullRequest-Builder/1875/
Test FAILed.
Test FAILured.

@adoroszlai
Copy link
Contributor

@swapanshridhar Please do not force-push to the source branch of pull requests, since it makes it harder to see what previous comments were about and how the comments were addressed. Instead, please push additional commits. They can be squashed during merge if a single commit on the target branch is desired.

@jonathan-hurley
Copy link
Member

Is that why the discussion threads were lost? Yeah, you can just make incremental commits. Once you're approved, then Squash and Merge to preserve the 1:1 relationship betweeen Jira and PR.

@swapanshridhar
Copy link
Contributor Author

@swapanshridhar Please do not force-push to the source branch of pull requests, since it makes it harder to see what previous comments were about and how the comments were addressed. Instead, please push additional commits. They can be squashed during merge if a single commit on the target branch is desired.

Sure Will take care onwards. CC @jonathan-hurley

@swapanshridhar swapanshridhar merged commit d99514d into apache:branch-feature-AMBARI-14714 Apr 17, 2018
@swapanshridhar swapanshridhar deleted the AMBARI-23552-branch-feature-AMBARI-14714 branch April 17, 2018 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants