AMBARI-23552. Switch to using Surrogate PK in Ambari DB tables, wherever applicable. #975

swapanshridhar · 2018-04-11T22:21:13Z

What changes were proposed in this pull request?

Problem Statement:

The {{clusterservices}} table was given a new surrogate, auto-incrementing primary key:

@Entity
@TableGenerator(name = "service_id_generator",
  table = "ambari_sequences", pkColumnName = "sequence_name", valueColumnName = "sequence_value"
  , pkColumnValue = "service_id_seq"
  , initialValue = 1
)

However, the table doesn't use this for its PK. Instead, it combines it with 2 other columns. This would allow a single Service Group to be a part of 2 clusters and still be considered unique (which is incorrect). Compound PKs also present a problem in slower cloud-based databases as they can cause table locks on read which lead to deadlocks in the database:

CREATE TABLE clusterservices (
  id BIGINT NOT NULL,
  service_name VARCHAR(255) NOT NULL,
  service_type VARCHAR(255) NOT NULL,
  cluster_id BIGINT NOT NULL,
  service_group_id BIGINT NOT NULL,
  service_enabled INTEGER NOT NULL,
  CONSTRAINT PK_clusterservices PRIMARY KEY (id, service_group_id, cluster_id),
  CONSTRAINT UQ_service_id UNIQUE (id),
  CONSTRAINT FK_clusterservices_cluster_id FOREIGN KEY (service_group_id, cluster_id) REFERENCES servicegroups (id, cluster_id));

By not using the surrogate PK, we also cause other tables, like {{serviceconfig}} to have to create compound FKs as well:

CONSTRAINT FK_serviceconfig_clstr_svc FOREIGN KEY (service_id, service_group_id, cluster_id) REFERENCES clusterservices (id, service_group_id, cluster_id),

This should just be a single FK to the surrogate ID.

Same for some other other tables, too, like {{servicegroups}}:

CREATE TABLE servicegroups (
  id BIGINT NOT NULL,
  service_group_name VARCHAR(255) NOT NULL,
  cluster_id BIGINT NOT NULL,
  CONSTRAINT PK_servicegroups PRIMARY KEY (id, cluster_id),
  CONSTRAINT FK_servicegroups_cluster_id FOREIGN KEY (cluster_id) REFERENCES clusters (cluster_id));

It uses a surrogate auto-incrementing ID, but it's PK is a compound.

Fix:

Update code to used Surrogate for PK, and updated the relevant foreign references to them from other tables.

How was this patch tested?

Fixed few UTs
Ran UTs corresponding to the changes.
Tested on cluster.

asfgit · 2018-04-12T00:56:18Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/Ambari-Github-PullRequest-Builder/1795/
Test FAILed.
Test FAILured.

adoroszlai

Please make sure to run DDLTests for some sanity checks on the DDL/JPA.

Also, there are some additional unit tests failures compared to the base branch.

adoroszlai · 2018-04-12T10:22:18Z

ambari-server/src/main/resources/Ambari-DDL-Postgres-CREATE.sql

-  CONSTRAINT UQ_service_id UNIQUE (id),
-  CONSTRAINT FK_clusterservices_cluster_id FOREIGN KEY (service_group_id, cluster_id) REFERENCES servicegroups (id, cluster_id));
+  CONSTRAINT PK_clusterservices PRIMARY KEY (id),
+  CONSTRAINT UK_clusterservices_id UNIQUE (id, cluster_id, service_group_id, service_name),


Since id is the primary key, this is guaranteed to be unique even with duplicate values for the rest of the columns. Shouldn't it constrain (cluster_id, service_group_id, service_name) instead?

jonathan-hurley · 2018-04-12T12:22:11Z

ambari-server/src/main/resources/Ambari-DDL-Derby-CREATE.sql

-  CONSTRAINT FK_servicegroups_stack_id FOREIGN KEY (stack_id) REFERENCES stack (stack_id),
-  CONSTRAINT UQ_TEMP_UNTIL_REAL_PK UNIQUE(id));
+  CONSTRAINT PK_servicegroups PRIMARY KEY (id),
+  CONSTRAINT UK_servicegroups_id UNIQUE (id, cluster_id, service_group_name),


This unique constraint is not needed unless you want to remove the ID value so that a service group name is unique in a cluster.

We want SG name to be unique across cluster, as the name is the only thing distinguishable from API perspective. CC @jayush

jonathan-hurley · 2018-04-12T12:23:26Z

ambari-server/src/main/resources/Ambari-DDL-Derby-CREATE.sql

-  CONSTRAINT UQ_service_id UNIQUE (id),
-  CONSTRAINT FK_clusterservices_cluster_id FOREIGN KEY (service_group_id, cluster_id) REFERENCES servicegroups (id, cluster_id));
+  CONSTRAINT PK_clusterservices PRIMARY KEY (id),
+  CONSTRAINT UK_clusterservices_id UNIQUE (id, cluster_id, service_group_id, service_name),


Same as above, don't use a PK as part of a unique clause.

jonathan-hurley · 2018-04-12T12:25:41Z

ambari-server/src/main/java/org/apache/ambari/server/orm/dao/ServiceDesiredStateDAO.java

+      return query.getSingleResult();
+    } catch (NoResultException ignored) {
+      return null;
+    }


I'm not sure that this change is a good idea. Moving away from a PK for this table, it means that any lookup must hit the database instead of the L1 cache. Why was this necessary?

Change was made as there is a way to simplify what constitutes as primary here. Servceid will be unique/primary here.

With my current change of using service_id as primary key, and querying it as :

return entityManagerProvider.get().find(ServiceDesiredStateEntity.class, serviceId);

That will get the query serviced via cache.

But your final code here does not do this. Instead, to lookup a ServiceDesiredStateEntity via a NamedQuery. That means anytime you need the state of a service, it's hitting the DB.

Sure, I was not clear on my earlier comment. Given that we were discussing on whether I should change PK for the servicedesiredstate table or not and given that I had changed it, but with the change the change in PK, I was not using EntityManager.

So, I had posted the change in comment that if we want to go changing the Pk construct (which I had done earlier), and by using the EntityManager (suggested in comments), we can leverage the caching mechanism.

So, I have incorporated the EntityManager change now.

jonathan-hurley · 2018-04-12T12:26:45Z

ambari-server/src/main/java/org/apache/ambari/server/orm/dao/ServiceGroupDAO.java

    TypedQuery<ServiceGroupEntity> query = entityManagerProvider.get()
-      .createNamedQuery("serviceGroupByClusterAndServiceGroupIds", ServiceGroupEntity.class);
-    query.setParameter("clusterId", clusterId);
+            .createNamedQuery("serviceGroupById", ServiceGroupEntity.class);
    query.setParameter("serviceGroupId", serviceGroupId);


this is a very bad idea - you're not actually looking up the SG by a PK, so EclipseLink will always make a DB query. You should use the EntityManager to findByPK ...

Got it. Fetching it like this now.

return entityManagerProvider.get().find(ServiceGroupEntity.class, serviceGroupId);

Similar change done for ServiceDAO as well.

jonathan-hurley · 2018-04-12T12:27:19Z

ambari-server/src/main/java/org/apache/ambari/server/orm/entities/ClusterServiceEntity.java

+    uniqueConstraints = @UniqueConstraint(
+        name = "UK_clusterservices_id",
+        columnNames = {"id" , "service_name", "service_group_id", "cluster_id"}))
+


No PK ID fields in unique clauses

jonathan-hurley · 2018-04-12T12:27:55Z

...ri-server/src/main/java/org/apache/ambari/server/orm/entities/ServiceDesiredStateEntity.java

+@Table(
+        name = "servicedesiredstate",
+        uniqueConstraints = @UniqueConstraint(name = "UQ_servicedesiredstate",
+                                              columnNames = {"service_id"}))
 @Entity


Spacing here looks off - 4 spaces instead of 2?

jonathan-hurley · 2018-04-12T12:28:20Z

ambari-server/src/main/java/org/apache/ambari/server/orm/entities/ServiceGroupEntity.java

+    uniqueConstraints = @UniqueConstraint(
+                name = "UK_servicegroups_id",
+                columnNames = { "id" , "cluster_id", "service_group_name" }))
+


No PK ID fields in unique constraints.

swapanshridhar · 2018-04-12T21:40:16Z

AMBARI-23552. Switch to using Surrogate PK in Ambari DB tables, where… …
…ver applicable.

Updated code based on above review comments.

asfgit · 2018-04-12T23:42:39Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/Ambari-Github-PullRequest-Builder/1827/
Test FAILed.
Test FAILured.

jonathan-hurley · 2018-04-13T14:00:04Z

ambari-server/src/main/java/org/apache/ambari/server/orm/dao/ServiceDesiredStateDAO.java

+      return query.getSingleResult();
+    } catch (NoResultException ignored) {
+      return null;
+    }


But your final code here does not do this. Instead, to lookup a ServiceDesiredStateEntity via a NamedQuery. That means anytime you need the state of a service, it's hitting the DB.

jonathan-hurley · 2018-04-13T14:08:26Z

ambari-server/src/main/java/org/apache/ambari/server/orm/entities/ClusterServiceEntity.java

+    "JOIN clusterService.clusterEntity clusterEntity " +
+    "WHERE clusterService.serviceName=:serviceName " +
+    "AND  serviceGroup.serviceGroupName=:serviceGroupName " +
+    "AND clusterEntity.clusterName=:clusterName")
 })


Instead of doing the JOINS yourself, can you do

SELECT clusterService FROM ClusterServiceEntity clusterService WHERE clusterService.serviceName=:serviceName AND clusterService.serviceGroupEntity.serviceGroupName=:serviceGroupName AND clusterService.clusterEntity.clusterName:=clusterName

Yep. Updated this part.

jonathan-hurley · 2018-04-13T14:09:29Z

ambari-server/src/main/java/org/apache/ambari/server/orm/entities/ClusterServiceEntity.java

+  uniqueConstraints = @UniqueConstraint(
+  name = "UK_clusterservices_id",
+  columnNames = {"service_name", "service_group_id", "cluster_id"}))
+


is cluster_id necessary in this unique clause? Aren't service group IDs unique anyway and only able to be associated with a single cluster?

Correct. Removed cluster_id here and in DB table for UQ constraint.

swapanshridhar · 2018-04-13T23:12:56Z

AMBARI-23552. Switch to using Surrogate PK in Ambari DB tables, where… …
…ver applicable.

Updated code based on above review comments. CC @jonathan-hurley

asfgit · 2018-04-14T00:38:46Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/Ambari-Github-PullRequest-Builder/1854/
Test FAILed.
Test FAILured.

jonathan-hurley · 2018-04-16T16:18:22Z

...ri-server/src/main/java/org/apache/ambari/server/orm/entities/ServiceDesiredStateEntity.java

+@Table(
+  name = "servicedesiredstate",
+  uniqueConstraints = @UniqueConstraint(name = "UQ_servicedesiredstate",
+                                        columnNames = {"service_id"}))


Does this unique constraint actually exist in the SQL files?

Nope. Doesnt exist.
In fact given that 'service_id' is the Primary key, Unique wouldn't be required. SO removed it from Entity.

jonathan-hurley · 2018-04-16T16:19:22Z

ambari-server/src/main/java/org/apache/ambari/server/orm/entities/ServiceGroupEntity.java

+@Table(
+    name = "servicegroups",
+    uniqueConstraints = @UniqueConstraint(
+                name = "UK_servicegroups_id",


Shouldn't this be UQ_ ?

jonathan-hurley · 2018-04-16T16:19:36Z

ambari-server/src/main/java/org/apache/ambari/server/orm/entities/ClusterServiceEntity.java

+  name = "clusterservices",
+  uniqueConstraints = @UniqueConstraint(
+  name = "UK_clusterservices_id",
+  columnNames = {"service_name", "service_group_id"}))


Shouldn't this be UQ_?

…ver applicable.

asfgit · 2018-04-16T23:07:58Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/Ambari-Github-PullRequest-Builder/1875/
Test FAILed.
Test FAILured.

adoroszlai · 2018-04-17T07:05:01Z

@swapanshridhar Please do not force-push to the source branch of pull requests, since it makes it harder to see what previous comments were about and how the comments were addressed. Instead, please push additional commits. They can be squashed during merge if a single commit on the target branch is desired.

jonathan-hurley · 2018-04-17T11:22:55Z

Is that why the discussion threads were lost? Yeah, you can just make incremental commits. Once you're approved, then Squash and Merge to preserve the 1:1 relationship betweeen Jira and PR.

swapanshridhar · 2018-04-17T17:43:54Z

@swapanshridhar Please do not force-push to the source branch of pull requests, since it makes it harder to see what previous comments were about and how the comments were addressed. Instead, please push additional commits. They can be squashed during merge if a single commit on the target branch is desired.

Sure Will take care onwards. CC @jonathan-hurley

swapanshridhar self-assigned this Apr 11, 2018

swapanshridhar requested review from jayush, mradha25, d0zen1 and vbrodetskyi April 11, 2018 22:22

jayush requested review from ncole and jonathan-hurley April 12, 2018 01:15

adoroszlai added JPA SQL labels Apr 12, 2018

adoroszlai suggested changes Apr 12, 2018

View reviewed changes

jonathan-hurley suggested changes Apr 12, 2018

View reviewed changes

swapanshridhar force-pushed the AMBARI-23552-branch-feature-AMBARI-14714 branch from fdacaf8 to 7b5c378 Compare April 12, 2018 21:38

jonathan-hurley suggested changes Apr 13, 2018

View reviewed changes

swapanshridhar force-pushed the AMBARI-23552-branch-feature-AMBARI-14714 branch from 7b5c378 to 4af4f8d Compare April 13, 2018 23:12

jonathan-hurley suggested changes Apr 16, 2018

View reviewed changes

AMBARI-23552. Switch to using Surrogate PK in Ambari DB tables, where…

4adcb38

…ver applicable.

swapanshridhar force-pushed the AMBARI-23552-branch-feature-AMBARI-14714 branch from 4af4f8d to 4adcb38 Compare April 16, 2018 21:34

jonathan-hurley approved these changes Apr 17, 2018

View reviewed changes

adoroszlai approved these changes Apr 17, 2018

View reviewed changes

swapanshridhar merged commit d99514d into apache:branch-feature-AMBARI-14714 Apr 17, 2018

swapanshridhar deleted the AMBARI-23552-branch-feature-AMBARI-14714 branch April 17, 2018 17:44

AMBARI-23552. Switch to using Surrogate PK in Ambari DB tables, wherever applicable. #975

AMBARI-23552. Switch to using Surrogate PK in Ambari DB tables, wherever applicable. #975

Conversation

swapanshridhar commented Apr 11, 2018 • edited

What changes were proposed in this pull request?

Problem Statement:

Fix:

How was this patch tested?

asfgit commented Apr 12, 2018

adoroszlai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swapanshridhar commented Apr 12, 2018

asfgit commented Apr 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swapanshridhar Apr 13, 2018 • edited

Choose a reason for hiding this comment

swapanshridhar commented Apr 13, 2018 • edited

asfgit commented Apr 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asfgit commented Apr 16, 2018

adoroszlai commented Apr 17, 2018

jonathan-hurley commented Apr 17, 2018

swapanshridhar commented Apr 17, 2018

swapanshridhar commented Apr 11, 2018 •

edited

swapanshridhar Apr 13, 2018 •

edited

swapanshridhar commented Apr 13, 2018 •

edited