Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple-create: Order determination for insertions #2056

Merged
merged 195 commits into from
Apr 22, 2024

Conversation

ayush3797
Copy link
Contributor

@ayush3797 ayush3797 commented Feb 22, 2024

Why make this change?

Using multiple-create, we want to perform insertions in two related entities. The insertion is first performed on the referenced entity followed by insertion in the referencing entity, and we want to determine exactly this - which entities for the mutation act as the referencing/referenced entity, respectively. This PR's changes adds this very logic to determine the referencing/referenced entities.

What is this change?

  • Added a new static class MultipleCreateOrderHelper which will expose public method GetReferencingEntityName() to get the referencing entity name for a related pair of source -> target entities for which insertion is to be performed in both the source and target entities.
  • The method GetReferencingEntityName() tries to determine the referencing entity name based on the metadata collected at startup using a call to the method DetermineReferencingEntityBasedOnEntityRelationshipMetadata(). This is achievable for 1:N relationships and for relationships backed by an FK constraint.
  • If the DetermineReferencingEntityBasedOnEntityRelationshipMetadata() method fails to determine the referencing entity name, a subsequent call is made to the DetermineReferencingEntityBasedOnRequestBody() method which tries to determine the referencing entity name using the input data for the source and target entities.
  • An appropriate exception is thrown if the DetermineReferencingEntityBasedOnRequestBody() method fails to determine the referencing entity name.

How was this tested?

  • Unit Tests- Added for Ms/Pg/MySql.

NOTE:

  1. Don't review SqlMetadatProvider.cs changes as those are from a different PR : Removing wrong FK definition  #2051 but were needed for correct functioning of the code changes done in this PR.
  2. [From Sean]: To use debugger with this PR's code, enter by stepping through unit tests. Engine does not support these requests end 2 end yet. at SqlInsertQueryStructure.cs::73

severussundar and others added 7 commits March 26, 2024 14:35
…ue (#2116)

## Why make this change?

- Closes #1951
- PR #1983,
#2103 add CLI options to
enable or disable multiple mutation/multiple create operation through
CLI. With the changes introduced in the mentioned PRs, the configuration
properties successfully gets written to the config file. Also, during
deserialization, the properties are read and the
`MultipleMutationOptions`, `MultipleCreateOptions`,
`GraphQLRuntimeOptions` objects are created accordingly.
- The above-mentioned PRs do not introduce any change in DAB engine
behavior depending on the configuration property values.
- This PR introduces changes to read these fields and enable/disable
multiple create operation depending on whether the feature is
enabled/disabled through the config file. This is achieved by
introducing behavior changes in the schema generation.

## What is this change?
- This PR builds on top of a) Schema generation PR
#1902 b) Rename
nested-create to multiple create PR
#2103
- When multiple create operation is disabled, 
> i) Fields belonging to the related entities are not created in the
input object type are not created.
> ii) Many type multiple create mutation nodes (ex: `createbooks`,
`createpeople_multiple` ) are not created.
> iii) ReferencingField directive is not applied on relationship fields,
so they continue to remain required fields for the create mutation
operation.
> iv) Entities for linking objects are not created as they are relevant
only in the context of multiple create operations.

## How was this tested?

- [x] Unit Tests and Integration Tests 
- [x] Manual Tests 

**Note:** At the moment, multiple create operation is disabled in the
config file generated for integration tests. This is because of the plan
to merge in the Schema generation, AuthZ/N branches separately to the
main branch. With just these 2 PRs, a multiple create operation will
fail, hence, the disabling multiple create operation. At the moment,
tests that perform validations specific to multiple create feature
enable it by i) updating the runtime object (or) ii) creating a custom
config in which the operation is enabled.

## Sample Request(s)

### When Multiple Create operation is enabled - MsSQL

#### Related entity fields are created in the input object type

![image](https://github.com/Azure/data-api-builder/assets/11196553/7a3a8bbe-2742-43e0-98d7-9412ed05db33)

#### Multiple type create operation is created in addition to point
create operation

![image](https://github.com/Azure/data-api-builder/assets/11196553/c6513d9a-5b49-44cc-8fcc-1ed1f44f5f58)


#### Querying related entities continue to work successfully

![image](https://github.com/Azure/data-api-builder/assets/11196553/4c1a61b8-0cbb-4a1e-afaa-1849d710be27)



### When Multiple Create operation is disabled - MsSQL

#### Only fields belonging to the given entity are created in the input
object type

![image](https://github.com/Azure/data-api-builder/assets/11196553/a3b6beb2-7245-4345-ba13-29d8905d859e)

#### Multiple type create operation is not created


### When Multiple Create operation is enabled - Other relational
database types

#### Only fields belonging to the given entity are created in the input
object type

![image](https://github.com/Azure/data-api-builder/assets/11196553/b2a4f7f6-b121-410d-806d-8c5772253080)

#### Multiple type create operation is not created

---------

Co-authored-by: Ayush Agarwal <ayusha083@gmail.com>
Base automatically changed from dev/NestedMutations to main March 29, 2024 02:56
seantleonard added a commit that referenced this pull request Apr 1, 2024
## Why make this change?
If there is a relationship defined for a (source,target) in the database
with right cardinality as `ONE`, we add the relationship in both the
directions (source->target and target->source) to the source's
`RelationshipMetadata` corresponding to the target entity's entry in
`RelationshipMetadata.TargetEntityToFkDefinitionMap` dictionary.

We do this because the relationship can be 1:1 or N:1. In case of N:1,
the first FK (from source->target) would be present. In case of 1:1,
either or both of them can be present. But all this is known to us only
at a later stage when we collect metadata from the database and actually
infer which of the two relationships (or both) is/are valid.

After validating the correct relationship, we don't remove the other
relationship which we added earlier. The redundant relationship jadds a
redundant `JOIN` condition (I think this is something which is visible
in the sql query generated by DAB highlighted in this issue:
#1940).

## What is this change?
The functioning of `SqlMetadatProvider.FillInferredFkInfo()` is changed
such that for all the FK definitions present for a target entity in the
source entity's definition, the FK definitions for a target entity in
the source entity's definition contain:
1. One entry for relationships defined between source to target in the
database (via FK constraint) in the right order of
referencing/referenced entity.
2. Two entries for relationships defined in the config between source to
target (which don't exist in the database).

The above is achieved via a call to a newly added method
`SqlMetadataProvider.GetValidatedFKs()` which loops over all the foreign
key definitions defined for the target entity in the source entity's
definition and adds to the set of validated FK definitions:
1. All the FK definitions which actually map to a foreign key constraint
defined in the database. In such a case, if the source/target fields are
also provided in the config, they are given precedence over the FK
constraint.
2. FK definitions for custom relationships defined by the user in the
configuration file where no FK constrain t exists between the pair of
(source, target) entities.

## How was this tested?
- Added tests to `SqlMetadataProviderUnitTests` for Pg/Ms/MySql for the
cases when an FK constraint exists between the source/target entities.


## Manual Testing
1. Did manual testing to ensure the redundant join condition is not
added in the query.
For a nested query like:

<img width="422" alt="image"
src="https://github.com/Azure/data-api-builder/assets/34566234/e49c98ab-5c6a-49d9-8cfd-97382dd1bf94">

where the configuration for Book entity looks like:

<img width="299" alt="image"
src="https://github.com/Azure/data-api-builder/assets/34566234/ab83e96e-8d13-4d71-b6d1-b89d20b13612">

the query generated before the fix (in main):

```tsql
SELECT TOP 100 
[table0].[title] AS [title], 
JSON_QUERY ([table1_subq].[data]) AS [publishers] 
FROM [dbo].[books] AS [table0] 
OUTER APPLY (
    SELECT TOP 1
    [table1].[name] AS [name] 
    FROM [dbo].[publishers] AS [table1] 
    WHERE [table0].[publisher_id] = [table1].[id] 
    AND [table1].[id] = [table0].[publisher_id] 
    ORDER BY [table1].[id] ASC 
    FOR JSON PATH, INCLUDE_NULL_VALUES,WITHOUT_ARRAY_WRAPPER
) AS [table1_subq]([data]) 
WHERE 1 = 1 
ORDER BY [table0].[id] ASC 
FOR JSON PATH, INCLUDE_NULL_VALUES
```

after the fix:

```tsql
SELECT TOP 100 
[table0].[title] AS [title],
JSON_QUERY ([table1_subq].[data]) AS [publishers] 
FROM [dbo].[books] AS [table0] 
OUTER APPLY (
    SELECT TOP 1 
    [table1].[name] AS [name] 
    FROM [dbo].[publishers] AS [table1] 
    WHERE [table0].[publisher_id] = [table1].[id] 
    ORDER BY [table1].[id] ASC 
    FOR JSON PATH, INCLUDE_NULL_VALUES,WITHOUT_ARRAY_WRAPPER
) AS [table1_subq]([data]) 
WHERE 1 = 1 
ORDER BY [table0].[id] ASC FOR JSON PATH, INCLUDE_NULL_VALUES
```
<Observe the removal of redundant join condition in the 4th line in the
second query>

Similar observations for MySql:
```mysql
SELECT COALESCE(JSON_ARRAYAGG(JSON_OBJECT(@param3, `subq6`.`title`, @param4, `subq6`.`publishers`)), JSON_ARRAY()) AS `data`
FROM ( SELECT `table0`.`title` AS `title`, `table1_subq`.`data` AS `publishers` FROM `books` AS `table0` LEFT OUTER JOIN LATERAL (SELECT JSON_OBJECT(@Param2, `subq5`.`name`) AS `data` 
FROM ( SELECT `table1`.`name` AS `name` FROM `publishers` AS `table1` 
WHERE `table0`.`publisher_id` = `table1`.`id` 
ORDER BY `table1`.`id` ASC LIMIT 1 ) AS `subq5`) AS `table1_subq` ON TRUE WHERE 1 = 1 ORDER BY `table0`.`id` ASC LIMIT 100 ) AS `subq6`
```

For pgSql:
```psql
SELECT COALESCE(jsonb_agg(to_jsonb("subq6")), '[]') AS "data" FROM ( SELECT "table0"."title" AS "title", "table1_subq"."data" AS "publishers" FROM "public"."books" AS "table0" 
LEFT OUTER JOIN LATERAL (SELECT to_jsonb("subq5") AS "data" FROM ( SELECT "table1"."name" AS "name" 
FROM "public"."publishers" AS "table1" 
WHERE "table0"."publisher_id" = "table1"."id" 
ORDER BY "table1"."id" ASC LIMIT 1 ) AS "subq5") AS "table1_subq" 
ON TRUE 
WHERE 1 = 1 ORDER BY "table0"."id" ASC LIMIT 100 ) AS "subq6"
```

2. Did manual testing to confirm that the relationship fields provided
by user take precedence:
(Creating issue here to add a test
later:#2104)
Config:
<img width="242" alt="image"
src="https://github.com/Azure/data-api-builder/assets/34566234/3bf60082-309c-4fc2-bd6b-d307061d61ba">

GraphQL query:
<img width="282" alt="image"
src="https://github.com/Azure/data-api-builder/assets/34566234/6f9bf50d-f679-4c6c-87a3-377fc86c423b">

Generated SQL query:
<img width="954" alt="image"
src="https://github.com/Azure/data-api-builder/assets/34566234/9b082354-1217-4ecf-b05b-263d5e2e5a13">

3. Manually tested that for relationships which are not backed by an FK
constraint, we infer both the source/target entities as the referencing
entity. Test to be added post the merge of :
#2056

---------

Co-authored-by: Shyam Sundar J <shyamsundarj@microsoft.com>
Co-authored-by: Sean Leonard <sean.leonard@microsoft.com>
src/Core/Resolvers/MultipleCreateOrderHelper.cs Outdated Show resolved Hide resolved
src/Core/Resolvers/MultipleCreateOrderHelper.cs Outdated Show resolved Hide resolved
src/Core/Resolvers/MultipleCreateOrderHelper.cs Outdated Show resolved Hide resolved
src/Core/Resolvers/MultipleCreateOrderHelper.cs Outdated Show resolved Hide resolved
@severussundar
Copy link
Contributor

/azp run

@Aniruddh25 Aniruddh25 modified the milestones: 0.12rc, 0.12rc2 Apr 4, 2024
Copy link
Contributor

@seantleonard seantleonard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, please incorporate suggestions where you deem appropriate. Clarifying comments and such.

@ayush3797
Copy link
Contributor Author

/azp run

@ayush3797
Copy link
Contributor Author

/azp run

@ayush3797 ayush3797 merged commit 91ed92d into main Apr 22, 2024
7 checks passed
@ayush3797 ayush3797 deleted the dev/agarwalayush/nestedInsertionOrderHelper branch April 22, 2024 21:42
ayush3797 added a commit that referenced this pull request Apr 25, 2024
## Why make this change?
With support for multiple-create, we introduce additional flexibility in
the GraphQL schema where we make the referencing fields in an entity
optional because their value can be derived from insertions in the
referenced entity. So to say, _the values of referencing fields in a
referencing entity can have 3 sources of truth_. The value for a
referencing field can be derived via:
1. User input data
2. Insertion in the parent referenced entity (if current entity is a
referencing entity for a relationship with its parent entity)
3. Insertion in the child referenced entity (if the current entity is a
referencing entity for a relationship with its child entity)

**_They can all exist at the same time, or none of them can. Presence of
more than one sources of truth leads to possible conflicting values for
the referencing column. Thus, at a time, we want exactly one of them to
be present - so that we exactly one source of truth_**. This needs us to
do request validation during request execution, so that only valid GQL
mutations reach the query generation/query execution stage. This PR adds
the logic for the same.


## What is this change?

- Added class `MultipleMutationInputValidator.cs` to perform the
validations for a multiple-create mutation.
- Added class `MultipleMutationEntityInputValidationContext` to store
relevant information while performing validations on a particular
entity.
- A call to the method
`MultipleMutationInputValidator.ValidateGraphQLValueNode()` will be made
_just after Authorization_ - request validation only occurs when the
user is authorized to execute the mutation.
- The method `MultipleMutationInputValidator.ValidateGraphQLValueNode()`
makes recursive call to itself to parse out the entire create mutation
request body based on an entity or relationship basis. Meaning, this
method will be called on the top-level entity, followed by all the
relationships for that entity, followed by all the relationships for the
entity's relationships and so on....
- The actual validation for the input data for an entity (top-level
entity or a nested entity) which includes column fields and relationship
fields is done by the method
`GraphQLRequestValidator.ValidateObjectFieldNodes()` which is called
from the method `ValidateGraphQLValueNode()`.
- The method `ValidateObjectFieldNodes()` contains logic to ensure that
when the current entity acts as the referencing entity for the
relationships with its target entity, then the current entity should not
specify a value for the referencing fields - as the values for the
referencing fields will be derived from the insertion in the referenced
target entity. Note: The determination of referencing entity is done via
a call to `MultipleCreateOrderHelper.GetReferencingEntityName()` method
added via: #2056.
- The method `ValidateObjectFieldNodes()` makes calls to
1. `ValidateAbsenceOfReferencingColumnsInTargetEntity()`: To validate
that the current entity does not specify value for referencing columns
to its parent entity (when parent entity is the referenced entity).
2. `ProcessRelationshipField()`: To process a relationship field and
does several things. Please refer to method summary.
3. `ValidateAbsenceOfRepeatedReferencingColumn()`: This method is called
for all the relationships with the current entity which are included in
the request body. This ensures that referencing columns in the
referencing entity (whichever entity, either source or target turns out
as the referencing entity) does not hold references to multiple columns
in the referenced entity - to avoid multiple sources of truth for the
referencing column's value. (Issue:
#2019)
4. `ValidatePresenceOfRequiredColumnsForInsertion()`: To ensure that we
have one source of truth for values for referencing fields.
5. `ValidateRelationshipFields()`: Recursively perform the same
validations for all the relationships with the current entity included
in the mutation request body.
`
## How was this tested?
- [x] Integration Tests - Added tests to
`MultipleMutationIntegrationTests.cs` to validate:
1. Throwing exception for absence of source of truth for referencing
column for create one mutations
2. Throwing exception for absence of source of truth for referencing
column for create multiple mutations.
3. Throwing exception for presence of multiple sources of truth for
referencing column for create one mutations.
4. Throwing exception for presence of multiple sources of truth for
referencing column for create one mutations.

## Sample Request(s)
1. Request:

![image](https://github.com/Azure/data-api-builder/assets/34566234/b6ef03bf-016d-4a45-9adf-dbe422683f73)
    Response:
    

![image](https://github.com/Azure/data-api-builder/assets/34566234/cbec3e22-3a9e-4bfd-abc8-0fb4fd45cfdd)

2. Request:

![image](https://github.com/Azure/data-api-builder/assets/34566234/6c58f6b9-54b8-42b3-8945-39272023c2bf)

Response:

![image](https://github.com/Azure/data-api-builder/assets/34566234/b0d5f689-d029-4ba9-9aeb-40fca7eb1b85)

3. Request and response:

![image](https://github.com/Azure/data-api-builder/assets/34566234/c39ad294-412c-45d9-9fdd-5efb69908836)

---------

Co-authored-by: Shyam Sundar J <shyamsundarj@microsoft.com>
Co-authored-by: Sean Leonard <sean.leonard@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Multiple mutations Fixes/enhancements related to nested mutations.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Determine order of insertion for relationships: Database Vs Config
4 participants