Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Known Issue] Detect JSON columns in Azure SQL #444

Open
Tracked by #177
yorek opened this issue May 23, 2022 · 12 comments
Open
Tracked by #177

[Known Issue] Detect JSON columns in Azure SQL #444

yorek opened this issue May 23, 2022 · 12 comments
Labels
improvement Let's make this better known-issue Known issues linked from https://learn.microsoft.com
Milestone

Comments

@yorek
Copy link
Contributor

yorek commented May 23, 2022

We should be able to automatically detect if a column contains JSON data and avoid escaping it in the output. Right now if I have the following table:

create table s003b.todos
(
	id int not null constraint pk__s003b_todos primary key default (next value for s003b.globalId),
	category_id varchar(3) collate Latin1_General_BIN2 not null constraint fk__s003b_todos__s003b_categories references s003b.categories(id),
	title nvarchar(100) not null,
	[description] nvarchar(1000) null,
	tags nvarchar(max) null check ( isjson(tags)= 1 ),
	completed bit not null default(0)
)

and I store JSON data in the "tags" column, I get the following output:

{ data": {
		"todos": {
			"items": [
				{
					"id": 10000,
					"title": "item-001",
					"completed": false,
					"tags": "[{\"tag\":\"red\"}]",
					"category": {
						"id": "f",
						"category": "Family"
					}
				}

where tags contains encoded JSON...even if the content is valid JSON itself. Right now Azure SQL DB doesn't have a native JSON data type, but we can check if a column contains JSON data by checking the check constraint that should have been created to allow only JSON data to be inserted (see table definition above). This query can return which columns should be treated as JSON:

select 
	s.[name] as [schema_name],
	t.[name] as [table_name],
	c.[name] as [columne_name],
	ck.[definition],
	case when (ck.[definition] like '%isjson(/[' + trim(c.[name]) + '/])=(1)%' escape '/') then 1 else 0 end as [isjson]
from 
	sys.check_constraints ck
inner join
	sys.tables t on ck.[parent_object_id] = t.[object_id]
inner join
	sys.columns c on t.[object_id] = c.[object_id] and ck.parent_column_id = c.[column_id]
inner join
	sys.schemas s on t.[schema_id] = s.[schema_id]
@yorek yorek added this to the M1.5 milestone May 23, 2022
@yorek yorek added the improvement Let's make this better label May 23, 2022
@jarupatj
Copy link
Contributor

@gledis69 can you please take a look at this issue?

@gledis69
Copy link
Contributor

@yorek what would be the desired output for "tags" in this scenario?
The " need to be escaped otherwise they will terminate the string that represents the json.
Another question that comes to mind is what the use case scenario is here. Most libraries that work with string should be able to handle escaped quotes.

@yorek
Copy link
Contributor Author

yorek commented Aug 4, 2022

@gledis69 given that in the tags columns there can only be valid JSON content, the expected output would be:

{
  "data": {
    "todos": {
      "items": [
        {
          "id": 10000,
          "title": "item-001",
          "completed": false,
          "tags": [
            {
              "tag": "red"
            }
          ],
          "category": {
            "id": "f",
            "category": "Family"
          }
        }
      }
    }
  }
}

basically: just pass the JSON as is, without encoding it.

@Aniruddh25
Copy link
Contributor

Not need to have support for Filter, Order but for Mutation for JSON type

@Aniruddh25
Copy link
Contributor

From @gledis69 : This turned out to be more involved that I imagined, The core problem is that our DbDataAdapter approach that we use to detect column types just doesn't detect json-s, so we'd have to rewrite column type detection to querying the db and analyzing the response similarly to what we do to detect fk-s.

@Aniruddh25 Aniruddh25 modified the milestones: Nov2022, Jan2023 Dec 3, 2022
@Aniruddh25
Copy link
Contributor

Moving to Jan2023 with regards to the new finding

@Aniruddh25
Copy link
Contributor

From @gledis69: json columns need to be handled either by:

  1. changing the way we determine query types by directly querying the db vs using the C# interface we are using rn since that interface doesn't detect the json. This is a bunch of work since the queries for each db need to be written and then the types need to be mapped since the types have different names across the db-s.
  2. "hack it" and only write db queries to get the json columns from the tables and override the results we get from the data interface for the json columns.  

2 is probably more feasible. Part of me doesn't like that since it is very particular and not a general scalable solution, but then realistically we already cover most of the commonly used types so it is not very likely we would need to do this for other types in the future hopefully

@yorek
Copy link
Contributor Author

yorek commented Jan 12, 2023

I think that 1 is something that will be needed anyway at some point. AFAIK we're using the ADO.NET native support for discovering metadata, but that will not work, fo example, on Synapse. Maybe we can start with 2, but let's plan for 1

@Aniruddh25 Aniruddh25 modified the milestones: Jan2023, Mar2023 Jan 13, 2023
@Aniruddh25
Copy link
Contributor

Not enough time to do this with competing deadlines in Jan203. I agree let's do 1 since we need to support Synapse anyway and its unfortunate that ADO.NET doesn't have that support yet.

@aaronpowell
Copy link
Contributor

@Aniruddh25 do you know if adding this support is on the roadmap for the ADO.NET team? I can try and find a contact if needed.

@aaronpowell
Copy link
Contributor

I was reflecting on this while working on some Blazor samples and how we'd best go about representing the JSON type in GraphQL.

Presently, we end up with a generated GraphQL type like so:

type Todos {
  id: Int!
  title: String!
  description: String!
  completed: Boolean!
  tags: String!
}

Looking at the original structure of tags we want it to map to a type like this:

type Tag {
  tag: String!
}

And that'd see us with a field tags: [Tag]! on the Todos type.

This isn't practical though, as we can't deserialize every JSON field to work out what the "shape" is to then know what GraphQL type to generate should look like, and what happens when you've got optional fields in the JSON, do we have to deserialize a subset of them to work out what the possibilities look it?

Proposal - Custom Scalar types

Back to my idea, we could use custom scalar types for these JSON fields, generating a "unique type" for each JSON field, meaning out Todos type would now look like this:

scalar Todo_Tags

type Todos {
  id: Int!
  title: String!
  description: String!
  completed: Boolean!
  tags: Todo_Tags!
}

Admittedly, this won't really do anything different at a server level, that'll still return a string field to the client, but what it means is that as a client we can intercept it and handle it.

If you're building a .NET client with Strawberry Shake you can define custom serialization for a particular scalar type. urql can also support it and it's possible with Apollo (although not well supported). But also it won't have any real breaking downstream impacts as the change is only masking that it's a String.

Alternatively it could be set as a JSON scalar type, but that would make it harder to have different serialization settings for different fields, in the case where you have multiple JSON columns in a SQL database.

@aaronpowell
Copy link
Contributor

Because curiosity meant that I needed to dig into this more I decided to look at what we need to do to identify JSON columns.

PostgreSQL (and I expect MySQL but didn't test) should be reasonably easy to detect as they have a column type of json that is returned when using the conn.GetSchema("Columns") query:

JSON column data type

Azure SQL is a little harder as it isn't a data type but it's a constraint on the table that we can query for and then find the column that the constraint is applicable to. Here's a SQL query that we can execute:

select schema_name(t.schema_id) + '.' + t.[name]  as [table],
    col.[name] as column_name,
    con.[definition]
from sys.check_constraints as con
    left outer join sys.objects as t
        on con.parent_object_id = t.object_id
    left outer join sys.all_columns as col
        on con.parent_column_id = col.column_id
        and con.parent_object_id = col.object_id
where lower(con.[definition]) like '%isjson%'
and con.is_disabled = 0

With this we could add another property to ColumnDefinition of IsJson (or similar) and use that as part of the codegen for the GraphQL engine.

@Aniruddh25 Aniruddh25 modified the milestones: Mar2023, May2023 Mar 29, 2023
@Aniruddh25 Aniruddh25 removed this from the 0.8 milestone Jun 27, 2023
@seantleonard seantleonard added this to the 0.12 milestone Nov 16, 2023
@seantleonard seantleonard modified the milestones: 0.12rc, 1.0 Feb 7, 2024
@seesharprun seesharprun changed the title Detect JSON columns [Known Issue] Detect JSON columns in Azure SQL Apr 3, 2024
@seesharprun seesharprun added the known-issue Known issues linked from https://learn.microsoft.com label Apr 3, 2024
@seantleonard seantleonard modified the milestones: 1.1rc, 1.2rc Apr 30, 2024
@seantleonard seantleonard modified the milestones: 1.2rc, Feature Backlog May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Let's make this better known-issue Known issues linked from https://learn.microsoft.com
Projects
None yet
Development

No branches or pull requests

7 participants