Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide the native OID of PG type in pg_type #11746

Merged
merged 8 commits into from
Apr 30, 2024

Conversation

goldmedal
Copy link
Contributor

@goldmedal goldmedal commented Apr 20, 2024

Description

DuckDB provide the internal OID of a type in pg_type.oid. However, the type OID is a constant in the native Postgres. Some pg driver will keep a cache of type info, such as JDBC. To avoid incorrect type mapping, we should provide the native OID in pg_type.

What's Changed

  • Create the mapping for the native PG type.
  • Provide native pg name in typname if it can be mapped to the native pg type.
  • Provide numeric type.
  • Filter the duplicate record if its type_oid is null.

Sample Result

D select oid, typname from pg_type;
┌───────┬──────────────┐
│  oid  │   typname    │
│ int64 │   varchar    │
├───────┼──────────────┤
│    20 │ int8         │
│    17 │ bytea        │
│  1560 │ bit          │
│    16 │ bool         │
│  1043 │ varchar      │
│  1082 │ date         │
│  1114 │ timestamp    │
│  1700 │ numeric      │
│   701 │ float8       │
│       │ enum         │
│   700 │ float4       │
│  2950 │ uuid         │
│       │ hugeint      │
│    23 │ int4         │
│       │ tinyint      │
│    21 │ int2         │
│  1186 │ interval     │
│       │ list         │
│       │ map          │
│       │ null         │
│       │ struct       │
│  1083 │ time         │
│  1184 │ timestamptz  │
│       │ timestamp_ms │
│       │ timestamp_ns │
│       │ timestamp_s  │
│  1266 │ timetz       │
│       │ ubigint      │
│       │ uhugeint     │
│       │ usmallint    │
│       │ uinteger     │
│       │ utinyint     │
│       │ union        │
├───────┴──────────────┤
│ 33 rows    2 columns │
└──────────────────────┘

Other DuckDB types have the oid field set to null in pg_type.

If there are custom enums, they will be set to the internal OID in DuckDB.

D CREATE TYPE greeting AS ENUM('hi', 'bonjour', 'konnichiwa', 'howdy')
  ;
D select oid, typname from pg_type where oid is not null;
┌───────┬─────────────────────────────┐
│  oid  │           typname           │
│ int64 │           varchar           │
├───────┼─────────────────────────────┤
...
│  1369 │ greeting                    │
...
├───────┴─────────────────────────────┤
│ 17 rows                   2 columns │
└─────────────────────────────────────┘
D select * from pg_enum;
┌───────┬───────────┬───────────────┬────────────┐
│  oid  │ enumtypid │ enumsortorder │ enumlabel  │
│ int32 │   int64   │     int32     │  varchar   │
├───────┼───────────┼───────────────┼────────────┤
│       │      1369 │             4 │ howdy      │
│       │      1369 │             3 │ konnichiwa │
│       │      1369 │             2 │ bonjour    │
│       │      1369 │             1 │ hi         │
└───────┴───────────┴───────────────┴────────────┘

@@ -62,6 +62,7 @@ static const DefaultMacro internal_macros[] = {
{"pg_catalog", "pg_get_expr", {"pg_node_tree", "relation_oid", nullptr}, "pg_node_tree"},
{"pg_catalog", "format_pg_type", {"logical_type", "type_name", nullptr}, "case when logical_type='FLOAT' then 'real' when logical_type='DOUBLE' then 'double precision' when logical_type='DECIMAL' then 'numeric' when logical_type='ENUM' then lower(type_name) when logical_type='VARCHAR' then 'character varying' when logical_type='BLOB' then 'bytea' when logical_type='TIMESTAMP' then 'timestamp without time zone' when logical_type='TIME' then 'time without time zone' else lower(logical_type) end"},
{"pg_catalog", "format_type", {"type_oid", "typemod", nullptr}, "(select format_pg_type(logical_type, type_name) from duckdb_types() t where t.type_oid=type_oid) || case when typemod>0 then concat('(', typemod//1000, ',', typemod%1000, ')') else '' end"},
{"pg_catalog", "map_to_pg_oid", {"oid", "logical_type", nullptr}, "case logical_type when 'ENUM' then oid else case oid when 10 then 16 when 12 then 21 when 13 then 23 when 14 then 20 when 15 then 1082 when 16 then 1083 when 19 then 1114 when 22 then 700 when 23 then 701 when 25 then 1043 when 26 then 17 when 27 then 1186 when 32 then 1184 when 34 then 1266 when 36 then 1560 when 54 then 2950 else null end end"}, // map duckdb_oid to pg_oid. If no corresponding type, return null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not robust - the oids in DuckDB are not guaranteed to be constants and might change over time. Could you rework this to use the names of the types instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments. I have addressed it in the latest commits.

@Mytherin
Copy link
Collaborator

Thanks for the PR! I agree we should match Postgres on the oids here - but this needs to be done robustly, see my comment.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft April 22, 2024 16:59
@goldmedal
Copy link
Contributor Author

goldmedal commented Apr 22, 2024

I found the extra 2 issue here:

  • The typname isn't the native pg type.
    I think that we also provide native name here is better.
  • numeric type is missing
    Previously, we filter duckdb_types() with type_size is not null. However, numeric should be provided actually. This condition only filter the two record, numeric and enum. I added extra condition to exclude enum, then added type_oid is not null to filter the duplicate record.

I also updated the PR description.

@goldmedal goldmedal marked this pull request as ready for review April 22, 2024 17:23
@Tishj
Copy link
Contributor

Tishj commented Apr 22, 2024

In the PR you state:

Provide numeric type

But in the old code, this was already present?

logical_type='DECIMAL' then 'numeric'

@Tishj
Copy link
Contributor

Tishj commented Apr 22, 2024

I think we should use ILIKE here, so this doesn't happen

D select pg_catalog.format_pg_type('DECIMAL', 'test');
┌─────────────────────────────┐
│ pg_type2('DECIMAL', 'test') │
│           varchar           │
├─────────────────────────────┤
│ numeric                     │
└─────────────────────────────┘
D select pg_catalog.format_pg_type('decimal', 'test');
┌─────────────────────────────┐
│ pg_type2('decimal', 'test') │
│           varchar           │
├─────────────────────────────┤
│ decimal                     │
└─────────────────────────────┘

@goldmedal
Copy link
Contributor Author

In the PR you state:

Provide numeric type

But in the old code, this was already present?

logical_type='DECIMAL' then 'numeric'

It's weird. I tried the v0.10.2 version

v0.10.2 1601d94f94
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D select oid, typname from pg_type where typname = 'numeric';
┌───────┬─────────┐
│  oid  │ typname │
│ int64 │ varchar │
├─────────────────┤
│     0 rows      │
└─────────────────┘

I can't find any result.

The definition of the original pg_type, it includes a filter

FROM duckdb_types() WHERE type_size IS NOT NULL

However, I found the type_size of DECIMAL is always null.

D select type_name, type_size from duckdb_types() where type_name = 'numeric';
┌───────────┬───────────┐
│ type_name │ type_size │
│  varchar  │   int64   │
├───────────┼───────────┤
│ numeric   │           │
│ numeric   │           │
│ numeric   │           │
└───────────┴───────────┘

I'm not sure about that but I guess it's a bug.

I think we should use ILIKE here, so this doesn't happen

@Tishj Many thanks! I'll fix them today.

@goldmedal goldmedal marked this pull request as draft April 23, 2024 02:07
@goldmedal
Copy link
Contributor Author

I think we should use ILIKE here, so this doesn't happen

Instead of multiple WHEN logical_type ILIKE xxx THEN, I chose upper(logical_type to make the code cleaner.

In the PR you state:

Provide numeric type

But in the old code, this was already present?

logical_type='DECIMAL' then 'numeric'

I think I found why DuckDB handle DECIMAL here previously. In test/sql/pg_catalog/sqlalchemy.test, The # get_columns test will use format_type which invokes format_pg_type internally. The test case asserts that numeric should be provided. However, it doesn't query pg_type but get type from the table metadata. That's why DuckDB handled numeric but no one found it's missing in pg_type.

@goldmedal goldmedal marked this pull request as ready for review April 23, 2024 11:25
@goldmedal goldmedal requested a review from Mytherin April 24, 2024 01:36
@Mytherin Mytherin merged commit f8715b2 into duckdb:main Apr 30, 2024
47 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request May 6, 2024
Merge pull request duckdb/duckdb#11746 from goldmedal/feature/pg-type
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants