Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotImplementedException - Querying JSON file with duplicate column names crashes CLI #10751

Closed
1 task done
hello-world-bfree opened this issue Feb 19, 2024 · 3 comments · Fixed by #10881 or #11271
Closed
1 task done

Comments

@hello-world-bfree
Copy link

hello-world-bfree commented Feb 19, 2024

What happens?

When querying a newline-delimited JSON file with read_json_auto DuckDB crashes with a NotImplementedException:

libc++abi: terminating due to uncaught exception of type duckdb::NotImplementedException: {"exception_type":"Not implemented","exception_message":"Error while casting - duplicate name \"platform\" in struct"}

The column name platform is just the first of a number of the same kind of duplicates. They're duplicates in the sense that their lower cased representation matches - "Platform" == "platform". This jives with DuckDB elsewhere; i.e.,

create or replace table test as select 1 as id, struct_pack(platform := 2, "Platform" := 3);
Error: Binder Error: Duplicate struct entry name "Platform"

So rejecting these is consistent. The bug is the lack of error handling or validation.

If the file is altered so that the field names are truly identical - platform == platform instead of Platform and platform, we get the expected behavior and no crash:

select * from read_json_auto('bug_report.json', format = 'newline_delimited');
Error: Invalid Input Error: Duplicate key "platform" in object {"time":1707878164,"distinct_id":"20110941","$app_version_string":"9.10.1","$city":"Cookstown","$insert_id":"244b811e-378a-5444-b373-d1a5d6e9bba0","$os":"Android","platform":"google","platform":"google","event_id":1037203}

To Reproduce

Create a table with the attached file:

create table json_test as select * from read_json_auto('bug_report.json', format = 'newline_delimited');

Query the newly created table:

select * from json_test;

OS:

macOS

DuckDB Version:

0.10.0

DuckDB Client:

CLI

Full Name:

Brandon Freeman

Affiliation:

Hallow

Have you tried this on the latest nightly build?

I have not tested with any build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have
@hello-world-bfree hello-world-bfree changed the title NotImplementedException - Table created from JSON file with duplicate column names crashes CLI NotImplementedException - Querying JSON file with duplicate column names crashes CLI Feb 19, 2024
@szarnyasg
Copy link
Collaborator

Thanks! Reproduced, we'll take a look.

@Mytherin
Copy link
Collaborator

I have pushed a partial fix for this in #10881 that correctly renders an exception instead in this case instead of aborting. The JSON issue separately needs to be fixed, however.

Mytherin added a commit that referenced this issue Feb 28, 2024
Partially fix #10751: correctly catch exceptions in sqlite3_print_duckbox
@szarnyasg szarnyasg reopened this Feb 29, 2024
@dforsber
Copy link

Might be related: #11152

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants