Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting to Import Nested JSON Objects to Dolt can Cause Dolt to Crash #5143

Closed
davidshq opened this issue Jan 13, 2023 · 5 comments · Fixed by #5302
Closed

Attempting to Import Nested JSON Objects to Dolt can Cause Dolt to Crash #5143

davidshq opened this issue Jan 13, 2023 · 5 comments · Fixed by #5302
Assignees
Labels
bad error message bug Something isn't working panic

Comments

@davidshq
Copy link

Attempting to run an import to dolt using the following command:
dolt table import -c -s dolt-schema.sql books oreilly.json

Error:

panic: interface conversion: interface {} is string, not map[string]interface {}

goroutine 91 [running]:
github.com/dolthub/dolt/go/libraries/doltcore/table/typed/json.(JSONReader).ReadSqlRow(0xc000f3acc0, {0x5?, 0x234cc00?})
/src/libraries/doltcore/table/typed/json/reader.go:114 +0x150
github.com/dolthub/dolt/go/cmd/dolt/commands/tblcmds.moveRows({0x2330068, 0xc0010ff280}, 0xc0010e6460, {0x2330570, 0xc000f3acc0}, 0xc000f02c80, 0xc0004547e0, 0xc001124420)
/src/cmd/dolt/commands/tblcmds/import.go:596 +0x14e
github.com/dolthub/dolt/go/cmd/dolt/commands/tblcmds.move.func2()
/src/cmd/dolt/commands/tblcmds/import.go:548 +0x7a
golang.org/x/sync/errgroup.(Group).Go.func1()
/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0xa5

SQL:

CREATE TABLE books (
    `id` LONGTEXT NOT NULL,
    `ourn` LONGTEXT,
    `isbn` INTEGER,
    `last_modified_time` DATETIME,
    `issued` DATETIME,
    `format` TEXT,
    `content_format` TEXT,
    `authors` TEXT,
    `publishers` TEXT,
    `language` TEXT,
    `title` LONGTEXT,
    `description` LONGTEXT,
    `url` LONGTEXT,
    `web_url` TEXT,
    `virtual_pages` INTEGER,
    `timestamp` DATETIME,
    `average_rating` INTEGER,
    `number_of_followers` INTEGER,
    `number_of_items` INTEGER,
    `number_of_reviews` INTEGER,
    `popularity` INTEGER,
    `report_score` INTEGER,
    `cover_url` LONGTEXT,
    `date_added` DATETIME,
    `topics` LONGTEXT,
    `topics_payload` LONGTEXT,
    `minutes_required` INTEGER
)

See oreilly.txt for JSON
oreilly.txt

For repo with code that pulls from O'Reilly API see: https://github.com/davidshq/oreillyas

@max-hoffman noted in Discord:

"looks like you are trying to import a json dict into a TEXT type, i am not sure we support the implicit conversion. one thing that would probably resolve that error is converting the dict/TEXT fields with something like json.loads
and we do have a schema type JSON, which is similar to TEXT but does some structural validation and has access to json-specific SQL functions
we definitely shouldn't panic though, that is a bug. if you would be kind enough to copy the repro and error message into a Github issue, we could have someone look at that soon,"

@davidshq
Copy link
Author

davidshq commented Jan 13, 2023

I changes the fields containing JSON from TEXT/LONGTEXT to JSON in an updated dolt-schema.sql file, specifically: authors, publishers, topics, topics_payload ...The line on which the error is occurring has changed but the base error has not.

panic: interface conversion: interface {} is string, not map[string]interface {}

goroutine 75 [running]:
github.com/dolthub/dolt/go/libraries/doltcore/table/typed/json.(*JSONReader).ReadSqlRow(0xc000566b40, {0x5?, 0x234cc00?})
/src/libraries/doltcore/table/typed/json/reader.go:114 +0x150
github.com/dolthub/dolt/go/cmd/dolt/commands/tblcmds.moveRows({0x2330068, 0xc0000d3300}, 0xc000a068c0, {0x2330570, 0xc000566b40}, 0xc000fc6500, 0xc0000b8cc0, 0xc0000d5080)
/src/cmd/dolt/commands/tblcmds/import.go:596 +0x14e
github.com/dolthub/dolt/go/cmd/dolt/commands/tblcmds.move.func2()
/src/cmd/dolt/commands/tblcmds/import.go:548 +0x7a
golang.org/x/sync/errgroup.(*Group).Go.func1()
/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0xa5

@timsehn timsehn added bug Something isn't working panic labels Jan 13, 2023
@max-hoffman
Copy link
Contributor

I debugged a bit and this has something to do with the expected format of a json file. I think the scanner is expecting every line to be a standalone json object, rather than the file itself being a json array. We still should not panic if the file is not formatted the way we expect.

The following example works for me locally

test.json:

{"a": 1}
{"b": 2}

test.sql:

create table t (x varchar(10), y int);
dolt table import -c -s test.sql t test.json
Rows Processed: 0, Additions: 0, Modifications: 0, Had No Effect: 0
Import completed successfully.

I forked a PR workaround that imports the table with the CSV interface davidshq/oreillyas#1.

@davidshq
Copy link
Author

Interesting. I tried removing the enclosing array and running it through that way, but no luck. I'm guessing it is catching on the nested fields that have arrays with objects inside them.

Sounds like the scanner in general doesn't support nested objects?

Thanks for the fork, I was able to run it successfully.

@max-hoffman
Copy link
Contributor

The scanner splits on line, and tries force casting the line to map[string][interface{}, which it expects to match the schema of the table defined. As far as I know we expect JSON objects to be strings or byte arrays on import, but I am not particularly informed for JSON object lifecycle handling. Others might be able to speak more to features/our JSON compatibility compared to other databases.

@jennifersp jennifersp self-assigned this Feb 3, 2023
@jennifersp jennifersp linked a pull request Feb 3, 2023 that will close this issue
@fulghum
Copy link
Contributor

fulghum commented Feb 7, 2023

Thanks again for taking the time to report this @davidshq. We've fixed the panic and added a better error message; we also made some documentation improvements to better document the require input JSON file structure.

Thanks for helping us find this problem! Let us know if there's anything else we can help out with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bad error message bug Something isn't working panic
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants