New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Malformed JSON error, suggestion is to increase "maximum_object_size" #6569
Comments
Thanks for the bug report! There are a few options that you could try. If these do not work, would you be open to emailing the file directly so that we can troubleshoot?
Let us know what you find! |
@Alex-Monahan Thanks for the ideas
Seems to get a count if I boost the size to something ridiculous:
But when trying to read the data:
Not sure off the bat which field this may be struggling at, and it's interesting that I don't see this internal error reading the JSONL.
Should not be a problem. This output serialized from a zap logger, but I'll need to obfuscate some things. |
Very interesting that the count worked! Would you mind trying message: 'JSON' ? I wonder if it is coming in as something other than a varchar and we aren't falling back to it? Thanks for being open to sending the file! It truly is the best way to solve these kinds of errors. |
Thanks for the bug report! The issue here is that |
@lnkuiper Cool. To confirm, the file contains indeed one array of many objects. The first snippet I showed is just one record of thousands. Similar to the |
Closed via #7478 We now support streaming reads of JSON arrays. |
What happens?
I was reading the latest blog post on shredding deeply nested JSON[1] and could not read JSON (but could read new line delimited JSON) as per the examples.
The file I am attempting to read is a structured log file. The original output is actually newline delimited, but I used
jq --slurp
to switch it over to an array. Here's the head of the file, showing one typical record (snipped, of course):When trying to do the simple query provided in the blog post, I get the following:
I thought maybe the actual error was being obfuscated by this. Since I need a manual schema to read this file (the keys and nesting are not consistent). I tried a simple schema but same result:
No luck! There is nothing really of interest at byte
2097149
, it's likely where DuckDB gave up on the record (I assume the issue here is that DuckDB sees everything up to byte 2097149 as one record). The file is 147M.So, I tried the newline delimited JSON file, providing a schema:
It worked! But why was my JSON array formatted file not working? I tried the
todos.json
[2] in the blog post and I had no problem parsing that, as per the blog post:Perhaps this has to do with the varying structure of my JSON file, maybe some of the deep nesting within, or am I missing something obvious and silly on this Saturday morning?
Regardless, super impressed with the speed and intuitive nature of the tool. Reading the JSONL data was snappy and I was able to produce the view I wanted in seconds. That's awesome.
[1] https://duckdb.org/2023/03/03/json.html
[2] https://jsonplaceholder.typicode.com/todos
To Reproduce
Provided in What Happens
OS:
macOS
DuckDB Version:
v0.7.1 b00b93f
DuckDB Client:
Shell (duckdb invoke)
Full Name:
Bartek Ciszkowski
Affiliation:
Myself!
Have you tried this on the latest
master
branch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
The text was updated successfully, but these errors were encountered: