Possibly add a function that's similar to pandas json_normalized #95

MrPowers · 2023-04-13T21:29:43Z

huynguyent · 2024-02-01T14:20:08Z

Kind of cheating but a naive solution is to use pandas json_normalized to parse the json and then convert the resulting pandas df into Spark. The logic seems a bit too simple to justify a dedicated helper function though

MrPowers · 2024-02-01T14:26:49Z

@huynguyent - would be nice to create an implementation that's really performant and doesn't depend on pandas!

SemyonSinchenko · 2024-02-01T18:58:40Z

It is possible only if you know the final schema. Otherwise you need to infer the schema first somehow. And even with known schema the simplest solution is still to use UDFs. My first question, do we know the schema in such a case? If not, I would suggest to start from the function like infer_json_schema(col).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibly add a function that's similar to pandas json_normalized #95

Possibly add a function that's similar to pandas json_normalized #95

MrPowers commented Apr 13, 2023

huynguyent commented Feb 1, 2024

MrPowers commented Feb 1, 2024

SemyonSinchenko commented Feb 1, 2024

Possibly add a function that's similar to pandas json_normalized #95

Possibly add a function that's similar to pandas json_normalized #95

Comments

MrPowers commented Apr 13, 2023

huynguyent commented Feb 1, 2024

MrPowers commented Feb 1, 2024

SemyonSinchenko commented Feb 1, 2024