Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly add a function that's similar to pandas json_normalized #95

Open
MrPowers opened this issue Apr 13, 2023 · 3 comments
Open

Possibly add a function that's similar to pandas json_normalized #95

MrPowers opened this issue Apr 13, 2023 · 3 comments

Comments

@MrPowers
Copy link
Owner

Suggestion from this Reddit thread.

@huynguyent
Copy link

Kind of cheating but a naive solution is to use pandas json_normalized to parse the json and then convert the resulting pandas df into Spark. The logic seems a bit too simple to justify a dedicated helper function though

@MrPowers
Copy link
Owner Author

MrPowers commented Feb 1, 2024

@huynguyent - would be nice to create an implementation that's really performant and doesn't depend on pandas!

@SemyonSinchenko
Copy link
Collaborator

It is possible only if you know the final schema. Otherwise you need to infer the schema first somehow. And even with known schema the simplest solution is still to use UDFs. My first question, do we know the schema in such a case? If not, I would suggest to start from the function like infer_json_schema(col).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants