pySpark-flatten-dataframe

PySpark function to flatten any complex nested dataframe structure loaded from JSON/CSV/SQL/Parquet

For example, for nested JSONs -

Flattens all nested items: { "human":{ "name":{ "first_name":"Jay Lohokare" } } }

Is converted to dataFrame with column = 'human-name-first_name' The connector '-' can be changed by changing the connector variable.

Explodes Arrays: { "array":["one", "two", "three"] } Is converted to dataFrame with column = 'array' with 3 rows

The function can handle any level of nesting.

The function can NOT handle Arrays within Arrays. This is just to keep the code dynamic and generic. To handle Arrays within Arrays, modify if isinstance in the for loop of flattenSchema function

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
SparkRelationalizeDF.ipynb		SparkRelationalizeDF.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SparkRelationalizeDF.ipynb

SparkRelationalizeDF.ipynb

Repository files navigation

pySpark-flatten-dataframe

About

Releases

Packages

Languages

JayLohokare/pySpark-flatten-dataframe

Folders and files

Latest commit

History

README.md

README.md

SparkRelationalizeDF.ipynb

SparkRelationalizeDF.ipynb

Repository files navigation

pySpark-flatten-dataframe

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages