Skip to content

adich23/XmltoCsv_StackExchange

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

XmltoCsv_StackExchange

A PySpark notebook to convert XML files to CSV format. I needed the Stack Exchange Data Dump in CSV format for my project. All the converters available online are good for small files but when you have xmls ranging from hundreds of MB to GB, they were taking too much time.

This one works in parallel, utilizing Spark's RDDs and complete the conversion in few minutes with a minimal 2-core and 2 GB RAM Spark setup.

HitCount

Releases

No releases published

Packages

No packages published