Skip to content

EilinLux/XmlPySparkParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

XmlPySparkParser

What involves

Extensible Markup Language

Extensible Markup Language (XML) is a markup language. It is quite common to find files as main source for an ETL pipeline, which stands for extract, transform, load and is a three-phase process where data is extracted, transformed (cleaned, sanitized, scrubbed) and loaded into a new datacontainer (ex. a database).

pyspark

PySpark is the Python API for Apache Spark, an open source, distributed computing framework and set of libraries for real-time, large-scale data processing.