Skip to content

CirsteanPaul/pyspark-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

E.ON Course - Big Data with PySpark

Description

This project served as the final assignment for the Hands-On Advanced Analytics with Apache Spark course. The training spanned 5 weeks and focused on mastering big data technologies. The project was completed at Fii practic.

Languages and Utilities Used

  • Python
  • PySpark
  • Jupyter Notebook

Implementation Details

  • The dataset contained approximately 3,549,246 entries.
  • The primary objective of the project was to clean the dataset, addressing inconsistencies intentionally introduced by our trainers, as well as more realistic inconsistencies.
  • Upon completion of the cleaning process, we performed data aggregation.

Project Task

For detailed tasks, please refer to the Tasks document.