- Task 1 - Install Spark on Google Colab and load datasets in PySpark
- Task 2 - Change column datatype, remove whitespaces and drop duplicates
- Task 3 - Remove columns with Null values higher than a threshold
- Task 4 - Group, aggregate and create pivot tables
- Task 5 - Rename categories and impute missing numeric values
- Task 6 - Create visualizations to gather insights
forked from AndreBluhm/Project_Data-Analysis-PySpark
-
Notifications
You must be signed in to change notification settings - Fork 0
Hamzi275/Project_Data-Analysis-PySpark
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
Guided Project using PySpark for Data-Analysis from Coursera.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Jupyter Notebook 100.0%
