This project demonstrates the integration of PySpark, Power BI, and Apache NiFi to process, analyze, and visualize large datasets. The goal is to showcase the capabilities of these tools in handling big data workflows from data ingestion to visualization.
- PySpark: For large-scale data processing and analysis.
- Power BI: For data visualization and reporting.
- Apache NiFi: For data ingestion and flow management.
- data_ingestion/: Contains NiFi flow configurations for data ingestion.
- data_processing/: Contains PySpark scripts for data processing and analysis.
- visualization/: Contains Power BI reports and dashboards.
- Python 3.x
- Apache Spark
- Apache NiFi
- Power BI Desktop
- Clone the repository
-
pip install pyspark - Setup Apache NiFi
- Setup Power BI