Skip to content

burhanmaseel/bigDataTermProject

Repository files navigation

Academic Project: Data Processing and Visualization with PySpark, Power BI, and NiFi

Overview

This project demonstrates the integration of PySpark, Power BI, and Apache NiFi to process, analyze, and visualize large datasets. The goal is to showcase the capabilities of these tools in handling big data workflows from data ingestion to visualization.

Technologies Used

  • PySpark: For large-scale data processing and analysis.
  • Power BI: For data visualization and reporting.
  • Apache NiFi: For data ingestion and flow management.

Project Structure

  • data_ingestion/: Contains NiFi flow configurations for data ingestion.
  • data_processing/: Contains PySpark scripts for data processing and analysis.
  • visualization/: Contains Power BI reports and dashboards.

Setup Instructions

Prerequisites

  • Python 3.x
  • Apache Spark
  • Apache NiFi
  • Power BI Desktop

Installation

  1. Clone the repository
  2. pip install pyspark
    
  3. Setup Apache NiFi
  4. Setup Power BI

About

Big data academic term project using Apache Spark, HDFS, NiFi and PowerBI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors