Skip to content

Maham802/data-engineering

Repository files navigation

data-engineering

This repository contains two data engineering projects leveraging Apache Spark for big data processing: A classification pipeline with MLlib + SparkSQL: A complete pipeline for data classification using Apache Spark's MLlib and SparkSQL. This project demonstrates the use of Spark for large-scale data processing, model training, and evaluation. Tweet analysis – Spark streaming: A real-time data pipeline for analyzing tweets using Spark Streaming. This project showcases the ability to ingest and process live data streams, applying transformations and analysis in real-time.

About

data engineering projects leveraging Apache Spark for big data processing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors