data-engineering

This repository contains two data engineering projects leveraging Apache Spark for big data processing: A classification pipeline with MLlib + SparkSQL: A complete pipeline for data classification using Apache Spark's MLlib and SparkSQL. This project demonstrates the use of Spark for large-scale data processing, model training, and evaluation. Tweet analysis – Spark streaming: A real-time data pipeline for analyzing tweets using Spark Streaming. This project showcases the ability to ingest and process live data streams, applying transformations and analysis in real-time.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
A classification pipeline with MLlib + SparkSQL		A classification pipeline with MLlib + SparkSQL
Tweet analysis – Spark streaming		Tweet analysis – Spark streaming
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data-engineering

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

data-engineering

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages