Spark Examples

Some simple, kinda introductory projects based on Apache Spark to be used as guides in order to make the whole DataFrame data management look less weird or complex.

Preparations & Prerequisites

Latest stable version of Spark or at least the one used here, 3.0.1 (in case you're old school).
A single node setup is enough. You can also use the applications in a local cluster or in a cloud service, with needed changes on anything to be parallelized, of course.
Of course, having (a somehow recent version of) Scala (and Java) installed (oh, you're really old school).
The most casual and convenient way to run the projects is to import them to a IDE as shown here.

Projects

Each project comes with its very own input data (.csv, .tsv, or simple text files in the project folder ready to be used or copied to the HDFS) and its execution results are either stored as a single file in an /output directory or printed in console.

The projects featured in this repo are:

AvgPrice

Calculating the average price of houses for sale by zipcode.

BankTransfers

A typical "sum-it-up" example where for each bank we calculate the number and the sum of its transfers.

MaxTemp

Typical case of finding the max recorded temperature for every city.

Medals

An interesting application of working on Olympic game stats in order to see the total wins of gold, silver, and bronze medals of every athlete.

NormGrades

Just a plain old normalization example for a bunch of students and their grades.

OldestTree

Finding the oldest tree per city district. Child's play.

ScoreComp

The most challenging and abstract one. Every key-character (A-E) has 3 numbers as values, two negatives and one positive. We just calculate the score for every character based on the following expression character_score = pos / (-1 * (neg_1 + neg_2)).

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
AvgPrice		AvgPrice
BankTransfers		BankTransfers
MaxTF		MaxTF
MaxTemp		MaxTemp
Medals		Medals
NormGrades		NormGrades
OldestTree		OldestTree
PatientFilter		PatientFilter
ReadFolderFiles		ReadFolderFiles
ScoreComp		ScoreComp
SymDiff		SymDiff
TopWords		TopWords
README.md		README.md

Coursal/Spark-Examples

Folders and files

Latest commit

History

Repository files navigation

Spark Examples

Preparations & Prerequisites

Projects

About

Topics

Resources

Stars

Watchers

Forks

Languages