HIT-FP-Spark-Project

An Apache Spark analytics project built with Scala for functional programming coursework.

Prerequisites

JDK 11 or higher
Scala 2.12.19
sbt 1.9+
Apache Spark 3.5.1 (for runtime execution)

Project Structure

HIT-FP-Spark-Project/
├── build.sbt                  # Project build configuration
├── project/
│   └── build.properties       # sbt version configuration
├── src/
│   ├── main/
│   │   ├── scala/             # Main Scala source code
│   │   │   └── Main.scala     # Application entry point
│   │   └── resources/         # Application resources
│   └── test/
│       └── scala/             # Test source code
│           └── MainSpec.scala # Unit tests
├── docs/                      # Project documentation
├── data/                      # Data files for analytics
└── README.md                  # This file

Running the Project

To run the analytics pipeline, execute the following command in PowerShell:

# Windows PowerShell
$env:SBT_OPTS="-Xmx4g"; sbt "runMain Driver --movies data/ml-25m/movies.csv --ratings data/ml-25m/ratings.csv --minRating 4.0 --minCount 1000 --topN 10"

This command will:

Start the Spark application.
Load movie and rating data from the data/ml-25m/ directory.
Filter for movies with a minimum rating of 4.0 and at least 1000 ratings.
Calculate the top 10 movies based on these criteria.
Write the results to the output/ directory in both CSV and JSON formats.

Expected Processing Time:

With the provided parameters, the job should take approximately 2-5 minutes.

Verifying the Output

After the run completes, you can find the results in the output/ directory.

CSV results: output/top_movies.csv/
JSON results: output/top_movies.json/

You can view the top movies by inspecting the part-files inside these directories.

Development

Testing

To run the test suite:

sbt test

Building the JAR

To build a fat JAR for deployment:

sbt assembly

Dependencies

Apache Spark Core 3.5.1 - Core Spark functionality (provided scope)
Apache Spark SQL 3.5.1 - Spark SQL API (provided scope)
ScalaTest 3.2.18 - Testing framework (test scope)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.bsp		.bsp
docs		docs
project		project
src		src
target		target
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt
code_review.md		code_review.md
summary.md		summary.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HIT-FP-Spark-Project

Prerequisites

Project Structure

Running the Project

Verifying the Output

Development

Testing

Building the JAR

Dependencies

About

Uh oh!

Releases

Packages

Languages

bensky103/functional_programming_project

Folders and files

Latest commit

History

Repository files navigation

HIT-FP-Spark-Project

Prerequisites

Project Structure

Running the Project

Verifying the Output

Development

Testing

Building the JAR

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages