ICP_02 : Spark Programming

Jump to bottom

acikgozmehmet edited this page Sep 4, 2020 · 3 revisions

Objectives:

We will focus on installation and getting familiar with Big Data Analytics and Applications programming concepts.

Spark

Spark is an open source cluster computing environment similar to Hadoop, developed at the University of California, Berkeley
- Machine Learning
- Spark Streaming
- Faster Batch
Spark enables in-memory distributed datasets that optimize iterative workloads in addition to interactive queries.
Spark is complementary to Hadoop and can run side by side over the Hadoop file system.
Spark supports to build large-scale and low-latency data analytics applications.

In Class Programming

1. Spark Integration with Colab (or IDE that you are using)

2. Creating a well commented Spark program and outputting the correct results and writing it to output file.

Results

Recording

Please click on the link to see the recording

References: