IBM_Intro_Spark

Spark’s primary abstraction is a distributed collection of items called a Resilient Distributed Dataset or RDD. In a subsequent lab exercise, you will learn more about the details of RDD. RDDs have actions, which return values, and transformations, which return pointers to new RDD.

This set of labs uses Skills Network (SN) Labs to provide an interactive environment to develop applications and analyze data. It is available in either Scala or Python shells. Scala runs on the Java VM and is thus a good way to use existing Java libraries.

In this lab exercise, we will set up our environment in preparation for the later labs. After completing this set of hands-on labs, you should be able to: 1. Perform basic RDD actions and transformations 2. Use caching to speed up repeated operations

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IBM_Intro_Spark

About

Releases

Packages

NkgoloL/IBM_Intro_Spark

Folders and files

Latest commit

History

Repository files navigation

IBM_Intro_Spark

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages