Skip to content

Spark’s primary abstraction is a distributed collection of items called a Resilient Distributed Dataset or RDD. In a subsequent lab exercise, you will learn more about the details of RDD. RDDs have actions, which return values, and transformations, which return pointers to new RDD. This set of labs uses Skills Network (SN) Labs to provide an int…

Notifications You must be signed in to change notification settings

NkgoloL/IBM_Intro_Spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

IBM_Intro_Spark

Spark’s primary abstraction is a distributed collection of items called a Resilient Distributed Dataset or RDD. In a subsequent lab exercise, you will learn more about the details of RDD. RDDs have actions, which return values, and transformations, which return pointers to new RDD.

This set of labs uses Skills Network (SN) Labs to provide an interactive environment to develop applications and analyze data. It is available in either Scala or Python shells. Scala runs on the Java VM and is thus a good way to use existing Java libraries.

In this lab exercise, we will set up our environment in preparation for the later labs. After completing this set of hands-on labs, you should be able to: 1. Perform basic RDD actions and transformations 2. Use caching to speed up repeated operations

About

Spark’s primary abstraction is a distributed collection of items called a Resilient Distributed Dataset or RDD. In a subsequent lab exercise, you will learn more about the details of RDD. RDDs have actions, which return values, and transformations, which return pointers to new RDD. This set of labs uses Skills Network (SN) Labs to provide an int…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published