
# RDD Transformations and Actions

In this lecture we will begin to delve deeper into using Spark and Python. Please view the video lecture for a full explanation.
## Important Terms

Let's quickly go over some important terms:<br>

|Term 	|Definition
|----    |------|
|RDD |Resilient Distributed Dataset|
|Transformation |Spark operation that produces an RDD|
|Action 	|Spark operation that produces a local object|
|Spark Job 	|Sequence of transformations on data with a final action|

## Creating an RDD

There are two common ways to create an RDD:<br>

|Method   |Result|
|----|------|
|sc.parallelize(array) |Create RDD of elements of array (or list)|
|sc.textFile(path/to/file) |Create RDD of lines from file|

## RDD Transformations

We can use transformations to create a set of instructions we want to preform on the RDD (before we call an action and actually execute them).

|Transformation Example 	|Result
|----|-------|
|filter(lambda x: x % 2 == 0) |Discard non-even elements
|map(lambda x: x * 2) 	|Multiply each RDD element by 2
|map(lambda x: x.split()) 	|Split each string into words
|flatMap(lambda x: x.split()) 	|Split each string into words and flatten sequence
|sample(withReplacement=True,0.25) 	|Create sample of 25% of elements with replacement
|union(rdd) 	|Append rdd to existing RDD
|distinct() 	|Remove duplicates in RDD
|sortBy(lambda x: x, ascending=False) |	Sort elements in descending order

## RDD Actions

Once you have your 'recipe' of transformations ready, what you will do next is execute them by calling an action. Here are some common actions:

|Action 	|Result|
|----|----|
|collect() 	|Convert RDD to in-memory list
|take(3) 	|First 3 elements of RDD
|top(3) 	|Top 3 elements of RDD
|takeSample(withReplacement=True,3) |	Create sample of 3 elements with replacement
|sum() 	|Find element sum (assumes numeric elements)
|mean() 	|Find element mean (assumes numeric elements)
|stdev() |	Find element deviation (assumes numeric elements)
----
## Examples

Now the best way to show all of this is by going through examples! We'll first review a bit by creating and working with a simple text file, then we will move on to more realistic data, such as customers and sales data.
### Creating an RDD from a text file:


**Creating the textfile**