# ‚ö° Data Transformations Using Spark RDDs

Apache **Spark RDDs (Resilient Distributed Datasets)** support powerful transformations such as **map**, **filter**, and **reduce** for large-scale data processing.

---

## üìÇ Dataset (`numbers.txt`)

```text
10
20
30
40
50
````

---

## üîπ Step 1: Load Dataset into an RDD

Start **Spark Shell** and load the data:

```scala
val rdd = sc.textFile("numbers.txt").map(_.toInt)
```

---

## üîπ Step 2: Apply `map` Operation

Multiply each number by 2:

```scala
val mappedRDD = rdd.map(x => x * 2)
```

---

## üîπ Step 3: Apply `filter` Operation

Filter values greater than 50:

```scala
val filteredRDD = mappedRDD.filter(x => x > 50)
```

---

## üîπ Step 4: Apply `reduce` Operation

Find the sum of the filtered values:

```scala
val result = filteredRDD.reduce((a, b) => a + b)
```

---

## ‚úÖ Output Verification

```scala
result
```

```text
140
```


## **In PySpark:**

In [None]:
from pyspark.sql import SparkSession

# Create Spark Session
spark = SparkSession.builder \
    .appName("RDD_Data_Transformations") \
    .getOrCreate()

sc = spark.sparkContext

# -----------------------------------
# Load dataset directly into RDD
data = ["Hello Spark",
        "RDD Transformations",
        "Spark Map Filter Reduce",
        "Big Data Processing"]

rdd = sc.parallelize(data)

print("Original Data:", rdd.collect())

# -----------------------------------
# 1Ô∏è‚É£ MAP Operation
# Convert each line to lowercase
mapped_rdd = rdd.map(lambda line: line.lower())
print("After map (lowercase):", mapped_rdd.collect())

# -----------------------------------
# 2Ô∏è‚É£ FILTER Operation
# Keep only lines containing 'spark'
filtered_rdd = mapped_rdd.filter(lambda line: "spark" in line)
print("After filter (contains 'spark'):", filtered_rdd.collect())

# -----------------------------------
# 3Ô∏è‚É£ REDUCE Operation
# Count total number of lines
line_count = rdd.map(lambda line: 1).reduce(lambda a, b: a + b)
print("Total Number of Lines:", line_count)

# Stop Spark
spark.stop()
