**Resilient Distributed Datasets (RDDs)**
- RDD's form a foundational layer of Apache Spark.
- They can be used as building blocks for more advanced data structures.

**Pair RDDs**
- Pair RDDs are a type of RDD that consists of *key-value pairs*, similar to entries in a dictionary but where keys are not unique.
- They enhance the functionality of standard RDDs and excel in operations where the handling of key-value pairs is beneficial.
- Unlike Python dictionaries, keys in Pair RDDs can have *multiple occurrences*.

**Key Functions in Pair RDDs**
1. `sortBy`: Similar to Python's sorted function, it allows sorting based on any function applied to the data.

![sortBy](<attachment:Screenshot 2024-12-01 130249.png>)

2. `sortByKey`: Specifically focuses on sorting the RDD by keys, typically in alphabetical order. Numeric keys may need to be formatted to strings for sorting.

![sortByKey](<attachment:Screenshot 2024-12-01 130302.png>)

3. `reduceByKey`: 
- This function facilitates aggregation by applying a function to two elements with the same key and producing a new RDD.
- Common application includes summing values associated with duplicate keys.

![reduceByKey](<attachment:Screenshot 2024-12-01 130313.png>)

![Word count example](<attachment:Screenshot 2024-12-01 130328.png>)

**Worked Example: Word Count**
1. Step 1: Input a text file (e.g., `data_totc.txt`) is read into an RDD.
2. Step 2: Use `flatMap` to split each line of the text into words.
3. Step 3: Transform each word into a key-value pair consisting of the word and the number 1.
4. Step 4: Apply `reduceByKey` to aggregate the counts of each word:
    - Iterates through each pair and sums the values where the key (word) is the same.
    - Utilizes a lambda function with an accumulator.
5. Step 5: Sort the resulting RDD based on the word count values in descending order.
6. Step 6: Extract the top 10 words with their counts using an action to retrieve results.

**Additional Notes**
- Emphasis is placed on understanding key functions in Pair RDDs and their role in common data operations like counting or sorting.
- The explanation of the word count example aims to clarify how transformations and actions are structured in Spark using RDDs.