Read "../data/quiz_Sample.txt" in the RDD, write a mapper that will provide the length of each word in the following format:
[ [2, 3, 3, 4], [4, 3, 3, 5], [5]]

### Import Spark Configuration and Create SparkContext
This cell imports the necessary PySpark classes and creates a SparkContext with the application name "QUIZ".

In [1]:
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("QUIZ")
sc = SparkContext.getOrCreate(conf=conf)

### Read Quiz Sample File into RDD
This cell reads the contents of 'quiz_Sample.txt' into a Resilient Distributed Dataset (RDD) and displays the raw text data.

In [None]:
rdd = sc.textFile("../data/Map_Quiz.txt")
rdd.collect()

['Hi how are you?', 'Hope you are doing', 'great']

### Split Text Lines into Words
This cell applies a map transformation to split each line of text by spaces, creating a new RDD where each element is a list of words.

In [3]:
rdd2 = rdd.map(lambda x: x.split(' '))
rdd2.collect()

[['Hi', 'how', 'are', 'you?'], ['Hope', 'you', 'are', 'doing'], ['great']]

### Calculate Word Lengths Using Custom Mapper
This cell defines a custom function `word_length_mapper` that takes a list of words and returns a list of their lengths. It then applies this function using a map transformation to create the desired output format: [[2, 3, 3, 4], [4, 3, 3, 5], [5]].

In [4]:
def word_length_mapper(words):
    lengths = []
    for word in words:
        lengths.append(len(word))
    return lengths

rdd3 = rdd2.map(lambda x: word_length_mapper(x))
rdd3.collect()


[[2, 3, 3, 4], [4, 3, 3, 5], [5]]