#PySpark flatMap() Transformation


---

**PySpark flatMap() is a transformation operation that flattens the RDD/DataFrame (array/map DataFrame columns) after applying the function on every element and returns a new PySpark RDD/DataFrame. In this article, you will learn the syntax and usage of the PySpark flatMap() with an example.**


---


**First, let’s create an RDD from the list.**

In [0]:
data = [
    "Project Gutenberg’s",
    "Alice’s Adventures in Wonderland",
    "Project Gutenberg’s",
    "Adventures in Wonderland",
    "Project Gutenberg’s"
]

rdd = sc.parallelize(data)

for element in rdd.collect():
    print(element)

Project Gutenberg’s
Alice’s Adventures in Wonderland
Project Gutenberg’s
Adventures in Wonderland
Project Gutenberg’s


##flatMap() Syntax


---
####flatMap(f, preservesPartitioning=False)
---




---


##flatMap() Example



**Now, let’s see with an example of how to apply a flatMap() transformation on RDD. In the below example, first, it splits each record by space in an RDD and finally flattens it. Resulting RDD consists of a single word on each record.**

In [0]:
rdd2 = rdd.flatMap(lambda x: x.split(" "))

for element in rdd2.collect():
    print(element)


Project
Gutenberg’s
Alice’s
Adventures
in
Wonderland
Project
Gutenberg’s
Adventures
in
Wonderland
Project
Gutenberg’s


##Using flatMap() transformation on DataFrame

---


**Unfortunately, PySpark DataFame doesn’t have flatMap() transformation however, DataFrame has explode() SQL function that is used to flatten the column. Below is a complete example.**

In [0]:
from pyspark.sql.functions import explode


arrayData = [
    ('James',['Java','Scala'],{'hair':'black','eye':'brown'}),
    ('Michael',['Spark','Java',None],{'hair':'brown','eye':None}),
    ('Robert',['CSharp',''],{'hair':'red','eye':''}),
    ('Washington',None,None),
    ('Jefferson',['1','2'],{})
]

df = spark.createDataFrame(data=arrayData, schema= ['name', 'knownlanguages', 'properties'])
df2 = df.select(df.name, explode(df.knownlanguages))
df2.printSchema()
df2.show(truncate=False)

root
 |-- name: string (nullable = true)
 |-- col: string (nullable = true)

+---------+------+
|name     |col   |
+---------+------+
|James    |Java  |
|James    |Scala |
|Michael  |Spark |
|Michael  |Java  |
|Michael  |null  |
|Robert   |CSharp|
|Robert   |      |
|Jefferson|1     |
|Jefferson|2     |
+---------+------+



**This example flattens the array column “knownLanguages” and yields below output**

##Conclusion

**In conclusion, you have learned how to apply a PySpark flatMap() transformation to flattens the array or map columns and also learned how to use alternatives for DataFrame.**