- toJSON() converts each row of a DataFrame into a JSON string.
- It is often used when you need to serialize data or prepare it for APIs or downstream systems that accept JSON format.
- Syntax: df.toJSON()

In [None]:
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("toJSONFunctionExample").getOrCreate()

In [None]:
# create sample dataframes
data = [
    (1, "Manta", "UK", 3600),
    (2, "Dipankar", "India", 3000),
    (3, "Souvik", "Ireland", 10000),
    (4, "Soukarjya", "USA", 5000),
    (5, "Padma", "Ireland", 2400)
]

columns = ["id", "name", "country", "salary"]

df = spark.createDataFrame(data, schema=columns)
df.show()

In [None]:
json_rdd = df.toJSON()

- each row in your DataFrame is converted into a JSON-formatted string.
- The result is not a DataFrame anymore, but an RDD (Resilient Distributed Dataset).
- Every element inside the RDD is a single JSON string, representing one row.

In [None]:
from item in json_rdd.collect():
print(item)

- Benefits:
    - toJSON() returns an RDD where each row is converted into a JSON string.
    - It is useful for:
        - Exporting data as JSON
        - Sending data to REST APIs
        - wWorking with systems that consume JSON format.