# PySpark – Create an Empty DataFrame & RDD

**While working with files, sometimes we may not receive a file for processing, however, we still need to create a DataFrame manually with the same schema we expect. If we don’t create with the same schema, our operations/transformations (like union’s) on DataFrame fail as we refer to the columns that may not present.**

**To handle situations similar to these, we always need to create a DataFrame with the same schema, which means the same column names and datatypes regardless of the file exists or empty file processing.**

## 1. Create Empty RDD in PySpark

---

Create an empty RDD by using emptyRDD() of SparkContext for example spark.sparkContext.emptyRDD().

In [0]:
# creates empty rdd

emptyRDD = sc.emptyRDD()
print(emptyRDD)

EmptyRDD[0] at emptyRDD at NativeMethodAccessorImpl.java:0


**Alternatively you can also get empty RDD by using spark.sparkContext.parallelize([]).**

In [0]:
#Creates Empty RDD using parallelize
rdd2 = sc.parallelize([])
print(rdd2)

ParallelCollectionRDD[1] at readRDDFromInputStream at PythonRDD.scala:435


## 2. Create Empty DataFrame with Schema (StructType)

---

**In order to create an empty PySpark DataFrame manually with schema ( column names & data types) first, Create a schema using StructType and StructField .**

In [0]:
#Create Schema

from pyspark.sql.types import StructType, StructField, StringType

schema = StructType([
        StructField('firstname', StringType(), True),
        StructField('middlename', StringType(), True),
        StructField('lastname', StringType(), True)
])

**Now use the empty RDD created above and pass it to createDataFrame() of SparkSession along with the schema for column names & data types.**

In [0]:
# Create empty DataFrame from empty RDD

df = spark.createDataFrame(data=emptyRDD, schema=schema)
df.printSchema()

root
 |-- firstname: string (nullable = true)
 |-- middlename: string (nullable = true)
 |-- lastname: string (nullable = true)



## 3. Convert Empty RDD to DataFrame

---

**You can also create empty DataFrame by converting empty RDD to DataFrame using toDF().**

In [0]:
# Convert empty RDD to DataFrame

df1 = rdd2.toDF(schema=schema)

df1.printSchema()


root
 |-- firstname: string (nullable = true)
 |-- middlename: string (nullable = true)
 |-- lastname: string (nullable = true)



## 4. Create Empty DataFrame with Schema.

---

**here will create it manually with schema and without RDD.**

In [0]:
#Create empty DataFrame directly

df2 = spark.createDataFrame(data=[], schema=schema)

df2.printSchema()

root
 |-- firstname: string (nullable = true)
 |-- middlename: string (nullable = true)
 |-- lastname: string (nullable = true)



## 5. Create Empty DataFrame without Schema (no columns)

---

**To create empty DataFrame without schema (no columns) just create a empty schema and use it while creating PySpark DataFrame.**

In [0]:
#Create empty DataFrame with no schema (no columns)

df3 = spark.createDataFrame(data=[], schema=StructType([]))

df3.printSchema()

root

