## Drop Function

Syntax: `drop(how='any', thresh=None, subset=None)`

`how` – This takes values `any` or `all`. By using `any`, drop a row if it contains NULLs on any columns.
      By using `all`, drop a row only if all columns have NULL values. Default is `any`.
`thresh` – This takes int value, Drop rows that have less than thresh hold non-null values. Default is `None`.
`subset` – Use this to select the columns for NULL values. Default is `None`.

In [1]:
from pyspark.sql import SparkSession

In [2]:
spark = SparkSession.builder.master("local[1]").appName("SparkByExamples.com").getOrCreate()

filePath = '../Example_Sources/small_zipcode.csv'
df = spark.read.options(header='true', inferSchema='true').csv(filePath)

df.printSchema()
df.show(truncate=False)

root
 |-- id: integer (nullable = true)
 |-- zipcode: integer (nullable = true)
 |-- type: string (nullable = true)
 |-- city: string (nullable = true)
 |-- state: string (nullable = true)
 |-- population: integer (nullable = true)

+---+-------+--------+-------------------+-----+----------+
|id |zipcode|type    |city               |state|population|
+---+-------+--------+-------------------+-----+----------+
|1  |704    |STANDARD|null               |PR   |30100     |
|2  |704    |null    |PASEO COSTA DEL SUR|PR   |null      |
|3  |709    |null    |BDA SAN LUIS       |PR   |3700      |
|4  |76166  |UNIQUE  |CINGULAR WIRELESS  |TX   |84000     |
|5  |76177  |STANDARD|null               |TX   |null      |
+---+-------+--------+-------------------+-----+----------+



## Remove rows that have a column with value NULL

In [7]:
print('Remove rows that have a column with value NULL')
df.na.drop().show(truncate=False)
df.na.drop('any').show(truncate=False)

Remove rows that have a column with value NULL
+---+-------+------+-----------------+-----+----------+
|id |zipcode|type  |city             |state|population|
+---+-------+------+-----------------+-----+----------+
|4  |76166  |UNIQUE|CINGULAR WIRELESS|TX   |84000     |
+---+-------+------+-----------------+-----+----------+

+---+-------+------+-----------------+-----+----------+
|id |zipcode|type  |city             |state|population|
+---+-------+------+-----------------+-----+----------+
|4  |76166  |UNIQUE|CINGULAR WIRELESS|TX   |84000     |
+---+-------+------+-----------------+-----+----------+



## Remove rows that have NULL value in all columns

In [4]:
print('Remove rows that have NULL value in all columns')
df.na.drop('all').show(truncate=False)

Remove rows that have NULL value in all columns
+---+-------+--------+-------------------+-----+----------+
|id |zipcode|type    |city               |state|population|
+---+-------+--------+-------------------+-----+----------+
|1  |704    |STANDARD|null               |PR   |30100     |
|2  |704    |null    |PASEO COSTA DEL SUR|PR   |null      |
|3  |709    |null    |BDA SAN LUIS       |PR   |3700      |
|4  |76166  |UNIQUE  |CINGULAR WIRELESS  |TX   |84000     |
|5  |76177  |STANDARD|null               |TX   |null      |
+---+-------+--------+-------------------+-----+----------+



## Remove Rows with NULL Value of Selected Columns

In [5]:
print('Remove Rows with NULL Value of Selected Columns')
df.na.drop(subset=['population', 'type']).show(truncate=False)

Remove Rows with NULL Value of Selected Columns
+---+-------+--------+-----------------+-----+----------+
|id |zipcode|type    |city             |state|population|
+---+-------+--------+-----------------+-----+----------+
|1  |704    |STANDARD|null             |PR   |30100     |
|4  |76166  |UNIQUE  |CINGULAR WIRELESS|TX   |84000     |
+---+-------+--------+-----------------+-----+----------+



## Remove Rows with NULL Values with dropna
`drop(columns:Seq[String]) or drop(columns:Array[String])`

In [8]:
print('Remove Rows with NULL Values with dropna')
df.dropna().show(truncate=False)

Remove Rows with NULL Values with dropna
+---+-------+------+-----------------+-----+----------+
|id |zipcode|type  |city             |state|population|
+---+-------+------+-----------------+-----+----------+
|4  |76166  |UNIQUE|CINGULAR WIRELESS|TX   |84000     |
+---+-------+------+-----------------+-----+----------+

