* ### String Manipulation Functions
  * Case Conversion - `lower`,  `upper`
  * Getting Length -  `length`
  * Extracting substrings - `substring`, `split`
  * Trimming - `trim`, `ltrim`, `rtrim`
  * Padding - `lpad`, `rpad`
  * Concatenating string - `concat`, `concat_ws`
* ### Date Manipulation Functions
  * Getting current date and time - `current_date`, `current_timestamp`
  * Date Arithmetic - `date_add`, `date_sub`, `datediff`, `months_between`, `add_months`, `next_day`
  * Beginning and Ending Date or Time - `last_day`, `trunc`, `date_trunc`
  * Formatting Date - `date_format`
  * Extracting Information - `dayofyear`, `dayofmonth`, `dayofweek`, `year`, `month`
* ### Aggregate Functions
  * `count`, `countDistinct`
  * `sum`, `avg`
  * `min`, `max`
* ### Other Functions
  * `CASE` and `WHEN`
  * `CAST` for type casting
  * Functions to manage special types such as `ARRAY`, `MAP`, `STRUCT` type columns
  * Many others

In [0]:
# DATE FUNCTIONS

# to_date(), to_timestamp() - converts strings to dates/timestamps
# Very convienient. We can provide strings in almost any format.
df.select(to_date(lit('20250418'), 'yyyyMMdd').alias('to_date')).show()
df.select(to_timestamp(lit('20250418: 1940'), 'yyyyMMdd: HHmm').alias('to_timestamp')).show()


# date_format()
# Convienient way to change format of dates
datetimesDF \
    .withColumn("date_ym", date_format("date", "yyyy_MM")) \
    .withColumn("time_ym", date_format("time", "yyyyMM")) \
    .withColumn("date_dt", date_format("date", "yyyyMMdd::HHmmss")) \
    .withColumn("date_ts", date_format("time", "yyyyMMdd HH\mm-ss")) \
    .show(truncate=False)

In [0]:
# FILLING NULL VALUES

employeesDF.fillna(0.0).show()  # All floats
employeesDF.na.fill('Empty').show()  # All strings
employeesDF.na.fill('Empty', 'last_name').na.fill(0.0, 'salary').show()  # Specific strings and floats

In [0]:
# CASE/WHEN

employeesDF. \
    withColumn(
        'bonus', 
        expr("""
            CASE WHEN bonus IS NULL OR bonus = '' THEN 0
            ELSE bonus
            END
            """)
    ). \
    show()

employeesDF. \
    withColumn(
        'bonus',
        when((col('bonus').isNull()) | (col('bonus') == lit('')), 0).otherwise(col('bonus'))
    ). \
    show()

In [0]:
# DROPPING COLUMNS

orders.drop("order_status")

cols_to_drop = [col('order_id'), col('order_date')]
orders.drop(*cols_to_drop)


# DROPPING ROWS

orders.distinct()
orders.dropDuplicates(['order_date', 'order_customer_id'])

orders.na.drop(how='any', thresh=3, subset=['order_date', 'order_customer_id'])