In [0]:
#https://sparkbyexamples.com/pyspark/pyspark-window-functions/

row_number() => Returns a sequential number starting from 1 within a window partition.  
rank() => Returns the rank of rows within a window partition, with gaps.  
percent_rank() => Returns the percentile rank of rows within a window partition.  
dense_rank() => Returns the rank of rows within a window partition without any gaps (Rank() returns rank with gaps).  
ntile(n) => Returns the ntile ID in a window partition.  
cume_dist() => Returns the cumulative distribution of values within a window partition.  
lag(e, offset) => Retrieves the value of a column from a preceding row within the same window.  
lag(col, offset) => Same as above but explicitly takes the column name.  
lag(col, offset, default) => Same as above but uses default value if preceding row doesn't exist.  
lead(col, offset) => Retrieves the value of a column from a succeeding row within the same window.  
lead(col, offset, default) => Same as above but uses default value if succeeding row doesn't exist.  

In [0]:
# https://sparkbyexamples.com/pyspark/pyspark-sql-functions/#partition

row_number() => Returns a sequential number starting from 1 within a window partition.
rank() => Returns the rank of rows within a window partition, with gaps.
percent_rank() => Returns the percentile rank of rows within a window partition.
dense_rank() => Returns the rank of rows within a window partition without gaps. rank() returns with gaps.
ntile(n) => Returns the ntile id in a window partition.
cume_dist() => Returns the cumulative distribution of values within a window partition.
lag(column, offset, default) => Retrieves the value from a preceding row in the same window.
lead(column, offset, default) => Retrieves the value from a succeeding row in the same window.

---- Aggregations ----
count(col) => Counts the number of non-null rows.
countDistinct(col) => Counts distinct non-null values.
sum(col) => Returns the sum of values.
avg(col) => Returns the average of values.
min(col) => Returns the minimum value.
max(col) => Returns the maximum value.

---- String Functions ----
concat(col1, col2, ...) => Concatenates multiple columns into one.
concat_ws(sep, col1, ...) => Concatenates columns with a separator.
upper(col) => Converts string to upper case.
lower(col) => Converts string to lower case.
trim(col) => Removes both leading and trailing spaces.
ltrim(col) => Removes leading spaces.
rtrim(col) => Removes trailing spaces.
length(col) => Returns the length of a string.
substr(col, pos, len) => Returns substring from position with length.
instr(col, substring) => Returns position of substring.
regexp_replace(col, pattern, replacement) => Replaces matching substrings.
regexp_extract(col, pattern, idx) => Extracts matching substring by regex.

---- Date/Time Functions ----
current_date() => Returns the current date.
current_timestamp() => Returns the current timestamp.
date_format(date, fmt) => Formats date to given pattern.
year(date) => Returns the year.
month(date) => Returns the month.
dayofmonth(date) => Returns the day of month.
dayofweek(date) => Returns the day of week.
dayofyear(date) => Returns the day of year.
weekofyear(date) => Returns the week number of year.
hour(ts) => Returns hour from timestamp.
minute(ts) => Returns minute from timestamp.
second(ts) => Returns second from timestamp.
add_months(date, n) => Adds n months to date.
date_add(date, days) => Adds days to date.
date_sub(date, days) => Subtracts days from date.
last_day(date) => Returns last day of month for date.
next_day(date, dayOfWeek) => Returns next date matching given day of week.
unix_timestamp(col) => Converts time to UNIX timestamp.
from_unixtime(ts) => Converts UNIX timestamp to string date.

---- Array Functions ----
array(col1, col2, ...) => Creates an array column from multiple columns.
size(col) => Returns the length of an array or map.
explode(col) => Creates a new row for each element in array.
split(col, regex) => Splits string into array by regex.
array_contains(col, value) => Checks if array contains a value.

---- Map & Struct Functions ----
create_map(key, value, ...) => Creates a map from key/value pairs.
get_json_object(col, path) => Extracts JSON value by JSON path.
from_json(col, schema) => Parses JSON string to struct/array.
to_json(col) => Converts struct/array/map to JSON string.

---- Null Handling ----
isnull(col) => Checks if column is NULL.
isnan(col) => Checks if column is NaN.
na.fill(value, subset) => Fills nulls in DataFrame.
na.drop(subset) => Drops rows with nulls in subset columns.

---- Math Functions ----
pow(col, n) => Raises value to the power of n.
sqrt(col) => Returns square root.
log(col) => Returns natural log (base e).
log10(col) => Returns log base 10.
exp(col) => Returns e raised to value.

---- Conditional Functions ----
when(condition, value) => Returns value when condition is true.
otherwise(value) => Specifies value when when() condition is false.
greatest(col1, col2, ...) => Returns largest value among columns.
least(col1, col2, ...) => Returns smallest value among columns.

---- Window Aggregates ----
collect_list(col) => Returns list of values within group/window.
collect_set(col) => Returns unique set of values within group/window.
first(col, ignorenulls) => Returns first value in group/window.
last(col, ignorenulls) => Returns last value in group/window.