### PySpark Actions on DataFrames
In PySpark, actions on a DataFrame are operations that trigger execution of the underlying lazy transformations and return a result to the driver or write to storage.

| Action                  | Description                                                                      | Returns to Driver?  |
| ----------------------- | -------------------------------------------------------------------------------- | ------------------- |
| `show()`                | Displays top rows in a tabular format                                            | ✅ (for display)     |
| `collect()`             | Returns **all rows** as a list to the driver (use cautiously on large datasets!) | ✅                   |
| `take(n)`               | Returns the first `n` rows as a list                                             | ✅                   |
| `head(n)` / `first()`   | Alias for `take(n)` / `take(1)[0]`                                               | ✅                   |
| `count()`               | Returns the number of rows in the DataFrame                                      | ✅                   |
| `describe()`            | Returns statistics summary for numeric columns (mean, stddev, min, max, etc.)    | ✅ (as DataFrame)    |
| `show(n)`               | Displays first `n` rows                                                          | ✅                   |
| `toPandas()`            | Converts entire DataFrame to a Pandas DataFrame on the driver                    | ✅ (use cautiously!) |
| `write()`               | Saves the DataFrame to file systems (CSV, Parquet, etc.)                         | ❌                   |
| `foreach()`             | Executes a function on each row (like a for-loop) – side effects only            | ❌                   |
| `foreachPartition()`    | Similar to `foreach()` but operates per partition – better for efficiency        | ❌                   |
| `cache()` / `persist()` | Not actions themselves, but force materialization when followed by an action     | ❌ (until used)      |



In [0]:
# Add rows
row1 = ("Ram", 30, "Senior Engineer", "India", 100000)
row2 = ("Krishna", 25, "Junior Engineer", "India", 50000)
row3 = ("Sree", 35, "Senior Engineer", "India", 40000)  
data = [row1, row2, row3]

# Define Colums
colums = ["name", "age", "role", "location", "salary"]
df = spark.createDataFrame(data, colums) # Create data 

# Add new row
new_row = [("Siva", 28, "Senior Engineer", "India", 60000)]
new_row_df = spark.createDataFrame(new_row, colums)
df = df.union(new_row_df)

display(df)

In [0]:
# Put location to Hyderabad for Sree and rest to Bangalore
from pyspark.sql.functions import when

df = df.withColumn(
    "location", lit("India")
)

display(df)

In [0]:
df.display()

In [0]:
df.printSchema()

In [0]:
df.select("age", "role").display()


In [0]:
df.select(when(df["age"]>25, df["name"])).display()

In [0]:
df.select(when(df["age"] > 25, df["name"])).na.drop().display()

In [0]:
df.orderBy(df["salary"].desc()).display()