In [None]:
### 1. **Creating a DataFrame**
To create a DataFrame in Spark, you typically use the `SparkSession` object. Below is an example using a list of tuples and a schema.

**Q: How do you create a DataFrame from a list of tuples in Spark?**

```python
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Initialize SparkSession
spark = SparkSession.builder.appName("DataFrameExample").getOrCreate()

# Sample data
data = [("Alice", 34), ("Bob", 45), ("Cathy", 29)]

# Define schema
schema = StructType([
    StructField("Name", StringType(), True),
    StructField("Age", IntegerType(), True)
])

# Create DataFrame
df = spark.createDataFrame(data, schema)
```

### 2. **Filtering Data**

**Q: How do you filter rows in a DataFrame where the age is greater than 30?**

```python
# Filter rows where Age > 30
filtered_df = df.filter(df.Age > 30)
```

### 3. **Selecting Specific Columns**

**Q: How do you select specific columns, such as "Name" only, from a DataFrame?**

```python
# Select the 'Name' column
name_df = df.select("Name")
```

### 4. **Adding a New Column**

**Q: How do you add a new column to a DataFrame that calculates the age in months?**

```python
from pyspark.sql.functions import col

# Add a new column 'AgeInMonths'
df_with_new_column = df.withColumn("AgeInMonths", col("Age") * 12)
```

### 5. **Renaming a Column**

**Q: How do you rename the column "Age" to "Years"?**

```python
# Rename the 'Age' column to 'Years'
df_renamed = df.withColumnRenamed("Age", "Years")
```

### 6. **Grouping and Aggregation**

**Q: How do you group by a column and perform an aggregation, such as counting the number of occurrences for each "Age"?**

```python
# Group by 'Age' and count the occurrences
grouped_df = df.groupBy("Age").count()
```

### 7. **Joining DataFrames**

**Q: How do you perform an inner join on two DataFrames based on a common column, such as "Name"?**

```python
# Sample data for another DataFrame
data2 = [("Alice", "F"), ("Bob", "M"), ("David", "M")]

# Create another DataFrame
df2 = spark.createDataFrame(data2, ["Name", "Gender"])

# Perform an inner join on 'Name' column
joined_df = df.join(df2, on="Name", how="inner")
```

### 8. **Sorting Data**

**Q: How do you sort a DataFrame by the "Age" column in descending order?**

```python
# Sort by 'Age' in descending order
sorted_df = df.orderBy(df.Age.desc())
```

### 9. **Dropping Duplicates**

**Q: How do you remove duplicate rows based on a specific column, such as "Name"?**

```python
# Drop duplicates based on 'Name'
distinct_df = df.dropDuplicates(["Name"])
```

### 10. **Handling Missing Data**

**Q: How do you fill missing values in a DataFrame's column with a default value, like filling missing "Age" values with 0?**

```python
# Fill missing values in 'Age' column with 0
filled_df = df.fillna({"Age": 0})
```

### 11. **Union of DataFrames**

**Q: How do you perform a union of two DataFrames with the same schema?**

```python
# Union two DataFrames
union_df = df.union(df2)
```

### 12. **Collecting Data**

**Q: How do you collect the DataFrame's content as a list of rows?**

```python
# Collect DataFrame content
collected_data = df.collect()
```

### 13. **Writing Data to a File**

**Q: How do you write a DataFrame to a CSV file?**

```python
# Write DataFrame to a CSV file
df.write.csv("output.csv")
```

### 14. **Reading Data from a File**

**Q: How do you read a CSV file into a DataFrame?**

```python
# Read a CSV file into a DataFrame
df_from_csv = spark.read.csv("input.csv", header=True, inferSchema=True)
```

These examples cover a wide range of operations you can perform with Spark DataFrames. Adjust and use these operations based on your specific data processing needs.