### Task 1: Handling Schema Mismatches using Spark
**Description**: Use Apache Spark to address schema mismatches by transforming data to match
the expected schema.

**Steps**:
1. Create Spark session
2. Load dataframe
3. Define the expected schema
4. Handle schema mismatches
5. Show corrected data

In [None]:
# Write your code from here

In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Step 1: Create Spark session
spark = SparkSession.builder.appName("SchemaMismatchHandling").getOrCreate()

# Step 2: Load dataframe (example data)
data = [("Alice", 25), ("Bob", None), ("Charlie", 35)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)

# Step 3: Define the expected schema
expected_schema = StructType([
    StructField("Name", StringType(), True),
    StructField("Age", IntegerType(), True)
])

# Step 4: Handle schema mismatches
df_corrected = spark.createDataFrame(df.rdd, schema=expected_schema)

# Step 5: Show corrected data
df_corrected.show()

ModuleNotFoundError: No module named 'pyspark'

### Task 2: Detect and Correct Incomplete Data in ETL
**Description**: Use Python and Pandas to detect incomplete data in an ETL process and fill
missing values with estimates.

**Steps**:
1. Detect incomplete data
2. Fill missing values
3. Report changes

In [None]:
# Write your code from here

In [2]:
import pandas as pd

# Example data for ETL process
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, None, 35], 'Gender': ['F', 'M', None]}
df_etl = pd.DataFrame(data)

# Step 1: Detect incomplete data
incomplete_data = df_etl[df_etl.isnull().any(axis=1)]
print("Incomplete Data:")
print(incomplete_data)

# Step 2: Fill missing values
df_filled = df_etl.fillna({'Age': df_etl['Age'].mean(), 'Gender': 'Unknown'})

# Step 3: Report changes
print("\nData after filling missing values:")
print(df_filled)

Incomplete Data:
      Name   Age Gender
1      Bob   NaN      M
2  Charlie  35.0   None

Data after filling missing values:
      Name   Age   Gender
0    Alice  25.0        F
1      Bob  30.0        M
2  Charlie  35.0  Unknown
