**(A)**. To read from and write to Cassandra using PySpark, you need to set up the necessary configurations and dependencies. Below are the steps to do so, along with example code snippets.

### Prerequisites

1. **Cassandra Cluster**: Ensure you have a running Cassandra cluster.
2. **PySpark**: Make sure PySpark is installed.
3. **Cassandra Connector**: Use the `spark-cassandra-connector` to connect PySpark with Cassandra.

### Step-by-Step Guide

#### 1. Set up Dependencies

First, ensure you have the `spark-cassandra-connector` package. If you are using a notebook or a local PySpark setup, you can specify the package when initializing the Spark session.

```python
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("CassandraSparkIntegration") \
    .config("spark.cassandra.connection.host", "your_cassandra_host") \
    .config("spark.jars.packages", "com.datastax.spark:spark-cassandra-connector_2.12:3.5.1") \
    .getOrCreate()
```

Replace `"your_cassandra_host"` with the IP address or hostname of your Cassandra node.

#### 2. Reading from Cassandra

To read data from a Cassandra table, you need to specify the keyspace and table name.

```python
# Read data from Cassandra
df = spark.read \
    .format("org.apache.spark.sql.cassandra") \
    .options(table="your_table_name", keyspace="your_keyspace_name") \
    .load()

# Show the DataFrame
df.show()
```

Replace `"your_table_name"` and `"your_keyspace_name"` with your Cassandra table and keyspace names.

#### 3. Writing to Cassandra

To write data to a Cassandra table, you need to create a DataFrame and use the `.write` method.

```python
# Example DataFrame to write
data = [("John", "Doe", 30), ("Jane", "Smith", 25)]
columns = ["first_name", "last_name", "age"]
df_to_write = spark.createDataFrame(data, columns)

# Write DataFrame to Cassandra
df_to_write.write \
    .format("org.apache.spark.sql.cassandra") \
    .options(table="your_table_name", keyspace="your_keyspace_name") \
    .mode("append") \
    .save()
```

Again, replace `"your_table_name"` and `"your_keyspace_name"` with your Cassandra table and keyspace names.

### Complete Example

Here is a complete example that combines reading from and writing to Cassandra.

```python
from pyspark.sql import SparkSession

# Initialize Spark session
spark = SparkSession.builder \
    .appName("CassandraSparkIntegration") \
    .config("spark.cassandra.connection.host", "your_cassandra_host") \
    .config("spark.jars.packages", "com.datastax.spark:spark-cassandra-connector_2.12:3.5.1") \
    .getOrCreate()

# Read data from Cassandra
df = spark.read \
    .format("org.apache.spark.sql.cassandra") \
    .options(table="your_table_name", keyspace="your_keyspace_name") \
    .load()

# Show the DataFrame
df.show()

# Example DataFrame to write
data = [("John", "Doe", 30), ("Jane", "Smith", 25)]
columns = ["first_name", "last_name", "age"]
df_to_write = spark.createDataFrame(data, columns)

# Write DataFrame to Cassandra
df_to_write.write \
    .format("org.apache.spark.sql.cassandra") \
    .options(table="your_table_name", keyspace="your_keyspace_name") \
    .mode("append") \
    .save()

# Stop the Spark session
spark.stop()
```

### Notes

1. **Dependencies**: Make sure the `spark-cassandra-connector` is compatible with your version of Spark and Scala.
2. **Cassandra Configurations**: You can add additional configurations like authentication details if your Cassandra cluster requires it.
3. **Error Handling**: Add appropriate error handling for production code.

This should help you get started with reading from and writing to Cassandra using PySpark. 

**(B)**. To read stream data from a source and write it to Cassandra using PySpark, you need to integrate Spark Structured Streaming with the Cassandra connector. Here’s a step-by-step guide to help you achieve this.

### Prerequisites

1. **Cassandra Cluster**: Ensure you have a running Cassandra cluster.
2. **PySpark**: Make sure PySpark is installed.
3. **Cassandra Connector**: Use the `spark-cassandra-connector` to connect PySpark with Cassandra.
4. **Streaming Source**: Have a streaming source like Kafka, socket, or a file source set up.

### Step-by-Step Guide

#### 1. Set up Dependencies

Make sure you have the `spark-cassandra-connector` package. If you are using a notebook or a local PySpark setup, specify the package when initializing the Spark session.

```python
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("CassandraSparkStreamingIntegration") \
    .config("spark.cassandra.connection.host", "localhost") \
    .config("spark.jars.packages", "com.datastax.spark:spark-cassandra-connector_2.12:3.2.0") \
    .getOrCreate()
```

Replace `"localhost"` with the IP address or hostname of your Cassandra node if it's not running locally.

#### 2. Read Streaming Data

For this example, let’s assume you are reading streaming data from Kafka.

```python
# Define Kafka source
kafka_df = spark \
    .readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", "localhost:9092") \
    .option("subscribe", "your_kafka_topic") \
    .load()

# Convert the value column from Kafka (which is in binary format) to string
from pyspark.sql.functions import col
streaming_df = kafka_df.selectExpr("CAST(value AS STRING)")

# Assuming the data in value column is a JSON string, parse it into a DataFrame
from pyspark.sql.functions import from_json
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Define schema of JSON data
schema = StructType([
    StructField("first_name", StringType(), True),
    StructField("last_name", StringType(), True),
    StructField("age", IntegerType(), True)
])

parsed_df = streaming_df.withColumn("data", from_json(col("value"), schema)).select("data.*")
```

Replace `"your_kafka_topic"` with the Kafka topic you are reading from and adjust the schema according to your data.

#### 3. Write Streaming Data to Cassandra

To write the streaming data to Cassandra, use the `.writeStream` method along with the Cassandra connector.

```python
# Write stream to Cassandra
query = parsed_df.writeStream \
    .format("org.apache.spark.sql.cassandra") \
    .options(table="your_table_name", keyspace="your_keyspace_name") \
    .option("checkpointLocation", "/path/to/checkpoint/dir") \
    .outputMode("append") \
    .start()

# Await termination of the streaming query
query.awaitTermination()
```

Replace `"your_table_name"` and `"your_keyspace_name"` with your Cassandra table and keyspace names. The checkpoint location is a directory on your file system or HDFS to store the checkpoint data.

### Complete Example

Here’s a complete example combining all the steps:

```python
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, from_json
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Initialize Spark session with Cassandra connector
spark = SparkSession.builder \
    .appName("CassandraSparkStreamingIntegration") \
    .config("spark.cassandra.connection.host", "localhost") \
    .config("spark.jars.packages", "com.datastax.spark:spark-cassandra-connector_2.12:3.2.0") \
    .getOrCreate()

# Define Kafka source
kafka_df = spark \
    .readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", "localhost:9092") \
    .option("subscribe", "your_kafka_topic") \
    .load()

# Convert the value column from Kafka (which is in binary format) to string
streaming_df = kafka_df.selectExpr("CAST(value AS STRING)")

# Define schema of JSON data
schema = StructType([
    StructField("first_name", StringType(), True),
    StructField("last_name", StringType(), True),
    StructField("age", IntegerType(), True)
])

# Parse JSON data
parsed_df = streaming_df.withColumn("data", from_json(col("value"), schema)).select("data.*")

# Write stream to Cassandra
query = parsed_df.writeStream \
    .format("org.apache.spark.sql.cassandra") \
    .options(table="your_table_name", keyspace="your_keyspace_name") \
    .option("checkpointLocation", "/path/to/checkpoint/dir") \
    .outputMode("append") \
    .start()

# Await termination of the streaming query
query.awaitTermination()
```

### Notes

1. **Dependencies**: Make sure the `spark-cassandra-connector` version is compatible with your Spark and Scala versions.
2. **Checkpointing**: Always provide a checkpoint location when working with streaming data to ensure fault tolerance and recovery.
3. **Schema**: Adjust the schema to match your actual data structure.
4. **Kafka Configuration**: Replace Kafka server and topic names with your actual Kafka configuration.

This should help you set up streaming data integration between Kafka and Cassandra using PySpark. If you have further questions or need specific adjustments, feel free to ask!

**(C)**. Using `foreach()` or `foreachBatch()` in PySpark allows you to have more control over how you handle each row or each batch of data in your streaming job. These methods are particularly useful when you need to perform custom operations that aren't supported by the built-in output modes.

### Using `foreach()`

The `foreach()` method allows you to apply a function to each row of the streaming DataFrame or Dataset. This method is useful for custom row-level operations, but it can be less efficient than `foreachBatch()` because it operates on a row-by-row basis.

#### Example with `foreach()`

```python
from pyspark.sql import SparkSession

# Initialize Spark session with Cassandra connector
spark = SparkSession.builder \
    .appName("CassandraSparkStreamingIntegration") \
    .config("spark.cassandra.connection.host", "localhost") \
    .config("spark.jars.packages", "com.datastax.spark:spark-cassandra-connector_2.12:3.2.0") \
    .getOrCreate()

# Define Kafka source
kafka_df = spark \
    .readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", "localhost:9092") \
    .option("subscribe", "your_kafka_topic") \
    .load()

# Convert the value column from Kafka (which is in binary format) to string
from pyspark.sql.functions import col, from_json
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Define schema of JSON data
schema = StructType([
    StructField("first_name", StringType(), True),
    StructField("last_name", StringType(), True),
    StructField("age", IntegerType(), True)
])

# Parse JSON data
streaming_df = kafka_df.selectExpr("CAST(value AS STRING)")
parsed_df = streaming_df.withColumn("data", from_json(col("value"), schema)).select("data.*")

# Define a function to write each row to Cassandra
def write_row_to_cassandra(row):
    from cassandra.cluster import Cluster

    # Connect to Cassandra
    cluster = Cluster(['localhost'])
    session = cluster.connect('your_keyspace_name')
    
    # Prepare the CQL statement
    cql = "INSERT INTO your_table_name (first_name, last_name, age) VALUES (%s, %s, %s)"
    
    # Execute the statement with the row data
    session.execute(cql, (row.first_name, row.last_name, row.age))

# Use foreach to apply the function to each row
query = parsed_df.writeStream \
    .foreach(write_row_to_cassandra) \
    .outputMode("append") \
    .start()

query.awaitTermination()
```

Replace `"your_kafka_topic"`, `"your_keyspace_name"`, and `"your_table_name"` with your actual Kafka topic, keyspace, and table names.

### Using `foreachBatch()`

The `foreachBatch()` method allows you to apply a function to each batch of the streaming DataFrame or Dataset. This method is more efficient than `foreach()` because it processes the data in batches. It is well-suited for writing to databases like Cassandra.

#### Example with `foreachBatch()`

```python
from pyspark.sql import SparkSession

# Initialize Spark session with Cassandra connector
spark = SparkSession.builder \
    .appName("CassandraSparkStreamingIntegration") \
    .config("spark.cassandra.connection.host", "localhost") \
    .config("spark.jars.packages", "com.datastax.spark:spark-cassandra-connector_2.12:3.2.0") \
    .getOrCreate()

# Define Kafka source
kafka_df = spark \
    .readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", "localhost:9092") \
    .option("subscribe", "your_kafka_topic") \
    .load()

# Convert the value column from Kafka (which is in binary format) to string
from pyspark.sql.functions import col, from_json
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Define schema of JSON data
schema = StructType([
    StructField("first_name", StringType(), True),
    StructField("last_name", StringType(), True),
    StructField("age", IntegerType(), True)
])

# Parse JSON data
streaming_df = kafka_df.selectExpr("CAST(value AS STRING)")
parsed_df = streaming_df.withColumn("data", from_json(col("value"), schema)).select("data.*")

# Define a function to write each batch to Cassandra
def write_batch_to_cassandra(batch_df, batch_id):
    batch_df.write \
        .format("org.apache.spark.sql.cassandra") \
        .options(table="your_table_name", keyspace="your_keyspace_name") \
        .mode("append") \
        .save()

# Use foreachBatch to apply the function to each batch
query = parsed_df.writeStream \
    .foreachBatch(write_batch_to_cassandra) \
    .option("checkpointLocation", "/path/to/checkpoint/dir") \
    .outputMode("append") \
    .start()

query.awaitTermination()
```

Replace `"your_kafka_topic"`, `"your_keyspace_name"`, and `"your_table_name"` with your actual Kafka topic, keyspace, and table names. Ensure you specify a checkpoint location to maintain the streaming state.

### Summary

- **`foreach()`**: Use for custom row-level operations. Less efficient for large-scale operations because it processes data row-by-row.
- **`foreachBatch()`**: Use for custom batch-level operations. More efficient for large-scale operations because it processes data in batches.

Both methods give you the flexibility to handle streaming data in a way that suits your specific needs. If you need further assistance or have specific use cases, feel free to ask!

In Spark Structured Streaming, the `foreachBatch` method automatically passes the current batch DataFrame (`batch_df`) and the batch ID (`batch_id`) to the function you define. This is why you don't need to explicitly pass these arguments when calling the function in your streaming query. 

Here’s a step-by-step explanation of how it works:

### Explanation

1. **Function Definition**:
   - `write_batch_to_cassandra(batch_df, batch_id)`: This function is defined to take two parameters:
     - `batch_df`: The DataFrame representing the current batch of streaming data.
     - `batch_id`: A unique identifier for the batch.

2. **Using `foreachBatch`**:
   - When you use `foreachBatch`, Spark Structured Streaming will call the provided function for each batch of data. It will automatically pass the `batch_df` and `batch_id` to the function.

### Complete Example

Here is a complete example illustrating how to use `foreachBatch` to write streaming data to Cassandra:

```python
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, from_json
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Initialize Spark session with Cassandra connector
spark = SparkSession.builder \
    .appName("CassandraSparkStreamingIntegration") \
    .config("spark.cassandra.connection.host", "localhost") \
    .config("spark.jars.packages", "com.datastax.spark:spark-cassandra-connector_2.12:3.2.0") \
    .getOrCreate()

# Define Kafka source
kafka_df = spark \
    .readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", "localhost:9092") \
    .option("subscribe", "your_kafka_topic") \
    .load()

# Convert the value column from Kafka (which is in binary format) to string
# Define schema of JSON data
schema = StructType([
    StructField("first_name", StringType(), True),
    StructField("last_name", StringType(), True),
    StructField("age", IntegerType(), True)
])

# Parse JSON data
streaming_df = kafka_df.selectExpr("CAST(value AS STRING)")
parsed_df = streaming_df.withColumn("data", from_json(col("value"), schema)).select("data.*")

# Define a function to write each batch to Cassandra
def write_batch_to_cassandra(batch_df, batch_id):
    batch_df.write \
        .format("org.apache.spark.sql.cassandra") \
        .options(table="your_table_name", keyspace="your_keyspace_name") \
        .mode("append") \
        .save()

# Use foreachBatch to apply the function to each batch
query = parsed_df.writeStream \
    .foreachBatch(write_batch_to_cassandra) \
    .option("checkpointLocation", "/path/to/checkpoint/dir") \
    .outputMode("append") \
    .start()

query.awaitTermination()
```

### Key Points:

- **`foreachBatch` Method**: This method will call the `write_batch_to_cassandra` function for each batch of data received from the streaming source.
- **Automatic Argument Passing**: Spark automatically passes `batch_df` (the DataFrame for the current batch) and `batch_id` (a unique ID for the batch) to the function.
- **Checkpointing**: The checkpoint location is specified to maintain the state of the streaming query. It’s necessary for fault tolerance.

By using `foreachBatch`, you gain the flexibility to perform batch-level operations efficiently, making it ideal for writing data to external systems like Cassandra in a structured and controlled manner. If you have further questions or specific requirements, feel free to ask!