---

# 1Ô∏è‚É£1Ô∏è‚É£ SparkContext (Very Important)

## What is SparkContext?

SparkContext is the entry point to Spark functionality.

It represents:

- Connection to cluster
- Configuration of application
- Resource coordination

In older versions:

```python
from pyspark import SparkContext

sc = SparkContext(appName="MyApp")
```

In modern Spark:

SparkSession internally creates SparkContext.

```python
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("MyApp") \
    .getOrCreate()

sc = spark.sparkContext
```

---

## Responsibilities of SparkContext

- Connects to cluster manager
- Requests executors
- Creates RDDs
- Tracks application metadata
- Distributes tasks
- Manages broadcast variables
- Manages accumulators

---

## SparkContext Architecture View

```
Application Code
       ‚Üì
SparkSession
       ‚Üì
SparkContext
       ‚Üì
Cluster Manager
       ‚Üì
Executors
```

---

## Important SparkContext Concepts

### 1Ô∏è‚É£ Broadcast Variables

Used to send large read-only data to executors efficiently.

```python
broadcast_var = sc.broadcast(large_lookup_dict)
```

---

### 2Ô∏è‚É£ Accumulators

Used for counters across executors.

```python
counter = sc.accumulator(0)
```

---

### 3Ô∏è‚É£ Only One SparkContext Per JVM

You cannot create multiple SparkContexts in the same application.

---

# üöÄ SparkSession vs SparkContext ‚Äî Detailed Interview Guide

---

# 1Ô∏è‚É£ Quick Summary

| Feature | SparkContext | SparkSession |
|----------|--------------|--------------|
| Introduced In | Spark 1.x | Spark 2.x |
| Purpose | Core Spark connection to cluster | Unified entry point to Spark |
| API Type | RDD-based | DataFrame / SQL / Streaming |
| Needed Today? | Yes (internally) | Yes (primary interface) |
| Replaces | ‚Äî | SQLContext, HiveContext, SparkContext (partial) |

---

# 2Ô∏è‚É£ What is SparkContext?

SparkContext is the **original entry point** to Spark (before Spark 2.0).

It represents:

- Connection to cluster
- Resource coordination
- RDD creation
- Task scheduling

---

## üîπ SparkContext Responsibilities

- Connects to Cluster Manager
- Requests executors
- Creates RDDs
- Distributes tasks
- Manages broadcast variables
- Manages accumulators

---

## üîπ Example (Old Style)

```python
from pyspark import SparkContext

sc = SparkContext(appName="MyApp")

rdd = sc.textFile("data.txt")
rdd.count()
```

---

## üîπ Important Facts

- Only **one SparkContext per JVM**
- If SparkContext stops ‚Üí Application ends
- Core object behind everything

---

# 3Ô∏è‚É£ What is SparkSession?

SparkSession was introduced in Spark 2.0.

It is a **unified entry point** for:

- Spark SQL
- DataFrame API
- Structured Streaming
- Hive support

It internally contains:

- SparkContext
- SQLContext
- HiveContext

---

## üîπ Example (Modern Way)

```python
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("MyApp") \
    .getOrCreate()

df = spark.read.csv("data.csv")
df.show()
```

---

# 4Ô∏è‚É£ Relationship Between SparkSession and SparkContext

Very important for interviews üëá

```
SparkSession
     |
     ‚îî‚îÄ‚îÄ SparkContext
```

SparkSession internally creates SparkContext.

You can access it like this:

```python
sc = spark.sparkContext
```

So:

> SparkSession is a wrapper around SparkContext.

---

# 5Ô∏è‚É£ Internal Architecture View

```
Application Code
       ‚Üì
SparkSession
       ‚Üì
SparkContext
       ‚Üì
Cluster Manager
       ‚Üì
Executors
```

---

# 6Ô∏è‚É£ Why SparkSession Was Introduced?

Before Spark 2.0, we had:

- SparkContext
- SQLContext
- HiveContext

Too many contexts.

SparkSession unified everything into one object.

So instead of:

```python
sc = SparkContext()
sqlContext = SQLContext(sc)
```

Now we just use:

```python
spark = SparkSession.builder.getOrCreate()
```

---

# 7Ô∏è‚É£ When Do You Use SparkContext Directly?

Rare cases:

- RDD-based operations
- Broadcast variables
- Accumulators
- Low-level distributed logic

Example:

```python
broadcast_var = spark.sparkContext.broadcast([1,2,3])
```

---

# 8Ô∏è‚É£ Interview-Ready Explanation

If interviewer asks:

### ‚ùì What is difference between SparkSession and SparkContext?

Answer:

> SparkContext is the core connection to the cluster and is used mainly for RDD operations. SparkSession is the unified entry point introduced in Spark 2.0 that wraps SparkContext and provides APIs for DataFrame, SQL, and Streaming.

---

# 9Ô∏è‚É£ Practical Rule

In modern Spark:

‚úÖ Always create SparkSession  
‚ùå Do not manually create SparkContext  

SparkSession will handle it internally.

---

# üîü Common Interview Trap

Question:

Can we create multiple SparkContexts?

Answer:

‚ùå No. Only one SparkContext per JVM.

But:

You can create multiple SparkSessions using the same SparkContext.

---

# üéØ Final Comparison

| Aspect | SparkContext | SparkSession |
|--------|--------------|--------------|
| Level | Low-level | High-level |
| API | RDD | DataFrame/SQL |
| Introduced | Spark 1.x | Spark 2.x |
| Used Today | Internally | Primary interface |

---

# üöÄ One-Line Memory Trick

SparkContext = Engine  
SparkSession = Dashboard + Engine

---

# üß† Advanced Follow-Up (If Asked)

Interviewer may ask:

- What happens if SparkContext crashes?
- Can SparkSession exist without SparkContext?
- How does SparkSession manage Hive?
- What is getOrCreate() doing internally?

We can cover these next if you want üî•


In [0]:
sc

In [0]:
print(spark)

In [0]:
# Create Spark Session
# from pyspark.sql import SparkSession# spark = SparkSession.builder.appName("Spark DataFrames").getOrCreate()

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Spark Fundamentals").getOrCreate()

In [0]:
sparak

In [0]:
spark