###Spark Context
SparkContext is the core entry point to Spark.

It:
- Connects your application to the Spark cluster
- Talks to the Cluster Manager (Standalone / YARN / Kubernetes)
- Manages executors, memory, and task scheduling
- Is required to create RDDs

Key points:
- Low-level API (older style)
- Works mainly with RDDs
- You can have only ONE SparkContext per JVM

In [0]:
%python
from pyspark import SparkContext

sc = SparkContext(appName="MyApp")

rdd = sc.parallelize([1, 2, 3, 4])
print(rdd.collect())


###Spark Session
SparkSession is a unified, higher-level entry point introduced in Spark 2.0.

It wraps and manages:
- SparkContext
- SQLContext
- HiveContext
So you don’t need to create them separately.

What it provides:
- DataFrame API
- SQL queries
- Table access
- Streaming
- ML integration

In [0]:
%python
from pyspark.sql import SparkSession
spark = SparkSession.builder \
    .appName("MyApp") \
    .getOrCreate()
df = spark.createDataFrame([(1, "A"), (2, "B")], ["id", "value"])
df.show()


####Spark Session in detail

spark = SparkSession.builder \
    .appName("MyApp") \
    .master("yarn") \
    .config("spark.executor.memory", "4g") \
    .config("spark.executor.cores", "2") \
    .enableHiveSupport() \
    .getOrCreate()



#####Builder
builder is a configuration object (a factory) used to configure SparkSession before it is created.

It says: “I’m defining how my Spark application should be set up.”

Internally
- It collects all configs (app name, master, memory, extensions, etc.)
- Does not start Spark yet
- Just prepares settings

#####Application Name
Sets the Spark application name.
.appName("MyApp")

Why this matters
Appears in:
- Spark UI
- Databricks Jobs UI
- Cluster event logs

Helps identify which job/app is running

Important notes
- Does NOT affect performance
- Purely for identification & monitoring

In Databricks, this name may be overridden by:
- Notebook name
- Job name

#####getorcreate
This is the MOST important part

What it does:

IF SparkSession already exists:
    return the existing one
ELSE:
    create a new SparkSession

Why this is critical
- Spark allows only ONE SparkContext per JVM
- Creating multiple SparkContexts will crash the app
- getOrCreate() prevents this problem safely

**How get or create works internally:**

Step 1: Check existing session:
Is there already an active SparkSession?

**Step 2: Create SparkContext (if needed):**

If no SparkContext exists:
- Connects to cluster manager
- Starts driver
- Requests executors

**Step 3: Create SparkSession:**

Wraps the SparkContext
Initializes:
- SQL engine (Catalyst)
- Optimizer
- Catalog
- Session configs:Sets up memory & cores