# Quickstart: Spark Connect

Spark Connect consists of server and client; this notebook demonstrates a simple step-by-step example of how it works on the server and on the client for new comers to Spark Connect.

## Launch Spark Connect server

Launching the server by `start-connect-server.sh` script. Proper Spark version should be specified for package name (e.g. `org.apache.spark:spark-connect_2.12:3.4.0`).

In [1]:
!./sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:$SPARK_VERSION

## Connect to Spark Connect server

Now the Spark Connect server is running, we can connect to the server. Spark Connect server can be accessed via the client, remote `SparkSession` configured for the server. Before creating the remote `SparkSession`, make sure to stop the existing regular `SparkSession` as below because the regular `SparkSession` and remote `SparkSession` cannot coexist.

In [2]:
from pyspark.sql import SparkSession

SparkSession.builder.master("local[*]").getOrCreate().stop()

The command we used to run the server runs the Spark Connect server at `localhost:15002`. Therefore, we can create a remote `SparkSession` with the following command.

In [3]:
spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()

## Create DataFrame

Once the remote `SparkSession` is created successfully, it can be used the same as a regular `SparkSession`. Therefore, creating `DataFrame` can be done with the following command.

In [4]:
from datetime import datetime, date
from pyspark.sql import Row

df = spark.createDataFrame([
    Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)),
    Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0)),
    Row(a=4, b=5., c='string3', d=date(2000, 3, 1), e=datetime(2000, 1, 3, 12, 0))
])
df.show()

+---+---+-------+----------+-------------------+
|  a|  b|      c|         d|                  e|
+---+---+-------+----------+-------------------+
|  1|2.0|string1|2000-01-01|2000-01-01 12:00:00|
|  2|3.0|string2|2000-02-01|2000-01-02 12:00:00|
|  4|5.0|string3|2000-03-01|2000-01-03 12:00:00|
+---+---+-------+----------+-------------------+



See 'Live Notebook: DataFrame' at [the quickstart page](https://spark.apache.org/docs/latest/api/python/getting_started/index.html) for more detail usage of [PySpark DataFrame](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.html?highlight=dataframe#pyspark.sql.DataFrame).