# Quickstart: Spark Connect

This is a short introduction and quickstart for Spark Connect.

Spark Connect consists of two parts: server and client.

This notebook demonstrates a simple step-by-step example of how it works on the sever and on the client for those new to Spark Connect.

## Create Remote Session (Server)

First of all, run the PySpark applications on the sever side.

PySpark application start with initializing `SparkSession` which is the entry point of PySpark.

**Note** that the [SparkSession.Builder.remote](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.builder.remote.html#pyspark.sql.SparkSession.builder.remote) must be specified if you want to make the session to be remote `SparkSession`, which is used for Spark Conenct.

In case of running it in PySpark shell via `pyspark` executable with `--remote` option, the shell automatically creates the remote SparkSession in the variable spark for users.

In [1]:
from pyspark.sql import SparkSession

spark = SparkSession.builder.remote("local[*]").getOrCreate()

## Create DataFrame (Server)

Let's create a PySpark DataFrame by using remote session.

In [2]:
from datetime import datetime, date
from pyspark.sql import Row

df = spark.createDataFrame([
    Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)),
    Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0)),
    Row(a=4, b=5., c='string3', d=date(2000, 3, 1), e=datetime(2000, 1, 3, 12, 0))
])
df.show()

+---+---+-------+----------+-------------------+
|  a|  b|      c|         d|                  e|
+---+---+-------+----------+-------------------+
|  1|2.0|string1|2000-01-01|2000-01-01 12:00:00|
|  2|3.0|string2|2000-02-01|2000-01-02 12:00:00|
|  4|5.0|string3|2000-03-01|2000-01-03 12:00:00|
+---+---+-------+----------+-------------------+



## Create a table from Spark DataFrame (Server)

After creating a DataFrame, then let's create a table 'spark_connect_test_table' from the DataFrame.

In [3]:
df.write.saveAsTable("spark_connect_test_table")

## Create Remote Session (Client)

Now, we're ready to connect from client to server. **Note** that the Spark application on the server must be running.

As we did in the server, the client must also create a remote session to communicate with the server as below.

In [4]:
from pyspark.sql import SparkSession

spark = SparkSession.builder.remote("local[*]").getOrCreate()

## Create DataFrame by using table created from server (Client)

Now, communication between client and server is possible through the created remote session.

Let's create a DataFrame on the client side by using a remote session to load a table created on the server.

In [5]:
df = spark.read.table("spark_connect_test_table")
df.show()

+---+---+-------+----------+-------------------+
|  a|  b|      c|         d|                  e|
+---+---+-------+----------+-------------------+
|  1|2.0|string1|2000-01-01|2000-01-01 12:00:00|
|  4|5.0|string3|2000-03-01|2000-01-03 12:00:00|
|  2|3.0|string2|2000-02-01|2000-01-02 12:00:00|
+---+---+-------+----------+-------------------+



By using Python built-in functions `type`, we can verify whether the currently created DataFrame is created through a Spark Connect.

It should be `pyspark.sql.connect.dataframe.DataFrame` as below:

In [6]:
type(df)

pyspark.sql.connect.dataframe.DataFrame

**Note** that a DataFrame with Spark Connect is virtually, conceptually identical to an existing [PySpark DataFrame](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.html?highlight=dataframe#pyspark.sql.DataFrame), so most of the examples from 'Live Notebook: DataFrame' at [the quickstart page](https://spark.apache.org/docs/latest/api/python/getting_started/index.html) can be reused directly.

However, note that it does not yet support some key features such as [RDD](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.RDD.html?highlight=rdd#pyspark.RDD) and [SparkSession.conf](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.conf.html#pyspark.sql.SparkSession.conf), so you need to consider it when using Spark Connect.