<div>
<table>
<tr>
<td><img src="https://spark.apache.org/docs/latest/api/python/_static/spark-logo-reverse.png" width="200"></td>
<td><img src="https://delta.io/static/delta-lake-logo-a1c0d80d23c17de5f5d7224cb40f15dc.svg" width="200"></td>
</tr>
</table>
</div>

# Pyspark with Delta Tables

- Setup Pyspark
- Initialize SparkSession
- Create Spark DataFrame
- Export to CSV
- Export to Delta Table

#### Setup Pyspark

``` 
pip3 install pyspark

pip3 install delta-spark

pyspark --packages io.delta:delta-core_2.11:0.4.0
```

In [None]:
!pip3 install pyspark

!pip3 install delta-spark

!pyspark --packages io.delta:delta-core_2.11:0.4.0

#### Initialize SparkSession

Import Required Packages:
- pyspark
- delta

```
builder = (
    pyspark.sql.SparkSession.builder.appName("PysparkDelta")
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
    .config(
        "spark.sql.catalog.spark_catalog",
        "org.apache.spark.sql.delta.catalog.DeltaCatalog",
    )
)
```

In [11]:
import pyspark
from delta import *

builder = (
    pyspark.sql.SparkSession.builder.appName("PysparkDelta")
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
    .config(
        "spark.sql.catalog.spark_catalog",
        "org.apache.spark.sql.delta.catalog.DeltaCatalog",
    )
)

spark = configure_spark_with_delta_pip(builder).getOrCreate()



#### Create Spark Dataframe

In [12]:
data = [("James","Educative","Engg","USA"),
    ("Michael","Google",None,"Asia"),
    ("Robert",None,"Marketing","Russia"),
    ("Maria","Netflix","Finance","Ukraine"),
    (None, None, None, None)
  ]

columns = ["empname","company","department","country"]
df = spark.createDataFrame(data = data, schema = columns)



#### Check Schema Definition of Dataframes

In [13]:
df.printSchema()

root
 |-- empname: string (nullable = true)
 |-- company: string (nullable = true)
 |-- department: string (nullable = true)
 |-- country: string (nullable = true)



#### Print Content of the Dataframes

In [14]:
df.show()

+-------+---------+----------+-------+
|empname|  company|department|country|
+-------+---------+----------+-------+
|  James|Educative|      Engg|    USA|
|Michael|   Google|      null|   Asia|
| Robert|     null| Marketing| Russia|
|  Maria|  Netflix|   Finance|Ukraine|
|   null|     null|      null|   null|
+-------+---------+----------+-------+



### Export as CSV

In [15]:
csv_file_path = "temp/data.csv"
df.write.option("header", True).option("delimiter",",").csv(csv_file_path)

### Export as Delta Table

In [16]:
df.write.format("delta").save("temp/tmp/students_delta")