founder: Dirk Derichsweiler, Contributors: Vincent Charbonnier and Isabelle Steinhauser, September 2023
# Spark
This chapter will delve into the process of creating a Spark using HPE Ezmeral Unified Analytics Software Platform.

Ezmeral Unified Analytics offers Spark Interactive Sessions as well as Spark Applications. In this chapter we will walk through the deployment of a Spark Appplication. Therefore let's head to the Spark Application part of EZUA: ![image.png](attachment:7ad97dee-7bb6-4297-8d2d-ee48941eb7c0.png)

Click on Create Application in the upper right in order to create a new one. Give it a name you prefer.

![image.png](attachment:226e5c34-ef5e-46c2-a0c3-40c4123ea1cd.png)

Adapt the following example according to your environment (edit __url__, __user__, __password__ as well the database names in the __query__) and save it in EZUA as .py file.
```python
from pyspark.sql import SparkSession

if __name__ == "__main__":
    print("pyspark session started..")
    spark = SparkSession.builder \
        .appName("demo") \
        .getOrCreate()

    sc = spark.sparkContext

    df = spark.read \
      .format("jdbc") \
      .option("driver", "com.facebook.presto.jdbc.PrestoDriver") \
      .option("url", "jdbc:presto://ezpresto.ezua-cb.ezmeral.demo.local:443") \
      .option("user", "demo-user") \
      .option("password", "Hpepoc@123") \
      .option("SSL", "true") \
      .option("IgnoreSSLChecks", "true") \
      .option("query", "SELECT * FROM czech_mysql_store1.discover.czech UNION ALL SELECT * FROM german_mysql_store1.discover.germany UNION ALL ( SELECT PRODUCTID , PRODUCT , TYPE , UNITPRICE , UNIT , QTY , TOTALSALES , CURRENCY , STORE , (CASE WHEN (country = 'Swiss') THEN 'Switzerland' ELSE country END) COUNTRY , YEAR FROM swiss_mariadb_store1.discover.swiss )") \
      .load() 
    df.show()
 ```

Choose __Python__ as Type. You can choose between Shared Directory, User Directory and S3 as Source. With Browse you can browse in these folders. Select according to where you saved your sparkjob file.

![image.png](attachment:f70d9ddf-5700-43b9-b56a-b14218976fe8.png)

We don't have any dependencies so you can leave that page empty and continue with the next one. For the driver Configuration 1 Core and 1M as ressources should be sufficient.  ![image.png](attachment:8970ec6a-3a62-4094-8449-a95cb5e7c146.png)
<div class="alert alert-block alert-success">
    <b>Note:</b>  It's important to write M after the amount of memory you request.
    </div>

For the Executor same amount of Cores and Memory should be sufficient, as __Number of Executors__ we enter __2__. ![image.png](attachment:b4b16d8b-8b20-45e4-a0f3-66e95b416d7a.png)

If you toggle Schedule to Run you can enter Frequency Intervals as seen below. ![image.png](attachment:b8ab5147-3ed8-4d3f-8719-a2a79190e88a.png)

The job is started once after deployment which is enough for our demo purposes, therefore we leave it __Off__!![image.png](attachment:beba4d6b-4178-468f-b37c-0ce2a3b0aab2.png)

On the last page of the configuration you see an overview of the configuration we just entered and it's possible to make adjustments.![image.png](attachment:25bd88e2-f05e-44a6-a023-8aa7a08cbaec.png)

Click on Create Spark Application. This takes a moment. If you want to follow the process you can go to View Logs. ![image.png](attachment:112cbb74-3c75-4de3-a45f-66a19ae49145.png)

# Video

In [3]:
%%HTML
<video width="1024" height="768" controls>
  <source src="../videos/6-fast.mp4" type="video/mp4">
</video>