# This is a test

**This notebook tests whether PySpark and all packages required are installed properly and working.**

Please run the cell below (select and press <kbd>SHIFT+ENTER</kbd>). You should see the following output on the last line: 

```python
Congratulations, your PySpark stack is ready to go
```

## PySpark via Jupyter Notebook

In [1]:
fail = False
try:
    import sys
    version_string = sys.version
    version_parts = version_string.split(".")
    major = int(version_parts[0])
    minor = int(version_parts[1])
    if (major) >= 3 and (minor >= 6):
        print(f"""Your Python interpreter is ready. Your version: 
        {version_string}
        """)
    else:
        print(f"""Your version of Python is older than required: 
            {version_string}
        """)
        fail = True
except:
    pass

try:
    import pandas
except ImportError:
    print(f"""Importing package failed: pandas""")
    fail = True

try:
    import findspark
    findspark.init()
except ImportError:
    print(f"""Importing package failed: findspark""")
    fail = True
    
try:
    import pyspark
except ImportError():
    print(f"""Importing package failed: pyspark""")
    fail = True

if not fail:
    print("")
    print(f"""Congratulations, your PySpark stack is ready to go""")
else:
    print("")
    print("Your Python stack is not ready, please check error messages above")

Your Python interpreter is ready. Your version: 
        3.6.5 (default, Apr 25 2018, 14:23:58) 
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)]
        

Congratulations, your PySpark stack is ready to go


## PySpark Batch Jobs

Now evaluate the cells below. This creates a script that is then submitted to your PySpark installation. Verify the output: You should see something like this:

    ##########################################
    PySpark uses Python version:  3.6.5 (default, Apr 25 2018, 14:23:58) 
    [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)]
    Congratulations, submitting a PySpark job is working
    ##########################################


Make sure PySpark is using the right Python version. This can be achieved by setting the environment variable `PYSPARK_PYTHON` to the appropriate Python binary.

In [2]:
%%file scripts/spark_job_test.py

SPARK_APP_NAME='sparkjob_test'

import sys
from contextlib import contextmanager
from pyspark import SparkContext, SparkConf

@contextmanager
def use_spark_context(appName):
    conf = SparkConf().setAppName(appName) 
    spark_context = SparkContext(conf=conf)

    try:
        print("starting ", SPARK_APP_NAME)
        yield spark_context
    finally:
        spark_context.stop()
        print("stopping ", SPARK_APP_NAME)


with use_spark_context(appName=SPARK_APP_NAME) as sc:
    rdd = sc.range(100)
    print()
    print("##########################################")
    print("PySpark uses Python version: ", sys.version)
    print("Congratulations, submitting a PySpark job is working")
    print("##########################################")
    print()


Overwriting scripts/spark_job_test.py


In [3]:
!spark-submit scripts/spark_job_test.py

2018-05-28 22:42:47 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-05-28 22:42:48 INFO  SparkContext:54 - Running Spark version 2.3.0
2018-05-28 22:42:48 INFO  SparkContext:54 - Submitted application: sparkjob_test
2018-05-28 22:42:48 INFO  SecurityManager:54 - Changing view acls to: cls
2018-05-28 22:42:48 INFO  SecurityManager:54 - Changing modify acls to: cls
2018-05-28 22:42:48 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-05-28 22:42:48 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-05-28 22:42:48 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(cls); groups with view permissions: Set(); users  with modify permissions: Set(cls); groups with modify permissions: Set()
2018-05-28 22:42:48 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 53583.
2018-05-28 22:42:48 INFO  Spar

---
_This notebook is licensed under a [Creative Commons Attribution 4.0 International License (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/). Copyright © 2018 [Point 8 GmbH](https://point-8.de)_