# Understanding SparkContext

- A `SparkContext` represents the entry point to Spark functionality. It's like a key to your car. PySpark automatically creates a `SparkContext` for you in the `PySpark` shell (so you don't have to create it by yourself) and is exposed via a variable `sc`.

- In this simple exercise, you'll find out the attributes of the `SparkContext` in your PySpark shell which you'll be using for the rest of the course.

## Instructions
- Print the version of `SparkContext` in the PySpark shell.
- Print the Python version of `SparkContext` in the PySpark shell.
- What is the master of SparkContext in the PySpark shell?

In [1]:
# Intialization
import os
import sys

os.environ["SPARK_HOME"] = "/home/talentum/spark"
os.environ["PYLIB"] = os.environ["SPARK_HOME"] + "/python/lib"
# In below two lines, use /usr/bin/python2.7 if you want to use Python 2
os.environ["PYSPARK_PYTHON"] = "/usr/bin/python3.6" 
os.environ["PYSPARK_DRIVER_PYTHON"] = "/usr/bin/python3"
sys.path.insert(0, os.environ["PYLIB"] +"/py4j-0.10.7-src.zip")
sys.path.insert(0, os.environ["PYLIB"] +"/pyspark.zip")

# NOTE: Whichever package you want mention here.
# os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.databricks:spark-xml_2.11:0.6.0 pyspark-shell' 
# os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-avro_2.11:2.4.0 pyspark-shell'
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.databricks:spark-xml_2.11:0.6.0,org.apache.spark:spark-avro_2.11:2.4.3 pyspark-shell'
# os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.databricks:spark-xml_2.11:0.6.0,org.apache.spark:spark-avro_2.11:2.4.0 pyspark-shell'

In [2]:
#Entrypoint 2.x
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Spark SQL basic example").enableHiveSupport().getOrCreate()

# On yarn:
# spark = SparkSession.builder.appName("Spark SQL basic example").enableHiveSupport().master("yarn").getOrCreate()
# specify .master("yarn")

sc = spark.sparkContext

In [4]:
# script.py

# Print the version of SparkContext
print("The version of Spark Context in the PySpark shell is:", sc.version)

# Print the Python version of SparkContext
print("The Python version of Spark Context in the PySpark shell is:", sc.pythonVer)

# Print the master of SparkContext
print("The master of Spark Context in the PySpark shell is:", sc.master)


The version of Spark Context in the PySpark shell is: 2.4.5
The Python version of Spark Context in the PySpark shell is: 3.6
The master of Spark Context in the PySpark shell is: local[*]


In [2]:
items = [1,2,3,4]
lst = []

for i in items:
    if i%2 != 0:
        lst.append(i)
        
lst

[1, 3]

In [3]:
items = [1,2,3,4]
list(filter(lambda x: (x%2 != 0), items))

[1, 3]