The first step in using Spark is to connecting to a `cluster`. In practice, the cluster is hosted on a remote machine that's connected to all other nodes. To create a connection we need to create an instance of `SparkContext` class. The class constructor takes in few optional arguments that allows to specify the attributes of the cluster. An object holding these attributes can be created using `SparkConf` class.

### Examining the Spark Context

here `SparkContext` object `sc` is already loaded into the workspace

In [None]:
from pyspark import SparkConf, SparkContext

configure = SparkConf().setAppName("example-app").setMaster("local")
sc = SparkContext(conf=configure)

In [None]:
print(sc)
print(sc.version)

In [None]:
flights = sc.parallelize('flights_small.csv')

Spark's core data structure is `Resilient Distributed Dataset` (RDD). This is a low level object that let's Spark work its magic by splitting data across multiple nodes in the cluster. Spark DataFrame behaves lot like SQL table. To start working on Spark DataFrames we have to create `SparkSession` object from `SparkContext` object

`SparkContext` is like to connection to the cluster and `SparkSession` is like an interface with that connection

### Creating a SparkSession

In [None]:
from pyspark.sql import SparkSession

my_spark = SparkSession.builder.getOrCreate()
print(my_spark)

In [None]:
my_spark.sparkContext.getConf().getAll()

### Loading Data

In [None]:
flights = my_spark.read.csv("flights_small.csv", header=True)

In [None]:
flights.show()

### Viewing tables

In [None]:
my_spark.catalog.listTables()