# Entry points to spark
---

## Two entry points

* **SparkContext**: create *RDD* and broadcast variables on the cluster.
* **SparkSession**: create *DataFrame* (pyspark.sql.dataframe.DataFrame).

## SparkContext

In [1]:
from pyspark import SparkContext
sc.stop() # stop existing sparkContext
sc = SparkContext(master = 'local', 
                  appName= 'my-sparkcontext')

### Create RDD by loading a file

In [2]:
rdd = sc.textFile('data/mtcars.csv')
rdd.take(5)

[',mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb',
 'Mazda RX4,21,6,160,110,3.9,2.62,16.46,0,1,4,4',
 'Mazda RX4 Wag,21,6,160,110,3.9,2.875,17.02,0,1,4,4',
 'Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1',
 'Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1']

### Create RDD with `parallelize()`

In [3]:
rdd= sc.parallelize([1,2,3])
rdd.collect()

[1, 2, 3]

## SparkSession

In [4]:
from pyspark.sql import SparkSession
spark = SparkSession(sparkContext=sc)

### Create DataFrame by loading a file

In [5]:
df = spark.read.csv('data/mtcars.csv', header=True, inferSchema=True)
df.show(5, truncate=False)

+-----------------+----+---+-----+---+----+-----+-----+---+---+----+----+
|_c0              |mpg |cyl|disp |hp |drat|wt   |qsec |vs |am |gear|carb|
+-----------------+----+---+-----+---+----+-----+-----+---+---+----+----+
|Mazda RX4        |21.0|6  |160.0|110|3.9 |2.62 |16.46|0  |1  |4   |4   |
|Mazda RX4 Wag    |21.0|6  |160.0|110|3.9 |2.875|17.02|0  |1  |4   |4   |
|Datsun 710       |22.8|4  |108.0|93 |3.85|2.32 |18.61|1  |1  |4   |1   |
|Hornet 4 Drive   |21.4|6  |258.0|110|3.08|3.215|19.44|1  |0  |3   |1   |
|Hornet Sportabout|18.7|8  |360.0|175|3.15|3.44 |17.02|0  |0  |3   |2   |
+-----------------+----+---+-----+---+----+-----+-----+---+---+----+----+
only showing top 5 rows



### Create DataFrame with `createDataFrame()`

In [6]:
import pandas as pd
pdf = pd.DataFrame({
    'x1': range(1,6),
    'x2': list('abcde')
})
df = spark.createDataFrame(pdf)
df.show()

+---+---+
| x1| x2|
+---+---+
|  1|  a|
|  2|  b|
|  3|  c|
|  4|  d|
|  5|  e|
+---+---+

