# WORK  IN PROGRESS


# Exploratory Data Analysis using Spark and Python  
Now that we have an idea of how to explore some data in Spark, the following content describes how to apply some of those principles to the __Exploratory Data Analysis__ methodology within Data Science. This document outlines some of the pitfalls and issues that one may encounter as they they try to explore data in Spark.

__Note:__ The infomration within this document is based on the [Python Tutorials](https://www.codementor.io/python/tutorial) from __Code Mentor__. 


## Getting the Data  
### Getting the Data  
For this exercise, we will use the Incidents derived from [SFPD Crime Incident Reporting system](https://data.sfgov.org/Public-Safety/SFPD-Incidents-from-1-January-2003/tmnf-yvry
).  

The Data isfomatted to show the following infortmation:
- Incident Number
- Catagory of the Incident
- Day of the Week
- Date
- Time
- Police Department District
- Resolution
- Address
- X map coordinates
- Y map coordinates
- Map location
- Poilice Deprtment ID

The data has been exported to `.csv` format and copied to HDFS using the following proceedure:

```
wget https://data..org/api/views/tmnf-yvry/rows.csv?accessType=DOWNLOAD -O SFPD_Incidents.csv
hdfs dfs -put incidents.csv /data/
hdfs dfs -ls /data/
```

### Importing the Data into Spark  
#### Manual Schema Preparation  
The first step to doing this is to isolate the headers of the data to be used for the field names,in order to understand what the fields are for and hence the field types. Since this is a manual process, we will manually bring the data back into Spark using the `SQLContext`.  We will not be using any of the functionality of dataframes and `spark-csv` adn the reson for this is to highlight the ease of doing this with dataframes later, over the manual (or traditional) steps.

In [1]:
from pyspark.sql import SQLContext
from pyspark.sql.types import *
data = sc.textFile("hdfs://master:54310/data/SFPD_Incidents.csv")
data.take(1)

[u'IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId']

The first thing we do is, as the [documentation](https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema) suggestions is to isolate the headings. We wil use these headings to build the schema.

In [2]:
header = data.first()
header

u'IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId'

Next, we can construct the individual fields, by splitting them up based on the `,` delimeter. As a baseline, we force each field to be of type `string`.

In [3]:
fields = [StructField(field_name, StringType(), True) for field_name in header.split(',')]
fields

[StructField(IncidntNum,StringType,true),
 StructField(Category,StringType,true),
 StructField(Descript,StringType,true),
 StructField(DayOfWeek,StringType,true),
 StructField(Date,StringType,true),
 StructField(Time,StringType,true),
 StructField(PdDistrict,StringType,true),
 StructField(Resolution,StringType,true),
 StructField(Address,StringType,true),
 StructField(X,StringType,true),
 StructField(Y,StringType,true),
 StructField(Location,StringType,true),
 StructField(PdId,StringType,true)]

Now that we have individual fields, we can specify the exact type of data within each column based on the description from the origional source. For example, according to the website, the `DayOfWeek` column is of __Plain Text__ type, but the `Date` column is is of type, __Date & Time__. So all we need to do is change the type of data in each of our fields, to match the descript from the website.  

Therfore, the fonly fields we need to change are:
- __Date__ from `StringType` to `DateType`
- __Time__ from `StringType` to `TimestampType`
- __X__ from `StringType` to `FloatType`
- __Y__ from `StringType` to `FloatType`
- __PdId__ from `StringType` to `LongType`

In [4]:
# Set the necessary fields to the proper type
fields[4].dataType = DateType() #Date
fields[5].dataType = TimestampType() #Time
fields[9].dataType = FloatType() #X
fields[10].dataType = FloatType() #Y
fields[12].dataType = LongType() #PdId
fields

[StructField(IncidntNum,StringType,true),
 StructField(Category,StringType,true),
 StructField(Descript,StringType,true),
 StructField(DayOfWeek,StringType,true),
 StructField(Date,DateType,true),
 StructField(Time,TimestampType,true),
 StructField(PdDistrict,StringType,true),
 StructField(Resolution,StringType,true),
 StructField(Address,StringType,true),
 StructField(X,FloatType,true),
 StructField(Y,FloatType,true),
 StructField(Location,StringType,true),
 StructField(PdId,LongType,true)]

As part of the data aquisition process, extracing the headers, also provides an opportunity to clean them up. Although this is not necessary, we can change the headings to something that's more understandable. For example:

In [5]:
# Change `IncidntNum` to `Incident`
fields[0].name = "Incident"

# Change `DayOfWeek` to `Day`
fields[3].name = "Day"

# Change `Descript` to `Description`
fields[2].name = "Description"
fields

[StructField(Incident,StringType,true),
 StructField(Category,StringType,true),
 StructField(Description,StringType,true),
 StructField(Day,StringType,true),
 StructField(Date,DateType,true),
 StructField(Time,TimestampType,true),
 StructField(PdDistrict,StringType,true),
 StructField(Resolution,StringType,true),
 StructField(Address,StringType,true),
 StructField(X,FloatType,true),
 StructField(Y,FloatType,true),
 StructField(Location,StringType,true),
 StructField(PdId,LongType,true)]

So now that the data types have been changes, we can use this to contruct the schema. This will be used later as we construct the dataframe. 

In [6]:
# Create the schema
schema = StructType(fields)

Before creating the dataframe manually, a good practice is to strip out the header file so a to not conflict with the actual data using Spark's `subtract()` method.

In [8]:
dataHeader = data.filter(lambda x: "PdId" in x)
dataHeader.collect()

# Remove the header data collected
dataNoHeader = data.subtract(dataHeader)
dataNoHeader.first()

u'130294262,LARCENY/THEFT,THEFT FROM MERCHANT OR LIBRARY,Wednesday,04/10/2013,20:38,MISSION,NONE,4000 Block of 18TH ST,-122.434457353955,37.7609766090845,"(37.7609766090845, -122.434457353955)",13029426206394'

Now the first row starts with the actual data. So now that we ahve raw data and the schema, we can create a dataframe.

In [9]:
df = sqlContext.createDataFrame(dataNoHeader, schema)
df.head()

Py4JJavaError: An error occurred while calling z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 9.0 failed 4 times, most recent failure: Lost task 0.3 in stage 9.0 (TID 25, slave-3): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main
    process()
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/opt/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/opt/spark/python/pyspark/sql/types.py", line 546, in toInternal
    raise ValueError("Unexpected tuple %r with StructType" % obj)
ValueError: Unexpected tuple u'130294262,LARCENY/THEFT,THEFT FROM MERCHANT OR LIBRARY,Wednesday,04/10/2013,20:38,MISSION,NONE,4000 Block of 18TH ST,-122.434457353955,37.7609766090845,"(37.7609766090845, -122.434457353955)",13029426206394' with StructType

	at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
	at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
	at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
	at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:212)
	at org.apache.spark.sql.execution.EvaluatePython$$anonfun$takeAndServe$1.apply$mcI$sp(python.scala:126)
	at org.apache.spark.sql.execution.EvaluatePython$$anonfun$takeAndServe$1.apply(python.scala:124)
	at org.apache.spark.sql.execution.EvaluatePython$$anonfun$takeAndServe$1.apply(python.scala:124)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
	at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)
	at org.apache.spark.sql.execution.EvaluatePython$.takeAndServe(python.scala:124)
	at org.apache.spark.sql.execution.EvaluatePython.takeAndServe(python.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
	at py4j.Gateway.invoke(Gateway.java:259)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:209)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main
    process()
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/opt/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/opt/spark/python/pyspark/sql/types.py", line 546, in toInternal
    raise ValueError("Unexpected tuple %r with StructType" % obj)
ValueError: Unexpected tuple u'130294262,LARCENY/THEFT,THEFT FROM MERCHANT OR LIBRARY,Wednesday,04/10/2013,20:38,MISSION,NONE,4000 Block of 18TH ST,-122.434457353955,37.7609766090845,"(37.7609766090845, -122.434457353955)",13029426206394' with StructType

	at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
	at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
	at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	... 1 more


As can be seen from the above result, one needs to have a very definite understanding on the the type of data they are dealing with and keeping in mind that we are working with __Big Data__, we will see that not all of the raw data in the rows conforms to the specifiec schema we have created. So another option to leverage `spark-csv`.

#### Using Spark-csv  
The first proceedure we will use to get the data into Spark, is `spark-csv` from [__Databricks__](http://spark-packages.org/package/databricks/spark-csv). This package allows us to import `.csv` data into a Spark DataFrame, using the example below:

In [1]:
# HDFS location of the downloaded file
data = "hdfs://master:54310/data/SFPD_Incidents.csv"

# Create a sqlContext variable to read and load the file, captuing the header and schema
df = sqlContext.read.load(data,
                          format="com.databricks.spark.csv",
                          header="true",
                          infereSchema="true")

# Take the first row
df.take(1)

[Row(IncidntNum=u'150027849', Category=u'NON-CRIMINAL', Descript=u'SEARCH WARRANT SERVICE', DayOfWeek=u'Friday', Date=u'01/09/2015', Time=u'22:02', PdDistrict=u'NORTHERN', Resolution=u'ARREST, BOOKED', Address=u'200 Block of LAGUNA ST', X=u'-122.425722704575', Y=u'37.7734305349811', Location=u'(37.7734305349811, -122.425722704575)', PdId=u'15002784975025')]

There are a few of important things to note from the output above. __Firstly__, the raw fomatting may not be helpful in descirbing the data. Therefore, another option to display this is shown below: 

In [2]:
# Show the first row
df.show(1)

+----------+------------+--------------------+---------+----------+-----+----------+--------------+--------------------+-----------------+----------------+--------------------+--------------+
|IncidntNum|    Category|            Descript|DayOfWeek|      Date| Time|PdDistrict|    Resolution|             Address|                X|               Y|            Location|          PdId|
+----------+------------+--------------------+---------+----------+-----+----------+--------------+--------------------+-----------------+----------------+--------------------+--------------+
| 150027849|NON-CRIMINAL|SEARCH WARRANT SE...|   Friday|01/09/2015|22:02|  NORTHERN|ARREST, BOOKED|200 Block of LAGU...|-122.425722704575|37.7734305349811|(37.7734305349811...|15002784975025|
+----------+------------+--------------------+---------+----------+-----+----------+--------------+--------------------+-----------------+----------------+--------------------+--------------+
only showing top 1 row



The `show()` function attempts to display the formatting better, but may not be the best display output if the number of colums exceeds the width of the Notebook. __Secondly__, although `inferSchema` is set to `true`, `spark-csv` tries its best to fully capture the Schema of the data as scale, as is seen from the output below.

In [3]:
# Show the Schema
df.printSchema()
df.dtypes

root
 |-- IncidntNum: string (nullable = true)
 |-- Category: string (nullable = true)
 |-- Descript: string (nullable = true)
 |-- DayOfWeek: string (nullable = true)
 |-- Date: string (nullable = true)
 |-- Time: string (nullable = true)
 |-- PdDistrict: string (nullable = true)
 |-- Resolution: string (nullable = true)
 |-- Address: string (nullable = true)
 |-- X: string (nullable = true)
 |-- Y: string (nullable = true)
 |-- Location: string (nullable = true)
 |-- PdId: string (nullable = true)



[('IncidntNum', 'string'),
 ('Category', 'string'),
 ('Descript', 'string'),
 ('DayOfWeek', 'string'),
 ('Date', 'string'),
 ('Time', 'string'),
 ('PdDistrict', 'string'),
 ('Resolution', 'string'),
 ('Address', 'string'),
 ('X', 'string'),
 ('Y', 'string'),
 ('Location', 'string'),
 ('PdId', 'string')]

As can be seen, the inferred Schema is set to string. __Thirdly__, calling the `.csv` file from the local filesystem seems to produce errors stating that the file cannot be found. I'm assuming that this is becuase the file needs to be on all nodes of the Spark Cluster and not just the Master node. To circumvent this issue, the data file has been copied onto HDFS - as shown at the outset - to ensure that all nodes can access the data.

As a *side note*, it is possible what once the Data has been captured as a Spark Dataframe, it can be comnverted to a __Pandas__ dataframe by making use of the `toPandas()` function on the Spark DataFrame, as shown below. 

Pandas offers a number of differences over Spark dataframes. For more information on this, see [6 differences between Pandas and Spark DataFrames](https://medium.com/@chris_bour/6-differences-between-pandas-and-spark-dataframes-1380cec394d2#.x2a9hwn4z).  

In [4]:
df = df.toPandas()

In [5]:
df.head()

Unnamed: 0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId
0,150027849,NON-CRIMINAL,SEARCH WARRANT SERVICE,Friday,01/09/2015,22:02,NORTHERN,"ARREST, BOOKED",200 Block of LAGUNA ST,-122.425722704575,37.7734305349811,"(37.7734305349811, -122.425722704575)",15002784975025
1,150046504,VANDALISM,"MALICIOUS MISCHIEF, VANDALISM OF VEHICLES",Thursday,01/15/2015,19:00,SOUTHERN,NONE,400 Block of THE EMBARCADEROSOUTH ST,-122.39670853026,37.7978728855933,"(37.7978728855933, -122.39670853026)",15004650428160
2,150048817,LARCENY/THEFT,PETTY THEFT SHOPLIFTING,Friday,01/16/2015,16:26,CENTRAL,NONE,200 Block of GRANT AV,-122.405254463024,37.7892525040522,"(37.7892525040522, -122.405254463024)",15004881706363
3,140009459,ARSON,ARSON,Saturday,01/04/2014,03:52,NORTHERN,"ARREST, BOOKED",SACRAMENTO ST / POLK ST,-122.420874632415,37.7914943051906,"(37.7914943051906, -122.420874632415)",14000945926030
4,140042001,FRAUD,"CREDIT CARD, THEFT BY USE OF",Wednesday,01/15/2014,12:30,CENTRAL,NONE,SANSOME ST / BUSH ST,-122.400748631911,37.7911776792224,"(37.7911776792224, -122.400748631911)",14004200109320


In [6]:
df.dtypes

IncidntNum    object
Category      object
Descript      object
DayOfWeek     object
Date          object
Time          object
PdDistrict    object
Resolution    object
Address       object
X             object
Y             object
Location      object
PdId          object
dtype: object

Unfortunately, by converting to a Pandas dataframe, the class of the data is now converted to an `object`. So once again we still don't have a clear idea of the actual schema. So we will have to manually prepare the schema.



# Appendix A: Using Pandas  and JSON
Pandas also provides a method of reading `.csv` files, which can then be used as a Spark DataFrame. For an example on how to work with a `.csv` file in Pandas, see [Chris Albon's](http://chrisalbon.com/python/pandas_dataframe_importing_csv.html) post.

In [None]:
import pandas as pd
pd_csv = pd.read_csv("CHI_Incidents.csv")
pd_df = sqlContext.createDataFrame(pd_csv)
pd_df.take(1)

In [None]:
#pd_df.printSchema()
pd_df.dtypes

In [None]:
pd_df.show(1)
pd_csv.head(1)

In [None]:
pd_csv.dtypes

Furthermore, Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame. This conversion can be done using SQLContext.read.json on a JSON file.

Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. As a consequence, a regular multi-line JSON file will most often fail.

In [None]:
df = sqlContext.read.load("hdfs://master:54310/data/incidents.json", format='json')

In [None]:
df.printSchema()

In [None]:
input_csv = "hdfs://master:54310/data/incidents.csv"
df = sqlContext.read.load(input_csv, format='com.databricks.spark.csv', header='true', infereSchema='true')
#df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("hdfs://master:54310/data/incidents.csv")
#df.printSchema()
df.take(5)

$$c = \sqrt{a^2 + b^2}$$