Get the earthquake data from http://earthquake.usgs.gov API for this year:
http://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2016-01-01&endtime=2016-01-31

Upload the file to your object store, named it as earthquake.csv for this notebook.

In [1]:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

In [2]:
val earthquakes = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("swift://notebooks.spark/earthquake.csv")

In [3]:
earthquakes.registerTempTable("earthquake")

In [4]:
earthquakes.printSchema

root
 |-- time: string (nullable = true)
 |-- latitude: double (nullable = true)
 |-- longitude: double (nullable = true)
 |-- depth: double (nullable = true)
 |-- mag: double (nullable = true)
 |-- magType: string (nullable = true)
 |-- nst: integer (nullable = true)
 |-- gap: double (nullable = true)
 |-- dmin: double (nullable = true)
 |-- rms: double (nullable = true)
 |-- net: string (nullable = true)
 |-- id: string (nullable = true)
 |-- updated: string (nullable = true)
 |-- place: string (nullable = true)
 |-- type: string (nullable = true)
 |-- horizontalError: double (nullable = true)
 |-- depthError: double (nullable = true)
 |-- magError: double (nullable = true)
 |-- magNst: integer (nullable = true)
 |-- status: string (nullable = true)
 |-- locationSource: string (nullable = true)
 |-- magSource: string (nullable = true)



Below query will fetch all the data about earthquakes that happened in California.

In [5]:
val results = sqlContext.sql("select earthquake.* from earthquake where earthquake.place like '%California%'")
results.collect

In [6]:
results.show

Below block is for using Machine learning Lib.
Belowtwo import are the alogirithm that Spark has provided.

In [7]:
import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors

In [8]:
val earthquakeFile = sc.textFile("swift://notebooks.spark/earthquake.csv")

Check the CSV structure

In [9]:
earthquakeFile.take(1)

Array(time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource)

In [10]:
val californiaData=earthquakeFile.filter(_.contains("California"))

above statement filters the file data for California state

In [11]:
val caliDataOn30th  = californiaData.filter(_.contains("reviewed"))

above statement further filters the filedata for the earthquackes in California state with type 'reviewed'

To find maximum of depth values, Create a Vector of this data as shown below

In [12]:
val caliFilteredDataVetcor = caliDataOn30th.map(line=>Vectors.dense(line.split(',').slice(3,4).map(_.toDouble)))

train
public static KMeansModel train(RDD<Vector> data,
                int k,
                int maxIterations)
Trains a k-means model using specified parameters and the default values for unspecified.

In [13]:
val model=KMeans.train(caliFilteredDataVetcor,1,10)
val clusterCenters=model.clusterCenters.map(_.toArray)
clusterCenters.foreach(lines=>println(lines(0)))

5.257182741116755
