# Scala example using Spark SQL over Cloudant as a source

This sample notebook is written in Scala and expects the Scala 2.10 runtime. Make sure the kernel is started and we are connected when executing this notebook.

The data source for this example can be found at: http://examples.cloudant.com/crimes/

Replicate the database into your own Cloudant account before you execute this script.

## 1. Work with the Spark Context

A Spark Context handle `sc` is available with every notebook create in the Spark Service. Use it to understand the Spark version used, the environment settings, and create a Spark SQL Context object off of it.

In [1]:
sc.version

1.4.1

In [3]:
val sqlCtx = new org.apache.spark.sql.SQLContext(sc)

## 2. Work with a Cloudant database

A Dataframe object can be created directly from a Cloudant database. To configure the database as source, pass these options:

1 - package name that provides the classes (like `CloudantDataSource`) implemented in the connector to extend `BaseRelation`. For the Cloudant Spark connector this will be `com.cloudant.spark`

2 - `cloudant.host` parameter to pass the Cloudant account name

3 - `cloudant.user` parameter to pass the Cloudant user name

4 - `cloudant.password` parameter to pass the Cloudant account password

In [4]:
val df = sqlCtx.read.format("com.cloudant.spark").option("cloudant.host","examples.cloudant.com").option("cloudant.username", "examples").option("cloudant.password","xxxx").load("customer")

Use dbName=customer, indexName=null, jsonstore.rdd.partitions=5, jsonstore.rdd.maxInPartition=-1, jsonstore.rdd.minInPartition=10, jsonstore.rdd.requestTimeout=100000,jsonstore.rdd.concurrentSave=-1,jsonstore.rdd.bulkSize=1


## 3. Work with a Dataframe

At this point all transformations and functions should behave as specified with Spark SQL. (http://spark.apache.org/sql/)

There are, however, a number of things the Cloudant Spark connector does not support yet, or things that are simply not working. For that reason we call this connector a **BETA** release and are only gradually improving it towards GA. 

Please direct your any change requests at [support@cloudant.com](mailto:support@cloudant.com)

In [5]:
df.printSchema()

root
 |-- ADDRESS1: struct (nullable = true)
 |    |-- street: string (nullable = true)
 |    |-- street number: string (nullable = true)
 |-- ADDRESS1_MB: string (nullable = true)
 |-- BRANCH_CODE: long (nullable = true)
 |-- CITY: string (nullable = true)
 |-- CITY_MB: string (nullable = true)
 |-- COUNTRY_CODE: long (nullable = true)
 |-- ORGANIZATION_CODE: string (nullable = true)
 |-- POSTAL_ZONE: string (nullable = true)
 |-- WAREHOUSE_BRANCH_CODE: long (nullable = true)
 |-- _id: string (nullable = true)
 |-- _rev: string (nullable = true)



In [6]:
df.count()

3561

In [7]:
df.select("CITY").show()

+----------------+
|            CITY|
+----------------+
|           Paris|
|          Milano|
|       Amsterdam|
|         Hamburg|
|         M?nchen|
|           Kista|
|         Calgary|
|         Toronto|
|          Boston|
|         Seattle|
|     Los Angeles|
|           Miami|
|            Lyon|
|Distrito Federal|
|           Tokyo|
|      Osaka City|
|       Melbourne|
|          Bilbao|
|       S?o Paulo|
|          Kuopio|
+----------------+



In [9]:
df.select("ADDRESS1_MB").show()

+--------------------+
|         ADDRESS1_MB|
+--------------------+
|75, rue du Faubou...|
|     Piazza Duomo, 1|
| Singelgravenplein 4|
|      Schwabentor 35|
|    Leopoldstra?e 36|
| Isafjordsgatan 30 C|
|7800, 756 - 6th A...|
|    789 Yonge Street|
|1288 Dorchester A...|
|     299 Yale Avenue|
|1288 South Barrin...|
|      10032 NW 186th|
| 6c, rue de l'?glise|
|Prol. Paseo de la...|
|         202-2-3 ???|
|           543-225 ?|
|    2315 Queen's Ave|
|Plaza de la Const...|
|Avenida Paulista,...|
|       Kauppakatu 33|
+--------------------+



In [10]:
val add = df.select("ADDRESS1_MB");
        add.registerTempTable("add");
        add.printSchema();
        add.show();

root
 |-- ADDRESS1_MB: string (nullable = true)

+--------------------+
|         ADDRESS1_MB|
+--------------------+
|75, rue du Faubou...|
|     Piazza Duomo, 1|
| Singelgravenplein 4|
|      Schwabentor 35|
|    Leopoldstra?e 36|
| Isafjordsgatan 30 C|
|7800, 756 - 6th A...|
|    789 Yonge Street|
|1288 Dorchester A...|
|     299 Yale Avenue|
|1288 South Barrin...|
|      10032 NW 186th|
| 6c, rue de l'?glise|
|Prol. Paseo de la...|
|         202-2-3 ???|
|           543-225 ?|
|    2315 Queen's Ave|
|Plaza de la Const...|
|Avenida Paulista,...|
|       Kauppakatu 33|
+--------------------+

