# Connecting Spark with Cosmos DB Change feed  
In this sample, we connect to Cosmos DB change feed through the Cosmos DB Java SDK and Spark. We will read new changes and count the number of new changes. In this specific example, the collection we are reading from contains documents with product information. To learn more about Azure Cosmos DB and change feed check out the [main page](https://docs.microsoft.com/azure/cosmos-db/introduction), or the [change feed page](https://docs.microsoft.com/azure/cosmos-db/change-feed). 

## Configuration
Configuring environment with Cosmos DB jar files available here on the [Github](https://github.com/Azure/azure-cosmosdb-spark/tree/master/releases/azure-cosmosdb-spark_2.1.0_2.11-0.0.4) 

In [1]:
%%configure
{ "name":"Spark-to-Cosmos_DB_Connector", 
  "executorMemory": "8G", 
  "executorCores": 2, 
  "numExecutors": 2, 
  "driverCores": 2,
  "jars": ["wasb:///example/jars/0.0.3.2/azure-documentdb-1.12.0.jar","wasb:///example/jars/0.0.3.2/azure-cosmosdb-spark-0.0.3-SNAPSHOT.jar", "wasb:///example/jars/0.0.3.2/azure-documentdb-rx-0.9.0-rc1.jar", "wasb:///example/jars/0.0.3.2/rxjava-1.3.0.jar" ],
  "conf": {
    "spark.jars.packages": "graphframes:graphframes:0.5.0-spark2.1-s_2.11",   
    "spark.jars.excludes": "org.scala-lang:scala-reflect"
   }
}

ID,YARN Application ID,Kind,State,Spark UI,Driver log,Current session?
130,application_1503579980138_0005,spark,idle,Link,Link,


## Connecting to Cosmos DB change feed 
To start the Spark session with change feed, we need to specify some config options. 

**Endpoint**: your Cosmos DB url (i.e. https://youraccount.documents.azzure.com:443/)

**Masterkey**: the key string for you Cosmos DB account

**Database**: name of your existing database

**Collection**: name of your existing collection from which you wish to read the change feed

**ChangeFeedQueryName**: unique string for your query 


In [2]:
# Adding variables 
rollingChangeFeed = False
startFromTheBeginning = False
useNextToken = True 

productsConfig = {
"Endpoint" : "https://youraccount.documents.azure.com:443/",
"Masterkey" : "mXBmwss4FJdsfhsdkjfhkeJFEDSQBNMFNUOGQsGCEeLRRkAZEUWebg==",
"Database" : "AW",
"Collection" : "AWProducts", 
"ReadChangeFeed" : "true",
"ChangeFeedQueryName" : str(rollingChangeFeed) + str(startFromTheBeginning) + str(useNextToken),
"ChangeFeedStartFromTheBeginning" : str(startFromTheBeginning),
"ChangeFeedUseNextToken" : str(useNextToken),
"RollingChangeFeed" : str(rollingChangeFeed),
"ChangeFeedCheckpointLocation" : "./changefeedcheckpointlocation",
"SamplingRatio" : "1.0"
}

Starting Spark application


ID,YARN Application ID,Kind,State,Spark UI,Driver log,Current session?
142,application_1503579980138_0017,pyspark,idle,Link,Link,✔


SparkSession available as 'spark'.


Loading data frame 

In [3]:
products = spark.read.format("com.microsoft.azure.cosmosdb.spark").options(**productsConfig).load()

Initializing count of new products on change feed to be 0 

In [4]:
new_product_count = 0 

Rerun below section to check for new changes and increment count of new changes. 

In [8]:
products.show()
new_product_count += products.count() 

+-------------+--------------------+--------+--------------------+-----+--------------------+------+----+-------------+------------+-----------+---+--------------------+----------+-------+-------+----------+------------+-------------+--------------------+-----+
|ProductNumber|               _etag|Category|                _rid|Model|         Description|Rating|Size|SubCategories|_attachments|SellEndDate| id|               _self|CategoryId|DocType| Weight|       _ts|CategoryName|SellStartDate|                Name|Color|
+-------------+--------------------+--------+--------------------+-----+--------------------+------+----+-------------+------------+-----------+---+--------------------+----------+-------+-------+----------+------------+-------------+--------------------+-----+
|   FR-R92B-58|"06000400-0000-00...|    null|01NUAOs2KABDAQAAA...| null|Our lightest and ...|  null|  58|         null|attachments/|       null|700|dbs/01NUAA==/coll...|      null|   null|1016.04|1503685803|       

In [9]:
print new_product_count

1