# Batch ingestion into Azure Cosmos DB collection

In this notebook, we'll 

+ Load the IoTDeviceInfo dataset from ADLS Gen2 to a dataframe
+ Write the dataframe to the Azure Cosmos DB collection

>**Did you know?**  [Azure Synapse Link for Azure Cosmos DB](https://review.docs.microsoft.com/en-us/azure/cosmos-db/synapse-link?branch=release-build-cosmosdb) is a hybrid transactional and analytical processing (HTAP) capability that enables you to run near real-time analytics over operational data in Azure Cosmos DB.
&nbsp

>**Did you know?**  [Azure Cosmos DB analytical store](https://review.docs.microsoft.com/en-us/azure/cosmos-db/analytical-store-introduction?branch=release-build-cosmosdb) is a fully isolated column store for enabling large scale analytics against operational data in your Azure Cosmos DB, without any impact to your transactional workloads.
&nbsp

>**Did you know?**  The Synapse workspace is attached to an ADLS Gen2 storage account and the files placed on the default storage account can be accessed using the relative path as below.
&nbsp

## 1. Using Synapse workspace, upload the csv files from the **RetailData** folder of this repo to your Azure Synapse ADLS Gen 2 account. Place them into the **RetailData** folder.

<img src="https://cosmosnotebooksdata.blob.core.windows.net/notebookdata/upload.PNG" alt="Upload" width="75%"/>


## 2. Load the data from ADLS Gen2 to Python DataFrames


In [5]:
dfStoreDemoGraphics = (spark
                .read
                .csv("/RetailData/StoreDemoGraphics.csv", header=True)
              )

dfRetailSales = (spark
                .read
                .csv("/RetailData/RetailSales.csv", header=True)
              )

dfProduct = (spark
                .read
                .csv("/RetailData/Product.csv", header=True)
              )


## 3. Write the dataframe to the Azure Cosmos DB Collections

>**Did you know?** The "cosmos.oltp" is the Spark format that enables connection to the Cosmos DB Transactional store.

>**Did you know?** The ingestion to the Azure Cosmos DB collection is always performed through the Transactional store irrespective of whether the Analytical Store is enabled or not.

In [8]:
dfStoreDemoGraphics.write\
            .format("cosmos.oltp")\
            .option("spark.synapse.linkedService", "SurfaceSalesDB")\
            .option("spark.cosmos.container", "StoreDemographics")\
            .option("spark.cosmos.write.upsertEnabled", "true")\
            .mode('append')\
            .save()

dfRetailSales.write\
            .format("cosmos.oltp")\
            .option("spark.synapse.linkedService", "SurfaceSalesDB")\
            .option("spark.cosmos.container", "RetailSales")\
            .option("spark.cosmos.write.upsertEnabled", "true")\
            .mode('append')\
            .save()

dfProduct.write\
            .format("cosmos.oltp")\
            .option("spark.synapse.linkedService", "SurfaceSalesDB")\
            .option("spark.cosmos.container", "Product")\
            .option("spark.cosmos.write.upsertEnabled", "true")\
            .mode('append')\
            .save()     
