## Synapse Sales Aggregation Notebook

Chris Joakim, Cosmos DB GBB, Microsoft

### This Spark/PySpark Notebook demonstrates how to:

- **Read the Synapse Link Analytic Datastore with Spark in Azure Synapse**
  - This Analytic Datastore contains the data from the Cosmos DB sales container
  - Implementation language is Python (i.e. - PySpark).
- **Filter the sales data (by doctype, timestamp) while reading it**
- **Aggregate the sales data by customer_id**
- Displaying the "shape" of the dataframes, and observed schema
- Writing the aggregated **Materialized Views** of **sales-by-customer** to the Cosmos DB views container

---

#### Spark Notebook Code Location

- GitHub repo:  https://github.com/cjoakim/azure-cosmosdb-synapse-link
- File in repo: Synapse/notebooks/cosmos_nosql_sales_processing.ipynb

#### Loading the Cosmos DB Sales Container

- See GitHub repo: https://github.com/cjoakim/azure-cosmos-db
- From directory apis\nosql\python, run the following command:
```
> python main.py load_sales retail sales sales1.json 99999
```


#### Truncating the Materialized Views container

See GitHub repo: https://github.com/cjoakim/azure-cosmos-db
From directory '\apis\nosql\dotnet', run the following command:
```
> dotnet run truncate_container retail views
```

In [None]:
# Query the current epoch time, with simple Python code, to generate a query like this:
# SELECT count(1) FROM c where c._ts > 1694698000
# This SQL query will be used vs the Cosmos DB 'views' container, which is
# updated later in this notebook.
                                     
import time

epoch = int(time.time())
print('SELECT count(1) FROM c where c._ts > {}'.format(epoch))


In [None]:
# Load the Synapse Link Sales Data into a Spark Dataframe.
# Select just the "sale" document types from the sales container, 
# which have a minimum _ts (timestamp) value

from pyspark.sql.functions import col

# initialize variables
begin_timestamp = 1675718000  # 1675718528 
end_timestamp   = 1699999999

# read just the doctype "sales", not the "line_item" documents
# "cosmos.oltp" = CosmosDB live database
# "cosmos.olap" = Synapse Link Analytic Datastore

df_sales = spark.read\
    .format("cosmos.olap")\
    .option("spark.synapse.linkedService", "gbbcjcdbnosql_retail_db")\
    .option("spark.cosmos.container", "sales")\
    .load().filter(col("doctype") == "sale")\
    .filter(col("_ts") > begin_timestamp)\
    .filter(col("_ts") < end_timestamp)

display(df_sales.limit(3))


In [None]:
# Display the shape and observed schema of the DataFrame

print('df_sales, shape: {} x {}'.format(
        df_sales.count(), len(df_sales.columns)))
        
df_sales.printSchema()


In [None]:
# Aggregate Sales by Customer 

import pyspark.sql.functions as F 

df_customer_aggregated = df_sales.groupBy("customer_id") \
    .agg(
        F.first('id').alias('id'), \
        F.first('customer_id').alias('pk'), \
        F.count("customer_id").alias('order_count'), \
        F.sum("total_cost").alias("total_dollar_amount"), \
        F.sum("item_count").alias("total_item_count")) \
        .sort("customer_id", ascending=True)

display(df_customer_aggregated.limit(10))


In [None]:
# Display the shape and observed schema of the DataFrame

print('df_customer_aggregated, shape: {} x {}'.format(
        df_customer_aggregated.count(), len(df_customer_aggregated.columns)))
        
df_customer_aggregated.printSchema()


In [None]:
# Write the customer-aggregated DataFrame to the Cosmos DB
# sales_aggregates container.  The id and pk is the customer ID,
# and upserts are enabled.

df_customer_aggregated.write.format("cosmos.oltp")\
    .option("spark.synapse.linkedService", "gbbcjcdbnosql_retail_db")\
    .option("spark.cosmos.container", "views")\
    .mode('append')\
    .save()


## Query Cosmos DB to Validate this logic

#### Query the views container

4 orders containing 10 line items.

```
SELECT * FROM c where c.pk = 12

[
    {
        "customer_id": 12,
        "id": "e3aeb1b0-1259-4b85-a01b-42a99ac71be5",
        "pk": 12,
        "order_count": 4,
        "total_dollar_amount": 15209.61,
        "total_item_count": 10,
        "_rid": "FlhFAKC2+qBJJQAAAAAAAA==",
        "_self": "dbs/FlhFAA==/colls/FlhFAKC2+qA=/docs/FlhFAKC2+qBJJQAAAAAAAA==/",
        "_etag": "\"0a00b7b3-0000-0100-0000-6503070c0000\"",
        "_attachments": "attachments/",
        "_ts": 1694697228
    }
]
```

#### Query the sales container

```
SELECT count(1) from c where c.customer_id = 12 and c.doctype = 'sale'

[
    {
        "$1": 4
    }
]
```

```
SELECT count(1) from c where c.customer_id = 12 and c.doctype = 'line_item'

[
    {
        "$1": 10
    }
]
```

```
SELECT * FROM c where c.customer_id = 12 and c.doctype in ('sale', 'line_item')

[
    {
        "pk": 23356,
        "id": "986ddff0-45de-44ca-8926-c3fec2d54de4",
        "sale_id": 23356,
        "doctype": "line_item",
        "date": "2021-11-11",
        "line_num": 1,
        "customer_id": 12,
        "store_id": 70,
        "upc": "1315622651752",
        "price": 888.54,
        "qty": 1,
        "cost": 888.54,
        "doc_epoch": 1681936059668,
        "doc_time": "2023/04/19-20:27:39",
        "_rid": "FlhFAKBvTp068gEAAAAAAA==",
        "_self": "dbs/FlhFAA==/colls/FlhFAKBvTp0=/docs/FlhFAKBvTp068gEAAAAAAA==/",
        "_etag": "\"2400a498-0000-0100-0000-64404ebc0000\"",
        "_attachments": "attachments/",
        "_ts": 1681936060
    },
    {
        "pk": 23356,
        "id": "c904d9a2-6228-4b93-b795-6852fa83b96d",
        "sale_id": 23356,
        "doctype": "line_item",
        "date": "2021-11-11",
        "line_num": 2,
        "customer_id": 12,
        "store_id": 70,
        "upc": "0842812037290",
        "price": 387.29,
        "qty": 1,
        "cost": 387.29,
        "doc_epoch": 1681936059668,
        "doc_time": "2023/04/19-20:27:39",
        "_rid": "FlhFAKBvTp078gEAAAAAAA==",
        "_self": "dbs/FlhFAA==/colls/FlhFAKBvTp0=/docs/FlhFAKBvTp078gEAAAAAAA==/",
        "_etag": "\"2400a598-0000-0100-0000-64404ebc0000\"",
        "_attachments": "attachments/",
        "_ts": 1681936060
    },
    {
        "pk": 23356,
        "id": "e3aeb1b0-1259-4b85-a01b-42a99ac71be5",
        "sale_id": 23356,
        "doctype": "sale",
        "date": "2021-11-11",
        "dow": "thu",
        "customer_id": 12,
        "store_id": 70,
        "item_count": 2,
        "total_cost": 1275.83,
        "doc_epoch": 1681936059668,
        "doc_time": "2023/04/19-20:27:39",
        "_rid": "FlhFAKBvTp088gEAAAAAAA==",
        "_self": "dbs/FlhFAA==/colls/FlhFAKBvTp0=/docs/FlhFAKBvTp088gEAAAAAAA==/",
        "_etag": "\"2400a698-0000-0100-0000-64404ebc0000\"",
        "_attachments": "attachments/",
        "_ts": 1681936060
    },
    {
        "pk": 24134,
        "id": "e4bf34e5-7637-4621-9e69-77cd6f0906a6",
        "sale_id": 24134,
        "doctype": "line_item",
        "date": "2021-11-21",
        "line_num": 1,
        "customer_id": 12,
        "store_id": 48,
        "upc": "0947998477592",
        "price": 1432.94,
        "qty": 2,
        "cost": 2865.88,
        "doc_epoch": 1681936065596,
        "doc_time": "2023/04/19-20:27:45",
        "_rid": "FlhFAKBvTp3c-AEAAAAAAA==",
        "_self": "dbs/FlhFAA==/colls/FlhFAKBvTp0=/docs/FlhFAKBvTp3c-AEAAAAAAA==/",
        "_etag": "\"240046a3-0000-0100-0000-64404ec20000\"",
        "_attachments": "attachments/",
        "_ts": 1681936066
    },
    {
        "pk": 24134,
        "id": "b8c286f6-771d-4351-9e8b-da6df1fc41a0",
        "sale_id": 24134,
        "doctype": "line_item",
        "date": "2021-11-21",
        "line_num": 2,
        "customer_id": 12,
        "store_id": 48,
        "upc": "0475014653274",
        "price": 367.9,
        "qty": 2,
        "cost": 735.8,
        "doc_epoch": 1681936065596,
        "doc_time": "2023/04/19-20:27:45",
        "_rid": "FlhFAKBvTp3d-AEAAAAAAA==",
        "_self": "dbs/FlhFAA==/colls/FlhFAKBvTp0=/docs/FlhFAKBvTp3d-AEAAAAAAA==/",
        "_etag": "\"240047a3-0000-0100-0000-64404ec20000\"",
        "_attachments": "attachments/",
        "_ts": 1681936066
    },
    {
        "pk": 24134,
        "id": "9685a700-5e23-4a8d-b599-f2d32b2d994b",
        "sale_id": 24134,
        "doctype": "sale",
        "date": "2021-11-21",
        "dow": "sun",
        "customer_id": 12,
        "store_id": 48,
        "item_count": 2,
        "total_cost": 3601.68,
        "doc_epoch": 1681936065596,
        "doc_time": "2023/04/19-20:27:45",
        "_rid": "FlhFAKBvTp3e-AEAAAAAAA==",
        "_self": "dbs/FlhFAA==/colls/FlhFAKBvTp0=/docs/FlhFAKBvTp3e-AEAAAAAAA==/",
        "_etag": "\"240048a3-0000-0100-0000-64404ec20000\"",
        "_attachments": "attachments/",
        "_ts": 1681936066
    },
    {
        "pk": 25669,
        "id": "58685c8b-ca05-47dd-8ecd-33450f7bf3c3",
        "sale_id": 25669,
        "doctype": "line_item",
        "date": "2021-12-12",
        "line_num": 1,
        "customer_id": 12,
        "store_id": 98,
        "upc": "0086295211635",
        "price": 1306.7,
        "qty": 2,
        "cost": 2613.4,
        "doc_epoch": 1681936077573,
        "doc_time": "2023/04/19-20:27:57",
        "_rid": "FlhFAKBvTp3XEQIAAAAAAA==",
        "_self": "dbs/FlhFAA==/colls/FlhFAKBvTp0=/docs/FlhFAKBvTp3XEQIAAAAAAA==/",
        "_etag": "\"240041b8-0000-0100-0000-64404ece0000\"",
        "_attachments": "attachments/",
        "_ts": 1681936078
    },
    {
        "pk": 25669,
        "id": "069d1d12-0cfd-4288-813b-bbee409aee84",
        "sale_id": 25669,
        "doctype": "line_item",
        "date": "2021-12-12",
        "line_num": 2,
        "customer_id": 12,
        "store_id": 98,
        "upc": "1244173829375",
        "price": 982,
        "qty": 2,
        "cost": 1964,
        "doc_epoch": 1681936077573,
        "doc_time": "2023/04/19-20:27:57",
        "_rid": "FlhFAKBvTp3YEQIAAAAAAA==",
        "_self": "dbs/FlhFAA==/colls/FlhFAKBvTp0=/docs/FlhFAKBvTp3YEQIAAAAAAA==/",
        "_etag": "\"240042b8-0000-0100-0000-64404ece0000\"",
        "_attachments": "attachments/",
        "_ts": 1681936078
    },
    {
        "pk": 25669,
        "id": "a5067aca-1c69-4944-b818-d690db91f861",
        "sale_id": 25669,
        "doctype": "line_item",
        "date": "2021-12-12",
        "line_num": 3,
        "customer_id": 12,
        "store_id": 98,
        "upc": "0322228897430",
        "price": 149.68,
        "qty": 3,
        "cost": 449.04,
        "doc_epoch": 1681936077573,
        "doc_time": "2023/04/19-20:27:57",
        "_rid": "FlhFAKBvTp3ZEQIAAAAAAA==",
        "_self": "dbs/FlhFAA==/colls/FlhFAKBvTp0=/docs/FlhFAKBvTp3ZEQIAAAAAAA==/",
        "_etag": "\"240043b8-0000-0100-0000-64404ece0000\"",
        "_attachments": "attachments/",
        "_ts": 1681936078
    },
    {
        "pk": 25669,
        "id": "2badd187-8c77-47ef-b56c-0b89d4cd17a7",
        "sale_id": 25669,
        "doctype": "sale",
        "date": "2021-12-12",
        "dow": "sun",
        "customer_id": 12,
        "store_id": 98,
        "item_count": 3,
        "total_cost": 5026.44,
        "doc_epoch": 1681936077573,
        "doc_time": "2023/04/19-20:27:57",
        "_rid": "FlhFAKBvTp3aEQIAAAAAAA==",
        "_self": "dbs/FlhFAA==/colls/FlhFAKBvTp0=/docs/FlhFAKBvTp3aEQIAAAAAAA==/",
        "_etag": "\"240044b8-0000-0100-0000-64404ece0000\"",
        "_attachments": "attachments/",
        "_ts": 1681936078
    },
    {
        "pk": 28154,
        "id": "90191386-aaea-4100-8932-a999eb5693b5",
        "sale_id": 28154,
        "doctype": "line_item",
        "date": "2022-01-13",
        "line_num": 1,
        "customer_id": 12,
        "store_id": 1,
        "upc": "0349025882315",
        "price": 716.26,
        "qty": 1,
        "cost": 716.26,
        "doc_epoch": 1681936096880,
        "doc_time": "2023/04/19-20:28:16",
        "_rid": "FlhFAKBvTp3MMwIAAAAAAA==",
        "_self": "dbs/FlhFAA==/colls/FlhFAKBvTp0=/docs/FlhFAKBvTp3MMwIAAAAAAA==/",
        "_etag": "\"240036da-0000-0100-0000-64404ee10000\"",
        "_attachments": "attachments/",
        "_ts": 1681936097
    },
    {
        "pk": 28154,
        "id": "f2d41664-1803-414d-9ba8-224ed6c66430",
        "sale_id": 28154,
        "doctype": "line_item",
        "date": "2022-01-13",
        "line_num": 2,
        "customer_id": 12,
        "store_id": 1,
        "upc": "1244377188803",
        "price": 1079.94,
        "qty": 3,
        "cost": 3239.82,
        "doc_epoch": 1681936096880,
        "doc_time": "2023/04/19-20:28:16",
        "_rid": "FlhFAKBvTp3NMwIAAAAAAA==",
        "_self": "dbs/FlhFAA==/colls/FlhFAKBvTp0=/docs/FlhFAKBvTp3NMwIAAAAAAA==/",
        "_etag": "\"240037da-0000-0100-0000-64404ee10000\"",
        "_attachments": "attachments/",
        "_ts": 1681936097
    },
    {
        "pk": 28154,
        "id": "2986d010-3577-4eb7-8d25-224139847357",
        "sale_id": 28154,
        "doctype": "line_item",
        "date": "2022-01-13",
        "line_num": 3,
        "customer_id": 12,
        "store_id": 1,
        "upc": "0622915694213",
        "price": 1349.58,
        "qty": 1,
        "cost": 1349.58,
        "doc_epoch": 1681936096880,
        "doc_time": "2023/04/19-20:28:16",
        "_rid": "FlhFAKBvTp3OMwIAAAAAAA==",
        "_self": "dbs/FlhFAA==/colls/FlhFAKBvTp0=/docs/FlhFAKBvTp3OMwIAAAAAAA==/",
        "_etag": "\"240038da-0000-0100-0000-64404ee10000\"",
        "_attachments": "attachments/",
        "_ts": 1681936097
    },
    {
        "pk": 28154,
        "id": "fbd65be4-c327-41f9-b1ef-11f83c5fd17d",
        "sale_id": 28154,
        "doctype": "sale",
        "date": "2022-01-13",
        "dow": "thu",
        "customer_id": 12,
        "store_id": 1,
        "item_count": 3,
        "total_cost": 5305.66,
        "doc_epoch": 1681936096880,
        "doc_time": "2023/04/19-20:28:16",
        "_rid": "FlhFAKBvTp3PMwIAAAAAAA==",
        "_self": "dbs/FlhFAA==/colls/FlhFAKBvTp0=/docs/FlhFAKBvTp3PMwIAAAAAAA==/",
        "_etag": "\"240039da-0000-0100-0000-64404ee10000\"",
        "_attachments": "attachments/",
        "_ts": 1681936097
    }
]
```

