# Concurrency Control

This notebook will guide you with basics delta lake concurrency control.

Public dataset - [Covid tracking](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-covid-tracking?tabs=pyspark#azure-synapse)

1. Test 1 - Concurrent read vs write activity
2. Test 2 - Concurrent write vs write activity
3. Test 3 - Concurrent write vs write activity in a partitioned table


In [None]:
blob_account_name = "pandemicdatalake"
blob_container_name = "public"
blob_relative_path = "curated/covid-19/covid_tracking/latest/covid_tracking.parquet"
blob_sas_token = r""
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)

spark.conf.set(
    'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
    blob_sas_token)
df = spark.read.parquet(wasbs_path)

spark.sql("DROP TABLE IF EXISTS covid")
spark.sql("DROP TABLE IF EXISTS covid_partitioned")

df.write.mode("overwrite").format("delta").saveAsTable("covid")
df.write.mode("overwrite").partitionBy("state").format("delta").save("Tables/covid_partitioned")


## Test 1 - Concurrent read vs write activity

Start a write activity. After starting the cell below, immediately go to Notebook 03 - Concurrency Control Part 2 and start cell 4 to test concurrent read vs write activity

In [None]:
%%sql
UPDATE demo.covid SET positive = positive * 1.2 where state = 'AK' 

## Test 2 - Concurrent write vs write activity

Start a write activity again to test concurrent write activity. After running the cell below, go to running Notebook 03 - Concurrency Control Part 2 cell #7

In [None]:
import time
from delta.tables import * 
from pyspark.sql.functions import *
covid_df = DeltaTable.forName(spark, 'covid')  


def slowConcurrency():
  time.sleep(2)
  covid_df.update(
  condition = col("state") == 'AK',
  set = { "positive": expr("positive *1.2") })

slowConcurrency() 

## Test 3 - Concurrent write vs write activity in a partitioned table

Start a write activity again to test concurrent write activity. After running the cell below, go to running Notebook 03 - Concurrency Control Part 2 cell # 

In [None]:
import time
from delta.tables import * 
from pyspark.sql.functions import *
covid_part_df = DeltaTable.forName(spark, 'covid_partitioned')  


def slowConcurrency():
  time.sleep(2)
  covid_part_df.update(
  condition = col("state") == 'AK',
  set = { "positive": expr("positive *1.2") })

slowConcurrency() 

# Clean up

In [None]:
spark.sql("DROP TABLE IF EXISTS covid")
spark.sql("DROP TABLE IF EXISTS covid_partitioned")