# Change Data Capture (CDC)


##1. What is Change Data Capture

It is a data integration pattern that captures changes made to a data in a source system.\
These changes can be inserts, updates, and deletes.\
These changes are represented in the form of a list. This list is called the Change Data Feed (CDF).\
SQL Server, MySQL, Oracle, and other transactional databasebases generate CDC Feeds.\
Delta tables generate their own CDC feeds called CDFs.


##2. Processing a Change Data Feed (CDF)

The processing of a CDF is known as _slowly changing dimensions_ (SCD).\
Two types of SCDs:
1. **SCD Type 1:** Keep the latest changes, overwriting the existing data.
2. **SCD Type 2:** Keep the history of changes of the data. Each version of the data is timestamped to allow users to trace when a change occured.

In [0]:
# update these to the catalog and schema where you have permissions
# to create tables and views.

catalog = "mycatalog"
schema = "myschema"
employees_cdf_table = "employees_cdf"

def write_employees_cdf_to_delta():
 data = [
   (1, "Alex", "chef", "FR", "INSERT", 1),
   (2, "Jessica", "owner", "US", "INSERT", 2),
   (3, "Mikhail", "security", "UK", "INSERT", 3),
   (4, "Gary", "cleaner", "UK", "INSERT", 4),
   (5, "Chris", "owner", "NL", "INSERT", 6),
   # out of order update, this should be dropped from SCD Type 1
#    (5, "Chris", "manager", "NL", "UPDATE", 5)
#    (6, "Pat", "mechanic", "NL", "DELETE", 8),
#    (6, "Pat", "mechanic", "NL", "INSERT", 7)
 ]
 columns = ["id", "name", "role", "country", "operation", "sequenceNum"]
 df = spark.createDataFrame(data, columns)
 df.write.format("delta").mode("overwrite").saveAsTable(f"{catalog}.{schema}.{employees_cdf_table}")

write_employees_cdf_to_delta()
fff