Skip to content

Latest commit

 

History

History
236 lines (147 loc) · 23.6 KB

SqlServer.md

File metadata and controls

236 lines (147 loc) · 23.6 KB

Microsoft SQL Server

This documents the nTangle architecture and corresponding code-generation when targeting a Microsoft SQL Server database.


Overview

At its core NTangle is a database-driven code-generation solution, in that it leverages an existing database to infer its underlying schema (tables and columns). A YAML (or JSON) configuration file is required to define the CDC requirements, being the root and child tables, and their relationships, representing the entity (aggregate root). The nTangle code-gen tool connects to the database, and using the YAML configuration generates the required database and .NET publishing runtime artefacts. This publishing runtime then uses the database's CDC capabilities to detect changes to the underlying tables and trigger the creation of the related entity events that are then published.

Overview

The core components are:

  • Database - the existing tables plus generated nTangle runtime artefacts (including outbox);
  • Config - the nTangle code-gen configuration;
  • Tooling - the nTangle code generator and resulting generated publisher .NET runtime;
  • Events - the resulting published entity event (depicted as a CloudEvent).

Change-data-capture (CDC)

This official documentation describes the Microsoft SQL Server CDC-capabilities.

In addition, this article provides an excellent overview of the Microsoft SQL Server CDC-capabilities and walks through the process of setting up and using to aid in the fundamental understanding of this key dependent capability.


Performance considerations

The introduction of CDC into a database can have an impact on the database performance, and as such it is important to consider this impact; see performance considerations.

Within the nTangle cofiguration the CdcEnable property can be used to enable/disable the automatic generation of the CDC-related sys.sp_cdc_enable_table TSQL code. This allows the developer and/or database administrator to manually enable CDC on the required tables external to nTangle code generation. By default, the CdcEnable property is set to false.

Given the above, it is generally recommended that CDC enablement is explicity managed by the developer and/or database administrator. Additionally, as schema changes are made to the underlying tables the CDC configuration may need to be changed accordingly; this does not happen automatically within SQL Server; see this article for more information.


CDC vs Change Tracking

Microsoft SQL Server provides two change tracking capabilities, namely CDC and Change Tracking. Change Tracking differs in that it only captures the fact that rows in a table were changed, but doesn't capture the data that was changed. Given the multiple related table hierarchy triggering that nTangle enables the joining columns as well as the primary key columns are required to be captured; this is not possible with Change Tracking as it only captures the primary key columns; therefore, making it unsuitable for walking back up the join hierarchy to determine the primary entity (aggregate root) that was changed. And largely impossible where a child table row has been physically deleted.


Architecture

The following represents the high-level conceptual run-time architecture for a solution leveraging nTangle.

Architecture


The key components are as follows.

  • Orchestrator - one per entity (aggregate root) change tracking and event publisher orchestrator (execution phases depicted).
  • Hosted service - one per entity (aggregate root) timer-based hosted service responsible for executing the orchestrator (optional).
  • Outbox dequeue publisher - event outbox dequeue and publishing (execution phases depicted).
  • Outbox dequeue hosted service - timer-based hosted service responsible for executing the outbox dequeue publisher.

Orchestrator

The XxxOrchestrator is responsible for the primary orchestration of the CDC-related change tracking and event publishing - this is essentially the "secret sauce" behind nTangle. This is achieved by executing (orchestrating) the following phases in the order specifed.

  • Change detection - detect changes to the underlying tables and package within a batch;
  • Consolidation - consolidate changes to minimize redundant event publishing;
  • Identifier mapping - assign global identifier mappings;
  • Versioning - version events to minimize publishing with same content;
  • Publishing - publish / send events to selected destination;
  • Completion - complete batch and record latest version hashes.

The XxxOrchestrator inherits from EntityOrchestrator, which in turn inherits from EntityOrchestratorBase, to enable the standardized orchestration processing.

An orchestrator is generated per entity (the aggregate root which has a naming convention of XxxCdc). The entity is essentially the .NET representation of the configured root and child table hierarchy, and selected columns as properties.


Change detection

The orchestrator will select (detect) the changes as enabled by the SQL Server CDC capabilities. This phase is achieved by invoking the spXxxBatchExecute stored procedure and updating the result sets into the .NET entity equivalents. The following stored procedure steps are all performed within a database transactional context.

Step Description
Batch check Check if there is already an incomplete Batch and attempt to reprocess; otherwise, create a new Batch (only where data is found to be processed). The batch tracking is persisted in the XxxBatchTracking table.
LSN check Get minimum and maximum LSNs for each table. If the minimum is less than previous Batch minimum then there is a CDC data loss scenario and some changes will be lost as a result; this will error unless option to continue with data loss is selected.
Root CDC Gets all CDC changes (create, update and delete) for the table (see fn_cdc_get_all_changes_) up to the maximum query size. Store the result into a temporary #changes table.
Child CDC Gets all CDC changes (create, update and delete) for each child table up to the maximum query size joining against the parent table(s) to ensure existence. Append the result into the temporary #changes table where distinct (as per root table primary key).
Batch query Upsert batch data and then select result set for the latest Batch details (will be in an incomplete state).
Root query Select result set using the temporary #changes table to left outer join to the root table (latest), then left outer joins to VersionTracking (last version hash) and IdentifierMapping (for selected colums where configured).
Child query Select result set using the temporary #changes for each child table using inner joins to ensure data is selected for only what currently exists within the database (latest), then left outer joins to IdentifierMapping (for selected colums where configured).

Consolidation

The orchestrator will consolidate the selected data returned from the change detection phase. The following consolidations are performed for each distinct primary key from the root table (being the aggregated entity), in order specified. The result of the consolidation phase is that there should be zero or more records (one per primary key from the root table).

Step Description
Delete wins Where there is a created and/or, one or more updates, followed by a delete, then only the delete will be considered for publishing. Can not include intermediary create/update events as the data is no longer available (physically deleted).
Create wins Where there is a create, followed by one or more updates, then only the create will be considered for publishing.

Where a delete is referenced above, this relates to both physical and logical deletes equally (as per configuration).


Identifier mapping

The orhestrator (where identifier mapping is configured) will assign new global identifiers generated using IIdentifierGenerator<T> to each of the selected columns where no value was previously selected during the change detection phase.

The spIdentifierMappingCreate stored procedure is then invoked to persist the mappings into the IdentifierMapping table; where mapping is already assigned, then the previously assigned value will be returned for use and the newly allocated value discarded. This may occur where concurrent access is being performed against the same related identifier.


Versioning

The orchestrator will version each record by JSON serializing the data (removing any properties that should be excluded) and then SHA256 hashed. This is compared to the existing version (from the VersionTracking table) selected during the change detection phase. Where the version has not changed for the record then no publish will occur; avoids sending same event content more than once.


Publishing

The orchestrator will instantiate an EventData per record, then invoke the IEventPublisher.Publish() passing all events to be published.

The default is to use the generated EventOutboxEnqueue which enqueues all events using stored procedure spEventOutboxEnqueue into the EventOutbox and EventOutboxData tables. This can be configured to use an alternate IEventPublisher where required.

There are multiple advantages of using the event outbox, a) a log of published events will be maintained, b) performance of orchestrator may be improved, c) dependency of external destination removed from orchestrator, and d) multiple sends will have the same event identifier which is useful in duplicate detection scenarios guaranteeing at least once delivery.


Completion

The orchestrator will complete the batch and update the latest version tracking hashes. This phase is managed by the spXxxBatchComplete stored procedure. The following stored procedure steps are all performed within a database transactional context.

Step Description
Batch complete Update the batch as complete within the XxxBatchTracking table, including updating the completed date/time.
Version tracking Create or update (merge) the latest versions for each of the records published into the VersionTracking table.

Hosted service

The XxxHostedService is responsible for hosting the Orchestrator at runtime. This is a synchronized timer-based hosted service which will continue to execute on a configured interval until the process host is stopped.

An IServiceSynchronizer is required by the XxxHostedService to ensure that only a single serivce per root table is running at any given time. This is primarily needed to ensure that the explicit order of changes in honored; otherwise, events for the root table would be emitted out of sequence.

The XxxHostedService is not explicitly required and can be replaced with any other host implementation, i.e. timer-based trigger Azure Function that directly invokes the XxxOrchestrator. As above, this host will need to manage the sychronized singleton behavior; see BlobLeaseSynchronizer.


Outbox dequeue publisher

The EventOutboxHostedService is responsible for managing the dequeue of events from the database and then sending. This is achieved by executing the following phases in the order specifed.

  • Dequeue - dequeue one or more events from the outbox;
  • Sending - send events to selected destination.

The phases are encapsulated within a database transaction to ensure that the publishing completes successfully before the dequeue is committed. As a result this will ensure guaranteed delivery, but may result in messages being sent more than once. The event receiver may be required to perform duplicate detection on the events where duplicates can not be tolerated.

The EventOutboxHostedService property EventOutboxDequeueFactory must be set to instantiate an instance of the EventOutboxDequeueBase. This instance is the generated class EventOutboxDequeue, which in turn manages the final IEventSender to send the events to the final destination.


Dequeue

The events will be dequeued (up to the maximum dequeue size) from the database using stored procedure spEventOutboxDequeue; this will dequeue the events from the underlying EventOutbox and EventOutboxData tables.


Sending

The EventOutboxDequeue manages the final IEventSender to send the events to the final destination.

To leverage the likes of Azure Service Bus consider the CoreEx ServiceBusSender.


Outbox dequeue hosted service

The EventOutboxHostedService is responsible for hosting the Outbox dequeue publisher at runtime. This is a synchronized timer-based service which will continue to execute on a configured interval until the process host is stopped.


Code-generation

The code-generator will leverage the ntangle.yaml configuration to generate both Database and .NET artefacts.

Where Xxx is referenced in the artefact name this is replaced with the name of the entity (root aggregate). Also, the artefact name represents the default, there are opportunities within the ntangle.yaml to change a number of these where applicable.

Depending on the whether using DbEx or DACPAC for managing the Database schema and deployment then the naming and placement of some artefacts may differ. Finally, features such as event outbox and identity mapping are configurable. Where not leveraged their respective artefacts will not be generated.


Database

The AppName.Database project generated artefacts are as follows.

Type Artefact Description
Script CdcEnable Turns CDC on for the selected tables.
Schema NTangle Creates the database schema.
Stored procedure spXxxBatchComplete Performs the batch completion for an entity.
Stored procedure spXxxBatchExecute Performs the batch execution for an entity.
Stored procedure spXxxBatchReset Performs the batch LSN reset for an entity.
Stored procedure spEventOutboxDequeue Performs the event outbox dequeue.
Stored procedure spEventOutboxEnqueue Performs the event outbox enqueue.
Stored procedure spIdentifierMappingCreate Performs the indentifier mapping creation.
Table XxxBatchTracking Batch tracking.
Schema EventOutbox Creates the Event outbox database schema.
Table EventOutbox Event outbox.
Table EventOutboxData Event outbox data.
Table IdentifierMapping Identifier mapping.
Table VersionTracking Version (hash) tracking.
UDT udtEventOutboxList Event outbox list user-defined type.
UDT udtIdentifierMappingList Identifier mapping list user-defined type.
UDT udtVersionTrackingList Version tracking list user-defined type.

.NET

The AppName.Publisher project .NET generated artefacts are as follows.

Namespace Artefact Description
ServiceCollectionExtensions IServiceCollection extension methods for Dependency Injection (DI) set up.
Data XxxOrchestrator Change tracking and event publisher orchestrator.
Data EventOutboxDequeue Event outbox dequeue.
Data EventOutboxEnqueue Event outbox enqueue.
Data IdentifierMappingMapper Identifier mapping database to .NET mapper.
Data VersionTrackingMapper Version tracking database to .NET mapper.
Entities XxxCdc Entity (aggregate root) representation of database table(s) and relationships.
Services XxxHostedService Timer-based host for the XxxCdcOrchestrator.