Skip to content

Latest commit

 

History

History
346 lines (231 loc) · 24.6 KB

README.md

File metadata and controls

346 lines (231 loc) · 24.6 KB

Logo


Introduction

DbEx provides database extensions for DbUp-inspired database migrations.


Status

The build status is CI with the NuGet package status as follows, including links to the underlying source code and documentation:

Package Status Source & documentation
DbEx NuGet version Link
DbEx.MySql NuGet version Link
DbEx.Postgres NuGet version Link
DbEx.SqlServer NuGet version Link

The included change log details all key changes per published version.


DbUp-inspired

DbUp is a .NET library that is used to deploy changes to relational databases (supports multiple database technologies). It tracks which SQL scripts have been run already, and runs the change scripts in the order specified that are needed to get a database up to date.

Traditionally, a Data-tier Application (DAC) is used to provide a logical means to define all of the SQL Server objects - like tables, views, and instance objects, including logins - associated with a database. A DAC is a self-contained unit of SQL Server database deployment that enables data-tier developers and database administrators to package SQL Server objects into a portable artifact called a DAC package, also known as a DACPAC. This is largely specific to Microsoft SQL Server. Alternatively, there are other tools such as redgate that may be used. DbUp provides a more explicit approach, one that Microsoft similarily adopts with the likes of EF Migrations.

DbEx provides additional functionality to improve the end-to-end experience of managing database migrations/updates leveraging the concepts of DbUp. DbEx prior to version 1.0.14 exclusively leveraged DbUb; however, due to the slow uptake of some key pull requests by the maintainers of DbUp which was starting to impose limitations on DbEx. The decision was made to emulate some functionality internally to achieve the functionality goals of DbEx. The changes are compatible with the underlying journaling that DbUp leverages (i.e. simulates the same).


Getting started

The easiest way to get started is to clone the repository and execute DbEx.Test.Console project, this will create a database with data.

DbEx.Test.Console git:(main)> export cs="Data Source=localhost, 1433;Initial Catalog=DbEx.Console;User id=sa;Password=Xxxxxx@123;TrustServerCertificate=true"
DbEx.Test.Console git:(main)> dotnet run -- -cv cs all

Next, create your own console app, follow the structure of DbEx.Test.Console project, add reference to https://www.nuget.org/packages/DbEx, then add in your SQL scripts.

Currently, the easiest way of generating scripts from an existing database, is to use the Generate Scripts feature of SQL Server Management Studio and copy its output.


Commands (functions)

The DbEx DatabaseMigrationBase provides the base database provider agnostic capability, with the likes of the SqlServerMigrator providing the specific Microsoft SQL Server implementation. This automates the functionality as specified by the MigrationCommand. One or more commands can be specified, and they will be executed in the order listed.

Command Description
Drop Drop the existing database (where it already exists).
Create Create the database (where it does not already exist).
Migrate Being the upgrading of a database overtime using order-based migration scripts; the tool is consistent with the philosophy of DbUp to enable.
CodeGen Provides opportunity to integrate a code-generation step where applicable (none by default).
Schema There are a number of database schema objects that can be managed outside of the above migrations, that are dropped and (re-)applied to the database using their native Create statement.
Reset Resets the database by deleting all existing data (exclusions can be configured).
Data There is data, for example Reference Data that needs to be applied to a database. This provides a simpler configuration than specifying the required SQL statements directly (which is also supported). This is also useful for setting up Master and Transaction data for the likes of testing scenarios.

Additional commands available are:

Command Description
All Performs all the primary commands as follows; Create, Migrate, CodeGen, Schema and Data.
Database Performs Create, Migrate, CodeGen and Data.
Deploy Performs Migrate and Schema.
DeployWithData Performs Deploy and Data.
DropAndAll Performs Drop and All.
DropAndDatabase Performs Drop and Database.
ResetAndAll Performs Reset and All (designed primarily for testing).
ResetAndData Performs Reset and Data (designed primarily for testing).
ResetAndDatabase Performs Reset and Database (designed primarily for testing).
Execute Executes the SQL statement(s) passed as additional arguments.
Script Creates a new migration script file using the defined naming convention.

Migrate

As stated, the DbUp approach is used enabling a database to be dropped, created and migrated. The migration is managed by tracking order-based migration scripts. It tracks which SQL scripts have been run already, and runs the change scripts that are needed to get the database up to date.

Over time there will be more than one script updating a single object, for example a Table. In this case the first script operation will be a Create, followed by subsequent Alter operations. The scripts should be considered immutable, in that they cannot be changed once they have been applied; ongoing changes will need additional scripts.

The migration scripts must be marked as embedded resources, and reside under the Migrations folder within the c# project. A naming convention should be used to ensure they are to be executed in the correct order; it is recommended that the name be prefixed by the date and time, followed by a brief description of the purpose. For example: 20181218-081540-create-demo-person-table.sql

A migration script can contain basic moustache value replacement syntax such as {{Company}}, this will then be replaced at runtime by the corresponding Company parameter value; see MigrationArgs.Parameters. These parameters (Name=Value pairs) can also be command-line specified.

It is recommended that each script be enclosed by a transaction that can be rolled back in the case of error; otherwise, a script could be partially applied and will then need manual intervention to resolve.

Note: There are special case scripts that will be executed pre- and post- migration deployments. In that any scripts ending with .pre.deploy.sql will always be executed before the migrations are attempted, and any scripts ending with .post.deploy.sql will always be executed after all the migrations have successfully executed. Finally, any scripts ending with .post.database.create.sql will only be executed when (after) the database is created.


Schema

There are some key schema objects that can be dropped and created overtime without causing side-effects. Equally, these objects can be code-generated reducing the effort to create and maintain over time. As such, these objects fall outside of the Migrations above.

The currently supported objects are (order specified implies order in which they are applied, and reverse when dropped to allow for dependencies):

  1. Type
  2. Function
  3. View
  4. Procedure

The schema scripts must be marked as embedded resources, and reside under the Schema folder within the c# project. Each script should only contain a single Create statement. Each script will be parsed to determine type so that the appropriate order can be applied.

A schema script script can contain basic moustache value replacement syntax such as {{Company}}, this will then be replaced at runtime by the corresponding Company parameter value; see MigrationArgs.Parameters. These parameters (Name=Value pairs) can also be command-line specified.

The Schema folder is used to encourage the usage of database schemas. Therefore, directly under should be the schema name, for example dbo or Ref. Then sub-folders for the object types as per Azure Data Studio, for example Functions, Stored Procedures or Types\User-Defined Table Types.


Data

Data can be defined using YAML or JSON to enable simplified configuration that will be used to generate the required SQL statements to apply to the database.

The data specified follows a basic indenting/levelling rule to enable:

  1. Schema - specifies Schema name.
  2. Table - specifies the Table name within the Schema; this will be validated to ensure it exists within the database as the underlying table schema (columns) will be inferred. The underyling rows will be inserted by default; or alternatively by prefixing with a $ character a merge operation will be performed instead.
  3. Rows - each row specifies the column name and the corresponding values (except for reference data described below). The tooling will parse each column value according to the underying SQL type.

Additionally, SQL script files can also be provided in addition to YAML and JSON where explicit SQL is to be executed.


Reference data

Reference Data is treated as a special case. The first column name and value pair are treated as the Code and Text columns; as defined via the DataParserArgs (see RefDataCodeColumnName and RefDataTextColumnName properties).

Where a column is a Reference Data reference the reference data code can be specified, with the identifier being determined at runtime (using a sub-query) as it is unlikely to be known at configuration time. The tooling determines this by the column name being suffixed by Id and a foreign-key constraint being defined.

Alternatively, a Reference Data reference could be the code itself, typically named XxxCode (e.g. GenderCode). This has the advantage of decoupling the reference data references from the underlying identifier. Where data is persisted as JSON then the code is used; this would ensure consistency. The primary disadvantage is that the code absolutely becomes immutable and therefore not easily changed; for the most part this typically is not an issue.


YAML/JSON configuration

Example YAML configuration for merging reference data is as follows.

Ref:
  - $Gender:
    - M: Male
    - F: Female

Example YAML configuration for inserting data (also inferring the GenderId from the specified reference data code) is as follows.

Demo:
  - Person:
    - { FirstName: Wendy, LastName: Jones, Gender: F, Birthday: 1985-03-18 }
    - { FirstName: Brian, LastName: Smith, Gender: M, Birthday: 1994-11-07 }
    - { FirstName: Rachael, LastName: Browne, Gender: F, Birthday: 1972-06-28, Street: 25 Upoko Road, City: Wellington }
    - { FirstName: Waylon, LastName: Smithers, Gender: M, Birthday: 1952-02-21 }
  - WorkHistory:
    - { PersonId: 2, Name: Telstra, StartDate: 2015-05-23, EndDate: 2016-04-06 }
    - { PersonId: 2, Name: Optus, StartDate: 2016-04-16 }

Additionally, to use an IIdentifierGenerator to generate the identifiers the DataParserArgs IdentifierGenerator property must be specified (this defaults to GuidIdentifierGenerator). For this to be used the ^ prefix must be specified for each corresponding table (must opt-in); must occur after $ merge character where specified. Example as follows.

Ref:
  - $^Gender:
    - { Code: M, Text: Male, TripCode: Male }
Demo:
  - ^Person:
    - { FirstName: Wendy, LastName: Jones, Gender: F, Birthday: 1985-03-18 }

Runtime values can be used within the YAML using the value lookup notation; this notation is ^(Key). This will either reference the DataParserArgs Parameters property using the specified key. There are two special parameters, being UserName and DateTimeNow, that reference the same named DataParserArgs properties. An additional special parameter being GuidNew, that results in a Guid.NewGuid. Where not found the extended notation ^(Namespace.Type.Property.Method().etc, AssemblyName) is used. Where the AssemblyName is not specified then the default mscorlib is assumed. The System root namespace is optional, i.e. it will be attempted by default. The initial property or method for a Type must be static, in that the Type will not be instantiated. These parameters (Name=Value pairs) can also be command-line specified.

Additionally, a column can be set with a guid representation of an integer where specified using shorthand notation; i.e. replace ^n values where n is an integer with a guid equivalent; e.g. ^1 will be converted to 00000001-0000-0000-0000-000000000000. The DataParserArgs.ReplaceShorthandGuids had been added to control this behavior (defaults to true).

Example as follows.

Demo:
  - Person:
    - { PersonId: ^1, FirstName: Wendy, Username: ^(System.Security.Principal.WindowsIdentity.GetCurrent().Name,System.Security.Principal.Windows), Birthday: ^(DateTimeNow) }
    - { PersonId: ^2, FirstName: Wendy, Username: ^(Beef.ExecutionContext.EnvironmentUsername,Beef.Core), Birthday: ^(DateTime.UtcNow) }

Advanced capabilities, such as nested YAML/JSON can be specified to represent hierarchical relationships (see contact->addresses within test data.yaml and related TableNameMappings to map to the correct underlying database table). DataConfig can also be specified using the * schema to control the behaviour within the context of a YAML/JSON file as demonstrated by the test ContactType.json.


Console application

DbEx has been optimized so that a new console application can reference and inherit the underlying capabilities.

Where executing directly the default command-line options are as follows.

Xxx Database Tool.

Usage: Xxx [options] <command> <args>

Arguments:
  command                    Database migration command (see https://github.com/Avanade/dbex#commands-functions).
                             Allowed values are: None, Drop, Create, Migrate, CodeGen, Schema, Deploy, Reset, Data, DeployWithData, Database, DropAndDatabase, All, DropAndAll,
                             ResetAndData, ResetAndDatabase, ResetAndAll, Execute, Script.
  args                       Additional arguments; 'Script' arguments (first being the script name) -or- 'Execute' (each a SQL statement to invoke).

Options:
  -?|-h|--help               Show help information.
  -cs|--connection-string    Database connection string.
  -cv|--connection-varname   Database connection string environment variable name.
  -so|--schema-order         Database schema name (multiple can be specified in priority order).
  -o|--output                Output directory path.
  -a|--assembly              Assembly containing embedded resources (multiple can be specified in probing order).
  -p|--param                 Parameter expressed as a 'Name=Value' pair (multiple can be specified).
  -eo|--entry-assembly-only  Use the entry assembly only (ignore all other assemblies).
  --accept-prompts           Accept prompts; command should _not_ stop and wait for user confirmation (DROP or RESET commands).

The DbEx.Test.Console demonstrates how this can be leveraged. The command-line arguments need to be passed through to support the standard options. Additional methods exist to specify defaults or change behaviour as required. An example Program.cs is as follows.

using DbEx.Console;
using System.Threading.Tasks;

namespace DbEx.Test.Console
{
    public class Program 
    {
        internal static Task<int> Main(string[] args) => SqlServerMigratorConsole
            .Create<Program>("Data Source=.;Initial Catalog=DbEx.Console;Integrated Security=True")
            .RunAsync(args);
    }
}

Tip: To ensure all files are included as embedded resources add the following to the .NET project:

<ItemGroup>
  <EmbeddedResource Include="Schema\**\*" />
  <EmbeddedResource Include="Migrations\**\*" />
  <EmbeddedResource Include="Data\**\*" />
</ItemGroup>

Script command

To simplify the process for the developer DbEx enables the creation of new migration script files into the Migrations folder. This will name the script file correctly and output the basic SQL statements to perform the selected function. The date and time stamp will use DateTime.UtcNow as this should avoid conflicts where being co-developed across time zones.

This requires the usage of the Script command, plus zero or more optional arguments where the first is the sub-command (these are will depend on the script being created). The optional arguments must appear in the order listed; where not specified it will default within the script file. Depending on the database provider not all of the following will be supported.

The following shows the Script sub-commands for SQL Server. Use --help to see the commands available at rubntime.

Sub-command Argument(s) Description
N/A N/A Creates a new empty skeleton script file.
Schema Schema and Table Creates a new table create script file for the named schema and table.
Create Schema and Table Creates a new table create script file for the named schema and table.
RefData Schema and Table Creates a new reference data table create script file for the named schema and table.
Alter Schema and Table Creates a new table alter script file for the named schema and table.
CdcDb N/A Creates a new sys.sp_cdc_enable_db script file for the database.
Cdc Schema and Table Creates a new sys.sp_cdc_enable_table script file for the named schema and table.

Examples as follows.

dotnet run script
dotent run script schema Foo
dotnet run script create Foo Bar
dotnet run script refdata Foo Gender
dotnet run script alter Foo Bar
dotnet run script cdcdb
dotnet run script cdc Foo Bar

Execute command

The execute command allows one or more SQL Statements, and/or Script files, to be executed directly against the database. This is intended for enabling commands to be executed only. No response other than success or failure will be acknowledged; as such this is not intended for performing queries.

Examples as follows.

dotnet run execute "create schema [Xyz] authorization [dbo]"
dotnet run execute ./schema/createscehma.sql

Infer database schema

Within a code-generation, or other context, the database schema may need to be inferred to understand the basic schema for all tables and their corresponding columns.

The Database class provides a SelectSchemaAsync method to return a DbTableSchema list, including the respective columns for each table (see DbColumnSchema).


Other considerations

To simplify the database management here are some further considerations that may make life easier over time; especially where you adopt the philosophy that the underlying busines logic (within the application APIs) is primarily responsible for the consistency of the data; and the data source (the database) is being largely used for storage and advanced query:

  • Minimise constraints - do not use database constraints unless absolutely necessary; only leverage where the database is the best and/or most efficient means to perform; i.e. uniqueness. The business logic should validate the request to ensure that any related data is provided, is valid and consistent.
  • No cross-schema referencing - avoid referencing across Schemas where possible as this may impact the Migrations as part of this tooling; and we should not be using constraints as per prior point. Each schema is considered independent of others (where using a schema per domain) except in special cases, such as dbo or sec (security where used) for example.
  • JSON for schema-less - where there is data that needs to be persisted, but rarely searched on, a schema-less approach should be considered such that a JSON object is persisted into a single column versus having to define additional tables and/or columns. This can further simplify the database requirements where the data is hierarchical in nature. To enable the ObjectToJsonConverter and AutoMapperObjectToJsonConverter can be used within the corresponding mapper to enable.
  • Nullable everything - all columns (except) the primary key should be defined as nullable. The business logic should validate the request to ensure data is provided where mandatory. Makes changes to the database schema easier over time without this constraint.

Other repos

These other Avanade repositories leverage DbEx:

  • NTangle - Change Data Capture (CDC) code generation tool and runtime.
  • Beef - Business Entity Execution Framework to enable industralisation of API development.

License

DbEx is open source under the MIT license and is free for commercial use.


Contributing

One of the easiest ways to contribute is to participate in discussions on GitHub issues. You can also contribute by submitting pull requests (PR) with code changes. Contributions are welcome. See information on contributing, as well as our code of conduct.


Security

See our security disclosure policy.


Who is Avanade?

Avanade is the leading provider of innovative digital and cloud services, business solutions and design-led experiences on the Microsoft ecosystem, and the power behind the Accenture Microsoft Business Group.