SSIS Packages Data Load Type
In ODE version prior to 5 you could only load the data into Data Vault from the source database on the same server where the configuration database is implemented. ODE version 5 could get the data into the Data Vault from another server via SSIS (Microsoft SQL Server Integration Services). ODE version 5 includes a feature that could execute an SSIS package to load the data into a stage table as a part of the table load. At the moment ODE would be able to execute only SSIS packages generated in a specific way based on the Configuration for that table with BimlExpress, a free tool from Varigence, or the advanced tool BimlStudio. Generated package could work with MS SQL Server Change Data Capture (CDC). At the moment only supported data sources are MS SQL Server and Oracle databases.
SSIS is used on the first step of the data load. ODE SSIS land data into stage area on the server where ODE is installed, then the usual ODE process loads this data into the Data Vault hubs and satellites.
Configuring SSIS data load
Create Stage and Vault databases
Follow the instructions here to create Stage and Vault databases. This scripts create a special table that is required to store SSIS execution information.
Configure databases and connections
Follow the steps from this article to make record stage database and source connection in Config.
Provision stage table
Provision the stage table structure based on the source table. The easiest way to do it, right-click on the source table in SQL Management Studio, script up a table creation statement in the new window. Switch the query window connection to the server where ODE is installed, and execute against the stage database. Keys and other constraints are not required. This will create an empty stage table with the exact source table structure; this is enough for ODE to get the metadata from. You can edit the table creation script before execution on stage database to exclude unwanted columns, they will be ignored by SSIS load process. Note that column names and types should stay as they are in source.
Configure ensemble and source table
Configure an ensemble using this helper script. Set StageLoadType to "MSSQLcdc" if your source database is MS SQL Server and you have MS CDC enabled on the instance; otherwise set to "Full" to load data in full snapshot mode. The satellite structure is derived from the stage table provisioned on the previous step.
One of the output queries replaces the provisioned stage table with the actual one with some extra fields required for SSIS execution.
Create new BIML project
The following steps describe how to generate packages in BimlStudio. Steps for BimlExpress are slightly different due to the tool difference. BimlExpress is an add-on to Visual Studio while BimlStudio is a standalone IDE.
Create a new project in BimlStudio. In the project settings check your SQL Server and SSIS versions. ODE target platform is SQL Server 2016, but we have found that packages generated for 2014 version work correctly.
Tick "Use Project Deployment" box.
Add ODE BIML scripts into the project
Add four BIML files to this project, you can find them in the folder "BIMLScripts" in the ODE project folder you have downloaded from GitHub. Copy the whole folder into the project's folder "addedBiml" on disk. Then right-click on the project name in the "Project View" panel and select “Include Descendants in Project”.
Edit the first BIML file called "Include_10.biml".
- Put in the source system name you have configured for this source. You have created a source system with this name on the preparatory stage and have already used in the ensemble creation script.
- Put in a table name for the next parameter or leave it set to "ALL". BimlStudio project will generate an SSIS package per all the source tables configured for this source, or re-generate it if it already exists. In case if SSIS packages are required to be generated for a specific list of tables, e.g. for a freshly configured only, provide a comma separated list of tables in this parameter.
- Put in the connection to ODE Config database. This is where BimlStudio will get the packages metadata from.
- You can edit the project file name prefix if you want, the default is "DV_".
Generate project from Config
If you set "10_Environments.biml" and "20_Build.biml" files to be "Live", then just wait until changes come through.
Alternatively, you'll need to execute two files manually. Right-click on the next BIML file "10_Environments.biml". Wait until the BimlStudio generates the code. Once it's finished, execute the next file "20_Build.biml". This will generate BIML packages for all the configured and included source tables.
Generate SSIS package
Check if the project name matches the "package_project" variable value you have provided when you configured the connection. In our example it was "DV_AdventureWorks". Right-click on project name and select "Build". Generated SSIS packages could be found in the output folder.
Deploy SSIS packages to server
If it doesn't exist yet, in the Integration Services catalog on your server, create the folder with the same name you have provided for the "package_folder" variable in the connection. In our example it was "Vault_External_Source". From the Visual Studio, deploy the project into this folder. To do that, right-click on the project in the Solution explorer, choose "Deploy". Provide a server name and select the folder.
Create execution jobs
A few SQL Agent jobs should be created to enable the execution. Create a job called "ODE - Scheduled Task Execution". Add a new step to it to execute the following code against the Config database:
Schedule it to run every 10 seconds. This is kind of execution thread. You can create a few of these jobs to execute multiple table loads in parallel if your server have enough resources for that. Separate the execution start time by 1 second between jobs, so that they start every next second instead of starting all at once. These threads pick up tasks from the queue, so multiple table loads could run in parallel.
Execute Full load to populate the data
Execute the stored procedure for the schedule you've configured. It was 'Full_load' in the example:
EXECUTE [dv_scheduler].[dv_process_schedule] 'Full_Load'
This will execute all the tables in this list, bring data from the source server via SSIS and load the data into the ensembles. The first load must be a full load even if you want the subsequent loads to run in CDC mode.
Each generated SSIS package have two load types implemented, full and CDC. Full mode loads snapshots of the source data i.e. load a current state of the data. If same record has been changed multiple times since the last ODE load executions, only the current state of the record is captured. CDC mode uses the MS SQL CDC functionality and captures all the changes for each record. If record has been changed multiple times since the last ODE load execution, all the changes will be sequentially added to the Data Vault satellite. This makes your raw vault a persistent storage of all the record changes.