ETL Process for WideWorldImporters
This SSIS project performs ETL from the transactional database WideWorldImporters into the OLAP database WideWorldImportersDW for long-term storage and analytics.
About this sample
- Applies to: SQL Server 2016 (or higher)
- Key features: Core database features
- Workload: ETL
- Programming Language:
- Authors: Greg Low, Denzil Ribeiro, Jos de Bruijn
- Update history: 25 May 2016 - initial revision
Before you begin
To run this sample, you need the following prerequisites.
- The databases WideWorldImporters and WideWorldImportersDW running on SQL Server 2016 (or higher) or Azure SQL Database. These can be on different servers.
- Visual Studio 2015 Update 3 (or higher). At the time of writing, Visual Studio 2017 does not yet support Integration Services projects. You will need to install Visual Studio 2015 to open the SSIS project.
- SQL Server 2016 (or higher) Integration Services.
- This needs to be installed on the same machine as Visual Studio to be able to build the project.
- Make sure you have already created an SSIS Catalog. If not, to do that, right click Integration Services in Object Explorer, and choose to add catalog. Follow the defaults. It will ask you to enable sqlclr and provide a password.
Running the sample
Open the solution file wwi-ssis.sln in Visual Studio.
Build the solution. This will create an SSIS package Daily ETL.ispac under Daily ETL\bin\Development.
Deploy the SSIS package.
- Open the "Daily ETL.ispac" package from Windows Explorer. This will launch the Integration Services Deployment Wizard.
- Under "Select Source" follow the default Project Deployment, with the path pointing to the "Daily ETL.ispac" package.
- Under "Select Destination" enter the name of the server that hosts the SSIS catalog.
- Select a path under the SSIS catalog, for example under a new folder "WideWorldImporters".
- Finalize the wizard by clicking Deploy.
Create a SQL Server Agent job for the ETL process.
- In SSMS, right-click "SQL Server Agent" and select New->Job.
- Pick a name, for example "WideWorldImporters ETL".
- Add a Job Step of type "SQL Server Integration Services Package".
- Select the server with the SSIS catalog, and select the "Daily ETL" package.
- Under Configuration->Connection Managers ensure the connections to the source and target are configured correctly. The default is to connect to the local instance.
- Click OK to create the Job.
Execute or schedule the Job.
The ETL package Daily ETL.ispac is used to migrate data from the WideWorldImporters database to the WideWorldImportersDW database as the data changes. The package is run periodically (most commonly daily).
The design of the package uses SSIS to orchestrate bulk T-SQL operations (rather than as separate transformations within SSIS) to ensure high performance.
Dimensions are loaded first, followed by Fact tables. The package can be re-run at any time after a failure.
The workflow is as follows:
It starts with an expression task that works out the appropriate cutoff time. This time is the current time less a few seconds. (This is more robust than requesting data right to the current time). It then truncates any milliseconds from the time.
The main processing starts by populating the Date dimension table. It ensures that all dates for the current year have been populated in the table.
After this, a series of data flow tasks loads each dimension, then each fact.
The code included in this sample is not intended to be used for production purposes.
For more information, see these articles: