Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'System.OutOfMemoryException' was thrown on a Large Table #19

Closed
deyshin opened this issue Aug 12, 2016 · 3 comments
Closed

'System.OutOfMemoryException' was thrown on a Large Table #19

deyshin opened this issue Aug 12, 2016 · 3 comments

Comments

@deyshin
Copy link

deyshin commented Aug 12, 2016

The script has been very helpful to my project. Thank you very much.

I have run into one problem - When I run the script on a very large table, I get the following error:

Exception of type 'System.OutOfMemoryException' was thrown.

The particular table has 4 columns - 2 Int32 and 2 String type with about 1.7 million rows of data. Is it something that we can solve - or is it a inherent system limitation?

@dnlnln
Copy link
Owner

dnlnln commented Mar 28, 2019

The tool was originally designed for synchronising smaller data sets between environments via source control, so typically this meant that a few thousand rows of static data would be the max. Others have used it for slightly different purposes, such as for performing simple ETL operations between databases.

In order to allow it to be used for datasets of this size, and have the data fully synchronised (e.g. all rows in the target inserted/updated/deleted to reflect the source data set), then a fundamentally different approach is needed in order to merge data in a reliable and efficient way.

The @include_values bit parameter, added recently (see #37, #47 & #34), might be a better option for working with larger datasets. When set to 0, it omits the VALUES clause containing all the data, in favour of simply joining onto the @table_name at merge time. This would allow you to populate the source table before running the merge, in whichever way is appropriate for your dataset (e.g. a single BULK INSERT statement containing all data, batches of INSERT statements containing thousands of rows at a time, etc).

For example:

EXEC [AdventureWorks2017].dbo.sp_generate_merge
  @table_name = 'CurrencyRate', 
  @schema = 'Sales',
  @include_values = 0

Generates the following statement:

MERGE INTO [Sales].[CurrencyRate] AS [Target]
USING [Sales].[CurrencyRate] AS [Source]
ON ([Target].[CurrencyRateID] = [Source].[CurrencyRateID])
WHEN MATCHED AND (
	NULLIF([Source].[CurrencyRateDate], [Target].[CurrencyRateDate]) IS NOT NULL OR NULLIF([Target].[CurrencyRateDate], [Source].[CurrencyRateDate]) IS NOT NULL OR 
	NULLIF([Source].[FromCurrencyCode], [Target].[FromCurrencyCode]) IS NOT NULL OR NULLIF([Target].[FromCurrencyCode], [Source].[FromCurrencyCode]) IS NOT NULL OR 
	NULLIF([Source].[ToCurrencyCode], [Target].[ToCurrencyCode]) IS NOT NULL OR NULLIF([Target].[ToCurrencyCode], [Source].[ToCurrencyCode]) IS NOT NULL OR 
	NULLIF([Source].[AverageRate], [Target].[AverageRate]) IS NOT NULL OR NULLIF([Target].[AverageRate], [Source].[AverageRate]) IS NOT NULL OR 
	NULLIF([Source].[EndOfDayRate], [Target].[EndOfDayRate]) IS NOT NULL OR NULLIF([Target].[EndOfDayRate], [Source].[EndOfDayRate]) IS NOT NULL OR 
	NULLIF([Source].[ModifiedDate], [Target].[ModifiedDate]) IS NOT NULL OR NULLIF([Target].[ModifiedDate], [Source].[ModifiedDate]) IS NOT NULL) THEN
 UPDATE SET
  [CurrencyRateDate] = [Source].[CurrencyRateDate], 
  [FromCurrencyCode] = [Source].[FromCurrencyCode], 
  [ToCurrencyCode] = [Source].[ToCurrencyCode], 
  [AverageRate] = [Source].[AverageRate], 
  [EndOfDayRate] = [Source].[EndOfDayRate], 
  [ModifiedDate] = [Source].[ModifiedDate]
WHEN NOT MATCHED BY TARGET THEN
 INSERT([CurrencyRateID],[CurrencyRateDate],[FromCurrencyCode],[ToCurrencyCode],[AverageRate],[EndOfDayRate],[ModifiedDate])
 VALUES([Source].[CurrencyRateID],[Source].[CurrencyRateDate],[Source].[FromCurrencyCode],[Source].[ToCurrencyCode],[Source].[AverageRate],[Source].[EndOfDayRate],[Source].[ModifiedDate])
WHEN NOT MATCHED BY SOURCE THEN 
 DELETE
;

The downside is that you'd need to manually edit the generated statement to use the pre-populated table object (rather than the original table object), so that the USING clause reads something like this:

USING [#currencyRateBulkInserted] AS [Source]

I'm curious to know if this kind of general approach would suit your needs? If so then there's always the possibility of adding a new parameter to override the source table name.

@JV-Gh
Copy link
Contributor

JV-Gh commented Jul 19, 2021

"The downside is that you'd need to manually edit the generated statement to use the pre-populated table object (rather than the original table object), "

A way to make it more dynamic would be add an additional parameter that would take in a string - custom table name (e.g. @custom_source_name = '#currencyRateBulkInserted') - and the procedure would use this parameter's value instead if NOT NULL (a validation can be added at the top of the merge to throw an error if an object with that name doesn't exist). Just a thought off the top of my head. It's possible that I might be missing something.

@dnlnln
Copy link
Owner

dnlnln commented Jul 31, 2023

An update on this: @EitanBlumin has very helpfully implemented a new parameter that allows you to split source rows into multiple MERGE statements. To use, use @max_rows_per_batch=1000 or whatever batch size you need and be sure to also include the @delete_if_not_matched=0 param.

This should avoid the out-of-memory exception. If it recurs in spite of this, please comment here and I will re-open the issue.

@dnlnln dnlnln closed this as completed Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants