Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very poor write performance starting from CU6 #355

Open
olljanat opened this issue Sep 7, 2018 · 12 comments

Comments

Projects
None yet
3 participants
@olljanat
Copy link

commented Sep 7, 2018

We find out that write performance on is very poor on CU6 (and all versions after that) if you compare it to CU5.

Steps to reproduce

Correctly working version (CU5)

Start SQL server using command:

docker run  -e "ACCEPT_EULA=Y" \
-e "MSSQL_PID=Enterprise" \
-e "MSSQL_SA_PASSWORD=P@ssword!" \
-p 1433:1433 \
--name sqltest \
-d microsoft/mssql-server-linux:2017-CU5

Create test table:

USE [master]
GO
CREATE DATABASE [LoadRuns]
GO
USE [LoadRuns]
GO
CREATE TABLE [dbo].[InsertTest](
	[OrderText] [varchar](255) NULL
) ON [PRIMARY]
GO

Run insert test:

USE [LoadRuns];
GO 
 
SET NOCOUNT ON
GO

DECLARE @Counter int = 0
WHILE @Counter < 100000
BEGIN
  INSERT INTO dbo.InsertTest(OrderText)
  SELECT REPLICATE ('1',100)
  SET @Counter = @Counter + 1
END
GO 

it should finish on couple of minutes.
You can also use iotop to see that sqlservr process is writing data to disk very fast.

Slow version (CU6)

Stop and remove test container:

docker stop sqltest
docker rm sqltest

Re-create test container but now use docker image microsoft/mssql-server-linux:2017-CU6

Repeat test.

You will notice that test will run much longer and you can also see with iotop that sqlservr is writing data much slower.

We also tried to disable these settings https://support.microsoft.com/en-us/help/4131496/enable-forced-flush-mechanism-in-sql-server-2017-on-linux but they didn't help (but I'm also not sure if container version reads settings from /var/opt/mssql/mssql.conf file).

@twright-msft

This comment has been minimized.

Copy link
Collaborator

commented Sep 9, 2018

@olljanat - Thanks for reporting this issue. I was able to confirm the performance degradation between CU5 and CU6 on my macbook. It does seem to be resolved in 2017-latest (CU 10) though. Can you please confirm by running your test against :2017-latest in your env?

Here are my results:
:2017-CU5
MStarted executing query at Line 6
Commands completed successfully.
Total execution time: 53241

:2017-CU6
Started executing query at Line 5
Commands completed successfully.
Total execution time: 138253

:2017-latest (CU10) - Try 1
Started executing query at Line 18
Commands completed successfully.
Total execution time: 69505

:2017-latest (CU10) - Try 2
Started executing query at Line 19
Commands completed successfully.
Total execution time: 55421

@olljanat

This comment has been minimized.

Copy link
Author

commented Sep 9, 2018

Ok. CU10 is not listed on info text on https://hub.docker.com/r/microsoft/mssql-server-linux/ so I missed it totally and only tested until CU9.

I will try CU10 on tomorrow.

@twright-msft

This comment has been minimized.

Copy link
Collaborator

commented Sep 9, 2018

Thanks for the heads up on CU10 not being listed. I've added it.

@olljanat

This comment has been minimized.

Copy link
Author

commented Sep 10, 2018

@twright-msft I tested :2017-CU10 on two different environments and looks that issue still exists on that.

Are you sure that you don't have some older CU version stored with :2017-latest tag to your machine?

Result of SELECT @@Version query on my env is:

Microsoft SQL Server 2017 (RTM-CU10) (KB4342123) - 14.0.3037.1 (X64)   Jul 27 2018 09:40:27   Copyright (C) 2017 Microsoft Corporation  Enterprise Edition (64-bit) on Linux (Ubuntu 16.04.5 LTS)
@olljanat

This comment has been minimized.

Copy link
Author

commented Sep 13, 2018

@olljanat

This comment has been minimized.

Copy link
Author

commented Sep 17, 2018

@twright-msft any possibility to get this one forward?

It is ruin our plans to migrate to containerized SQL servers as performance there was earlier match better than on regular SQL server installation which are based on Windows but now it is totally opposite.

@twright-msft

This comment has been minimized.

Copy link
Collaborator

commented Sep 18, 2018

I'll ask our perf team to take a look at this.

@jamiere-msft

This comment has been minimized.

Copy link

commented Sep 18, 2018

Hello @olijanat. The poor write performance due to a change we made in CU6 to force a flush to disk to address a data loss exposure with the FUA issues on Linux. We have a support document about this at https://support.microsoft.com/en-us/help/4131496/enable-forced-flush-mechanism-in-sql-server-2017-on-linux.

You can disable this behavior if you believe that your storage subsystem can guarantee durable writes across a power loss. You can use trace flag 3979 will disable the forced flush mechanism and revert to pre-CU6 behavior.

@olljanat

This comment has been minimized.

Copy link
Author

commented Sep 18, 2018

@jamiere-msft ok, that makes sense. I think that it would be best if these settings would be settable using environment variables and listed on here: https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-configure-environment-variables?view=sql-server-2017

or is there some reason that these need to be handled different way?

@jamiere-msft

This comment has been minimized.

Copy link

commented Sep 18, 2018

@olljanat We are still evaluating how to handle this in the long-term. We are working with several of the Linux distro owners to come up with a way to ensure data durability and avoid the cache flush overhead. If this can be accomplished then we can remove the need for a trace flag completely and avoid adding any other environment variables. I will definitely take your comments back as we continue to work on this.

@olljanat

This comment has been minimized.

Copy link
Author

commented Sep 18, 2018

@jamiere-msft IMO, always-on replication on synchronous mode it is simplest way to avoid need for it and it provides other benefits too (of course these nodes need to be located on different physical hosts which are using different UPS but still).

So maybe it would be better to put focus to #313 ?

@jamiere-msft

This comment has been minimized.

Copy link

commented Sep 26, 2018

@olljanat I agree that using Always-On is the preferred option. We still see many installations of standalone systems without any HA. :-( We need to address this FUA issue for both environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.