Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very poor write performance starting from CU6 #355

Closed
olljanat opened this issue Sep 7, 2018 · 21 comments
Closed

Very poor write performance starting from CU6 #355

olljanat opened this issue Sep 7, 2018 · 21 comments

Comments

@olljanat
Copy link

@olljanat olljanat commented Sep 7, 2018

We find out that write performance on is very poor on CU6 (and all versions after that) if you compare it to CU5.

Steps to reproduce

Correctly working version (CU5)

Start SQL server using command:

docker run  -e "ACCEPT_EULA=Y" \
-e "MSSQL_PID=Enterprise" \
-e "MSSQL_SA_PASSWORD=P@ssword!" \
-p 1433:1433 \
--name sqltest \
-d microsoft/mssql-server-linux:2017-CU5

Create test table:

USE [master]
GO
CREATE DATABASE [LoadRuns]
GO
USE [LoadRuns]
GO
CREATE TABLE [dbo].[InsertTest](
	[OrderText] [varchar](255) NULL
) ON [PRIMARY]
GO

Run insert test:

USE [LoadRuns];
GO 
 
SET NOCOUNT ON
GO

DECLARE @Counter int = 0
WHILE @Counter < 100000
BEGIN
  INSERT INTO dbo.InsertTest(OrderText)
  SELECT REPLICATE ('1',100)
  SET @Counter = @Counter + 1
END
GO 

it should finish on couple of minutes.
You can also use iotop to see that sqlservr process is writing data to disk very fast.

Slow version (CU6)

Stop and remove test container:

docker stop sqltest
docker rm sqltest

Re-create test container but now use docker image microsoft/mssql-server-linux:2017-CU6

Repeat test.

You will notice that test will run much longer and you can also see with iotop that sqlservr is writing data much slower.

We also tried to disable these settings https://support.microsoft.com/en-us/help/4131496/enable-forced-flush-mechanism-in-sql-server-2017-on-linux but they didn't help (but I'm also not sure if container version reads settings from /var/opt/mssql/mssql.conf file).

@twright-msft
Copy link
Collaborator

@twright-msft twright-msft commented Sep 9, 2018

@olljanat - Thanks for reporting this issue. I was able to confirm the performance degradation between CU5 and CU6 on my macbook. It does seem to be resolved in 2017-latest (CU 10) though. Can you please confirm by running your test against :2017-latest in your env?

Here are my results:
:2017-CU5
MStarted executing query at Line 6
Commands completed successfully.
Total execution time: 53241

:2017-CU6
Started executing query at Line 5
Commands completed successfully.
Total execution time: 138253

:2017-latest (CU10) - Try 1
Started executing query at Line 18
Commands completed successfully.
Total execution time: 69505

:2017-latest (CU10) - Try 2
Started executing query at Line 19
Commands completed successfully.
Total execution time: 55421

@olljanat
Copy link
Author

@olljanat olljanat commented Sep 9, 2018

Ok. CU10 is not listed on info text on https://hub.docker.com/r/microsoft/mssql-server-linux/ so I missed it totally and only tested until CU9.

I will try CU10 on tomorrow.

@twright-msft
Copy link
Collaborator

@twright-msft twright-msft commented Sep 9, 2018

Thanks for the heads up on CU10 not being listed. I've added it.

@olljanat
Copy link
Author

@olljanat olljanat commented Sep 10, 2018

@twright-msft I tested :2017-CU10 on two different environments and looks that issue still exists on that.

Are you sure that you don't have some older CU version stored with :2017-latest tag to your machine?

Result of SELECT @@Version query on my env is:

Microsoft SQL Server 2017 (RTM-CU10) (KB4342123) - 14.0.3037.1 (X64)   Jul 27 2018 09:40:27   Copyright (C) 2017 Microsoft Corporation  Enterprise Edition (64-bit) on Linux (Ubuntu 16.04.5 LTS)
@olljanat
Copy link
Author

@olljanat olljanat commented Sep 13, 2018

@olljanat
Copy link
Author

@olljanat olljanat commented Sep 17, 2018

@twright-msft any possibility to get this one forward?

It is ruin our plans to migrate to containerized SQL servers as performance there was earlier match better than on regular SQL server installation which are based on Windows but now it is totally opposite.

@twright-msft
Copy link
Collaborator

@twright-msft twright-msft commented Sep 18, 2018

I'll ask our perf team to take a look at this.

@jamiere-msft
Copy link

@jamiere-msft jamiere-msft commented Sep 18, 2018

Hello @olijanat. The poor write performance due to a change we made in CU6 to force a flush to disk to address a data loss exposure with the FUA issues on Linux. We have a support document about this at https://support.microsoft.com/en-us/help/4131496/enable-forced-flush-mechanism-in-sql-server-2017-on-linux.

You can disable this behavior if you believe that your storage subsystem can guarantee durable writes across a power loss. You can use trace flag 3979 will disable the forced flush mechanism and revert to pre-CU6 behavior.

@olljanat
Copy link
Author

@olljanat olljanat commented Sep 18, 2018

@jamiere-msft ok, that makes sense. I think that it would be best if these settings would be settable using environment variables and listed on here: https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-configure-environment-variables?view=sql-server-2017

or is there some reason that these need to be handled different way?

@jamiere-msft
Copy link

@jamiere-msft jamiere-msft commented Sep 18, 2018

@olljanat We are still evaluating how to handle this in the long-term. We are working with several of the Linux distro owners to come up with a way to ensure data durability and avoid the cache flush overhead. If this can be accomplished then we can remove the need for a trace flag completely and avoid adding any other environment variables. I will definitely take your comments back as we continue to work on this.

@olljanat
Copy link
Author

@olljanat olljanat commented Sep 18, 2018

@jamiere-msft IMO, always-on replication on synchronous mode it is simplest way to avoid need for it and it provides other benefits too (of course these nodes need to be located on different physical hosts which are using different UPS but still).

So maybe it would be better to put focus to #313 ?

@jamiere-msft
Copy link

@jamiere-msft jamiere-msft commented Sep 26, 2018

@olljanat I agree that using Always-On is the preferred option. We still see many installations of standalone systems without any HA. :-( We need to address this FUA issue for both environments.

@olljanat
Copy link
Author

@olljanat olljanat commented May 28, 2019

@amitmsft @jamiere-msft I have not been playing with mssql-docker for a while but we are still planning to use it on future and it would be nice if you can leave some comment here that how this was solved after all?

As I understood from this comment that solution was not ready when we discussed about this last time?

@olljanat We are still evaluating how to handle this in the long-term. We are working with several of the Linux distro owners to come up with a way to ensure data durability and avoid the cache flush overhead. If this can be accomplished then we can remove the need for a trace flag completely and avoid adding any other environment variables. I will definitely take your comments back as we continue to work on this.

@banerjeeamit
Copy link
Contributor

@banerjeeamit banerjeeamit commented May 28, 2019

Depending on the storage that you have, you can use the write through settings suggested in https://support.microsoft.com/en-us/help/4131496/enable-forced-flush-mechanism-in-sql-server-2017-on-linux

If you need a kernel that already has the improvement, then you can move to RHEL 8 in the future when we announce SQL Server support on RHEL 8.0. Please see this article for reference on the change: https://bobsql.com/sql-server-on-linux-forced-unit-access-fua-internals/

@AlBundy33
Copy link

@AlBundy33 AlBundy33 commented Sep 10, 2019

As far as I understand disabling forced flush should result in a better performance.
Can someone please describe how to modify the image to disable forced flush?

@olljanat
Copy link
Author

@olljanat olljanat commented Sep 10, 2019

@AlBundy33 that was just workaround. Actual issue should be fixed on Linux kernel versions >= 4.18. Which one you are using?

@AlBundy33
Copy link

@AlBundy33 AlBundy33 commented Sep 10, 2019

Kernel: Linux qc-docker01 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3 (2019-09-02) x86_64 GNU/Linux
Image: mcr.microsoft.com/mssql/server:2017-latest

So it seems that this setting could help right?

Our nightly tests on mssql takes up to 1h longer than postgres or oracle - therefore I thought I could try these settings.
Because we create each day a new database container there is no need for forced flushes.

I've found these commands:

/opt/mssql/bin/mssql-conf traceflag 3979 on
/opt/mssql/bin/mssql-conf set control.alternatewritethrough 0
/opt/mssql/bin/mssql-conf set control.writethrough 0
systemctl restart mssql-server

systemctl does not work with the image mentioned above - maybe stopping and starting the container will work.

@olljanat
Copy link
Author

@olljanat olljanat commented Sep 10, 2019

I think that we was able to enable it just using SQL query

DBCC TRACEON(3979)

but I'm not sure as it over year ago when I tested this last time.

@AlBundy33
Copy link

@AlBundy33 AlBundy33 commented Sep 10, 2019

dbcc did not work

Ignoring trace flag 3979. It is either an invalid trace flag or a trace flag that can only be specified during server startup.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

@olljanat
Copy link
Author

@olljanat olljanat commented Sep 10, 2019

Ok. So upgrade kernel to >= 4.18 and if that does not help then open new issue

@AlBundy33
Copy link

@AlBundy33 AlBundy33 commented Sep 11, 2019

I'm sure that my admins don't want to update the kernel of an LTS distro. ;-)

But I've found an easy way to set the values - just create a new image with the changes.

Dockerfile

FROM mcr.microsoft.com/mssql/server:2017-latest

RUN /opt/mssql/bin/mssql-conf traceflag 3979 on && \
        /opt/mssql/bin/mssql-conf set control.alternatewritethrough 0 && \
        /opt/mssql/bin/mssql-conf set control.writethrough 0

I'll tests this image as soon as I find the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants