Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"The connection is closed" errors in System.Data.SqlClient on Linux without MARS. #54

Closed
JulianRooze opened this issue Jul 4, 2018 · 17 comments · Fixed by #796
Closed
Assignees
Labels
Area\Managed SNI Issues that are targeted to the Managed SNI codebase.

Comments

@JulianRooze
Copy link

We've recently run into an issue where executing SQL commands results in "Invalid operation. The connection is closed." exceptions being thrown.

  • This seems to happen exclusively on Async operations.
  • This seems to have started happening since we upgrade to .NET Core 2.1. We've seen occurrences in our logs before the upgrade, but that was a big burst of them (probably real, but transient, connectivity errors), but they've started showing up consistently literally the hour after we migrated to 2.1.
  • It looks like this happens only on Linux. We can't reproduce it on our local Windows machines. To rule out this is purely Azure related we reproduced it on Linux by running our application in a Linux Docker container on our local machines.
  • Turning on MultipleActiveResultSets (MARS) in the connection string seems to eliminate this error entirely.
  • The fact that it doesn't happen on Windows and that MARS solves it, rules out transient connectivity errors.
  • Our usage of SqlConnection is pretty standard: they come from a connection pool, and we don't use them concurrently.
  • I've attempted to reproduce the issue in an isolated project, but I've been unable to, so I'm not sure where the difference between that project and our full application lies.

The exception that gets thrown is this one:

https://github.com/dotnet/corefx/blob/a74bf4200926c47c11ba0383e2bddeaa59227266/src/System.Data.SqlClient/src/System/Data/Common/AdapterUtil.SqlClient.cs#L395

The exception gets thrown (or rethrown) from these three points in the code:

  1. https://github.com/dotnet/corefx/blob/e0ba7aa8026280ee3571179cc06431baf1dfaaac/src/System.Data.SqlClient/src/System/Data/SqlClient/TdsParserStateObject.cs#L3098

Example stacktrace:

at Task System.Data.SqlClient.TdsParserStateObject.WritePacket(byte flushMode, bool canAccumulate)
   at Task System.Data.SqlClient.TdsParserStateObject.WriteByteArray(byte[] b, int len, int offsetBuffer, bool canAccumulate, TaskCompletionSource<object> completion)
   at void System.Data.SqlClient.TdsValueSetter.SetString(string value, int offset, int length)
   at void Microsoft.SqlServer.Server.ValueUtilsSmi.SetString_LengthChecked(SmiEventSink_Default sink, ITypedSettersV3 setters, int ordinal, SmiMetaData metaData, string value, int offset)
   at void Microsoft.SqlServer.Server.ValueUtilsSmi.SetCompatibleValue(SmiEventSink_Default sink, ITypedSettersV3 setters, int ordinal, SmiMetaData metaData, object value, ExtendedClrTypeCode typeCode, int offset)
   at void Microsoft.SqlServer.Server.ValueUtilsSmi.SetDataTable_Unchecked(SmiEventSink_Default sink, SmiTypedGetterSetter setters, int ordinal, SmiMetaData metaData, DataTable value)
   at void System.Data.SqlClient.TdsParser.WriteSmiParameter(SqlParameter param, int paramIndex, bool sendDefault, TdsParserStateObject stateObj)
   at Task System.Data.SqlClient.TdsParser.TdsExecuteRPC(_SqlRPC[] rpcArray, int timeout, bool inSchema, SqlNotificationRequest notificationRequest, TdsParserStateObject stateObj, bool isCommandProc, bool sync, TaskCompletionSource<object> completion, int startRpc, int startParam)
   at SqlDataReader System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, bool returnStream, bool async, int timeout, out Task task, bool asyncWrite, SqlDataReader ds)
   at Task System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource<object> completion, bool sendToPipe, int timeout, bool asyncWrite, string methodName)
   at IAsyncResult System.Data.SqlClient.SqlCommand.BeginExecuteNonQuery(AsyncCallback callback, object stateObject)
   at Task<TResult> System.Threading.Tasks.TaskFactory<TResult>.FromAsyncImpl(Func<AsyncCallback, object, IAsyncResult> beginMethod, Func<IAsyncResult, TResult> endFunction, Action<IAsyncResult> endAction, object state, TaskCreationOptions creationOptions)
   at Task<int> System.Data.SqlClient.SqlCommand.ExecuteNonQueryAsync(CancellationToken cancellationToken)
   at async Task<int> Dapper.SqlMapper.ExecuteImplAsync(IDbConnection cnn, CommandDefinition command, object param) in C:/projects/dapper/Dapper/SqlMapper.Async.cs:line 678
  1. https://github.com/dotnet/corefx/blob/e0ba7aa8026280ee3571179cc06431baf1dfaaac/src/System.Data.SqlClient/src/System/Data/SqlClient/SqlCommand.cs#L1638

Example stacktrace:

at Task<DbDataReader> System.Data.SqlClient.SqlCommand.ExecuteDbDataReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken)+(Task<SqlDataReader> result) => { }
   at void System.Threading.Tasks.ContinuationResultTaskFromResultTask<TAntecedentResult, TResult>.InnerInvoke()
   at async Task<IEnumerable<T>> Dapper.SqlMapper.QueryAsync<T>(IDbConnection cnn, Type effectiveType, CommandDefinition command) in C:/projects/dapper/Dapper/SqlMapper.Async.cs:line 419
  1. https://github.com/dotnet/corefx/blob/a74bf4200926c47c11ba0383e2bddeaa59227266/src/System.Data.SqlClient/src/System/Data/SqlClient/SqlInternalConnectionTds.cs#L612

Example stacktrace:

 at void System.Data.SqlClient.SqlInternalConnectionTds.ValidateConnectionForExecute(SqlCommand command)
   at void System.Data.SqlClient.SqlInternalTransaction.Rollback()
   at void System.Data.SqlClient.SqlTransaction.Dispose(bool disposing)
   at void System.Data.Common.DbTransaction.Dispose()

So it either happens when writing data to SQL Server or reading a result set and it seems to happen on longer running operations, but probably only because a longer query makes it more likely for the error to show up.

Our connection string looks like this:

Server=tcp:<DB>.database.windows.net,1433;Initial Catalog=<DB>;Persist Security Info=False;User ID=<USER>;Password=<PW>;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;

@keeratsingh keeratsingh self-assigned this Jul 25, 2018
@divega divega transferred this issue from dotnet/corefx May 16, 2019
@divega
Copy link

divega commented May 16, 2019

As recently announced in the .NET Blog, focus on new SqlClient features an improvements is moving to the new Microsoft.Data.SqlClient package. For this reason, we are moving this issue to the new repo at https://github.com/dotnet/SqlClient. We will still use https://github.com/dotnet/corefx to track issues on other providers like System.Data.Odbc and System.Data.OleDB, and general ADO.NET and .NET data access issues.

@David-Engel David-Engel added this to the 1.1.0 milestone May 21, 2019
@David-Engel David-Engel added the Area\Managed SNI Issues that are targeted to the Managed SNI codebase. label May 21, 2019
@DenSmoke
Copy link

DenSmoke commented May 22, 2019

I have same problem only on Linux. Error occurs on Ubuntu 18.04 with .NET Core 3.0 preview 5. StackTrace is like in first example of @JulianRooze
Code (important places are marked with comments):

public async Task AddReportAsync(Report report, Func<Task> onSuccess = null, CancellationToken ct = default)
{
    if (report == null)
        throw new ArgumentNullException("отчет не может быть неопределенным");
    Logger?.Invoke($"SQL Добавление отчета Id={report.Id}");
    using var command = new SqlCommand("dbo.AddReport");
    command.CommandType = CommandType.StoredProcedure;
    command.Parameters.Add("@Id", SqlDbType.VarChar, 28).Value = report.Id;
    command.Parameters.Add("@UserId", SqlDbType.VarChar, 28).Value = report.UserId;
    command.Parameters.Add("@Status", SqlDbType.VarChar, 20).Value = report.Status;
    command.Parameters.Add("@CancelOrDeclineReason", SqlDbType.NVarChar, 254).Value = (object)report.CancelOrDeclineReason?.Trim() ?? DBNull.Value;
    command.Parameters.Add("@Defects", SqlDbType.NVarChar, 254).Value = (object)report.Defects ?? DBNull.Value;
    command.Parameters.Add("@AgreementDate", SqlDbType.DateTime).Value = (object)report.AgreementDate ?? DBNull.Value;
    command.Parameters.Add("@AgreementNumber", SqlDbType.NVarChar, 254).Value = (object)report.AgreementNumber ?? DBNull.Value;
    command.Parameters.Add("@ProductType", SqlDbType.NVarChar, 254).Value = (object)report.ProductType ?? DBNull.Value;
    command.Parameters.Add("@BsoNumber", SqlDbType.NVarChar, 254).Value = (object)report.BsoNumber ?? DBNull.Value;
    command.Parameters.Add("@Comments", SqlDbType.NVarChar, 254).Value = (object)report.Comments?.Trim() ?? DBNull.Value;
    command.Parameters.Add("@PaymentType", SqlDbType.NVarChar, 20).Value = (object)report.PaymentType ?? DBNull.Value;
    command.Parameters.Add("@OrderId", SqlDbType.Int).Value = report.OrderId;
    command.Parameters.Add("@TimeSpend", SqlDbType.Int).Value = (object)report.TimeSpend ?? DBNull.Value;
    command.Parameters.Add("@Cost", SqlDbType.Int).Value = (object)report.Cost ?? DBNull.Value;
    command.Parameters.Add("@IssueTime", SqlDbType.DateTime).Value = report.IssueTime;
    command.Parameters.Add("@Deposit", SqlDbType.Int).Value = report.Deposit ?? (object)report.Cost ?? DBNull.Value;
    command.Parameters.Add("@ToDate", SqlDbType.DateTime).Value = (object)report.ToDate ?? DBNull.Value;
    command.Parameters.Add("@IsAccepted", SqlDbType.Bit).Value = report.IsInner;

    if (report.UsedDetails.Count > 0)
    {
        using var dt = new DataTable();
        dt.Columns.Add("Number", typeof(int));
        dt.Columns.Add("Name", typeof(string));
        dt.Columns.Add("Count", typeof(int));
        dt.Columns.Add("Price", typeof(int));
        var number = 0;
        foreach (var detail in report.UsedDetails)
            dt.Rows.Add(number++, detail.Name, detail.Count, detail.Price);
        command.Parameters.Add(new SqlParameter("@Details", SqlDbType.Structured)
        {
            TypeName = "dbo.DetailsType",
            Value = dt
        });
    }

    //When this is true
    if (report.Photos.Count > 0) 
    {
        using var dt = new DataTable();
        dt.Columns.Add("Number", typeof(int));
        dt.Columns.Add("Type", typeof(string));
        dt.Columns.Add("Code", typeof(byte[]));
        foreach (var photo in report.Photos)
            dt.Rows.Add(photo.Number, photo.Type, photo.Code);
        command.Parameters.Add(new SqlParameter("@Photos", SqlDbType.Structured)
        {
            TypeName = "dbo.PhotosType",
            Value = dt
        });
    }
    //Causes error
    await ExecuteForMaxAttemptsAsync(command, ct, onSuccess).ConfigureAwait(false);
}

protected async Task ExecuteForMaxAttemptsAsync(SqlCommand command, CancellationToken ct = default, Func<Task> callback = null)
{
    if (command == null)
        throw new ArgumentNullException(nameof(command), "Комманда не может быть неопределенной");
    byte attempts = 1;
    while (true)
        try
        {
            using var sc = new SqlConnection(Config.ConnectionString);
            await sc.OpenAsync(ct).ConfigureAwait(false);
            using var transaction = sc.BeginTransaction();
            command.CommandTimeout = Config.CommandTimeout;
            command.Connection = sc;
            command.Transaction = transaction;
             //Exception is thrown here
            await command.ExecuteNonQueryAsync(ct).ConfigureAwait(false);
            if (callback != null)
                await callback().ConfigureAwait(false);
            transaction.Commit();
            break;
        }
        catch (TaskCanceledException)
        {
            throw;
        }
        catch (Exception ex)
        {
            if (attempts >= Config.MaxAttempts)
                throw ex;
            Logger?.Invoke($"SQL Ошибка выполнения транзакции: {ex.GetType().Name} {ex.Message}. Попытка {attempts}");
            attempts++;
            await Task.Delay(Config.RetryInterval).ConfigureAwait(false);
        }
}

@Trontin
Copy link

Trontin commented Nov 13, 2019

Same error only on Linux.
Error occurs on Debian 10 with .NET Core 2.2,
StackTrace is like in first example of @JulianRooze and pretty much the same case, no concurrency on the connection.
Workaround by activate MARS is working

@cheenamalhotra cheenamalhotra modified the milestones: 1.1.0, 1.2.0 Nov 20, 2019
@mathobbs
Copy link

Thank you @JulianRooze for the great write up. We also see this intermittently on services running in linux docker containers (dotnetcore 2.2.3 on alpine 3.9). Enabling MARS on the connection as a workaround also worked.

The SQLConnection's StateChange event is called unexpectedly indicating a transition from Open to Closed, and the same "connection is closed" exception is thrown to the code executing the sql call. We have logic verifying that a single SQLConnection is not being used concurrently.

With a retry added in the code, before enabling the workaround we would sometimes see multiple retries fail with the same, on other occasions one retry worked.

Currently we have only seen this on calls that have table-valued parameters (it may be more likely for larger parameter payloads). Sometimes (but not always) on a fail there is a sql-server logged error message which is usually:
"The data for table-valued parameter "@TheUserDefinedTypeName" doesn't conform to the table type of the parameter" (error code 8061)
but is sometimes this:
"The incoming tabular data stream (TDS) protocol stream is incorrect. The stream ended unexpectedly." (error code 4002)

@rnarayana
Copy link

rnarayana commented Nov 30, 2019

We also found this only while running on Azure SQL ( most likely it's Linux underneath), while calling Stored Procs with a larger TVP, and while there are multiple calls happening, all connections from connection pool. This error did not occur if no other query was running. Only with parallel requests (web requests anyway), could this be consistently reproed.

P.S - Running on .net core 3.0 on alpine3.9

@cheenamalhotra cheenamalhotra self-assigned this Nov 30, 2019
@jakejscott
Copy link

We also found this issue while running on Azure SQL with Alpine using mcr.microsoft.com/dotnet/core/sdk:3.1-alpine3.10

I'm doing a bulk insert calling a stored procedure with a TVP that has 1000 rows in it. We are using a datatable + dapper to call the stored proc:

public static async Task MergeCustomerTokensAsync(this IDbConnection connection, List<CustomerTokenEntity> customerTokens, IDbTransaction transaction = null, CancellationToken cancellationToken = default)
{
    var dataTable = new DataTable();

    dataTable.Columns.Add("CustomerId", typeof(Guid));
    dataTable.Columns.Add("Token", typeof(string));
    dataTable.Columns.Add("UserAgent", typeof(string));
    dataTable.Columns.Add("Processed", typeof(DateTimeOffset));

    foreach (var item in customerTokens)
    {
        dataTable.Rows.Add(item.CustomerId, item.Token, item.UserAgent, item.Processed.HasValue ? (object)item.Processed.Value : DBNull.Value);
    }

    var param = new
    {
        CustomerTokens = dataTable.AsTableValuedParameter()
    };

    await connection.ExecuteAsync(new CommandDefinition("dbo.[BulkMergeCustomerTokenImportV1]", param, commandType: CommandType.StoredProcedure, cancellationToken: cancellationToken));
}

@arivaltaoja
Copy link

We're seeing this issue as well with Microsoft.Data.SqlClient versions 1.1.X - 2.X running on Linux containers without MARS.

This repros easily with a query that has a large number (thousands) of TVP values. The longer the round-trip time the easier this seems to occur. I created a forked repository https://github.com/arivaltaoja/SqlClient that contains a modified DockerLinuxTest that reproduces the issue almost 100% when executed against Azure SQL, especially if you're on a slow connection.

Based on quick debugging the issue seems to be that async void SNIPacket.WriteToStreamAsync and therefore the underlying stream's WriteAsync are called while the previous task is still being executed.

public async void WriteToStreamAsync(Stream stream, SNIAsyncCallback callback, SNIProviders provider)
{
    uint status = TdsEnums.SNI_SUCCESS;
    try
    {
        await stream.WriteAsync(_data, 0, _dataLength, CancellationToken.None).ConfigureAwait(false);
    }
    catch (Exception e)
    {
        SNILoadHandle.SingletonInstance.LastError = new SNIError(provider, SNICommon.InternalExceptionError, e);
        status = TdsEnums.SNI_ERROR;
    }
    callback(this, status);
}

This results in System.NotSupported exception being thrown from the underlying stream.

System.NotSupportedException:  The WriteAsync method cannot be called when another write operation is pending.
   at System.Net.Security.SslStream.WriteAsyncInternal[TWriteAdapter](TWriteAdapter writeAdapter, ReadOnlyMemory`1 buffer)
   at Microsoft.Data.S...

System.NotSupported is later on cough and rethrown as System.InvalidOperationException.

@cheenamalhotra
Copy link
Member

cc @DavoudEshtehari

@arivaltaoja
I think this is related to PR #579 in progress. Could you give it a try as well? Thanks!

@arivaltaoja
Copy link

I tested my repro case against PR #579 and it seems to fix this issue as well 👍

@asankaf
Copy link

asankaf commented Sep 7, 2020

We are also seeing the same error in our environment. We have a .NET CORE 3.1 background service hosted in AKS on Linux containers with the SqlClient nuget package Microsoft.Data.SqlClient (1.1.1). AKS Cluster is located in WEU and the Databases are located in WEU and EUS. What the background service does, is inserting around 2.5K rows into a table via a stored procedure. Rows to be inserted are passed into the SP as a TVP. However, something we noted is that we have not yet seen this error while the background service is connecting to the WEU database. But we see this error every time when inserting to the database located in EUS.

We are thinking of turning MARS in connection string as the mentioned workaround in here. But we are bit skeptical to do so since there are different set of issues when MARS is ON in a Linux environment as mentioned in #422.

Any idea when will this be fixed?

@DenSmoke
Copy link

DenSmoke commented Sep 7, 2020

@asankaf Why don't you use latest version (Microsoft.Data.SqlClient 2.0.1)?

@asankaf
Copy link

asankaf commented Sep 7, 2020

@DenSmoke We can of course upgrade the nuget package. But as per one of the above comments from @arivaltaoja, upgrading the nuget package doesn't necessarily fix the issue.

@alexmg
Copy link

alexmg commented Sep 9, 2020

Is there any progress to report on the unblocking this issue? I have enabled MARS as a workaround but share the same concern as @asankaf about the performance issues reported in #422.

@altso
Copy link

altso commented Oct 30, 2020

I had the same (or similar) issue with Microsoft.Data.SqlClient 1.1.3 (came from Microsoft.EntityFrameworkCore.SqlServer 3.1.9) in linux containers. Resolved by explicitly referencing Microsoft.Data.SqlClient 2.0.1

@simader
Copy link

simader commented Nov 12, 2020

Maybe I have a similar Issue #792 , but Upgrade to Microsoft.Data.SqlClient 2.0.1 did not help. (also not any other preview)
Only turn on MARS helped. Can anyone finish the PullRequest?

@alexmg
Copy link

alexmg commented Nov 19, 2020

@cheenamalhotra Will there be a preview package released with this fix in the near term? Do you have a MyGet feed or similar for nightly builds?

@cheenamalhotra
Copy link
Member

Hi @alexmg

You may access nightly build feed here: https://dev.azure.com/sqlclientdrivers-ci/sqlclient/_packaging?_a=feed&feed=Microsoft.Data.SqlClient.dev

We'll be releasing this fix soon! Thanks!

MGibson1 added a commit to bitwarden/server that referenced this issue Jun 25, 2021
MGibson1 added a commit to bitwarden/server that referenced this issue Jun 25, 2021
* Creat TVP prior to opening sql connection

Data Table creation is slow. connection may be timing out while we create it.

* USe MARS-enabled connections to fix connection issue

dotnet/SqlClient#54
MGibson1 added a commit to bitwarden/server that referenced this issue Jun 25, 2021
* Creat TVP prior to opening sql connection

Data Table creation is slow. connection may be timing out while we create it.

* USe MARS-enabled connections to fix connection issue

dotnet/SqlClient#54
(cherry picked from commit 9841502)
dlundgren pushed a commit to dlundgren/server that referenced this issue Dec 14, 2021
* Creat TVP prior to opening sql connection

Data Table creation is slow. connection may be timing out while we create it.

* USe MARS-enabled connections to fix connection issue

dotnet/SqlClient#54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area\Managed SNI Issues that are targeted to the Managed SNI codebase.
Projects
None yet