Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Microsoft.Data.SqlClient 2.0.0 breaks a self-contained .NET Core 3.1 ServiceFabric service #644

Closed
dmytro-gokun opened this issue Jul 11, 2020 · 32 comments

Comments

@dmytro-gokun
Copy link

dmytro-gokun commented Jul 11, 2020

We have a ServiceFabric service, which is running the latest .NET Core 3.1. We publish & deploy it as self-contained. It references Microsoft.Data.SqlClient. Using version 1.1.3 everything works perfectly. After updating to 2.0.0, we can not event deploy it to the fabric anymore. Unfortunataly, ServiceFabric gives very little information why. This is what we see:

Service package for manifest 'XXXX' and service package activation ID 'XXXX' is in Error.
		'System.Hosting' reported Error for property 'ServiceTypeRegistration:XXX'.
		The ServiceType was disabled on the node.

It tries that several times and then gives up.
By digging into SF events i was able to find this as well:

There was an error during CodePackage activation.The process/container terminated with exit code:2147516556.

By comparing service package i can see that it used to include sni.dll when using 1.1.3 and then its gone after we updated to 2.0.0. My suspicion is that is the reason why the service fails to start.

And i think is my theory is correct because the service works just fine locally, since sni.dll is availaible on my PC. But it does not exist on the ServiceFabric's VMs.

@cheenamalhotra
Copy link
Member

@dmytro-gokun

Please look at 2.0.0 release notes for breaking changes in SNI naming (introduced in #570). For .NET Core, with 2.0.0 we've introduced our own replacement for runtime.native.System.Data.SqlClient.sni NuGet as well that renames sni.dll to Microsoft.Data.SqlClient.SNI.dll.

If you can provide us a repro, we can investigate more in what your observations are.

@dmytro-gokun
Copy link
Author

@cheenamalhotra Yes, I've read that. And the deployment package contains Microsoft.Data.SqlClient.SNI.dll, so no problem here. The problem is that it somehow still needs sni.dll to load.

About a repro. As I said, it does not reproduce with a Service Fabric Local Cluster on a typical development machine. I can make a repro that you will have to deploy to a real cluster and then you will see it works just fine with 1.1.3 and does not even start with 2.0.0. Will this work for you?

@cheenamalhotra
Copy link
Member

The problem is that it somehow still needs sni.dll to load.

The SqlClient driver does not need the old sni.dll anymore. Can you confirm which component needs the old 'sni.dll' by capturing more logs from your cluster? The above errors don't provide any context.

I can make a repro that you will have to deploy to a real cluster and then you will see it works just fine with 1.1.3 and does not even start with 2.0.0. Will this work for you?

I don't have that environment to deploy but if you can gather logs we can investigate the errors.

@dmytro-gokun
Copy link
Author

The SqlClient driver does not need the old sni.dll anymore.

Do you perhaps know, in which situations it was loaded before? That will help me build a repro.

Can you confirm which component needs the old 'sni.dll' by capturing more logs from your cluster? The above errors don't provide any context.

Well, that's the problem with ServiceFabric. It does not give you any more info than that. I've spent several hours googling and trying different tricks, but that's all I've got :( Any chance, you know anyone who can help with getting more info from SF?

@cheenamalhotra
Copy link
Member

Do you perhaps know, in which situations it was loaded before? That will help me build a repro.

It's a NuGet package reference, and was loaded from runtime.native.System.Data.SqlClient.sni same way as currently Microsoft.Data.SqlClient.SNI.runtime NuGet is loaded. It's a package reference that contains native runtime DLLs. The old package referenced another package implicitly based on arch, but now we added arch specific runtime folders in the same NuGet. But again. the driver doesn't need old sni.dll anymore, so I don't think it will stop you from working if it's absent from your environment.

Have you tried to remove sni.dll from your local dev environment and confirm if you see same error?

If it's related, it should fail locally too, but if not - this error is probably not related to driver and it's dependencies. I also see similar issues on service fabric (same error code): microsoft/service-fabric#868

You can find guide to debug issues here: Debug service errors using the log files, or else contact Service Fabric teams.

@cheenamalhotra cheenamalhotra changed the title Update to Micrisoft.Data.SqlClient 2.0.0 breaks a self-contained .NET Core 3.1 ServiceFabric service Update to Microsoft.Data.SqlClient 2.0.0 breaks a self-contained .NET Core 3.1 ServiceFabric service Jul 15, 2020
@cheenamalhotra cheenamalhotra added this to Needs triage in SqlClient Triage Board via automation Jul 15, 2020
@cheenamalhotra cheenamalhotra moved this from Needs triage to Waiting for customer in SqlClient Triage Board Jul 15, 2020
@dmytro-gokun
Copy link
Author

So, after further investigation, I was able to find the root cause. It has nothing to do with sni.dll.
The root cause is this exception:

Error:
  An assembly specified in the application dependencies manifest (XXXX.deps.json) was not found:
    package: 'Microsoft.Data.SqlClient.SNI.runtime', version: '2.0.0'
    path: 'runtimes/win-x64/native/Microsoft.Data.SqlClient.SNI.pdb'

This exception happens during the deployment and before our code even has a chance to run.

Now:

  1. Why a .pdb is even required for the app to load? Are not pdbs only needed for debugging/stack trace formatting etc? To me this looks really weird, maybe I do not understand something about pdbs here;

  2. The .pdb is not here because that's how Azure DevOps deploys services by default: without pdbs. This is described in detail here: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-tutorial-deploy-app-with-cicd-vsts#configure-continuous-delivery-with-azure-pipelines. Particularly, on this screenshot: https://docs.microsoft.com/en-us/azure/service-fabric/media/service-fabric-tutorial-deploy-app-with-cicd-vsts/saveandqueue.png you can (partially) see the step which copies *.pdb to a separate folder (which is later zipped to an artifact) and then the next step which deletes original *.pdb from the final deployment package. This explains why pdbs are not copied to Service Fabric and why we get the exception we get.

  3. After we removed steps which remove pdbs from the deployment package, it deploys and works just fine.

To me it looks like the correct fix to this issue should be not to require the pdb presence after all. If, for some reason, it is absolutely required, this should be thoroughfuly documented somewhere. Noone should kill so much time on this again.

@ErikEJ
Copy link
Contributor

ErikEJ commented Jul 15, 2020

@dmytro-gokun This looks very Service Fabric specific - wonder why the Service Fabric team does not want to deploy .pdb files? Maybe the Azure DevOps template should be updated?

@dmytro-gokun
Copy link
Author

@ErikEJ well, I do not know why. I can only guess that they want to preserve disk space. If you are using microservices architecture and run hundreds of services, all those pdbs (which are identical for each service) may eat all your limited disk space very fast.

Nevertheless, deploying a .pdb alongside your app or not deploying it should be the developer's choice and not a requirement for the app to run at all.

@ErikEJ
Copy link
Contributor

ErikEJ commented Jul 15, 2020

Agree, wonder if the pdb was included in the previous package? @cheenamalhotra ??

@dmytro-gokun
Copy link
Author

@ErikEJ It obviously was not, as the same app/DevOps pipeline worked just fine with 1.1.3

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Jul 15, 2020

Hi @dmytro-gokun

The pdbs were included starting with v2.0.0 since the DLLs are closed source, as tools like Binskim fail to work with driver otherwise as reported in #598

I would expect these errors to be handled by applications or application frameworks and there should be a solution to this.

But products cannot NOT ship PDBs due to such issues, it looks like a generic issue to me that should be reported to Service Fabrics and documented by their teams how to handle PDB related issues their tools may encounter.

@ErikEJ
Copy link
Contributor

ErikEJ commented Jul 15, 2020

This issue is called by this step in the Azure DevOps build pipeline:

steps:
- task: DeleteFiles@1
  displayName: 'Delete files from $(build.artifactstagingdirectory)\applicationpackage'
  inputs:
    SourceFolder: '$(build.artifactstagingdirectory)\applicationpackage'
    Contents: '**\*.pdb'

@dmytro-gokun
Copy link
Author

@cheenamalhotra In my humble opinion, Azure DevOps does the right thing: they treat debug info as debug info. By requiring a pdb to be present for an app to start, you are essentially changing the semantic of what pdb is. IMO.

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Jul 15, 2020

By requiring a pdb to be present for an app to start, you are essentially changing the semantic of what pdb is. IMO.

@dmytro-gokun I'm not sure I follow.. We don't require PDBs to be present, there's no dependency! They're copied with DLL when you build/deploy. You should be able to work without them.

What seems to be going on is these PDB references are copied to your deps.json file but when you remove these PDBs the deps.json fails to find them when dotnet reads the file. So you need to probably remove them before deps.json is created or fix your deps.json essentially if you absolutely need to remove them.

@dmytro-gokun
Copy link
Author

dmytro-gokun commented Jul 15, 2020

@cheenamalhotra deps.json is not created manually or by Service Fabric. It is created by dotnet build and i presume it uses dependency information from nuget packages. Here is the problematic piece:

      "Microsoft.Data.SqlClient.SNI.runtime/2.0.0": {
        "native": {
          "runtimes/win-x64/native/Microsoft.Data.SqlClient.SNI.dll": {
            "fileVersion": "2.0.0.0"
          },
          "runtimes/win-x64/native/Microsoft.Data.SqlClient.SNI.pdb": {
            "fileVersion": "0.0.0.0"
          }
        }
      },

And that's why Service Fabric looks for this pdb while deploying the service. To be honest, i'm not that familiar with nuget to tell if it's possible to mark *.pdb dependency as optional. But that's what would look like a correct solution to me.

@cheenamalhotra
Copy link
Member

I would then recommend this gets reported to dotnet/sdk team that they should exclude copying PDBs for "deploy" targets :)

We don't control anything in .NET Core. We only ship PDBs along with DLLs and rest is taken care by NuGet restore and dotnet sdk.

For .NET Framework we control COPY targets and also exclude PDBs for deploy tasks, but we always shipped PDBs there and there have been no such concerns reported.

@dmytro-gokun
Copy link
Author

@cheenamalhotra

Okay, but:

  1. This does not happen to any other nuget package we reference in any other project we have (quite a lot of packages, quite a lot of projects);
  2. This did not happen in version 1.1.3

The only conclusion I can draw from this is that there is some problem with this particular nuget package. Do not you agree?

@ErikEJ
Copy link
Contributor

ErikEJ commented Jul 15, 2020

Maybe have a look at any differences between the two SNI packages

@cheenamalhotra
Copy link
Member

Can you give me name of any other native runtime NuGet packages that contain PDBs that you reference?
I can compare and see.

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Jul 15, 2020

For 1.1.3 it didn't happen because we referenced "runtime.native.System.Data.SqlClient.SNI" NuGet that did not contain PDBs. Reason why there were complaints regarding missing PDBs.

@ErikEJ
Copy link
Contributor

ErikEJ commented Jul 15, 2020

LOL - now there are complaints that they are there 😄

@dmytro-gokun
Copy link
Author

@cheenamalhotra Unfortunately, I cannot give you such an example. But it also obvious to me that the way it is done today is causing problems. I've spent tons of time getting to the root of this issue. I'm 100% sure hundreds of people are doing the same these days and will do in the future if it's not fixed somehow. This is not the way it should be.

If you ask me, I would say it makes sense for you to speak to the core team (or nuget team?) and ask them how to do this properly, so that .pdb is not included to .deps.json. If no such way exists at the moment, a feature request has to be made to the proper team.

Also, if you think about it ... why should the pdb go to the package itself? That's what MS symbol server is for, is not it?

@cheenamalhotra
Copy link
Member

I think since we also shipped the PDBs to Microsoft.Data.SqlClient.SNI Nuget (SNI for .NET Framework), we packaged it here too for consistency, but I agree the experience hasn't been great with .NET Core. I'll take it offline and see if we can take them away and only upload them on symbol server.

@cheenamalhotra
Copy link
Member

Hi @dmytro-gokun

Just a heads up, we're setup and planning to fix this issue with next preview release of Microsoft.Data.SqlClient v2.1.0-preview2, will keep you posted.

@dmytro-gokun
Copy link
Author

@cheenamalhotra Thanks a lot. Looking forward for it.

@zickvon
Copy link

zickvon commented Oct 10, 2020

Run into the same problem here. Our single exe .net core app crashed after upgrading Sqlclient from 1.1.0 to 2.0.1. Took me whole night to realize it's the pdb missing, then I found this discussion.

Looking forward for for the next release/fix!

@WTobor
Copy link

WTobor commented Oct 15, 2020

I had a similar problem. I was using EFCore.BulkExtensions v3.2.4 (which is using SqlClient v2). Downgrade to EFCore.BulkExtensions v3.1.4 (with SqlClient v1) solved the issue.

@cheenamalhotra
Copy link
Member

Hi @dmytro-gokun

Closing this issue as pdbs have been removed from Microsoft.Data.SqlClient.SNI.runtime v2.1.1 to be picked by Microsoft.Data.SqlClient v2.1.0-preview2 onwards.

SqlClient Triage Board automation moved this from Waiting for customer to Closed Oct 26, 2020
@jjxtra
Copy link

jjxtra commented Nov 11, 2020

This issue happened when I switched from .NET 5 rc2 to RTM today and upgraded many of my nugets to 5.0.0 versions. I don't use service fabric at all. Has there been a regression somewhere?

    <PackageReference Include="FlexLabs.EntityFrameworkCore.Upsert" Version="3.1.0" />
    <PackageReference Include="Microsoft.AspNetCore.Mvc.Razor.RuntimeCompilation" Version="5.0.0" />
    <PackageReference Include="Humanizer" Version="2.8.26" />
    <PackageReference Include="MailKit" Version="2.9.0" />
    <PackageReference Include="Microsoft.EntityFrameworkCore.InMemory" Version="5.0.0" />
    <PackageReference Include="Microsoft.EntityFrameworkCore.Relational" Version="5.0.0" />
    <PackageReference Include="Microsoft.EntityFrameworkCore.Sqlite.Core" Version="5.0.0" />
    <PackageReference Include="Microsoft.EntityFrameworkCore.SqlServer" Version="5.0.0" />
    <PackageReference Include="Microsoft.Extensions.Caching.Memory" Version="5.0.0" />
    <PackageReference Include="Microsoft.Extensions.Hosting" Version="5.0.0" />
    <PackageReference Include="Microsoft.Extensions.Http.Polly" Version="5.0.0" />
    <PackageReference Include="Microsoft.Web.Administration" Version="11.1.0" />
    <PackageReference Include="NLog" Version="4.7.5" />
    <PackageReference Include="Npgsql.EntityFrameworkCore.PostgreSQL" Version="5.0.0-rc2" />
    <PackageReference Include="Pomelo.EntityFrameworkCore.MySql" Version="5.0.0-alpha.1" />
    <PackageReference Include="protobuf-net.Grpc.AspNetCore" Version="1.0.123" />
    <PackageReference Include="SharpCompress" Version="0.26.0" />
    <PackageReference Include="StackExchange.Redis" Version="2.1.58" />
    <PackageReference Include="System.Net.WebSockets" Version="4.3.0" />
    <PackageReference Include="System.Net.WebSockets.WebSocketProtocol" Version="5.0.0" />

@jjxtra
Copy link

jjxtra commented Nov 11, 2020

Rolling the entity framework nuget packages back to 3.1.8 fixed the problem on the .NET 5 rtm release for me.

@0xced
Copy link
Contributor

0xced commented Nov 30, 2020

I had this issue too with a .NET Core app using Entity Framework Core 5.0.0 packaged as a single exe. I solved it by explicitly taking a dependency on Microsoft.Data.SqlClient version 2.1.0 (which depends on Microsoft.Data.SqlClient.SNI.runtime >= 2.1.1):

<PackageReference Include="Microsoft.Data.SqlClient" Version="2.1.0" />
<PackageReference Include="Microsoft.EntityFrameworkCore.SqlServer" Version="5.0.0" />

@jjxtra
Copy link

jjxtra commented Nov 30, 2020

I did the same thing and it worked as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

7 participants