Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The code appears to build fine. The certificates are all loaded. But I cannot connect. There has to be a step missing. Do we need to create client certificate on the cluster? When I try to open the service fabric explorer I get a not authorized message. Not sure why? #23408

Closed
tshinkle opened this issue Jan 27, 2019 · 40 comments

Comments

@tshinkle
Copy link

tshinkle commented Jan 27, 2019

The code appears to build fine. The certificates are all loaded. But I cannot connect. There has to be a step missing. Do we need to create client certificate on the cluster? When I try to open the service fabric explorer I get a not authorized message. Not sure why?


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

@Mike-Ubezzi-MSFT
Copy link
Contributor

@tshinkle Thank you for the detailed feedback. We are actively investigating but we need to understand the documentation that you are attempting to follow in order to provide a comprehensive response and address any documentation gaps.

@tshinkle
Copy link
Author

tshinkle commented Jan 27, 2019 via email

@mimckitt
Copy link
Contributor

@tshinkle when you say you have errors in the cluster, what do you mean? Can you provide a screenshot?

@tshinkle
Copy link
Author

tshinkle commented Jan 28, 2019 via email

@tshinkle
Copy link
Author

tshinkle commented Jan 28, 2019 via email

@mimckitt
Copy link
Contributor

@tshinkle Sorry but if you reply directly to the email the images don't show up. You actually need to login on a PC to upload the images.

@tshinkle
Copy link
Author

image

@tshinkle
Copy link
Author

image 1

@mimckitt
Copy link
Contributor

@tshinkle thanks for that. So as it shows it seems your cluster itself is struggling so we should fix that before anything else.

I assume you are first trying to deploy this in a local Service Fabric cluster correct? If so, can you remove your current cluster and then open the local Service Fabric Cluster manager.

Try building a new cluster without deploying any code.

image

Try setting up just a single node cluster and let SF manager create that cluster. Once it is created, ensure all is healthy. Once that is confirmed, try deploying the app from Visual Studio and see if it works. I ran through the doc and was able to get it all to deploy correctly without any changes.

@tshinkle
Copy link
Author

tshinkle commented Jan 28, 2019 via email

@tshinkle
Copy link
Author

tshinkle commented Jan 28, 2019 via email

@mimckitt
Copy link
Contributor

@tshinkle sorry I re-read and I see it's an Azure cluster not local.

I am investigating and will update shortly.

@tshinkle
Copy link
Author

tshinkle commented Jan 28, 2019 via email

@tshinkle
Copy link
Author

tshinkle commented Jan 28, 2019 via email

@mimckitt
Copy link
Contributor

@tshinkle thanks for the info. I am going to go through all parts of the tutorial again as well to ensure all the needed info is there. will update once I have completed it.

@mimckitt
Copy link
Contributor

@tshinkle I ran through the steps and steps 1 and 2 worked without issues on both my local cluster and my Azure Cluster.

When getting to step 3 for enabling HTTPs I found issues with both my local cluster and my Azure cluster as you are.

image

Seems after making the changes to enable HTTPs the application itself is unhealthy.

@rwike77 @aljo-microsoft would either of you be able to provide some insight on this? Or possibly go through the doc to confirm as well? I tried it a few times and the results are consistent.

Copy link

Part 1 and 2 works, however looks like Enable HTTPS doesn't work. I am using Visual Studio 2019 Preview. The application builds and gets deployed on the local cluster and then i am getting this error and unable to browse the application.
"The target process exited without raising a CoreCLR started event. Ensure that the target process is configured to use .Net Core. This may be expected if the target process did not run on .net core."

@rwike77
Copy link
Contributor

rwike77 commented Feb 1, 2019

@tshinkle @MicahMcKittrick-MSFT @aljo-microsoft I recently refreshed this article. At one point, I remember seeing the "There was an error during CodePackage activation.The service host terminated with exit code:1" error in SFX as well. In the https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-tutorial-dotnet-app-enable-https-endpoint#configure-kestrel-to-use-https step, did you replace the "<your_CN_value>" value with "mytestcert" (or your test cert subject) in the GetCertificateFromStore method?

@mimckitt
Copy link
Contributor

mimckitt commented Feb 1, 2019

Thanks @rwike77 I did replace the value both times I ran through it. I even kept the naming convention the same using "mytestcert"

Although, looking back at it, I am wondering if I just put "mytestcert" instead of "CN=mytestcert"

Is it expected to keep the CN= part? If so that might be the issue and I can rerun through the doc.

Copy link

Revisited the code again, but this part doesn't work. Getting the same error again. "The target process exited without raising a CoreCLR started event. Ensure that the target process is configured to use .Net Core. This may be expected if the target process did not run on .net core." and https://localhost shows same above blank screen "This Site Cannot be reached".

@aljo-microsoft
Copy link
Contributor

@tshinkle @MicahMcKittrick-MSFT @its-saurabhjain

The idea of a local secured clusters doesn't make sense; what is your use case for this?

@dragav
To provide additional context.

@tshinkle
Copy link
Author

tshinkle commented Feb 4, 2019 via email

@dragav
Copy link
Contributor

dragav commented Feb 4, 2019

@MicahMcKittrick-MSFT if you have a local repro, can you send me the logs please? dragosav @ msft.

On the surface, this does appear to be caused by missing permissions to the cert's private key. Please check in the cert mgmt UX that the private key is available and acl'd accordingly. (For context, self-signed certs generated with PowerShell are by default CNG certs, unless a 'legacy' crypto provider is specified explicitly. Our runtime ACLing code doesn't handle CNG certs.)

The failure to connect to the cluster could be explained by the client (browser) rejecting the server's untrusted cert. This check can be bypassed in Chrome.

@mimckitt
Copy link
Contributor

mimckitt commented Feb 4, 2019

@tshinkle I just spun this up on my local work PC. So I dont have the app deployed at the moment. I am happy to deploy it again and send you any logs needed. That being said, I did just follow the steps in the doc exactly as written so you should be able to run through them and get the same results as we are all seeing.

I also checked to ensure the cert has the right permissions and all appears well. I also am using chrome to connect to the local cluster.

@tshinkle
Copy link
Author

tshinkle commented Feb 5, 2019

I'm at a complete loss. I've been through this code several times, but the error is happening in the Application both localhost and Azure when I publish.

On localhost I get the following event error:
Error event: SourceId='System.FM', Property='State'.
Partition is below target replica or instance count.
fabric:/Voting/VotingWeb -1 1 12dbcf59-c2df-4a7c-bacf-1a7e6f7ac629
(Showing 0 out of 0 instances. Total available instances: 0)

In the Azure cluster I get the following event error:
Error event: SourceId='System.FM', Property='State'.
Partition is below target replica or instance count.
fabric:/Voting/VotingWeb -1 1 12dbcf59-c2df-4a7c-bacf-1a7e6f7ac629
(Showing 0 out of 0 instances. Total available instances: 0)

And in a node I get the following:
Error event: SourceId='System.FM', Property='State'.
Partition is below target replica or instance count.
fabric:/Voting/VotingWeb -1 1 fb5a3a2c-abe4-42d2-b581-72d19821e1d2
InBuild _66xvwyvto_1 131938067770606908
InBuild _66xvwyvto_0 131938067953445492
(Showing 2 out of 2 instances. Total available instances: 0)

There are no errors happening in the build of the code.

@dragav
Copy link
Contributor

dragav commented Feb 5, 2019

@tshinkle may I ask you to zip the traces in the SFLogs\traces directory, and share them with me? I'm dragosav at microsoft dot com. The exact path to the logs directory is listed in the cluster manifest. Thank you.

@mimckitt
Copy link
Contributor

mimckitt commented Feb 5, 2019

@dragav if you like I can also publish my code to an Azure Cluster and give you the subscription and cluster information offline if that helps.

@mimckitt
Copy link
Contributor

mimckitt commented Feb 8, 2019

Just FYI, I have given access to @dragav to my Azure Cluster seeing this error. Hopefully we can find some problems and get the doc updated.

Copy link

I am also followed the same steps mentioned above and also getting same 'Site can't be reached error'

@mimckitt
Copy link
Contributor

@dragav any luck on this? Seems we have multiple users who get the same issue with this doc

Copy link

vipwlb commented Feb 26, 2019

The enable https part don't work, just as what @MicahMcKittrick-MSFT hits. Any update on this please?

@dragav
Copy link
Contributor

dragav commented Mar 1, 2019

We narrowed it down to a failure to either find or ACL the certificate. Highly recommend using the SF CertSetup.ps1 script to generate a certificate (in dev/test environments), as that is proven to work. I suspect whoever is hitting this may be generating CNG self-signed certs, whose private key is not accessible directly in PSh (and so the ACLing code fails). We're working on this.

@aljo-microsoft
Copy link
Contributor

@MicahMcKittrick-MSFT
#please-close

@dapathy
Copy link

dapathy commented Apr 11, 2019

Was this issue resolved?

@aljo-microsoft
Copy link
Contributor

@dapathy
https://docs.microsoft.com/azure/service-fabric/service-fabric-best-practices-security

Please adopt documented best practices to mitigate issues like this.

Copy link

I ran into the same problems as others and here is a few hints that might help others.

Firstly: If you added the .bat file like the tutorial says "right-click VotingWeb and select Add->New Item and add a new file named "Setup.bat". Visual studio might encode it wrong. In my case there was unwanted symbols in the beginning of the file (I might have done it incorrectly), check it by running the .bat file directly.

Secondly: When i ran the SetCertAccess.ps1 from PS, I got an error on line 34-37. regarding "$accessRule", because my windows is in german, so line two needed another user group from "NETWORK SERVICE" to "Netzwerkdienst".

$userGroup="Netzwerkdienst".

Thirdly: Also related to windows language. System group in ApplicationManifest.xml needs to reflect your windows language, in my case i changed from "Administrator" to "Administratoren".

That did it for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests