Skip to content
This repository has been archived by the owner on Jan 16, 2021. It is now read-only.

Service Fabric External and Internal Load Balancer Standard #879

Closed
MarekLani opened this issue Feb 25, 2018 · 10 comments
Closed

Service Fabric External and Internal Load Balancer Standard #879

MarekLani opened this issue Feb 25, 2018 · 10 comments
Assignees
Labels

Comments

@MarekLani
Copy link

Hello,
I would like to ask, whether it is possible to use Load Balancer Standard Tier with Service Fabric. I need to create SF cluster which has both internal and external load balancers. However as I have been notified by Azure Portal itself, in order to assign two LBs to VMSS I need to use Standard SKU. However when I change my template to use LB Standard, I am not able to connect to my SF cluster (neither to managment or explorer port) even though deployment succeeded with no error, and I see LB deployed with backend pool created. Thank you.

@masnider
Copy link
Member

Service Fabric doesn't care about this at all, so something else must be wrong.

@mburumaxwell
Copy link

@MarekLani you might find help in the link below
https://blogs.msdn.microsoft.com/kwill/2016/10/05/azure-service-fabric-common-networking-scenarios/#IELB, at least it helped me.

@MarekLani
Copy link
Author

MarekLani commented Feb 27, 2018

@mburumaxwell thank you. We managed to add both, however apparently there was introduced new requirement to use LB Standard when you want to assign both internal and external LB to 1 VMSS.

@masnider I have working ARM template where I have configured internal LB Basic only. After deployment everything is working fine. However when I do only the change of SKU for LB to Standard and redeploy, I get following errors outputed by SF to Server Event Log: ERROR: Microsoft.Azure.ServiceFabric.Extension.Core.AgentException: System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 52.174.163.204:443
at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Exception& exception)
--- End of inner exception stack trace ---
at System.Net.HttpWebRequest.GetRequestStream(TransportContext& context)
at System.Net.HttpWebRequest.GetRequestStream()
at Microsoft.Azure.ServiceFabric.Extension.Core.RestClient.Invoke(Uri requestUri, String method, String requestBody, X509Certificate2 clientCertificate)
at Microsoft.Azure.ServiceFabric.Extension.Core.RestClient.Invoke(Uri requestUri, String method, String requestBody, List`1 clientCertificates)

at Microsoft.Azure.ServiceFabric.Extension.Core.NodeBootstrapAgent.d__91.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.ServiceFabric.Extension.Core.NodeBootstrapAgent.d__84.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.ServiceFabric.Extension.Core.NodeBootstrapAgent.d__80.MoveNext()

And then when Service Fabric gives up retries it logs this warning: "Installer is either a not supported installer format or the product did not exist on this machine." from ServiceFabricNodeBootstrapModule It seems SF is not able to configure itself when deployed with LB Standard for some reason (this is tested for internal LB and as stated deployment works fo LB Basic). Please can you confirm, LB Internal in Standard tier is working tested scenario? Thank you.

@Xandven
Copy link

Xandven commented May 29, 2018

We're experiencing the same issue. Once the Load Balancer SKU is changed to Standard the Service Fabric Cluster gets stuck on "Waiting for nodes". I've run ipconfig on nodes on both a working and non-working cluster and noticed that the nodes running behind the Standard SKU LB has one extra net adapter. It is called vEternet (nat) on all the nodes. On the first node the IPv4 address associated with this adapter is 172.20.128.1 with mask 255.255.240.0. I'm wondering if this could be part of the problem.

@MarekLani
Copy link
Author

Hi Xandven, I do not see that deep into networking, so I cant confirm this behavior is caused by LB Standard, but it makes sense that it is. However I managed to make it work. Big difference when using LB Standard is, that (from documentation) LB is fully onboarded to virtual network. The virtual network is a private, closed network. Because Standard Load Balancers and Standard public IP addresses are designed to allow this virtual network to be accessed from outside of the virtual network, these resources now default to closed unless you open them. This means Network Security Groups (NSGs) are now used to explicitly permit and whitelist allowed traffic. You can create your entire virtual data center and decide through NSG what and when it should be available. If you do not have an NSG on a subnet or NIC of your virtual machine resource, we will not permit traffic to reach this resource.

That means, that you have to explicitly whitelist outbound/inbound communication. You can achieve that in two ways: You need to create Network Security Group and set the inbound, outbound rules and set Public Load Balancer with correct LB rules or assign public IP address to each node in cluster. This way you will allow communication of SF Agent, which need to download some bits from the internet. Currently I am in discussion with SF Product group to specify exact IP Addresses that need to be whitelisted. I have working scenario where I have whitelisted all IP Addresses within NSG rules and created load balancing rule on port 443. I will create Blog Post about this and will share the templates.

@Xandven
Copy link

Xandven commented May 30, 2018

Hi MareKLani, thank you for sharing you progress. Would you be able to share the list of IP addresses that must be white-listed in the NSG?

@MarekLani
Copy link
Author

Now I whitelisted entire IP address range as I do not have the list of needed IP Addresses yet. Once I have it from SF product team, I will definitely post it here and create blog post as well.

@MarekLani
Copy link
Author

@Xandven I have final solution. It took some time, but, what you need to do, is to introduce Public Load Balancer Standard and create load balancing rule for port 443, to allow download of SF bits to cluster. This will however open inbound communication on port 443, what is not quite secure thing to do. So in addition it is necessary to introduce network security group, and add deny inbound rule for port 443, so you only allow outbound communication from the cluster. Blog post about this setup together with ARM templates will be out soon.

@MarekLani
Copy link
Author

@Xandven I have final solution. It took some time, but, what you need to do, is to introduce Public Load Balancer Standard and create load balancing rule for port 443, to allow download of SF bits to cluster. This will however open inbound communication on port 443, what is not quite secure thing to do. So in addition it is necessary to introduce network security group, and add deny inbound rule for port 443, so you only allow outbound communication from the cluster. Blog post wth ARM templates can be found here: https://github.com/MarekLani/AzureServiceFabricWithLoadBalancerStandard

@masnider
Copy link
Member

Closed, glad you're unblocked.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

9 participants