-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client connection using AdoNetClustering fails on Docker container #5158
Comments
Does the frontend have network access to the silo gateway endpoint
|
The frontend has access to the backend. They are on the same docker network. It is interesting that backend silo says “Remote socket closed while receiving connection preamble data from endpoint 192.168.32.4:34992” for the frontend connection. |
Can you post your client and silos configurations? |
@benjaminpetit I prepared a sample for the issue. You can execute "docker-compose up" at sample's root folder. Then you should hit http://[Your docker machine IP]:8081/api/values The web application can't connect to the Silo for the first time. My Docker Version is 18.06.1-ce, build e68fc7a I use the default Docker machine ISO on my Mac. |
I said there was no problem with static clustering configuration, but I was wrong. |
@benjaminpetit Is there anything I can do for the issue? |
@fduman @benjaminpetit went on a leave until early February. I'm not sure how much we can help you with running on Mac. Asking in https://gitter.im/dotnet/orleans might be a more expedient way to get help with that. |
@fduman had the same issue with almost identical setup playing around with the orleans docker sample, realized that the silo was setting its address to 127.0.0.1 when checking the Membeor ship table instead of its Docker network IP.
`
` |
I had the same issue, but I was running directly in Linux and without docker. Even so, my client can not connect to the slio if the client running in Linux, however, it worked when the client running in Windows. |
@tcz717 firewall issue maybe? |
@fduman Has this been resolved? |
Closing due to inactivity. Feel free to reopen if needed. |
This issue still exists after upgrading to 2.3.0. Here are the context and logs in my situation. I deploy my service by using docker and docker-compose. version: '3'
services:
gateway:
image: gateway
ports:
- "20170:20170"
depends_on:
- "silo"
build:
context: .
dockerfile: ./SimCivil.Gate/Dockerfile
silo:
image: silo
ports:
- "30000:30000"
build:
context: .
dockerfile: ./SimCivil.Orleans.Server/Dockerfile Then after the startup, the server and the client respectively record errors and say connect timeout. The server log is
The client log is
I tried using |
@tcz717 here you are mapping the silo gateway port (3000) on your host. This is not a good idea, Orleans communications should be isolated from external traffic. What clustering provider are you using? How do you configure your client and your silos? |
@benjaminpetit You are right, but I just open this port for debug. As for clustering, I used local debug cluster and now I am using DynamoDb. But both of them didn't work, which run normally in non-docker environment. I suppose this issue is not related to clustering provider. |
One interesting thing is that when I complie and build the client by using |
Maybe something related to the ip published in the membership table? |
I checked the field in the dynamoDB and the record is the same as the container IP. Because I set |
Experiencing a similar issue. Silos and client communication was working fine in windows containers, But when switched to Linux containers, the client unable to communicate with silos. Our configurations, XXX.Web/Dockerfile---
EXPOSE 80
EXPOSE 443
--- XXX.SiloHost/Dockerfile---
EXPOSE 8090
--- docker-compose.ymlversion: '3.4'
services:
XXX.web:
image: ${DOCKER_REGISTRY-}XXXweb
build:
context: .
dockerfile: XXX.Web/Dockerfile
depends_on:
- XXX.silo
XXX.silo:
image: ${DOCKER_REGISTRY-}XXXsilo
build:
context: .
dockerfile: XXX.SiloHost/Dockerfile
XXX.dashboard:
image: ${DOCKER_REGISTRY-}XXXsilo
ports:
- "8090:8090"
depends_on:
- XXX.silo
docker-compose.override.ymlversion: '3.4'
services:
XXX.web:
environment:
- ASPNETCORE_ENVIRONMENT=Development
- ASPNETCORE_URLS=https://+:443;http://+:80
- ASPNETCORE_HTTPS_PORT=44305
ports:
- "50556:80"
- "44305:443"
volumes:
- ${APPDATA}/ASP.NET/Https:/root/.aspnet/https:ro
- ${APPDATA}/Microsoft/UserSecrets:/root/.microsoft/usersecrets:ro Following are the containers created on
We are using SQL Server for clustering and the registered silos entries are,
And the network inspect result for the running containers, docker network inspect XXX_default [
{
"Name": "XXX_default",
"Id": "d2f01f3a6998c95f8157aabfff5f6535378760e51d1572302d079bb33b9e912b",
"Created": "2019-05-05T08:10:06.6194202Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.18.0.0/16",
"Gateway": "172.18.0.1"
}
]
},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"3adbc04c12a0185a67da8a21b476654785da4f022998db11cfc075abece360a8": {
"Name": "XXX_XXX.silo_1",
"EndpointID": "79ec02ac924acad618833968ff5026d48394e47cc099c55c66fb84fd4bf87d89",
"MacAddress": "02:42:ac:12:00:02",
"IPv4Address": "172.18.0.2/16",
"IPv6Address": ""
},
"54087c9a0a9ab4c3756c1581f92ab5abad152f8aa8beefa0963b49ca382be4e5": {
"Name": "XXX_XXX.web_1",
"EndpointID": "0c3ad891c11e830f17e67ca1ae082c30db2e031ba204f671768a6405d96bfdee",
"MacAddress": "02:42:ac:12:00:03",
"IPv4Address": "172.18.0.3/16",
"IPv6Address": ""
},
"a8390b891c05a9939c500901044667169fa113e247b84f7bf3b5ed400357f4d4": {
"Name": "XXX_XXX.dashboard_1",
"EndpointID": "d9155862139875451cae487f7d26bb7399f282995a50e875108233e1d8147d95",
"MacAddress": "02:42:ac:12:00:04",
"IPv4Address": "172.18.0.4/16",
"IPv6Address": ""
}
},
"Options": {},
"Labels": {
"com.docker.compose.network": "default",
"com.docker.compose.project": "XXX",
"com.docker.compose.version": "1.23.2"
}
}
] And the client log is,
And we can see two active silos in the dashboard. |
I was able to solve the issue by changing the IClusterClient DI configuration code in the client Startup, Previously it was, public void ConfigureServices(IServiceCollection services)
{
...
services.AddSingleton(CreateClusterClient);
...
}
private IClusterClient CreateClusterClient(IServiceProvider serviceProvider)
{
var client = new ClientBuilder()
...
.Build();
StartClientWithRetries(client).Wait();
return client;
}
private static async Task StartClientWithRetries(IClusterClient client)
{
for (var i = 0; i < 5; i++)
{
try
{
await client.Connect();
return;
}
catch (Exception)
{
// ignored
}
await Task.Delay(TimeSpan.FromSeconds(5));
}
} And after referring orleans/Samples/2.0/docker-aspnet-core/API/Startup.cs , I changed private IClusterClient CreateClusterClient(IServiceProvider serviceProvider)
{
var log = serviceProvider.GetService<ILogger<Startup>>();
var client = new ClientBuilder()
...
.Build();
client.Connect(RetryFilter).GetAwaiter().GetResult();
return client;
async Task<bool> RetryFilter(Exception exception)
{
log?.LogWarning("Exception while attempting to connect to Orleans cluster: {Exception}", exception);
await Task.Delay(TimeSpan.FromSeconds(2));
return true;
}
} And it WORKED. My client log is now,
|
I found this issue still exsisted even in Azure Ubuntu vm (No container, server and client are in the same machine)
|
After I researched the source code, I found a bug in SocketManager.Connect https://github.com/dotnet/orleans/blob/2.4.3/src/Orleans.Core/Messaging/SocketManager.cs#L202 Acccodring to documentation,
In Linux host, the corefx also says: // The asynchronous socket operations here generally do the following:
// (1) If the operation queue is Ready (queue is empty), try to perform the operation immediately, non-blocking.
// If this completes (i.e. does not return EWOULDBLOCK), then we return the results immediately
// for both success (SocketError.Success) or failure.
// No callback will happen; callers are expected to handle these synchronous completions themselves. That means if operation completed synchronously, the Completed callback will be never invoked, which will make I see you plan to rewrite network stack in 3.0.0. I don't if this bug will be solved. But as for 2.x version, my possible solution is to change Connect method https://github.com/dotnet/orleans/blob/2.4.3/src/Orleans.Core/Messaging/SocketManager.cs#L196-L209 like internal static void Connect(Socket s, IPEndPoint endPoint, TimeSpan connectionTimeout)
{
var signal = new AutoResetEvent(false);
var e = new SocketAsyncEventArgs();
e.RemoteEndPoint = endPoint;
e.Completed += (sender, eventArgs) => signal.Set();
bool pending = s.ConnectAsync(e);
if (pending && !signal.WaitOne(connectionTimeout))
throw new TimeoutException($"Connection to {endPoint} could not be established in {connectionTimeout}");
if (e.SocketError != SocketError.Success || !s.Connected)
throw new OrleansException($"Could not connect to {endPoint}: {e.SocketError}");
} |
Fixed |
orleans.log
(frontend is Orleans Client app. backend is silo. Db is Postgresql)
Using StaticClustering configuration runs perfectly.
I saw an issue similar with this at microsoft/service-fabric-issues#1182
The text was updated successfully, but these errors were encountered: