Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seaweedfs 2.49 - weed server started with IPv4 address only listens on IPv6 #2098

Closed
Radtoo opened this issue May 28, 2021 · 9 comments
Closed

Comments

@Radtoo
Copy link
Contributor

Radtoo commented May 28, 2021

version 8000GB 2.49 42fb03a linux amd64

I invoke weed server like this (via a systemd service that runs it under a "seaweedfs" user):
weed server -rack=aaa -ip=192.168.1.130 -master.defaultReplication=010 -master.peers=192.168.1.130:9333 -filer=true -filer.port=8515 -filer.defaultReplicaPlacement=010 -volume.port=8526 -volume.max=56 -dir=/mnt/seaweedfs

192.168.1.130 is also that same machine's IPv4. But this happens:

Log file created at: 2021/05/29 00:29:31
Running on machine: redacted
Binary: Built with gc go1.16.4 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss threadid file:line] msg
I0529 00:29:31 50479 master.go:165] current: 192.168.1.130:9333 peers:192.168.1.130:9333
I0529 00:29:31 50479 file_util.go:23] Folder /mnt/seaweedfs Permission: -rwxr-xr-x
I0529 00:29:31 50479 master.go:165] current: 192.168.1.130:9333 peers:192.168.1.130:9333
I0529 00:29:31 50479 file_util.go:23] Folder /mnt/seaweedfs Permission: -rwxr-xr-x
I0529 00:29:31 50479 master_server.go:114] Volume Size Limit is 30000 MB
I0529 00:29:31 50479 master_server.go:206] adminScripts:
  lock
  ec.encode -fullPercent=96 -quietFor=3h
  ec.rebuild -force
  ec.balance -force
  volume.balance -force
  volume.fix.replication
  unlock
I0529 00:29:31 50479 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.1.130:19333: connect: connection refused"
I0529 00:29:31 50479 master.go:120] Start Seaweed Master 8000GB 2.49 42fb03a at :9333
I0529 00:29:31 50479 raft_server.go:70] Starting RaftServer with 192.168.1.130:9333
I0529 00:29:32 50479 raft_server.go:129] current cluster leader:
I0529 00:29:32 50479 master.go:143] Start Seaweed Master 8000GB 2.49 42fb03a grpc server at :19333
I0529 00:29:33 50479 masterclient.go:79] No existing leader found!
I0529 00:29:33 50479 raft_server.go:154] Initializing new cluster
I0529 00:29:33 50479 master_server.go:155] leader change event:  => 192.168.1.130:9333
I0529 00:29:33 50479 master_server.go:157] [ 192.168.1.130:9333 ] 192.168.1.130:9333 becomes leader.
I0529 00:29:33 50479 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0529 00:29:35 50479 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0529 00:29:37 50479 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
[Omitted: Many repeats of that last line with newer timestamp]

I observe on "lsof -i -P -n | grep LISTEN" that weed seems to have started gRPC on port + 10000, but only listening on IPv6. No mention of IPv4:

[Omitted: Many unrelated ports]
weed      49099       seaweedfs    9u  IPv6 356380      0t0  TCP *:9333 (LISTEN)
weed      49099       seaweedfs   11u  IPv6 350869      0t0  TCP *:19333 (LISTEN)

I imagine this is the problem, but I can't figure out what made seaweedfs use IPv6 when an IPv4 address was specified. Nothing was using 9333 or 19333 on IPv4. Perhaps it is a bug?

@chrislusf
Copy link
Collaborator

use -ip.bind=xxx

@Radtoo
Copy link
Contributor Author

Radtoo commented May 29, 2021

The older -ip flag is now just a way to define an instance/hostname string?

weed      282333       seaweedfs    9u  IPv4 1816835      0t0  TCP 192.168.1.130:9333 (LISTEN)
weed      282333       seaweedfs   11u  IPv4 1816836      0t0  TCP 192.168.1.130:19333 (LISTEN)

Yes, -ip.bind works, although I couldn't figure out how to bind to * on ipv4 for further tests. Something still goes wrong:

Running on machine: redacted
Binary: Built with gc go1.16.4 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss threadid file:line] msg
I0529 23:05:34 82333 master.go:165] current: 192.168.1.130:9333 peers:192.168.1.130:9333
I0529 23:05:34 82333 file_util.go:23] Folder /mnt/seaweedfs Permission: -rwxr-xr-x
I0529 23:05:34 82333 master.go:165] current: 192.168.1.130:9333 peers:192.168.1.130:9333
I0529 23:05:34 82333 file_util.go:23] Folder /mnt/seaweedfs Permission: -rwxr-xr-x
I0529 23:05:34 82333 master_server.go:114] Volume Size Limit is 30000 MB
I0529 23:05:34 82333 master_server.go:206] adminScripts:
  lock
  ec.encode -fullPercent=96 -quietFor=3h
  ec.rebuild -force
  ec.balance -force
  volume.balance -force
  volume.fix.replication
  unlock
I0529 23:05:34 82333 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.1.130:19333: connect: connection refused"
I0529 23:05:34 82333 master.go:120] Start Seaweed Master 8000GB 2.49 42fb03a at 192.168.1.130:9333
I0529 23:05:34 82333 raft_server.go:70] Starting RaftServer with 192.168.1.130:9333
I0529 23:05:34 82333 raft_server.go:129] current cluster leader:
I0529 23:05:34 82333 master.go:143] Start Seaweed Master 8000GB 2.49 42fb03a grpc server at 192.168.1.130:19333
I0529 23:05:35 82333 masterclient.go:79] No existing leader found!
I0529 23:05:35 82333 raft_server.go:154] Initializing new cluster
I0529 23:05:35 82333 master_server.go:155] leader change event:  => 192.168.1.130:9333
I0529 23:05:35 82333 master_server.go:157] [ 192.168.1.130:9333 ] 192.168.1.130:9333 becomes leader.
I0529 23:05:35 82333 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0529 23:05:37 82333 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
[Last line repeats]

Seems like the volume server tries to connect to 192.168.1.130:19333 before master is running and obviously fails. But even after it is started (apart from seeing the LISTENing ports, I can reach the web interface at 192.168.1.130:9333, with no volumes) there is still the repeating error at the end of the log. Does it actually give up on retrying gRPC on IPv4 after the first failure or such?

chrislusf added a commit that referenced this issue May 30, 2021
@Radtoo
Copy link
Contributor Author

Radtoo commented May 31, 2021

Almost the same error on the 2.50 release when using explicit ip.bind to 192.168.1.130 as in the last comment. It now just alternates between two different error messages with an new "use of closed network connection":

Running on machine: redacted
Binary: Built with gc go1.16.4 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss threadid file:line] msg
I0531 15:28:25 18059 master.go:165] current: 192.168.1.130:9333 peers:
I0531 15:28:25 18059 file_util.go:23] Folder /mnt/seaweedfs Permission: -rwxr-xr-x
I0531 15:28:25 18059 master.go:165] current: 192.168.1.130:9333 peers:192.168.1.130:9333
I0531 15:28:25 18059 file_util.go:23] Folder /mnt/seaweedfs Permission: -rwxr-xr-x
I0531 15:28:25 18059 master_server.go:114] Volume Size Limit is 30000 MB
I0531 15:28:25 18059 master_server.go:206] adminScripts:
  lock
  ec.encode -fullPercent=96 -quietFor=3h
  ec.rebuild -force
  ec.balance -force
  volume.balance -force
  volume.fix.replication
  unlock
I0531 15:28:25 18059 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.1.130:19333: connect: connection refused"
I0531 15:28:25 18059 master.go:120] Start Seaweed Master 8000GB 2.50 4233ad3 at :9333
I0531 15:28:25 18059 raft_server.go:70] Starting RaftServer with 192.168.1.130:9333
I0531 15:28:25 18059 raft_server.go:129] current cluster leader: 
I0531 15:28:25 18059 master.go:143] Start Seaweed Master 8000GB 2.50 4233ad3 grpc server at :19333
I0531 15:28:26 18059 masterclient.go:79] No existing leader found!
I0531 15:28:26 18059 raft_server.go:154] Initializing new cluster
I0531 15:28:26 18059 master_server.go:155] leader change event:  => 192.168.1.130:9333
I0531 15:28:26 18059 master_server.go:157] [ 192.168.1.130:9333 ] 192.168.1.130:9333 becomes leader.
I0531 15:28:26 18059 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0531 15:28:28 18059 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = use of closed network connection
I0531 15:28:30 18059 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0531 15:28:32 18059 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0531 15:28:34 18059 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0531 15:28:35 18059 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = use of closed network connection
I0531 15:28:37 18059 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0531 15:28:39 18059 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = use of closed network connection
[

@chrislusf
Copy link
Collaborator

The logs showed -ip.bind was not configured.

@Radtoo
Copy link
Contributor Author

Radtoo commented Jun 1, 2021

That was before the additional error which now happens in 2.50. What runs is currently:

CGroup: /system.slice/seaweedfs.service
             └─261144 weed server -rack=aaa -ip.bind=192.168.1.130 -master.defaultReplication=010 -master.peers=192.168.1.130:9333 -filer=true -filer.port=8515 -filer.defaultReplicaPlacement=010 -volume.port=8526 -volume.max=56 -dir=/mnt/seaweedfs

I deleted the older logs prior to restarting to be certain this can only be from the current attempt:

Log file created at: 2021/06/01 06:57:24
Running on machine: redacted
Binary: Built with gc go1.16.4 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss threadid file:line] msg
I0601 06:57:24 61144 master.go:165] current: 192.168.1.130:9333 peers:192.168.1.130:9333
I0601 06:57:24 61144 file_util.go:23] Folder /mnt/seaweedfs Permission: -rwxr-xr-x
I0601 06:57:24 61144 file_util.go:23] Folder /mnt/seaweedfs Permission: -rwxr-xr-x
I0601 06:57:24 61144 master.go:165] current: 192.168.1.130:9333 peers:192.168.1.130:9333
I0601 06:57:24 61144 master_server.go:114] Volume Size Limit is 30000 MB
I0601 06:57:24 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.1.130:19333: connect: connection refused"
I0601 06:57:24 61144 master_server.go:206] adminScripts:
  lock
  ec.encode -fullPercent=96 -quietFor=3h
  ec.rebuild -force
  ec.balance -force
  volume.balance -force
  volume.fix.replication
  unlock
I0601 06:57:24 61144 master.go:120] Start Seaweed Master 8000GB 2.50 4233ad3 at 192.168.1.130:9333
I0601 06:57:24 61144 raft_server.go:70] Starting RaftServer with 192.168.1.130:9333
I0601 06:57:24 61144 raft_server.go:129] current cluster leader: 
I0601 06:57:24 61144 master.go:143] Start Seaweed Master 8000GB 2.50 4233ad3 grpc server at 192.168.1.130:19333
I0601 06:57:25 61144 masterclient.go:79] No existing leader found!
I0601 06:57:25 61144 raft_server.go:154] Initializing new cluster
I0601 06:57:25 61144 master_server.go:155] leader change event:  => 192.168.1.130:9333
I0601 06:57:25 61144 master_server.go:157] [ 192.168.1.130:9333 ] 192.168.1.130:9333 becomes leader.
I0601 06:57:26 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:57:27 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:57:29 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = write tcp 192.168.1.130:48766->192.168.1.130:19333: write: broken pipe
I0601 06:57:31 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:57:33 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:57:35 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:57:36 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:57:38 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:57:40 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:57:42 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:57:44 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:57:45 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:57:47 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:57:49 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:57:51 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:57:53 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:57:54 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = write tcp 192.168.1.130:48902->192.168.1.130:19333: write: connection reset by peer
I0601 06:57:56 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:57:58 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:58:00 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:58:02 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:58:03 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = use of closed network connection
I0601 06:58:05 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
I0601 06:58:07 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = use of closed network connection
I0601 06:58:09 61144 volume_grpc_client_to_master.go:42] checkWithMaster 192.168.1.130:9333: get master 192.168.1.130:9333 configuration: rpc error: code = Unavailable desc = connection closed
[+ repetitions]

@chrislusf
Copy link
Collaborator

Maybe try to make it reproducible via some docker instance first.

@Radtoo
Copy link
Contributor Author

Radtoo commented Jun 1, 2021

I can check if it works in Podman (no Docker on that machine, unfortunately). Seems like the default compose setup is running the parts individually rather than all in one - I'll try that.

@Radtoo
Copy link
Contributor Author

Radtoo commented Jun 4, 2021

Figured it out thanks to trying more with the containers.

These remaining issues were not due to IPv4/IPv6 networking but the security certificates coincidentially had additionally expired just now (they have a short expiration time based on certstrap defaults). Results in the above errors and weed shell hanging rather than more obvious verbose complaints about expired certificates, apparently. Thanks for the help with this issue.

@chrislusf
Copy link
Collaborator

Thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants