Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filer.copy fails on a "single" error and does not try reconnecting to master #2056

Closed
Suika opened this issue May 7, 2021 · 0 comments
Closed

Comments

@Suika
Copy link
Contributor

Suika commented May 7, 2021

Describe the bug
I setup another instance of seaweed on a quite busy server. Got in ~1466875 files ~60GB in size over ~10 hours via filer.copy into seaweed and then the following error happened.

copy file error: Failed to assign from [seaweed-master:9333]: assign volume failure count:1 collection:"kemono-thumbs" replication:"000" path:"/buckets/thumbs/36432417/": assign volume: failed to parse master : server port parse error: server should have hostname:port format:

That error times 8, but for different folders.
Sadly I lost all the master logs, but it seems that for a short moment the master was not reachable and caused the filer.copy to abort.

System Setup

  • See docker-compose further below
  • Ubuntu 20.04.2 LTS
  • output of version 8000GB 2.43 c48ef78 linux amd64
  • default filer.toml
[master.maintenance]
sleep_minutes = 17          # sleep minutes between each script execution

[master.filer]
default = "localhost:8888"    # used by maintenance scripts if the scripts needs to use fs related commands

[master.sequencer]
sequencer_etcd_urls = "http://127.0.0.1:2379"

[master.volume_growth]
copy_1 = 1                # create 1 x 7 = 7 actual volumes
copy_2 = 1                # create 2 x 6 = 12 actual volumes
copy_3 = 1                # create 3 x 3 = 9 actual volumes
copy_other = 1            # create n x 1 = n actual volumes

[master.replication]
treat_replication_as_minimums = false
version: "3.8"
services:
  seaweed-master:
    image: chrislusf/seaweedfs:2.43_large_disk
    container_name: seaweed-master
    command: "master -ip=seaweed-master -volumeSizeLimitMB=40000"
    stop_grace_period: 1m
    restart: always
    ports:
      - 127.0.0.1:9333:9333
      - 127.0.0.1:19333:19333
    volumes:
      - ./master.toml:/etc/seaweedfs/master.toml
      - ./seaweedfs/master:/data

  seaweed-volume-1:
    image: chrislusf/seaweedfs:2.43_large_disk
    container_name: seaweed-volume-1
    command: 'volume -mserver="seaweed-master:9333" -port=8080 -dataCenter=test -rack=hdd'
    stop_grace_period: 1m
    restart: always
    ports:
      - 127.0.0.1:8081:8080
      - 127.0.0.1:18080:18080
    volumes:
      - ./seaweedfs/seaweed-volume-1:/data
    depends_on:
      seaweed-master:
          condition: service_started

  seaweed-filer:
    image: chrislusf/seaweedfs:2.43_large_disk
    container_name: seaweed-filer
    command: 'filer -master="seaweed-master:9333" -disableDirListing -port.readonly=7865'
    stop_grace_period: 1m
    tty: true
    stdin_open: true
    environment:
      WEED_LEVELDB2_ENABLED: 'true'
      WEED_LEVELDB2_DIR: '/data'
    restart: always
    volumes:
      - ./seaweedfs/filer:/data
    ports:
      - 127.0.0.1:8888:8888
      - 127.0.0.1:18888:18888
      - 127.0.0.1:7865:7865
    depends_on:
      seaweed-master:
          condition: service_started
      seaweed-volume-1:
          condition: service_started

Expected behavior
Make filer.copy a bit more resilient to network errors by waiting and retrying the connection. It's not like the inserts failed, but simply the master was not reachable.
There were 8 lines of copy file error: for different folders/files and all of them have assign volume failure count:1.

There is another problem that stems from how filer.copy works, which makes this problem such a pain. Filer.copy does not know how to check if file exists and compare time and size.
Which maybe allow it to skip the file uploads that already exist in the filer. Right now, if you run filer.copy 2 times it will simply delete existing file in the filer and re-upload the "new" file again.

Additional context
The system has IO problems. Better to say, the IO is sometimes squeezed dry by webserver and other tools, which is why we want to see how seaweed fares on such a system, hosting the thumbs for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant