Hot, warm, cold storage [feature] #1792

divanikus · 2021-02-09T18:11:35Z

I'm currently running a setup with 3 dedicated masters/filers and 4 volume servers 10 TB HDD each.
Our app is writing data to daily buckets. When workload is kind of write-only, everything performs quite well. But if random reading kicks in, performance seem to suffer a lot. I'm currently investigating which side is suffering, our app or the storage. But I want to clear this out for myself: is it possible to organize hot, warm and cold tiers inside one cluster?

I mean, create new buckets on hot storage, for example NVMe SSD based volume servers, and later move it by a call to not so often accessed HDD volume servers (warm). I've read about cloud tier uploads, so it would work for cold phase I guess. But what about hot to warm transition?

Or maybe I'm missing something and I have just misconfigured something so I can really speed up my cluster without any extra abstractions?

LazyDBA247-Anyvision · 2021-02-09T18:42:52Z

currently, you can do this, but you'll need to create the movement script/job by yourself.
the volume server can have two disks and tag them accordingly.
example:
volume -max=10000 -disk=hdd,ssd -ip=seaweedfs-volume-localnode.tls.ai -mserver=seaweedfs-master-localnode.tls.ai:9333 -port=8080 -dir=/data,/ssd -compactionMBps=40 -minFreeSpacePercent=7

please notice the -disk=hdd,ssd & -dir=/data,/ssd

then you can create with fs.configure a path specific setting and set buckets/collections that start with (example) "ssd_" to be allocated to the ssd, and all other collection will be created on the hdd. you can also create specific bucket name...

example (within the weed shell):
fs.configure -locationPrefix=/buckets/ssd_ -disk=ssd -apply

https://github.com/chrislusf/seaweedfs/wiki/Path-Specific-Configuration

you will need to create the job that move files from ssd_ collection/bucket to another collection.

LazyDBA247-Anyvision · 2021-02-09T18:54:51Z

not tested by me (yet) but you may also try just putting the idx files on the ssd.
by using the parameter in the volume server
-dir.idx=/ssd
to save the index file on fast ssd mount

LazyDBA247-Anyvision · 2021-02-09T18:56:47Z

do you have Prometheus exists in the DC, you can enable metrics and get some metrics on the SW operations, and maybe identify the problem...
https://github.com/chrislusf/seaweedfs/wiki/System-Metrics

divanikus · 2021-02-09T20:22:57Z

@LazyDBA247-Anyvision As far as I understand your approach, I'll have to move files from one bucket to another manually. I believe it is suboptimal, as we can just move the whole bucket's volume to another server as one piece. It should be tons faster than copying separate files file by file. My vision of such operation would be to somehow stick newly created volume to a "hot" rack and change their allocation to a "warm" rack(s) by a call, a script, whatever. Kind of hierarchical storage approach. I'm not insisting on that, just my thoughts about it.

Yes, I have Prometheus and I'm collecting metrics, but currently they are somewhat unclear to me

The actual problem is happening between 8 am to 2 pm, although according to graph the slowest response times are somewhere in between 1 am and 8 am, when mostly write-only workload is happening and no one is really complaining about that. So I'm unsure if it is the storage fault, but I find the idea of hierarchical storage interesting to at least discuss.

chrislusf · 2021-02-09T20:43:12Z

Probably you need to increase the volume counts to increase concurrency.

divanikus · 2021-02-09T21:44:55Z

You mean the volume server count? Or per bucket volumes?

chrislusf · 2021-02-09T22:09:03Z

volume count. not volume server count.

divanikus · 2021-02-09T22:24:30Z

I guess you mean that setting? I'm a little bit scared to increase it, because I only have 4 physical HDDs as of now, and concurrency is not the strongest side of HDDs.

LazyDBA247-Anyvision · 2021-02-10T07:31:25Z

another hack you can do (if you know the data is hot only for X days)
you can do my offer of the Path specific setting + use TTL on the SSD but write to 2 collections, the HOT (with the TTL) and WARN(HDD), and just have the access logic (which collection to access, by the TTL) in the application code,
and except for two "simple" application changes, seaweed will delete/clean your HOT collections...

chrislusf · 2021-02-10T07:59:59Z

Added a feature to change disk type ssd <=> hdd when moving a volume. 770393a

Need to think of a command to make this easier.

LazyDBA247-Anyvision · 2021-02-10T08:31:54Z

@chrislusf is that will happen after you "remove" the collection config of ssd?

cause the collection configuration will still say 'ssd'...
can we change the configuration of the collection and just use balance command?

chrislusf · 2021-02-10T08:42:42Z

collection does not have disk type, but the volumes have, depending on which folder it stays in.

LazyDBA247-Anyvision · 2021-02-10T08:44:14Z

so the fs.configure only used on volume creation?

divanikus · 2021-02-10T09:25:07Z

To my understanding, it would be cool to have ability to label volume servers with user supplied values. And placement of a volume is based on that label. So to move to another tier is just to change volume's label (by volume.move for example).

Another option I can think of, is to label volume servers as different DCs and having rack-level replication move volumes from one DC to another (in the logical sense of course).

chrislusf · 2021-02-10T09:34:16Z

so the fs.configure only used on volume creation?

Correct. But the disk type of a volume would not change unless the admin decided to change it.

chrislusf · 2021-02-10T09:34:49Z

To my understanding, it would be cool to have ability to label volume servers with user supplied values. And placement of a volume is based on that label. So to move to another tier is just to change volume's label (by volume.move for example).

btw: this is already implemented in the git master branch.

divanikus · 2021-02-10T09:40:15Z

btw: this is already implemented in the git master branch.

Could you please explain how to use it?

chrislusf · 2021-02-10T09:59:21Z

see https://github.com/chrislusf/seaweedfs/wiki/Tiered-Storage

divanikus · 2021-02-10T10:08:50Z

So, all I need is to add -disk label to my volume servers, apply path specific settings to bucket and later move volumes manually. By manually I mean I have to find which volumes are on which server and manually determinate to which other volume server I have to move it, right? Sounds a little bit complicated, but not a deal breaker for sure.

chrislusf · 2021-02-10T10:26:46Z

yes. this is exactly what @LazyDBA247-Anyvision mentioned before.

Any suggestion on how to design for the tiering commands? How about this?

volume.tier -from=ssd -to=hdd -collection=xxx
volume.tier -from=ssd -to=hdd -collection=xxx -quietFor=1d
volume.tier -from=ssd -to=hdd -collection=xxx -fullPercent=95

Actually will there be any needs to change from hdd to ssd?

divanikus · 2021-02-10T10:34:57Z

I think it would be cool to have not just ssd and hdd, but any label whatever user would supply. So if someone would like to have more than just 2-3 tiers, he would be able to do that.

How about this?

Looks promising 👍

LazyDBA247-Anyvision · 2021-02-10T10:42:22Z

@divanikus you can tag whatever, you choose, the example is with hdd,ssd to be more understandable :)

chrislusf · 2021-02-10T10:59:33Z

Current implementation only supports ssd and hdd. Flexible tags are possible. I am not sure there is real need for it though.

divanikus · 2021-02-10T11:17:56Z

Well, I can think of several HDD tiers for example. Like fast 15000 rpm SAS drives for warm storage and plain 5400 rpm SATAs for "slightly colder" storage :)

divanikus · 2021-02-16T11:39:38Z

I see that at least parts of this feature are now in the current release. Is volume.tier is also there? @chrislusf

chrislusf · 2021-02-16T12:50:50Z

not yet

divanikus · 2021-02-18T21:19:25Z

So ok, I have added two dedicated volume servers with SSDs, i've started them with -disk=ssd flag. I've issued fs.configure and added this settings:

{
  "locations": [
    {
      "locationPrefix": "/buckets",
      "diskType": "ssd"
    },
    {
      "locationPrefix": "/buckets/mybucket",
      "diskType": "ssd"
    },
    {
      "locationPrefix": "/buckets/mybucket-",
      "diskType": "ssd"
    }
  ]
}

My buckets are like mybucket-2021-02-19. But whatever I do, it keeps creating them with hdd flag and on the hdd volume servers! Am I missing something?

divanikus · 2021-02-18T21:43:32Z

So, another thing. I've created a bucket and used volume.move to move it to SSD manually. It worked. I can see it's volumes on SSD servers. But now I'm trying to pull those volumes back to HDD server. And I got this:

> volume.move -volumeId 4206 -target 192.168.65.57:8080 -source 192.168.65.63:8080 -disk hdd
2021/02/19 00:41:29 copying volume 4206 from 192.168.65.63:8080 to 192.168.65.57:8080
error: copy volume 4206 from 192.168.65.63:8080 to 192.168.65.57:8080: rpc error: code = Unknown desc = no space left

related to #1792

chrislusf · 2021-02-19T01:07:48Z

My buckets are like mybucket-2021-02-19. But whatever I do, it keeps creating them with hdd flag and on the hdd volume servers! Am I missing something?

Added a fix 776f497 for this.

chrislusf · 2021-02-19T09:45:46Z

volume.move -volumeId 4206 -target 192.168.65.57:8080 -source 192.168.65.63:8080 -disk hdd
2021/02/19 00:41:29 copying volume 4206 from 192.168.65.63:8080 to 192.168.65.57:8080
error: copy volume 4206 from 192.168.65.63:8080 to 192.168.65.57:8080: rpc error: code = Unknown desc = no space left

Need to see what's the output of volume.list first.

divanikus · 2021-02-19T10:03:17Z

Here it is.

chrislusf · 2021-02-19T10:41:08Z

Here it is.

Please try the latest version.

divanikus · 2021-02-19T11:13:24Z

I've tried the latest on my test environment, looks good. Bucket's volumes are placed on correct volume server and I can move volume to and from ssd no matter on which server it was firstly created. Will try on production soon.

divanikus · 2021-02-19T11:15:09Z

BTW, noticed that volume.tier.upload is mentioned twice in help:

  volume.tier.download                  # download the dat file of a volume from a remote tier
  volume.tier.upload                    # change a volume from one disk type to another
  volume.tier.upload                    # upload the dat file of a volume to a remote tier

chrislusf · 2021-02-19T11:40:18Z

BTW, noticed that volume.tier.upload is mentioned twice in help

Ack. Added a fix b961cd6

divanikus · 2021-02-19T20:51:18Z

One thing which I didn't noticed on my test env and made me freak out a little bit on production is that volume list has disappeared from the master's view 😅

address #1792 (comment)

chrislusf · 2021-02-22T09:31:21Z

Is volume.tier is also there?

volume.tier.move is added in 6a4546d

divanikus · 2021-02-23T11:31:39Z

Currently trying volume.tier.move with 2.28.
@chrislusf is this warning message is ok?

moving volume 4266 from 192.168.65.62:8080 to 192.168.65.58:8080 with disk type hdd ...
markVolumeReadonly 4266 on 192.168.65.63:8080 ...
markVolumeReadonly 4266 on 192.168.65.62:8080 ...
2021/02/23 14:29:14 copying volume 4266 from 192.168.65.62:8080 to 192.168.65.58:8080
2021/02/23 14:29:14 tailing volume 4266 from 192.168.65.62:8080 to 192.168.65.58:8080
2021/02/23 14:29:26 deleting volume 4266 from 192.168.65.62:8080
2021/02/23 14:29:26 moved volume 4266 from 192.168.65.62:8080 to 192.168.65.58:8080
moving volume 4266 from 192.168.65.62:8080 to 192.168.65.59:8080 with disk type hdd ...
markVolumeReadonly 4266 on 192.168.65.63:8080 ...
tier move volume 4266: mark volume 4266 as readonly on 192.168.65.63:8080: rpc error: code = Unknown desc = volume 4266 not found

Also, I've found that nothing gets selected unless I add -fullPercent=0.001 parameter.

chrislusf · 2021-02-23T11:37:51Z

should be ok. seems a bug with replicated volumes.

fix #1792 (comment)

chrislusf · 2021-02-23T11:50:31Z

added a fix. Just need to update the weed shell.

chrislusf added a commit that referenced this issue Feb 19, 2021

filer: fs.configure should try to read from entry.content also

776f497

related to #1792

chrislusf closed this as completed Feb 19, 2021

chrislusf reopened this Feb 19, 2021

chrislusf added a commit that referenced this issue Feb 19, 2021

add back volume ids

a37473a

address #1792 (comment)

chrislusf closed this as completed Feb 22, 2021

chrislusf added a commit that referenced this issue Feb 23, 2021

volume.tier.move: avoid repeated move for replicated volumes

9edd964

fix #1792 (comment)

Hot, warm, cold storage [feature] #1792

Hot, warm, cold storage [feature] #1792

Comments

divanikus commented Feb 9, 2021 • edited Loading

LazyDBA247-Anyvision commented Feb 9, 2021

LazyDBA247-Anyvision commented Feb 9, 2021

LazyDBA247-Anyvision commented Feb 9, 2021

divanikus commented Feb 9, 2021

chrislusf commented Feb 9, 2021

divanikus commented Feb 9, 2021

chrislusf commented Feb 9, 2021

divanikus commented Feb 9, 2021

LazyDBA247-Anyvision commented Feb 10, 2021

chrislusf commented Feb 10, 2021

LazyDBA247-Anyvision commented Feb 10, 2021

chrislusf commented Feb 10, 2021

LazyDBA247-Anyvision commented Feb 10, 2021

divanikus commented Feb 10, 2021 • edited Loading

chrislusf commented Feb 10, 2021

chrislusf commented Feb 10, 2021

divanikus commented Feb 10, 2021

chrislusf commented Feb 10, 2021

divanikus commented Feb 10, 2021 • edited Loading

chrislusf commented Feb 10, 2021

divanikus commented Feb 10, 2021 • edited Loading

LazyDBA247-Anyvision commented Feb 10, 2021

chrislusf commented Feb 10, 2021

divanikus commented Feb 10, 2021

divanikus commented Feb 16, 2021

chrislusf commented Feb 16, 2021

divanikus commented Feb 18, 2021

divanikus commented Feb 18, 2021

chrislusf commented Feb 19, 2021

chrislusf commented Feb 19, 2021

divanikus commented Feb 19, 2021

chrislusf commented Feb 19, 2021

divanikus commented Feb 19, 2021

divanikus commented Feb 19, 2021

chrislusf commented Feb 19, 2021

divanikus commented Feb 19, 2021

chrislusf commented Feb 22, 2021 • edited Loading

divanikus commented Feb 23, 2021

chrislusf commented Feb 23, 2021

chrislusf commented Feb 23, 2021

divanikus commented Feb 9, 2021 •

edited

Loading

divanikus commented Feb 10, 2021 •

edited

Loading

divanikus commented Feb 10, 2021 •

edited

Loading

divanikus commented Feb 10, 2021 •

edited

Loading

chrislusf commented Feb 22, 2021 •

edited

Loading