Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hot, warm, cold storage [feature] #1792

Closed
divanikus opened this issue Feb 9, 2021 · 40 comments
Closed

Hot, warm, cold storage [feature] #1792

divanikus opened this issue Feb 9, 2021 · 40 comments

Comments

@divanikus
Copy link
Contributor

divanikus commented Feb 9, 2021

I'm currently running a setup with 3 dedicated masters/filers and 4 volume servers 10 TB HDD each.
Our app is writing data to daily buckets. When workload is kind of write-only, everything performs quite well. But if random reading kicks in, performance seem to suffer a lot. I'm currently investigating which side is suffering, our app or the storage. But I want to clear this out for myself: is it possible to organize hot, warm and cold tiers inside one cluster?

I mean, create new buckets on hot storage, for example NVMe SSD based volume servers, and later move it by a call to not so often accessed HDD volume servers (warm). I've read about cloud tier uploads, so it would work for cold phase I guess. But what about hot to warm transition?

Or maybe I'm missing something and I have just misconfigured something so I can really speed up my cluster without any extra abstractions?

@LazyDBA247-Anyvision
Copy link
Contributor

currently, you can do this, but you'll need to create the movement script/job by yourself.
the volume server can have two disks and tag them accordingly.
example:
volume -max=10000 -disk=hdd,ssd -ip=seaweedfs-volume-localnode.tls.ai -mserver=seaweedfs-master-localnode.tls.ai:9333 -port=8080 -dir=/data,/ssd -compactionMBps=40 -minFreeSpacePercent=7

please notice the -disk=hdd,ssd & -dir=/data,/ssd

then you can create with fs.configure a path specific setting and set buckets/collections that start with (example) "ssd_" to be allocated to the ssd, and all other collection will be created on the hdd. you can also create specific bucket name...

example (within the weed shell):
fs.configure -locationPrefix=/buckets/ssd_ -disk=ssd -apply

https://github.com/chrislusf/seaweedfs/wiki/Path-Specific-Configuration

you will need to create the job that move files from ssd_ collection/bucket to another collection.

@LazyDBA247-Anyvision
Copy link
Contributor

not tested by me (yet) but you may also try just putting the idx files on the ssd.
by using the parameter in the volume server
-dir.idx=/ssd
to save the index file on fast ssd mount

@LazyDBA247-Anyvision
Copy link
Contributor

do you have Prometheus exists in the DC, you can enable metrics and get some metrics on the SW operations, and maybe identify the problem...
https://github.com/chrislusf/seaweedfs/wiki/System-Metrics

@divanikus
Copy link
Contributor Author

@LazyDBA247-Anyvision As far as I understand your approach, I'll have to move files from one bucket to another manually. I believe it is suboptimal, as we can just move the whole bucket's volume to another server as one piece. It should be tons faster than copying separate files file by file. My vision of such operation would be to somehow stick newly created volume to a "hot" rack and change their allocation to a "warm" rack(s) by a call, a script, whatever. Kind of hierarchical storage approach. I'm not insisting on that, just my thoughts about it.

Yes, I have Prometheus and I'm collecting metrics, but currently they are somewhat unclear to me
image

The actual problem is happening between 8 am to 2 pm, although according to graph the slowest response times are somewhere in between 1 am and 8 am, when mostly write-only workload is happening and no one is really complaining about that. So I'm unsure if it is the storage fault, but I find the idea of hierarchical storage interesting to at least discuss.

@chrislusf
Copy link
Collaborator

Probably you need to increase the volume counts to increase concurrency.

@divanikus
Copy link
Contributor Author

You mean the volume server count? Or per bucket volumes?

@chrislusf
Copy link
Collaborator

volume count. not volume server count.

@divanikus
Copy link
Contributor Author

I guess you mean that setting? I'm a little bit scared to increase it, because I only have 4 physical HDDs as of now, and concurrency is not the strongest side of HDDs.

@LazyDBA247-Anyvision
Copy link
Contributor

another hack you can do (if you know the data is hot only for X days)
you can do my offer of the Path specific setting + use TTL on the SSD but write to 2 collections, the HOT (with the TTL) and WARN(HDD), and just have the access logic (which collection to access, by the TTL) in the application code,
and except for two "simple" application changes, seaweed will delete/clean your HOT collections...

@chrislusf
Copy link
Collaborator

Added a feature to change disk type ssd <=> hdd when moving a volume. 770393a

Need to think of a command to make this easier.

@LazyDBA247-Anyvision
Copy link
Contributor

@chrislusf is that will happen after you "remove" the collection config of ssd?

cause the collection configuration will still say 'ssd'...
can we change the configuration of the collection and just use balance command?

@chrislusf
Copy link
Collaborator

collection does not have disk type, but the volumes have, depending on which folder it stays in.

@LazyDBA247-Anyvision
Copy link
Contributor

so the fs.configure only used on volume creation?

@divanikus
Copy link
Contributor Author

divanikus commented Feb 10, 2021

To my understanding, it would be cool to have ability to label volume servers with user supplied values. And placement of a volume is based on that label. So to move to another tier is just to change volume's label (by volume.move for example).

Another option I can think of, is to label volume servers as different DCs and having rack-level replication move volumes from one DC to another (in the logical sense of course).

@chrislusf
Copy link
Collaborator

so the fs.configure only used on volume creation?

Correct. But the disk type of a volume would not change unless the admin decided to change it.

@chrislusf
Copy link
Collaborator

To my understanding, it would be cool to have ability to label volume servers with user supplied values. And placement of a volume is based on that label. So to move to another tier is just to change volume's label (by volume.move for example).

btw: this is already implemented in the git master branch.

@divanikus
Copy link
Contributor Author

btw: this is already implemented in the git master branch.

Could you please explain how to use it?

@chrislusf
Copy link
Collaborator

@divanikus
Copy link
Contributor Author

divanikus commented Feb 10, 2021

So, all I need is to add -disk label to my volume servers, apply path specific settings to bucket and later move volumes manually. By manually I mean I have to find which volumes are on which server and manually determinate to which other volume server I have to move it, right? Sounds a little bit complicated, but not a deal breaker for sure.

@chrislusf
Copy link
Collaborator

yes. this is exactly what @LazyDBA247-Anyvision mentioned before.

Any suggestion on how to design for the tiering commands? How about this?

volume.tier -from=ssd -to=hdd -collection=xxx
volume.tier -from=ssd -to=hdd -collection=xxx -quietFor=1d
volume.tier -from=ssd -to=hdd -collection=xxx -fullPercent=95

Actually will there be any needs to change from hdd to ssd?

@divanikus
Copy link
Contributor Author

divanikus commented Feb 10, 2021

I think it would be cool to have not just ssd and hdd, but any label whatever user would supply. So if someone would like to have more than just 2-3 tiers, he would be able to do that.

How about this?

Looks promising 👍

@LazyDBA247-Anyvision
Copy link
Contributor

@divanikus you can tag whatever, you choose, the example is with hdd,ssd to be more understandable :)

@chrislusf
Copy link
Collaborator

Current implementation only supports ssd and hdd. Flexible tags are possible. I am not sure there is real need for it though.

@divanikus
Copy link
Contributor Author

Well, I can think of several HDD tiers for example. Like fast 15000 rpm SAS drives for warm storage and plain 5400 rpm SATAs for "slightly colder" storage :)

@divanikus
Copy link
Contributor Author

I see that at least parts of this feature are now in the current release. Is volume.tier is also there? @chrislusf

@chrislusf
Copy link
Collaborator

not yet

@divanikus
Copy link
Contributor Author

So ok, I have added two dedicated volume servers with SSDs, i've started them with -disk=ssd flag. I've issued fs.configure and added this settings:

{
  "locations": [
    {
      "locationPrefix": "/buckets",
      "diskType": "ssd"
    },
    {
      "locationPrefix": "/buckets/mybucket",
      "diskType": "ssd"
    },
    {
      "locationPrefix": "/buckets/mybucket-",
      "diskType": "ssd"
    }
  ]
}

My buckets are like mybucket-2021-02-19. But whatever I do, it keeps creating them with hdd flag and on the hdd volume servers! Am I missing something?

@divanikus
Copy link
Contributor Author

So, another thing. I've created a bucket and used volume.move to move it to SSD manually. It worked. I can see it's volumes on SSD servers. But now I'm trying to pull those volumes back to HDD server. And I got this:

> volume.move -volumeId 4206 -target 192.168.65.57:8080 -source 192.168.65.63:8080 -disk hdd
2021/02/19 00:41:29 copying volume 4206 from 192.168.65.63:8080 to 192.168.65.57:8080
error: copy volume 4206 from 192.168.65.63:8080 to 192.168.65.57:8080: rpc error: code = Unknown desc = no space left

@chrislusf
Copy link
Collaborator

My buckets are like mybucket-2021-02-19. But whatever I do, it keeps creating them with hdd flag and on the hdd volume servers! Am I missing something?

Added a fix 776f497 for this.

@chrislusf
Copy link
Collaborator

volume.move -volumeId 4206 -target 192.168.65.57:8080 -source 192.168.65.63:8080 -disk hdd
2021/02/19 00:41:29 copying volume 4206 from 192.168.65.63:8080 to 192.168.65.57:8080
error: copy volume 4206 from 192.168.65.63:8080 to 192.168.65.57:8080: rpc error: code = Unknown desc = no space left

Need to see what's the output of volume.list first.

@divanikus
Copy link
Contributor Author

Here it is.

@chrislusf
Copy link
Collaborator

Here it is.

Please try the latest version.

@divanikus
Copy link
Contributor Author

I've tried the latest on my test environment, looks good. Bucket's volumes are placed on correct volume server and I can move volume to and from ssd no matter on which server it was firstly created. Will try on production soon.

@divanikus
Copy link
Contributor Author

BTW, noticed that volume.tier.upload is mentioned twice in help:

  volume.tier.download                  # download the dat file of a volume from a remote tier
  volume.tier.upload                    # change a volume from one disk type to another
  volume.tier.upload                    # upload the dat file of a volume to a remote tier

@chrislusf
Copy link
Collaborator

BTW, noticed that volume.tier.upload is mentioned twice in help

Ack. Added a fix b961cd6

@chrislusf chrislusf reopened this Feb 19, 2021
@divanikus
Copy link
Contributor Author

One thing which I didn't noticed on my test env and made me freak out a little bit on production is that volume list has disappeared from the master's view 😅
image

chrislusf added a commit that referenced this issue Feb 19, 2021
@chrislusf
Copy link
Collaborator

chrislusf commented Feb 22, 2021

Is volume.tier is also there?

volume.tier.move is added in 6a4546d

@divanikus
Copy link
Contributor Author

Currently trying volume.tier.move with 2.28.
@chrislusf is this warning message is ok?

moving volume 4266 from 192.168.65.62:8080 to 192.168.65.58:8080 with disk type hdd ...
markVolumeReadonly 4266 on 192.168.65.63:8080 ...
markVolumeReadonly 4266 on 192.168.65.62:8080 ...
2021/02/23 14:29:14 copying volume 4266 from 192.168.65.62:8080 to 192.168.65.58:8080
2021/02/23 14:29:14 tailing volume 4266 from 192.168.65.62:8080 to 192.168.65.58:8080
2021/02/23 14:29:26 deleting volume 4266 from 192.168.65.62:8080
2021/02/23 14:29:26 moved volume 4266 from 192.168.65.62:8080 to 192.168.65.58:8080
moving volume 4266 from 192.168.65.62:8080 to 192.168.65.59:8080 with disk type hdd ...
markVolumeReadonly 4266 on 192.168.65.63:8080 ...
tier move volume 4266: mark volume 4266 as readonly on 192.168.65.63:8080: rpc error: code = Unknown desc = volume 4266 not found

Also, I've found that nothing gets selected unless I add -fullPercent=0.001 parameter.

@chrislusf
Copy link
Collaborator

should be ok. seems a bug with replicated volumes.

@chrislusf
Copy link
Collaborator

added a fix. Just need to update the weed shell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants