Skip to content

HDDS-3755. Storage-class support for Ozone#1208

Closed
elek wants to merge 30 commits intoapache:masterfrom
elek:storage-class
Closed

HDDS-3755. Storage-class support for Ozone#1208
elek wants to merge 30 commits intoapache:masterfrom
elek:storage-class

Conversation

@elek
Copy link
Member

@elek elek commented Jul 16, 2020

Created together with @maobaolong (thanks the help)

What changes were proposed in this pull request?

This is the initial draft for storage-class support. It introduces the storage-class and uses it instead of replication factor / type.

Legacy clients are supported with converting back to factor / type to storage-class.

Storage classes are hard-coded in the java code (can be changed later to be configurable).

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3755

How was this patch tested?

With CI tests.

@elek elek marked this pull request as draft July 16, 2020 13:13
@elek
Copy link
Member Author

elek commented Jul 16, 2020

Would you like to merge #920 to make it possible to accept any number of replication which specified by StorageClass or rebase that works to this storageClass POC? Both are ok for me.

I think the easiest approach can be to modify only the ClosedStateConfiguration.replicationFactor to be an integer. And keep the OpenStateConfiguration as is. OpenStateConfiguration can be more tricky, but modifying the closed state seems to be safe.

@elek elek requested review from arp7, jnp and umamaheswararao July 16, 2020 13:19
@maobaolong
Copy link
Member

@elek Do you think we should use StorageClass class to replace the String for StorageClass type within a Service, we use string to transfer storageClass only in client to OM. Do you think this will make sense?

@elek elek marked this pull request as ready for review August 26, 2020 13:20
@elek
Copy link
Member Author

elek commented Aug 26, 2020

Thanks for the help from @maobaolong , it's ready to review.

The key points of the patch:

  1. o3fs/ofs are not changed just ozone sh.

Instead of:

ozone sh key put -r THREE -t RATIS ....

you can use:

ozone sh key put --storage-class STANDARD ...
  1. Storage classes are hard coded in this version (see StaticStorageClassRegistry.java). There is no way to configure or introduce new one.

  2. Should be backward compatible. If storage class is not used but the old factory/type is sent from the client, it's converted to storage class:

+    String storageClass = null;
+    if (info.hasStorageClass()) {
+      storageClass = info.getStorageClass();
+    }
+    if (StringUtils.isBlank(storageClass)) {
+      storageClass = StorageClassConverter.convert(
+          null, info.getReplicationFactor(),
+          info.getReplicationType()).getName();
+    }
  1. But under the hood, the storage class abstraction is used.

  2. SCM allocates block as before (BlockManagerImpl.java)

+    final ReplicationFactor factor =
+        storageClass.getOpenStateConfiguration().getReplicationFactor();
+
+    final ReplicationType type =
+        storageClass.getOpenStateConfiguration().getReplicationType();

Remaining logic is the same.

  1. ReplicationManager works as before, but the expected number of the replicas are got from the storage class:
+      StorageClass storageClass =
+          storageClassRegistry.getStorageClass(container.getStorageClass());
  1. ContainerStateManager is slightly modified. Until containers are sorted by "owner" now they are sorted by "owner" and "storageClass". Logic is the same. If there is a container with the same owner and storageClass it can be returned.

  2. storageClasss is stored and send as string (make it possible to use ANY storage class in the future), but inside the services type-safe objects are used.

@linyiqun
Copy link
Contributor

linyiqun commented Aug 26, 2020

The POC patch looks great, we can flexibly adjust the replica number/ storage type for the storage. This is very similar to the HDFS storage policy design, some high level review comments from me:

  1. I see most of change focus on the converting replica factor, type to new storage class. As I know Ozone GA version is not released, can we update all places to storage class way? This will not address backward compatible.

  2. Will we have the chance to switch storage class from its OpenStateConfiguration to its ClosedStateConfiguration? Will we support dynamically switch the storage class? I didn't see this switch logic in current patch.

  3. ClosedStateConfiguration is only applied in closed state container, right? If yes, we should have the automatic behavior (e.g. background thread to do this) to convert container storage to ClosedStateConfiguration setting from its OpenStateConfiguration state.

  4. It will be an improvement to support storage class on volume, bucket level in the future. Then key object can inherit storage class from them and user doesn't need to pass expected storage class every time.

@elek
Copy link
Member Author

elek commented Aug 27, 2020

Thanks the questions @linyiqun

I see most of change focus on the converting replica factor, type to new storage class. As I know Ozone GA version is not released, can we update all places to storage class way? This will not address backward compatible.

I am not sure if I understood the question well, but let me try to answer:

I think this change is backward compatible as RPC (client!) interface accept both requests: the old one (which contains factor/type and no storageClass) and the new one (which contains storageClass and no factor/type).

Will we have the chance to switch storage class from its OpenStateConfiguration to its ClosedStateConfiguration? Will we support dynamically switch the storage class? I didn't see this switch logic in current patch.

Dynamic switch is hard, because storageClass is a property of the containers. When you modify the storageClass of a bucket/key the related data should me moved out from one container to a new one.

Amazon doesn't enable the modification of storageClass and I suggested the same.

However, we can support to change the "definition" of the storage class. Currently, it's hard coded, but it's easy to be modified to a dynamic model. When you modify the closed container replication number of a storage class (let's say STANDARD close is THREE today but you would like to make it TWO) ReplicationManager can easily pick up the changes.

ClosedStateConfiguration is only applied in closed state container, right? If yes, we should have the automatic behavior (e.g. background thread to do this) to convert container storage to ClosedStateConfiguration setting from its OpenStateConfiguration state.

Today the transition between OPEN and CLOSED state is hard coded. Later we can make it more dynamic (for example disable the transition for some specific storageClass) but we are not yet there.

On the other hand: the configuration can be changed. Today STANDARD = Ratis/THREE -> Closed/THREE but it can be changed to be configurable (for example support Ratis/THREE -> Ratis/TWO)

I am not sure, if this was the question, let me know if not.

It will be an improvement to support storage class on volume, bucket level in the future. Then key object can inherit storage class from them and user doesn't need to pass expected storage class every time.

Definitely, this should be one of the next steps.

@linyiqun
Copy link
Contributor

Thanks @elek for the detailed comments.

However, we can support to change the "definition" of the storage class. Currently, it's hard coded, but it's easy to be modified to a dynamic model...

Dynamic model seems a good way, so this storage class will bee a user-defined storage class, right? And then, we should persist storage class info (factor/type) into db and admin user can update the storage class info via CLI command way.

But anyway, current POC overall looks good to me.

@elek
Copy link
Member Author

elek commented Aug 27, 2020

Dynamic model seems a good way, so this storage class will bee a user-defined storage class, right? And then, we should persist storage class info (factor/type) into db and admin user can update the storage class info via CLI command way.

As a first step I would be happy to make it configurable (instead of having the config build time) and admin can update the config and restart the cluster. But yes, later it can be more and more dynamic (like storing it and providing CLI)

This patch is mainly about the framework. If we have storage class as an abstraction level we can add more and more configuration options: for example the storage format on the datanode, erasure coding parameters. Or we can create an experimental, different write path without changing the existing code. Storage-class abstraction can help to separate the configuration from the user interface. User can choose from understandable storage class names and admin can (re)configure the details in the background.

@maobaolong
Copy link
Member

I did the following test, it proved that storageClass can run as expected.

➜  $ bin/ozone sh key put myvol/mybucket/a.txt LICENSE.txt
➜  $ bin/ozone sh key put -sc STANDARD myvol/mybucket/b.txt LICENSE.txt
➜  $ bin/ozone sh key put -sc REDUCED_REDUNDANCY myvol/mybucket/c.txt LICENSE.txt
➜  $ bin/ozone sh key put -sc LEGACY myvol/mybucket/d.txt LICENSE.txt    
➜  $ bin/ozone admin pipeline list                                     
Pipeline[ Id: f1f7d0fd-86c2-4ce2-b5d9-ec86e45c0feb, Nodes: 88ee8bfd-242c-4ca0-8b59-586054371b60{ip: 127.0.0.1, host: localhost, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:ONE, State:OPEN, leaderId:88ee8bfd-242c-4ca0-8b59-586054371b60, CreationTimestamp2020-09-03T09:37:41.947Z]
Pipeline[ Id: 3e28c490-ef8a-46e7-80bd-bb55b07acce4, Nodes: 88ee8bfd-242c-4ca0-8b59-586054371b60{ip: 127.0.0.1, host: localhost, networkLocation: /default-rack, certSerialId: null}, Type:STAND_ALONE, Factor:ONE, State:OPEN, leaderId:, CreationTimestamp2020-09-03T09:46:28.380Z]
Pipeline[ Id: f4959f88-939d-4ff3-b209-e6751ab0d328, Nodes: bbceeabd-31ea-46be-a319-8d4cdf6dfd6a{ip: 127.0.0.1, host: localhost, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:ONE, State:OPEN, leaderId:bbceeabd-31ea-46be-a319-8d4cdf6dfd6a, CreationTimestamp2020-09-03T09:37:27.890Z]
Pipeline[ Id: 69ab4f14-bd85-4e20-a833-398906c3af2b, Nodes: 03f94c8a-3b26-4675-b5e9-6d7a69991476{ip: 127.0.0.1, host: localhost, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:ONE, State:OPEN, leaderId:03f94c8a-3b26-4675-b5e9-6d7a69991476, CreationTimestamp2020-09-03T09:37:20.101Z]
Pipeline[ Id: c54ed915-c6ea-4783-84d6-95ce450a79bb, Nodes: 88ee8bfd-242c-4ca0-8b59-586054371b60{ip: 127.0.0.1, host: localhost, networkLocation: /default-rack, certSerialId: null}bbceeabd-31ea-46be-a319-8d4cdf6dfd6a{ip: 127.0.0.1, host: localhost, networkLocation: /default-rack, certSerialId: null}03f94c8a-3b26-4675-b5e9-6d7a69991476{ip: 127.0.0.1, host: localhost, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN, leaderId:88ee8bfd-242c-4ca0-8b59-586054371b60, CreationTimestamp2020-09-03T09:37:41.959Z]
➜  $ bin/ozone sh key info  myvol/mybucket/a.txt
{
  "volumeName" : "myvol",
  "bucketName" : "mybucket",
  "name" : "a.txt",
  "dataSize" : 18062,
  "creationTime" : "2020-09-03T09:44:43.594Z",
  "modificationTime" : "2020-09-03T09:44:47.892Z",
  "storageClass" : "STANDARD",
  "ozoneKeyLocations" : [ {
    "containerID" : 1,
    "localID" : 104800340118536192,
    "length" : 18062,
    "offset" : 0
  } ],
  "metadata" : { },
  "fileEncryptionInfo" : null
}
➜  $ bin/ozone sh key info  myvol/mybucket/b.txt
{
  "volumeName" : "myvol",
  "bucketName" : "mybucket",
  "name" : "b.txt",
  "dataSize" : 18062,
  "creationTime" : "2020-09-03T09:45:29.015Z",
  "modificationTime" : "2020-09-03T09:45:30.865Z",
  "storageClass" : "STANDARD",
  "ozoneKeyLocations" : [ {
    "containerID" : 2,
    "localID" : 104800343098327041,
    "length" : 18062,
    "offset" : 0
  } ],
  "metadata" : { },
  "fileEncryptionInfo" : null
}
➜  $ bin/ozone sh key info  myvol/mybucket/c.txt
{
  "volumeName" : "myvol",
  "bucketName" : "mybucket",
  "name" : "c.txt",
  "dataSize" : 18062,
  "creationTime" : "2020-09-03T09:46:04.634Z",
  "modificationTime" : "2020-09-03T09:46:06.875Z",
  "storageClass" : "REDUCED_REDUNDANCY",
  "ozoneKeyLocations" : [ {
    "containerID" : 3,
    "localID" : 104800345432588290,
    "length" : 18062,
    "offset" : 0
  } ],
  "metadata" : { },
  "fileEncryptionInfo" : null
}
➜  $ bin/ozone sh key info  myvol/mybucket/d.txt
{
  "volumeName" : "myvol",
  "bucketName" : "mybucket",
  "name" : "d.txt",
  "dataSize" : 18062,
  "creationTime" : "2020-09-03T09:46:28.389Z",
  "modificationTime" : "2020-09-03T09:46:29.145Z",
  "storageClass" : "LEGACY",
  "ozoneKeyLocations" : [ {
    "containerID" : 4,
    "localID" : 104800346989461507,
    "length" : 18062,
    "offset" : 0
  } ],
  "metadata" : { },
  "fileEncryptionInfo" : null
}
➜  $ ll /tmp/datanode*/storage/hdds/e81edfd6-5895-408e-94f1-84a39d6995e3/current/containerDir*/1/chunks/104800340118536192.block | wc -l
       3
➜  $ ll /tmp/datanode*/storage/hdds/e81edfd6-5895-408e-94f1-84a39d6995e3/current/containerDir*/2/chunks/104800343098327041.block | wc -l
       3
➜  $ ll /tmp/datanode*/storage/hdds/e81edfd6-5895-408e-94f1-84a39d6995e3/current/containerDir*/3/chunks/104800345432588290.block | wc -l
       1
➜  $ ll /tmp/datanode*/storage/hdds/e81edfd6-5895-408e-94f1-84a39d6995e3/current/containerDir*/4/chunks/104800346989461507.block | wc -l
       1

@elek
Copy link
Member Author

elek commented Sep 7, 2020

@arp7 @umamaheswararao do you have any more comments / questions about the concept and / or about the patch?

@elek
Copy link
Member Author

elek commented Nov 30, 2020

Closing it temporary. We need #1419 to be accepted. When it's accepted, this branch can be mosted in smaller chunks (proto changes + SCM changes + ...)

@elek elek closed this Nov 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants