Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-15025. Applying NVDIMM storage media to HDFS #2189

Merged
merged 12 commits into from
Sep 24, 2020
Original file line number Diff line number Diff line change
Expand Up @@ -34,28 +34,35 @@
@InterfaceStability.Unstable
public enum StorageType {
// sorted by the speed of the storage types, from fast to slow
RAM_DISK(true),
SSD(false),
DISK(false),
ARCHIVE(false),
PROVIDED(false);
RAM_DISK(true, true),
NVDIMM(false, true),
huangtianhua marked this conversation as resolved.
Show resolved Hide resolved
SSD(false, false),
DISK(false, false),
ARCHIVE(false, false),
PROVIDED(false, false);

private final boolean isTransient;
private final boolean isRAM;

public static final StorageType DEFAULT = DISK;

public static final StorageType[] EMPTY_ARRAY = {};

private static final StorageType[] VALUES = values();

StorageType(boolean isTransient) {
StorageType(boolean isTransient, boolean isRAM) {
this.isTransient = isTransient;
this.isRAM = isRAM;
}

public boolean isTransient() {
return isTransient;
}

public boolean isRAM() {
return isRAM;
}
brahmareddybattula marked this conversation as resolved.
Show resolved Hide resolved

public boolean supportTypeQuota() {
return !isTransient;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ public static void registerCommands(CommandFactory factory) {
"Otherwise, it displays the quota and usage for all the storage \n" +
"types that support quota. The list of possible storage " +
"types(case insensitive):\n" +
"ram_disk, ssd, disk and archive.\n" +
"ram_disk, ssd, disk, archive and nvdimm.\n" +
"It can also pass the value '', 'all' or 'ALL' to specify all " +
"the storage types.\n" +
"The -" + OPTION_QUOTA_AND_USAGE + " option shows the quota and \n" +
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,7 @@ public void processPathWithQuotasByStorageTypesHeader() throws Exception {
count.processOptions(options);
String withStorageTypeHeader =
// <----13---> <-------17------> <----13-----> <------17------->
" NVDIMM_QUOTA REM_NVDIMM_QUOTA " +
" SSD_QUOTA REM_SSD_QUOTA DISK_QUOTA REM_DISK_QUOTA " +
// <----13---> <-------17------>
"ARCHIVE_QUOTA REM_ARCHIVE_QUOTA PROVIDED_QUOTA REM_PROVIDED_QUOTA " +
Expand Down Expand Up @@ -337,6 +338,7 @@ public void processPathWithQuotasByQTVH() throws Exception {
count.processOptions(options);
String withStorageTypeHeader =
// <----13---> <-------17------>
" NVDIMM_QUOTA REM_NVDIMM_QUOTA " +
" SSD_QUOTA REM_SSD_QUOTA " +
" DISK_QUOTA REM_DISK_QUOTA " +
"ARCHIVE_QUOTA REM_ARCHIVE_QUOTA " +
Expand Down Expand Up @@ -495,7 +497,7 @@ public void getDescription() {
+ "Otherwise, it displays the quota and usage for all the storage \n"
+ "types that support quota. The list of possible storage "
+ "types(case insensitive):\n"
+ "ram_disk, ssd, disk and archive.\n"
+ "ram_disk, ssd, disk, archive and nvdimm.\n"
+ "It can also pass the value '', 'all' or 'ALL' to specify all the "
+ "storage types.\n"
+ "The -u option shows the quota and \n"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ public final class HdfsConstants {

public static final byte MEMORY_STORAGE_POLICY_ID = 15;
public static final String MEMORY_STORAGE_POLICY_NAME = "LAZY_PERSIST";
public static final byte ALLNVDIMM_STORAGE_POLICY_ID = 14;
public static final String ALLNVDIMM_STORAGE_POLICY_NAME = "ALL_NVDIMM";
public static final byte ALLSSD_STORAGE_POLICY_ID = 12;
public static final String ALLSSD_STORAGE_POLICY_NAME = "ALL_SSD";
public static final byte ONESSD_STORAGE_POLICY_ID = 10;
Expand Down Expand Up @@ -65,6 +67,7 @@ public enum StoragePolicy{
HOT(HOT_STORAGE_POLICY_ID),
ONE_SSD(ONESSD_STORAGE_POLICY_ID),
ALL_SSD(ALLSSD_STORAGE_POLICY_ID),
ALL_NVDIMM(ALLNVDIMM_STORAGE_POLICY_ID),
LAZY_PERSIST(MEMORY_STORAGE_POLICY_ID);

private byte value;
Expand All @@ -86,6 +89,8 @@ public static StoragePolicy valueOf(int value) {
return ONE_SSD;
case 12:
return ALL_SSD;
case 14:
return ALL_NVDIMM;
case 15:
return LAZY_PERSIST;
default:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -475,6 +475,8 @@ public static StorageTypeProto convertStorageType(StorageType type) {
return StorageTypeProto.RAM_DISK;
case PROVIDED:
return StorageTypeProto.PROVIDED;
case NVDIMM:
return StorageTypeProto.NVDIMM;
default:
throw new IllegalStateException(
"BUG: StorageType not found, type=" + type);
Expand All @@ -493,6 +495,8 @@ public static StorageType convertStorageType(StorageTypeProto type) {
return StorageType.RAM_DISK;
case PROVIDED:
return StorageType.PROVIDED;
case NVDIMM:
return StorageType.NVDIMM;
default:
throw new IllegalStateException(
"BUG: StorageTypeProto not found, type=" + type);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,7 @@ enum StorageTypeProto {
ARCHIVE = 3;
RAM_DISK = 4;
PROVIDED = 5;
NVDIMM = 6;
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,12 @@ public static BlockStoragePolicySuite createDefaultSuite(
new StorageType[]{StorageType.DISK},
new StorageType[]{StorageType.DISK},
true); // Cannot be changed on regular files, but inherited.
final byte allnvdimmId = HdfsConstants.StoragePolicy.ALL_NVDIMM.value();
policies[allnvdimmId] = new BlockStoragePolicy(allnvdimmId,
HdfsConstants.StoragePolicy.ALL_NVDIMM.name(),
new StorageType[]{StorageType.NVDIMM},
new StorageType[]{StorageType.DISK},
new StorageType[]{StorageType.DISK});
final byte allssdId = HdfsConstants.StoragePolicy.ALL_SSD.value();
policies[allssdId] = new BlockStoragePolicy(allssdId,
HdfsConstants.StoragePolicy.ALL_SSD.name(),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,9 @@ public interface FsVolumeSpi
/** Returns true if the volume is NOT backed by persistent storage. */
boolean isTransientStorage();
brahmareddybattula marked this conversation as resolved.
Show resolved Hide resolved

/** Returns true if the volume is backed by RAM storage. */
boolean isRAMStorage();

/**
* Reserve disk space for a block (RBW or Re-replicating)
* so a writer does not run out of space before the block is full.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2299,9 +2299,9 @@ private void cacheBlock(String bpid, long blockId) {
": volume was not an instance of FsVolumeImpl.");
return;
}
if (volume.isTransientStorage()) {
LOG.warn("Caching not supported on block with id " + blockId +
" since the volume is backed by RAM.");
if (volume.isRAMStorage()) {
LOG.warn("Caching not supported on block with id {} since the " +
"volume is backed by {} which is RAM.", blockId, volume);
return;
}
success = true;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ public class FsVolumeImpl implements FsVolumeSpi {
}

protected ThreadPoolExecutor initializeCacheExecutor(File parent) {
if (storageType.isTransient()) {
if (storageType.isRAM()) {
return null;
}
if (dataset.datanode == null) {
Expand Down Expand Up @@ -533,6 +533,11 @@ public boolean isTransientStorage() {
return storageType.isTransient();
}

@Override
public boolean isRAMStorage() {
return storageType.isRAM();
}

@VisibleForTesting
public File getFinalizedDir(String bpid) throws IOException {
return getBlockPoolSlice(bpid).getFinalizedDir();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,8 @@ private static class ClearSpaceQuotaCommand extends DFSAdminCommand {
"\t\t- DISK\n" +
"\t\t- SSD\n" +
"\t\t- ARCHIVE\n" +
"\t\t- PROVIDED";
"\t\t- PROVIDED\n" +
"\t\t- NVDIMM";


private StorageType type;
Expand Down Expand Up @@ -303,7 +304,8 @@ private static class SetSpaceQuotaCommand extends DFSAdminCommand {
"\t\t- DISK\n" +
"\t\t- SSD\n" +
"\t\t- ARCHIVE\n" +
"\t\t- PROVIDED";
"\t\t- PROVIDED\n" +
"\t\t- NVDIMM";

private long quota; // the quota to be set
private StorageType type;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -377,7 +377,7 @@
<value>0</value>
<description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
Specific storage type based reservation is also supported. The property can be followed with
corresponding storage types ([ssd]/[disk]/[archive]/[ram_disk]) for cluster with heterogeneous storage.
corresponding storage types ([ssd]/[disk]/[archive]/[ram_disk]/[nvdimm]) for cluster with heterogeneous storage.
For example, reserved space for RAM_DISK storage can be configured using property
'dfs.datanode.du.reserved.ram_disk'. If specific storage type reservation is not configured
then dfs.datanode.du.reserved will be used. Support multiple size unit suffix(case insensitive),
Expand All @@ -395,7 +395,7 @@
when this takes effect. The actual number of bytes reserved will be calculated by using the
total capacity of the data directory in question. Specific storage type based reservation
is also supported. The property can be followed with corresponding storage types
([ssd]/[disk]/[archive]/[ram_disk]) for cluster with heterogeneous storage.
([ssd]/[disk]/[archive]/[ram_disk]/[nvdimm]) for cluster with heterogeneous storage.
For example, reserved percentage space for RAM_DISK storage can be configured using property
'dfs.datanode.du.reserved.pct.ram_disk'. If specific storage type reservation is not configured
then dfs.datanode.du.reserved.pct will be used.
Expand Down Expand Up @@ -604,7 +604,7 @@
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices. The directories should be tagged
with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]/[NVDIMM]) for HDFS
storage policies. The default storage type will be DISK if the directory does
not have a storage type tagged explicitly. Directories that do not exist will
be created if local filesystem permission allows.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,17 @@ The frameworks provided by Heterogeneous Storage and Archival Storage generalize
Storage Types and Storage Policies
----------------------------------

### Storage Types: ARCHIVE, DISK, SSD and RAM\_DISK
### Storage Types: ARCHIVE, DISK, SSD, RAM\_DISK and NVDIMM

The first phase of [Heterogeneous Storage (HDFS-2832)](https://issues.apache.org/jira/browse/HDFS-2832) changed datanode storage model from a single storage, which may correspond to multiple physical storage medias, to a collection of storages with each storage corresponding to a physical storage media. It also added the notion of storage types, DISK and SSD, where DISK is the default storage type.

A new storage type *ARCHIVE*, which has high storage density (petabyte of storage) but little compute power, is added for supporting archival storage.

Another new storage type *RAM\_DISK* is added for supporting writing single replica files in memory.

### Storage Policies: Hot, Warm, Cold, All\_SSD, One\_SSD, Lazy\_Persist and Provided
From Hadoop 3.4, a new storage type *NVDIMM* is added for supporting writing replica files in non-volatile memory that has the capability to hold saved data even if the power is turned off.

### Storage Policies: Hot, Warm, Cold, All\_SSD, One\_SSD, Lazy\_Persist, Provided and All\_NVDIMM

A new concept of storage policies is introduced in order to allow files to be stored in different storage types according to the storage policy.

Expand All @@ -48,6 +50,7 @@ We have the following storage policies:
* **One\_SSD** - for storing one of the replicas in SSD. The remaining replicas are stored in DISK.
* **Lazy\_Persist** - for writing blocks with single replica in memory. The replica is first written in RAM\_DISK and then it is lazily persisted in DISK.
* **Provided** - for storing data outside HDFS. See also [HDFS Provided Storage](./HdfsProvidedStorage.html).
* **All\_NVDIMM** - for storing all replicas in NVDIMM.

More formally, a storage policy consists of the following fields:

Expand All @@ -64,6 +67,7 @@ The following is a typical storage policy table.
| **Policy** **ID** | **Policy** **Name** | **Block Placement** **(n  replicas)** | **Fallback storages** **for creation** | **Fallback storages** **for replication** |
|:---- |:---- |:---- |:---- |:---- |
| 15 | Lazy\_Persist | RAM\_DISK: 1, DISK: *n*-1 | DISK | DISK |
| 14 | All\_NVDIMM | NVDIMM: *n* | DISK | DISK |
| 12 | All\_SSD | SSD: *n* | DISK | DISK |
| 10 | One\_SSD | SSD: 1, DISK: *n*-1 | SSD, DISK | SSD, DISK |
| 7 | Hot (default) | DISK: *n* | \<none\> | ARCHIVE |
Expand All @@ -73,7 +77,7 @@ The following is a typical storage policy table.

Note 1: The Lazy\_Persist policy is useful only for single replica blocks. For blocks with more than one replicas, all the replicas will be written to DISK since writing only one of the replicas to RAM\_DISK does not improve the overall performance.

Note 2: For the erasure coded files with striping layout, the suitable storage policies are All\_SSD, Hot, Cold. So, if user sets the policy for striped EC files other than the mentioned policies, it will not follow that policy while creating or moving block.
Note 2: For the erasure coded files with striping layout, the suitable storage policies are All\_SSD, Hot, Cold and All\_NVDIMM. So, if user sets the policy for striped EC files other than the mentioned policies, it will not follow that policy while creating or moving block.

### Storage Policy Resolution

Expand All @@ -88,13 +92,14 @@ The effective storage policy can be retrieved by the "[`storagepolicies -getStor
### Configuration

* **dfs.storage.policy.enabled** - for enabling/disabling the storage policy feature. The default value is `true`.
* **dfs.storage.default.policy** - Set the default storage policy with the policy name. The default value is `HOT`. All possible policies are defined in enum StoragePolicy, including `LAZY_PERSIST` `ALL_SSD` `ONE_SSD` `HOT` `WARM` `COLD` and `PROVIDED`.
* **dfs.storage.default.policy** - Set the default storage policy with the policy name. The default value is `HOT`. All possible policies are defined in enum StoragePolicy, including `LAZY_PERSIST` `ALL_SSD` `ONE_SSD` `HOT` `WARM` `COLD` `PROVIDED` and `ALL_NVDIMM`.
* **dfs.datanode.data.dir** - on each data node, the comma-separated storage locations should be tagged with their storage types. This allows storage policies to place the blocks on different storage types according to policy. For example:

1. A datanode storage location /grid/dn/disk0 on DISK should be configured with `[DISK]file:///grid/dn/disk0`
2. A datanode storage location /grid/dn/ssd0 on SSD can should configured with `[SSD]file:///grid/dn/ssd0`
3. A datanode storage location /grid/dn/archive0 on ARCHIVE should be configured with `[ARCHIVE]file:///grid/dn/archive0`
4. A datanode storage location /grid/dn/ram0 on RAM_DISK should be configured with `[RAM_DISK]file:///grid/dn/ram0`
5. A datanode storage location /grid/dn/nvdimm0 on NVDIMM should be configured with `[NVDIMM]file:///grid/dn/nvdimm0`

The default storage type of a datanode storage location will be DISK if it does not have a storage type tagged explicitly.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Quotas are persistent with the fsimage. When starting, if the fsimage is immedia
Storage Type Quotas
------------------

The storage type quota is a hard limit on the usage of specific storage type (SSD, DISK, ARCHIVE) by files in the tree rooted at the directory. It works similar to storage space quota in many aspects but offers fine-grain control over the cluster storage space usage. To set storage type quota on a directory, storage policies must be configured on the directory in order to allow files to be stored in different storage types according to the storage policy. See the [HDFS Storage Policy Documentation](./ArchivalStorage.html) for more information.
The storage type quota is a hard limit on the usage of specific storage type (SSD, DISK, ARCHIVE, NVDIMM) by files in the tree rooted at the directory. It works similar to storage space quota in many aspects but offers fine-grain control over the cluster storage space usage. To set storage type quota on a directory, storage policies must be configured on the directory in order to allow files to be stored in different storage types according to the storage policy. See the [HDFS Storage Policy Documentation](./ArchivalStorage.html) for more information.

The storage type quota can be combined with the space quotas and name quotas to efficiently manage the cluster storage usage. For example,

Expand Down Expand Up @@ -96,15 +96,15 @@ Quotas are managed by a set of commands available only to the administrator.
integer, the directory does not exist or it is a file, or the
directory would immediately exceed the new quota. The storage type
specific quota is set when -storageType option is specified. Available
storageTypes are DISK,SSD,ARCHIVE,PROVIDED.
storageTypes are DISK,SSD,ARCHIVE,PROVIDED,NVDIMM.

* `hdfs dfsadmin -clrSpaceQuota -storageType <storagetype> <directory>...<directory>`

Remove storage type quota specified for each directory. Best effort
for each directory, with faults reported if the directory does not exist or
it is a file. It is not a fault if the directory has no storage type quota on
for storage type specified. The storage type specific quota is cleared when -storageType
option is specified. Available storageTypes are DISK,SSD,ARCHIVE,PROVIDED.
option is specified. Available storageTypes are DISK,SSD,ARCHIVE,PROVIDED,NVDIMM.

Reporting Command
-----------------
Expand Down
8 changes: 8 additions & 0 deletions hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/WebHDFS.md
Original file line number Diff line number Diff line change
Expand Up @@ -1163,6 +1163,14 @@ Storage Policy Operations
"replicationFallbacks": ["DISK"],
"storageTypes": ["SSD"]
},
{
"copyOnCreateFile": false,
"creationFallbacks": ["DISK"],
"id": 14,
"name": "ALL_NVDIMM",
"replicationFallbacks": ["DISK"],
"storageTypes": ["NVDIMM"]
},
{
"copyOnCreateFile": true,
"creationFallbacks": ["DISK"],
Expand Down
Loading