Skip to content

Tiered storage#3138

Closed
HeimingZ wants to merge 25 commits intoapache:masterfrom
HeimingZ:tiered_storage
Closed

Tiered storage#3138
HeimingZ wants to merge 25 commits intoapache:masterfrom
HeimingZ:tiered_storage

Conversation

@HeimingZ
Copy link
Copy Markdown
Contributor

@HeimingZ HeimingZ commented May 7, 2021

Description

This PR adds tiered storage support for iotdb. Users can use both local and hdfs directories as data_dirs at the same time, and they can group some directories as a tier, so that one tsfile may migrate to next tier after it meets certain conditions.

hadoop

Completes iotdb's support for hdfs.

  • Implements Path, FileSystem and FileSystemProvider interfaces in java.nio.file.
  • The implement of HDFSPath class references the implement of code sun.nio.fs.UnixPath.
  • Adds move and copy methods in HDFSFile.

tsfile

Supports both local and hfs filesystem at the same time.

  • The type of tsfile_storage_fs is change from FSType to FSType[].
  • The FSFactoryProducer class's factory method now must accept a FsType param because iotdb can support several filesystems at the same time.
  • The FSPath class wraps filesystem and path value. The format is like LOCAL@data/data.
    • two parse methods: parse String or File to FSPath.
    • toFile, toPath method: get File or Path object of this fsPath.
    • preConcate, postConcat method: concat string at the beginning or end of this path.
  • The FsUtils class helps judging the corresponding FSType of File, Path, String object.

tier

tier

Maintains the relationship between storage tiers and data directories.

  • The type of tsfile_storage_fs is change from FSType to FSType[].
  • The type of data_dirs is change from String[] to FSPath[][].
  • The type of multi_dir_strategy is change from String to String[].
  • Adds default_tier_migration_strategy param, which is used to specify the default migration trigger strategy of each tier.
  • Adds TIER_MIGRATION_CHECK_INTERVAL in StorageEngine, which specify the time interval of checking tier migration.
  • The TierManager class maintains data_dirs, multi_dir_strategy and default_tier_migration_strategy, which all can be hot modified.
    • Methods to hot modify data_dirs, multi_dir_strategy and default_tier_migration_strategy.
    • getAllSequenceFileFolders, getNextFolderForSequenceFile... : wraps same methods of DirectoryManager, the methods are call in ascending order of tierLevel.
    • getTierLevel():get the tierLevel of a tsfile.
  • Provides two migration strategies: PinnedStrategy and Time2LiveStrategy
    • PinnedStrategy: tsfile is pinned in this tier and will not be migrated.
    • Time2LiveStrategy: accepts one param as ttl, in ms, tsfile will be migrated when timeIndex.stillLives(System.currentTimeMillis() - ttl) return false.
  • StorageEngine, VirtualStorageGroupManager and StorageGroupProcessor are responsible for checking when to migrate tsfiles.

migration

migration

Responsible for migrating tsfile.

  • Adds migration_thread_num param, which is used to specify how many threads will be set up to perform file migration.
  • MigrationTask: migrates some tsfiles to a target directory.
  • MigrationRecoverTask: recover migration task from log.
  • MigrationCallBack: is called when each tsfile is migrating, offen used to ensure concurrency.
    • BiConsumer<File, File> opsToBothFiles param: the operation needs to do to the source file and the target file when getting the lock.
  • MigrationLogger, MigrationLogAnalyzer: write and analyze migration log.

sync

Adapts to tiered storage, aiming to removing the redudant remove&add operations. (If file A is migrated from /tier1 to /tier2, it will bring a remove op in /tier1 and an add op in /tier2, which are redundant)

  • Adds deleted_files_blacklist.txt, files in this blacklist don't need remove operation.
  • Adds to_be_synced_files_blacklist.txt, files in this blacklist don't need add operation.
  • Two files above are analyzed in SyncFileManager.getLastLocalFiles method and SyncSenderLogAnalyzer.loadLastLocalFiles method.

integraion&unit Test

  • use FSType.LOCAL as default filesystem

This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious
    for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold
    for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR
  • org.apache.iotdb.hadoop.fileSystem
    • HDFSPath
    • HDFSFileSystem
    • HDFSFileSystemProvider
  • org.apache.iotdb.tsfile.fileSystem
    • FSPath
    • FSFactoryProducer
  • org.apache.iotdb.tsfile.utils.FSUtils
  • org.apache.iotdb.db.engine.tier.TierManager
  • org.apache.iotdb.db.engine.tier.migration.IMigrationStrategy
  • org.apache.iotdb.db.engine.tier.migration
    • mange
    • task
    • utils
  • org.apache.iotdb.db.sync.sender.utils.FilesBlacklistWriter

HeimingZ and others added 25 commits May 4, 2021 19:31
…artially (apache#3128)

* [IOTDB-1355] Support updating aligned timeseries values when insert partially
Co-authored-by: haiyi.zb <haiyi.zb@alibaba-inc.com>
* [IOTDB-1153] Last plan not work in cluster mode
@HeimingZ HeimingZ closed this Aug 23, 2022
@HeimingZ HeimingZ deleted the tiered_storage branch May 8, 2023 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants