Skip to content

Latest commit

 

History

History
22 lines (14 loc) · 1.14 KB

hdfs_block_scanner.md

File metadata and controls

22 lines (14 loc) · 1.14 KB

How does HDFS detect and handle corrupted blocks?

Role of Block Scanner

Block Scanner is basically used to identify corrupt datanode Block. During a write operation, when a datanode writes in to the HDFS, it verifies a checksum for that data. This checksum helps in verifying the data corruptions during the data transmission.

Handling Corrupted Blocks

Block scanner runs periodically on every DataNode to verify whether the data blocks stored are correct or not. The following steps will occur when a corrupted data block is detected by the block scanner:

  1. DataNode will report to the NameNode about the corrupted block.
  2. NameNode will start the process of creating a new replica using the correct replica of the corrupted block present in other DataNodes.
  3. The corrupted data block will not be deleted until the replication count of the correct replicas matches with the replication factor (3 by default).

This whole process allows HDFS to maintain the integrity of the data when a client performs a read operation.

Reference