Skip to content
This repository has been archived by the owner on Jun 7, 2022. It is now read-only.

Commit

Permalink
Merge pull request #100 from darrenldl/dev
Browse files Browse the repository at this point in the history
Updated BLKAR_SPECS, code refactoring
  • Loading branch information
darrenldl committed Dec 16, 2018
2 parents 92812ed + c2f5d77 commit f1e35ea
Show file tree
Hide file tree
Showing 2 changed files with 84 additions and 85 deletions.
44 changes: 34 additions & 10 deletions BLKAR_SPECS.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ Metadata block is valid if
- else mark the sequence number as missing

- a ref block is required to provide guidance on version and uid accepted
2. Go through level 0 to 1000(inclusive), calculate supposed sequence number at each block position, record number of mismatches for each level
2. Go through level 0 to 1000 (inclusive), calculate supposed sequence number at each block position, record number of mismatches for each level
- if sequence number was marked missing, then it is ignored and checked for mismatch
3. return the level with least amount of mismatches

Expand All @@ -100,6 +100,7 @@ Metadata block is valid if
- The written metadata block is valid, but does not contain the actual file hash, a filler pattern of 0x00 is used in place of the hash part of the multihash (the header and length indicator of multihash are still valid)
3. Load version specific data sized chunk one at a time from input file to encode and output (and if metadata is enabled, Multihash hash state/ctx is updated as well - the actual hash state/ctx used depends on hash type, defaults to SHA256)
- data size = block size - header size (e.g. version 1 has data size of 512 - 16 = 496)
- if the seq num exceeds the maximum, the encoding procedure is terminated
4. If metadata is enabled, the encoder seeks back to starting position of output file and overwrites the metadata block with one that contains the actual hash

## Decode workflow
Expand All @@ -114,22 +115,21 @@ Data block is valid if and only if
- Basic block validity criteria are satisfied (see **Block handling in general** above)
- Version and uid matches reference block (see below)

1. A reference block is retrieved first and is used for guidance on alignment, version, and uid (see **Finding reference block** procedure specified above)
### If output to file

1. A reference block is retrieved first and is used for guidance on alignment, version, and uid (see **Finding reference block** procedure specified above)#
2. Scan for valid blocks from start of SBX container to decode and output using reference block's block size as alignment
- if a block is invalid, nothing is done
- if a block is valid, and is a metadata block, nothing is done
- if a block is valid, and is a data parity block, nothing is done
- if a block is valid, and is a data block, then
- if output is file, then it will be written to the writepos at output file, where writepos = (sequence number - 1) * block size of reference block in bytes

- else if output is stdout, it will be written to stdout directly
- if a block is valid, and is a data block, then it will be written to the writepos at output file, where writepos = (sequence number - 1) * block size of reference block in bytes
3. If possible, truncate output file to remove data padding done for the last block during encoding
- if reference block is a metadata block, and contains file size field, and output is a file, then the output file will be truncated to that file size
- otherwise nothing is done
4. If possible, report/record if the hash of decoded file matches the recorded hash during encoding
- if reference block is a metadata block, and contains the hash field, and output is a file, then the output file will be hashed to check against the recorded hash
- output file will not be deleted even if hash does not match
- otherwise nothing is done
- If possible, report/record if the hash of decoded file matches the recorded hash during encoding
- if reference block is a metadata block, and contains the hash field, and output is a file, then the output file will be hashed to check against the recorded hash
- output file will not be deleted even if hash does not match
- otherwise nothing is done

#### Handling of duplicate metadata/data blocks

Expand All @@ -141,6 +141,30 @@ Data block is valid if and only if
- Corrupted blocks or missing blocks are not repaired in this mode
- User needs to invoke repair mode to repair the archive

### If output to stdout

1. A reference block is retrieved first and is used for guidance on alignment, version, and uid (see **Finding reference block** procedure specified above)

2. Scan for valid blocks from the SBX container in the anticipated pattern to decode and output using reference block's block size as alignment

- The anticipated pattern is same as the guessed encoding pattern, which depends on the SBX version, data parity parameters, guessed burst error resistance level

- If a block is valid, and contains the anticipated seq num, then

- if the block is a metadata block, then nothing is done

- if the block is a data parity block, then nothing is done

- if the block is a data block, then

- if blkar can determine the block is the last block, the data chunk of the block is truncated then outputted to stdout

- this is only possible when metadata block is used as reference block, and also contains the original file size

- else the data chunk of the block is outputted to stdout

- else a blank chunk of the same size as a normal data chunk is outputted to stdout

## Rescue workflow

1. Scan for valid blocks from start of the provided file using 128 bytes alignment
Expand Down
125 changes: 50 additions & 75 deletions src/decode_core.rs
Original file line number Diff line number Diff line change
Expand Up @@ -682,42 +682,26 @@ pub fn decode(param : &Param,
}
} else {
match block.sync_from_buffer(&buffer, Some(&pred)) {
Ok(_) => {
if block.get_seq_num() != seq_num {
if sbx_block::seq_num_is_meta(seq_num) {
stats.lock().unwrap().incre_meta_blocks_failed();
} else if sbx_block::seq_num_is_parity(seq_num, data, parity) {
stats.lock().unwrap().incre_parity_blocks_failed();
} else {
stats.lock().unwrap().incre_data_blocks_failed();

write_blank_chunk(is_last_data_block(&stats, total_data_chunk_count),
Ok(_) if block.get_seq_num() == seq_num => {
if block.is_meta() { // do nothing if block is meta
stats.lock().unwrap().meta_blocks_decoded += 1;
} else if block.is_parity(data, parity) {
stats.lock().unwrap().parity_blocks_decoded += 1;
} else {
stats.lock().unwrap().data_blocks_decoded += 1;

// write data chunk
write_data_only_block(data_par_shards,
is_last_data_block(&stats, total_data_chunk_count),
data_size_of_last_data_block,
&ref_block,
&block,
&mut writer,
&mut hash_ctx)?;
}
} else {
if block.is_meta() { // do nothing if block is meta
stats.lock().unwrap().meta_blocks_decoded += 1;
} else if block.is_parity(data, parity) {
stats.lock().unwrap().parity_blocks_decoded += 1;
} else {
stats.lock().unwrap().data_blocks_decoded += 1;

// write data chunk
write_data_only_block(data_par_shards,
is_last_data_block(&stats, total_data_chunk_count),
data_size_of_last_data_block,
&ref_block,
&block,
&mut writer,
&mut hash_ctx,
&buffer)?;
}
&mut hash_ctx,
&buffer)?;
}
},
Err(_) => {
_ => {
if sbx_block::seq_num_is_meta(seq_num) {
stats.lock().unwrap().incre_meta_blocks_failed();
} else if sbx_block::seq_num_is_parity(seq_num, data, parity) {
Expand Down Expand Up @@ -751,56 +735,47 @@ pub fn decode(param : &Param,

break_if_eof_seen!(read_res);

match block.sync_from_buffer(&buffer, Some(&pred)) {
Ok(_) => {
// fix seq num for the case of no metadata block
if block.get_seq_num() == 1 && seq_num == 0 {
seq_num = 1;
}
let block_okay =
match block.sync_from_buffer(&buffer, Some(&pred)) {
Ok(_) => {
// fix seq num for the case of no metadata block
if block.get_seq_num() == 1 && seq_num == 0 {
seq_num = 1;
}

if block.get_seq_num() != seq_num {
if sbx_block::seq_num_is_meta(seq_num) {
stats.lock().unwrap().incre_meta_blocks_failed();
} else {
stats.lock().unwrap().incre_data_blocks_failed();
block.get_seq_num() == seq_num
},
Err(_) => false,
};

write_blank_chunk(is_last_data_block(&stats, total_data_chunk_count),
data_size_of_last_data_block,
&ref_block,
&mut writer,
&mut hash_ctx)?;
}
} else {
if block.is_meta() { // do nothing if block is meta
stats.lock().unwrap().meta_blocks_decoded += 1;
} else {
stats.lock().unwrap().data_blocks_decoded += 1;
if block_okay {
if block.is_meta() { // do nothing if block is meta
stats.lock().unwrap().meta_blocks_decoded += 1;
} else {
stats.lock().unwrap().data_blocks_decoded += 1;

// write data block
write_data_only_block(None,
is_last_data_block(&stats, total_data_chunk_count),
data_size_of_last_data_block,
&ref_block,
&block,
&mut writer,
&mut hash_ctx,
&buffer)?;
}
}
},
Err(_) => {
if sbx_block::seq_num_is_meta(seq_num) {
stats.lock().unwrap().incre_meta_blocks_failed();
} else {
stats.lock().unwrap().incre_data_blocks_failed();

write_blank_chunk(is_last_data_block(&stats, total_data_chunk_count),
// write data block
write_data_only_block(None,
is_last_data_block(&stats, total_data_chunk_count),
data_size_of_last_data_block,
&ref_block,
&block,
&mut writer,
&mut hash_ctx)?;
}
},
&mut hash_ctx,
&buffer)?;
}
} else {
if sbx_block::seq_num_is_meta(seq_num) {
stats.lock().unwrap().incre_meta_blocks_failed();
} else {
stats.lock().unwrap().incre_data_blocks_failed();

write_blank_chunk(is_last_data_block(&stats, total_data_chunk_count),
data_size_of_last_data_block,
&ref_block,
&mut writer,
&mut hash_ctx)?;
}
}

if is_last_data_block(&stats, total_data_chunk_count) { break; }
Expand Down

0 comments on commit f1e35ea

Please sign in to comment.