Skip to content

What is the meaning of delete_rows_count and delete_data_count_file at manifest #2445

@wg1026688210

Description

@wg1026688210

I am confused about delete_rows_count and delete_data_count_file . It seem not associated with table format v2 when I write a unit to test my guess .

 @Test
  public void test() throws IOException {
    PartitionSpec spec = PartitionSpec.builderFor(SCHEMA)
            .identity("c1")
            .truncate("c2", 2)
            .build();
    Table table = TABLES.create(SCHEMA, spec, ImmutableMap.of(), tableLocation);
    upgradeToFormatV2(table);
    // Commit the txn to delete few rows.
    Schema deleteRowSchema = table.schema().select("c1", "c2", "c3");
    Record dataDelete = GenericRecord.create(deleteRowSchema);
    List<Record> deletions = Lists.newArrayList(
            dataDelete.copy("c1", 1, "c2", "AAAAAAAAAA", "c3", "CCCC")
    );
    DeleteFile eqDeletes1 = FileHelpers.writeDeleteFile(table, newOutputFile(),
            TestHelpers.Row.of(1, "AA"), deletions.subList(0, 1), deleteRowSchema);
    table.newRowDelta()
            .addDeletes(eqDeletes1)
            .commit();
    final List<ManifestFile> manifestFiles = Lists.newArrayList(table.currentSnapshot().deleteManifests());
    Assert.assertEquals("delete manifest should be 1",manifestFiles.size(),1);

    final ManifestFile deleteManifests = manifestFiles.get(0);
    final int deleteFilesCount = deleteManifests.deletedFilesCount();
    Assert.assertEquals("deleteFilesCount should be 1",deleteFilesCount,1);
    final long aLong = deleteManifests.deletedRowsCount();
    Assert.assertEquals("deletedRowsCount should be 1",deleteFilesCount,1);
  }

I found it work when merge and remove snapshot , and it is added together when ManifestEntry status id deleted at ManifestWriter .
What 's the real semantics of delete_rows_count and delete_data_count_file in design .
And whether the name is confusing with equality delete and position delete of table v2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions