Support incremental backup #1248

v0y4g3r · 2023-03-27T03:11:10Z

What problem does the new feature solve?

Given that all tables in GreptimeDB contains a timestamp column, we can allow users to backup data in some database within a specified time range into some directory, in an one file for one table manner.

What does the feature do?

Implement some SQL syntax like:

COPY DATABASE <DATABASE_NAME> [FROM <START_TIME>] [UNTIL <END_TIME>] TO <TARGET_DIR> [WITH (<COPY OPTIONS>)]

which export rows within given timerange of all tables in that database to target directory. All exported rows of one table will reside in the same parquet file.

Or maybe we can skip SQL and use HTTP Admin API first for prototype.

Implementation challenges

Upgrade opendal so that Writer support can simplify stream parquet writer implementation feat: upgrade opendal #1245

Change ParquetWriter to a stream writer that does not dump all parquet content in memory and write to underlying storage at a time, which may cause huge memory consumption in this case feat: buffered parquet writer #1263

greptimedb/src/storage/src/sst/parquet.rs

Lines 107 to 109 in 8e0fdb0

    
           let mut buf = vec![]; 
        
           let mut arrow_writer = ArrowWriter::try_new(&mut buf, schema.clone(), Some(writer_props)) 
        
               .context(WriteParquetSnafu)?;

Implement the syntax parser for COPY DATABASE or HTTP API handler.
Implement COPY DATABASE by iterating all tables in some database and copy the content of that table into a parquet file in target directory. Maybe we don't need to compress the target directory in database.

The text was updated successfully, but these errors were encountered:

evenyag · 2023-03-27T03:34:27Z

RocksDB's backup support: How to backup RocksDB

But COPY DATABASE is different from BACKUP DATABASE as COPY is much simpler. We might also need to write some metadata to the target directory (to store the start/end time).

v0y4g3r · 2023-03-27T03:41:15Z

RocksDB's backup support: How to backup RocksDB

But COPY DATABASE is different from BACKUP DATABASE as COPY is much simpler.

Backup also involves backing up manifest files etc.

We might also need to write some metadata to the target directory (to store the start/end time).

Necessary metadata I come up with:

catalog/schema/table name
data time range
backup time

sunng87 · 2023-03-28T14:36:13Z

Can we use parquet metadata https://parquet.apache.org/docs/file-format/metadata/ for our metadata? Using less files reduces chance of corrupted data.

v0y4g3r · 2023-03-29T02:45:47Z

Can we use parquet metadata https://parquet.apache.org/docs/file-format/metadata/ for our metadata? Using less files reduces chance of corrupted data.

If "our metdata" refers to catalog/schema/table name, data time range and backup time, yes, we are going to write these to parquet footer's metadata section, juts like arrow does. We don't have a separate metadata file now.

killme2008 · 2023-06-20T10:28:00Z

Closed via #1240

v0y4g3r added the C-feature Category Features label Mar 27, 2023

v0y4g3r mentioned this issue Mar 27, 2023

feat: upgrade opendal #1245

Merged

2 tasks

v0y4g3r self-assigned this Mar 27, 2023

fengjiachun added this to the v0.3 milestone Apr 12, 2023

killme2008 closed this as completed Jun 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support incremental backup #1248

Support incremental backup #1248

v0y4g3r commented Mar 27, 2023 •

edited

evenyag commented Mar 27, 2023 •

edited

v0y4g3r commented Mar 27, 2023 •

edited

sunng87 commented Mar 28, 2023

v0y4g3r commented Mar 29, 2023

killme2008 commented Jun 20, 2023

Support incremental backup #1248

Support incremental backup #1248

Comments

v0y4g3r commented Mar 27, 2023 • edited

What problem does the new feature solve?

What does the feature do?

Implementation challenges

evenyag commented Mar 27, 2023 • edited

v0y4g3r commented Mar 27, 2023 • edited

sunng87 commented Mar 28, 2023

v0y4g3r commented Mar 29, 2023

killme2008 commented Jun 20, 2023

v0y4g3r commented Mar 27, 2023 •

edited

evenyag commented Mar 27, 2023 •

edited

v0y4g3r commented Mar 27, 2023 •

edited