Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pass around metadata #765

Closed
ThomasWaldmann opened this issue Mar 17, 2016 · 5 comments
Closed

pass around metadata #765

ThomasWaldmann opened this issue Mar 17, 2016 · 5 comments
Assignees

Comments

@ThomasWaldmann
Copy link
Member

in borg, there is a flow of data through the different stages / layers, but metadata of files / of content is missing mostly. we could use a metadata dict and pass it around with the data, e.g. as a tuple (meta, data).

e.g. currently, the compression component is statically set up, you choose it via commandline param.
it could be that an entry in the meta dict determines the compression that will be used.
the entry could default to whatever commandline says, but it could also be changed dynamically (e.g. if the file reader knows it is .mp3 and can't be compressed, so it sets meta["compression"]='none' for that data).

e.g. it could be also used for sparse files, so hole=True/False can get passed around.

@ThomasWaldmann ThomasWaldmann self-assigned this Mar 17, 2016
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Mar 28, 2016
Chunk is a namedtuple of (meta, data), create chunks using mkchunk(data, **meta).

This does not yet have any visible functionality, meta is always empty dict right now.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Apr 17, 2016
Chunk is a namedtuple of (meta, data), create chunks using mkchunk(data, **meta).

This does not yet have any visible functionality, meta is always empty dict right now.
ThomasWaldmann added a commit that referenced this issue Apr 18, 2016
@ThomasWaldmann
Copy link
Member Author

related to #14

@ThomasWaldmann
Copy link
Member Author

The infrastructure added in #934 was removed again in #2364. Trying to find out why...

@ThomasWaldmann
Copy link
Member Author

Can't find out why.

I guess we need this for sparse handling (and maybe also for other metadata later).

@enkore do you remember?

@ThomasWaldmann
Copy link
Member Author

Anyway, in #5620 I re-added a little bit of it to support communicating between chunker and hasher.

After the hasher, everything is still as it was - the compressor will not get any metadata yet.

@ThomasWaldmann ThomasWaldmann added this to the 2.0.0b6 milestone Apr 2, 2023
@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Apr 2, 2023

Considering the metadata (about sparseness) produced by the chunker (mostly by the fixed chunker, a bit less by the buzhash chunker):

  • if it is a piece of DATA: metadata not useful to compressor, the compressor will need to compress the data anyway
  • ZEROS / SPARSE HOLE: the compressor will compress the each first <size> block of zeros and the chunk will be stored in the repo. any other all zeros chunk of same size will be deduplicated, so we do not need to pass this metadata to the compressor either.

@ThomasWaldmann ThomasWaldmann removed this from the 2.0.0b6 milestone Apr 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant