Skip to content
This repository has been archived by the owner on Oct 20, 2022. It is now read-only.

Base 64 experiment #48

Open
JoelGardes opened this issue Mar 13, 2017 · 0 comments
Open

Base 64 experiment #48

JoelGardes opened this issue Mar 13, 2017 · 0 comments
Projects

Comments

@JoelGardes
Copy link
Collaborator

Here are some explanations for needed tools to evaluate similarity measurement using base 64 information transcoding.

The purpose

Base 64 is a generic information coding based on a alphabet composed of 64 bytes that will allow to free 64 other bytes to improve similarity distance by replacing run length with a more efficient algorithm. An other way ahead will consist of applying "combinatorial pattern matching" algorithm to evaluate direct byte sequences segmentation (delimiters will be coded of the 64 released bytes after alphabet compression).

This purpose means operating a reversible alphabet compression for data coding.

A first step for this experiment consists on evaluating measurement based on current algorithm, of similarities of information coded in base64 which would become a pivot format if good results obtained.

Globally, we have just to add a base 64 transcoding option inside data preparation.

For pictures

  • We need to preserve linearity of bitmap, so, before base64 coding, a raw format transcoding could be necessary. If possible, maintain the -raw option.

For other content

  • Generic (raw PDF, for example): add base64 coding facility.

  • Work on metadata (vector of extracted words for example): transcode in base 64 extracted data and preserve alignment with source files through canonical filename (i.e. filename without termination) for permitting thumbnails computation in graph program with -src option.

Is that OK ?

@ChristopheMaldivi ChristopheMaldivi added this to In Progress in Simdoc Mar 13, 2017
@ChristopheMaldivi ChristopheMaldivi moved this from In Progress to Testing in Simdoc Apr 5, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Simdoc
Testing
Development

No branches or pull requests

1 participant