You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 20, 2022. It is now read-only.
Here are some explanations for needed tools to evaluate similarity measurement using base 64 information transcoding.
The purpose
Base 64 is a generic information coding based on a alphabet composed of 64 bytes that will allow to free 64 other bytes to improve similarity distance by replacing run length with a more efficient algorithm. An other way ahead will consist of applying "combinatorial pattern matching" algorithm to evaluate direct byte sequences segmentation (delimiters will be coded of the 64 released bytes after alphabet compression).
This purpose means operating a reversible alphabet compression for data coding.
A first step for this experiment consists on evaluating measurement based on current algorithm, of similarities of information coded in base64 which would become a pivot format if good results obtained.
Globally, we have just to add a base 64 transcoding option inside data preparation.
For pictures
We need to preserve linearity of bitmap, so, before base64 coding, a raw format transcoding could be necessary. If possible, maintain the -raw option.
For other content
Generic (raw PDF, for example): add base64 coding facility.
Work on metadata (vector of extracted words for example): transcode in base 64 extracted data and preserve alignment with source files through canonical filename (i.e. filename without termination) for permitting thumbnails computation in graph program with -src option.
Is that OK ?
The text was updated successfully, but these errors were encountered:
Here are some explanations for needed tools to evaluate similarity measurement using base 64 information transcoding.
The purpose
Base 64 is a generic information coding based on a alphabet composed of 64 bytes that will allow to free 64 other bytes to improve similarity distance by replacing run length with a more efficient algorithm. An other way ahead will consist of applying "combinatorial pattern matching" algorithm to evaluate direct byte sequences segmentation (delimiters will be coded of the 64 released bytes after alphabet compression).
This purpose means operating a reversible alphabet compression for data coding.
A first step for this experiment consists on evaluating measurement based on current algorithm, of similarities of information coded in base64 which would become a pivot format if good results obtained.
Globally, we have just to add a base 64 transcoding option inside data preparation.
For pictures
For other content
Generic (raw PDF, for example): add base64 coding facility.
Work on metadata (vector of extracted words for example): transcode in base 64 extracted data and preserve alignment with source files through canonical filename (i.e. filename without termination) for permitting thumbnails computation in graph program with -src option.
Is that OK ?
The text was updated successfully, but these errors were encountered: