Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow inferring source format from alternate string #95

Open
yajo opened this issue Nov 10, 2023 · 6 comments
Open

Allow inferring source format from alternate string #95

yajo opened this issue Nov 10, 2023 · 6 comments

Comments

@yajo
Copy link

yajo commented Nov 10, 2023

Is your feature request related to a problem? Please describe.

I'm integrating sqlitebiter into another system whose storage is content-addressable.

The file name on disk doesn't contain the original file name; only its content hash.

The original file name is stored elsewhere in a separate DB.

Describe the solution you'd like

A --format-from-filename option in sqlitebiter file and/or sqlitebiter stdin that allows me to pass a filename to be considered for automatic format detection, instead of the filename of the file being processed (which BTW isn't available in sqlitebiter stdin mode).

Describe alternatives you've considered

Creating a copy of the file with another name, converting it and removing it. Duplicates desired disk space, so it's not a good solution.

I also tried creating a symlink with the appropriate name that points to the hashed file. It doesn't work because:

  • If I don't pass --follow-symlinks, it's skipped.
  • If I pass it, it tries to obtain the file name from the target of the symlink, which, again, is just a hash.
@thombashi
Copy link
Owner

@yajo
Please let me confirm this.
Are the existing options not applicable to your use case?
You can specify a format by using the --format option for sqlitebiter file.
And you can set a format as a positional argument for sqlitebiter stdin.

@yajo
Copy link
Author

yajo commented Nov 27, 2023

Yeah, the point is that I want to pass the file through stdin but let sqlitebiter guess the format by a filename that I give in an alternate flag. So that if the file is .CSV or .csv, for example, sqlitebiter does its magic to get the format automatically, without me having to replicate the same magic upper in the stack.

The use case is that I'm integrating sqlitebiter in a system (Odoo) that stores attachments in a content-addressable manner. The file is named asd9f87a9sd8f97987as987d9a8s7d. In the database I know it's called something.xlsx.

I tried by symlinking a something.xlsx to the source file, but sqlitebiter resolves the symlink and ends up without the suffix it needs to know the format.

Thus, I wanted to just let it guess the format based on a filename that I can pass separately. Something like:

sqlitebiter file asd9f87a9sd8f97987as987d9a8s7d --format-from-filename something.xlsx

something.xlsx would not exist in this context. It's just a parameter.

@thombashi
Copy link
Owner

@yajo
I intended my previous comment to suggest that you can specify the conversion format using an option or argument.
For example:

$ cat abcdefg   # a CSV format file without extension
A,B,C
ab,1,2
bc,3,4
cd,5,6
$ sqlitebiter file --format csv abcdefg 
[INFO] convert 'abcdefg' to 'abcdefg' table
[INFO] converted results: source=1, success=1, created-table=1
[INFO] database path: out.sqlite

Or

$ cat abcdefg | sqlitebiter stdin csv
[INFO] convert 'stdin' to 'csv1' table
[INFO] converted results: source=1, success=1, created-table=1
[INFO] database path: out.sqlite

Are these methods of executing commands not solving your problem?

@yajo
Copy link
Author

yajo commented Dec 5, 2023

Those require my app to know the format of the file. The whole point of this feature request is that I'd love to use sqlitebiter format autodetection, which is already coded here. So do those commands convert the file, but do not solve the issue.

@thombashi
Copy link
Owner

I feel that the use cases of the --format-from-filename option is too limited.
How about adding file extension names (xlsx, etc.) support to the --format option?

@yajo
Copy link
Author

yajo commented Dec 9, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants