Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV auto parsing regression in insertCSVFromPath #1166

Closed
domoritz opened this issue Mar 2, 2023 · 3 comments
Closed

CSV auto parsing regression in insertCSVFromPath #1166

domoritz opened this issue Mar 2, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@domoritz
Copy link
Collaborator

domoritz commented Mar 2, 2023

There seems to be a regression between 1.17 and 1.24.

This code used to work but doesn't anymore.

  await database.open({ query: { castTimestampToDate: true } });
  const url = await file.url();
  await database.registerFileURL(file.name, url);
  const connection = await database.connect();
  await connection.insertCSVFromPath(file.name, {
    name: file.name,
    schema: "main"
  });

We get an error Error: Invalid Input Error: Error in file "us-state-capitals.tsv": CSV options could not be auto-detected. Consider setting parser options manually..

I can read the file fine in the CLI.

D from read_csv_auto('us-state-capitals.tsv');
┌────────────────┬────────────────┬────────────┬────────┬────────────┬─────────┬─────────┬───────┐
│     State      │    Capital     │   Since    │  Area  │ Population │ MSA/µSA │   CSA   │ Rank  │
│    varchar     │    varchar     │    date    │ double │   int64    │  int64  │  int64  │ int64 │
├────────────────┼────────────────┼────────────┼────────┼────────────┼─────────┼─────────┼───────┤
│ Alabama        │ Montgomery     │ 1846-01-01 │  159.8 │     198525 │  373290 │  461516 │     3 │
│ Alaska         │ Juneau         │ 1906-01-01 │ 2716.7 │      32113 │   32113 │         │     3 │
│ Arizona        │ Phoenix        │ 1912-01-01 │  517.6 │    1680992 │ 4948203 │ 5002221 │     1 │
│ Arkansas       │ Little Rock    │ 1821-01-01 │  116.2 │     197312 │  742384 │  908941 │     1 │

from read_csv('us-state-capitals.tsv', AUTO_DETECT=true); also works.

See https://observablehq.com/d/9df8918a02ad22c6

This is blocking observablehq/stdlib#350

@domoritz domoritz added the bug Something isn't working label Mar 2, 2023
@carlopi
Copy link
Collaborator

carlopi commented Mar 2, 2023

Thanks for the detailed bug report, I am looking into this.

@kimmolinna
Copy link
Contributor

kimmolinna commented Mar 2, 2023

I don't think that this is a bug. Nowadays you have to specify DuckDBDataProtocol. A clip from Readme

await db.registerFileURL('remote.parquet', 'https://origin/remote.parquet', DuckDBDataProtocol.HTTP, false);

And you can skip the last boolean.
This one is working (4 equals DuckDBDataProtocol.HTTP:

await database.registerFileURL(file.name, url, 4);

@domoritz
Copy link
Collaborator Author

domoritz commented Mar 2, 2023

Closing since it's not a bug.

@domoritz domoritz closed this as completed Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants