[SQLite] Discussion: how to handle duplicate column names in result set? #696

geelen · 2023-05-24T05:58:43Z

If you have the following SQL:

CREATE TABLE abc (a INT, b INT, c INT);
CREATE TABLE cde (c INT, d INT, e INT);
INSERT INTO abc VALUES (1,2,3),(4,5,6);
INSERT INTO cde VALUES (7,8,9),(1,2,3);

SELECT * FROM abc, cde;

You get (in real sqlite in 'table' output mode):

+---+---+---+---+---+---+
| a | b | c | c | d | e |
+---+---+---+---+---+---+
| 1 | 2 | 3 | 7 | 8 | 9 |
| 1 | 2 | 3 | 1 | 2 | 3 |
| 4 | 5 | 6 | 7 | 8 | 9 |
| 4 | 5 | 6 | 1 | 2 | 3 |
+---+---+---+---+---+---+

(Note the repeated column name c)

Doing that in workerd's SQLite:

Array.from(sql.prepare(`SELECT * FROM abc, cde`)())
// =>
[
  {"a":1,"b":2,"c":7,"d":8,"e":9},
  {"a":1,"b":2,"c":1,"d":2,"e":3},
  {"a":4,"b":5,"c":7,"d":8,"e":9},
  {"a":4,"b":5,"c":1,"d":2,"e":3}
]

Note that the second value of c, 7, has overwritten the first value 3.

Using .raw() gets the right data:

Array.from(sql.prepare(`SELECT * FROM abc, cde`)().raw())
// =>
[
  [1,2,3,7,8,9],
  [1,2,3,1,2,3],
  [4,5,6,7,8,9],
  [4,5,6,1,2,3]
]

But currently, the D1 shim assumes .raw() can always be retrieved from the Object.values of the "object" response type. We'll be changing the shim but, for backwards compatibility, should we endeavour to make the default response format never drop data, by preferring to instead mangle the column names?

[
  {"a":1,"b":2,"c":3,"c_1":7,"d":8,"e":9},
]

Note the c_1 key meaning "first duplicate of c" column. We could also use . as a separator, or even ~ if we wanted to channel DOS filenames... :)

Alternatively, we could throw an exception if this case occurs, though that might be painful for people who do something like: SELECT * FROM users, projects WHERE projects.user_id = users.user_id. There's a duplicate column name but the data is always the same, so is that really bad? The fact that the raw() and normal responses have different lengths of responses is still not great though...

As an aside, it's a shame the full_column_names pragma is deprecated (and has no effect). It would have solved this nicely for anyone who relied on select * a bunch in their app...

rozenmd · 2023-05-24T06:43:17Z

I also tried this test case, but didn't get far: #691

geelen · 2023-05-24T09:16:24Z

Ah, apologies @rozenmd I missed that you'd already pushed something to demonstrate the issue.

Another option that just occurred to me is making .raw() the default for all D1 queries, which we can then stitch together into JS objects either in the DO or even in newer versions of the shim, and implement renaming there. But we'd require a way to also get the column names out during the RawIterator's loop. I had a quick go at adding a getCachedColumnNames() to SqlStorage::Cursor but couldn't make it work. I'm sure I'm missing something simple though...

But then it's probably no good if D1 introduces its own column-collision-renaming behaviour. It'd be better to do it in workerd so it's standard across all users of the API.

kentonv · 2023-05-24T20:25:52Z

IMO, it's fine to say to the app: "If you're going to request rows as objects, it's your responsibility to make sure the column names don't collide by using AS where needeed."

But this doesn't work for D1 specifically because D1 doesn't know at the time of the query whether the application is requesting objects vs. raw arrays, right?

So I think a columnNames property on Cursor would make sense, to let D1 solve this. This would also save bytes on the wire between the eyeball and the database DO, since D1's protocol could be designed to send only one copy of the column names instead of repeating them for every row.

kentonv · 2023-05-24T20:28:20Z

I actually think maybe the DO API should throw an exception if rows are requested in object form and it turns out there are duplicate column names. If the app doesn't realize it has duplicate column names, renaming one of them isn't going to make the code work correctly. If the app does know, then it can use AS. So adding a _1 suffix or whatever doesn't seem helpful.

elithrar · 2023-05-24T20:34:43Z

Agree with the idea of a columnNames property to minimize redundant bytes. Adds up fast. re: duplicates - a contextual error would be great. The DO API can return something like `DUPLICATE_COLUMN_NAMES` as a short form error. The D1 API can expand that into (for example): “Error: DUPLICATE_COLUMN_NAMES: Duplicate column names in result set. This typically occurs when joining multiple tables with overlapping column names. Use `AS` to provide a unique alias for column names.”

…

On Wed, May 24, 2023 at 16:28 Kenton Varda ***@***.***> wrote: I actually think maybe the DO API should throw an exception if rows are requested in object form and it turns out there are duplicate column names. If the app doesn't realize it has duplicate column names, renaming one of them isn't going to make the code work correctly. If the app does know, then it can use AS. So adding a _1 suffix or whatever doesn't seem helpful. — Reply to this email directly, view it on GitHub <#696 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAEQ4E5OUDMGUOXFKNXLNLXHZVO5ANCNFSM6AAAAAAYM3LZ34> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

drizzle-team/drizzle-orm#555 cloudflare/workerd#696

geelen · 2024-01-30T00:09:28Z

FYI, this will be fixed D1 once #1586 is merged and rolls out

geelen · 2024-02-13T22:39:58Z

This is now live. .raw() returns the full results, regardless of column name collisions.

rozenmd mentioned this pull request May 24, 2023

🐛 BUG: when trying to left join 2 tables that have same name columns - second column is not returned cloudflare/workers-sdk#3160

Open

geelen force-pushed the glen/wip branch from 4df6e32 to e9e1d6e Compare July 24, 2023 22:34

Test showing duplicate names in rowIterator are being overwritten

324b993

geelen force-pushed the glen/wip branch 2 times, most recently from 9fade21 to 93664af Compare July 24, 2023 23:36

geelen mentioned this pull request Jul 24, 2023

Implement test case for joining two tables with identical column names #691

Closed

Using .columnNames with .raw(), the full data can be accessed

91e616f

geelen force-pushed the glen/wip branch from 93664af to 91e616f Compare July 25, 2023 03:18

geelen closed this Jul 25, 2023

geelen deleted the glen/wip branch July 25, 2023 03:19

geelen mentioned this pull request Jul 25, 2023

[SQLite] Add .columnNames property on Cursor #911

Merged

Angelelz mentioned this pull request Dec 17, 2023

[BUG]: Broken shifted columns with leftJoin and same column name (on D1) drizzle-team/drizzle-orm#555

Open

beeequeue added a commit to beeequeue/dota-matches-api that referenced this pull request Jan 7, 2024

work around D1 bug

c471b59

drizzle-team/drizzle-orm#555 cloudflare/workerd#696

beeequeue mentioned this pull request Jan 7, 2024

Replace itty-* with Hono, Workaround D1 Bug beeequeue/dota-matches-api#261

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SQLite] Discussion: how to handle duplicate column names in result set? #696

[SQLite] Discussion: how to handle duplicate column names in result set? #696

geelen commented May 24, 2023 •

edited

Loading

rozenmd commented May 24, 2023

geelen commented May 24, 2023

kentonv commented May 24, 2023

kentonv commented May 24, 2023

elithrar commented May 24, 2023 via email

geelen commented Jan 30, 2024

geelen commented Feb 13, 2024

[SQLite] Discussion: how to handle duplicate column names in result set? #696

[SQLite] Discussion: how to handle duplicate column names in result set? #696

Conversation

geelen commented May 24, 2023 • edited Loading

rozenmd commented May 24, 2023

geelen commented May 24, 2023

kentonv commented May 24, 2023

kentonv commented May 24, 2023

elithrar commented May 24, 2023 via email

geelen commented Jan 30, 2024

geelen commented Feb 13, 2024

geelen commented May 24, 2023 •

edited

Loading