Description
Hi! I found an issue with Zed selection detection when OpenCode uses the Zed SQLite DB.
When the selected file contains non-ASCII text before or inside the selection, OpenCode reports an incorrect line range. It looks like Zed stores selection offsets as UTF-8 byte offsets, but OpenCode treats them as JavaScript string indexes.
Environment
- Editor: Zed
- File contains Cyrillic text
Example
Given this Next.js layout file:
import type { Metadata } from 'next'
import { Inter, Manrope } from 'next/font/google'
import { Toaster } from '@/shared/ui'
import { Providers } from './_providers'
import './globals.css'
const manrope = Manrope({
subsets: ['latin', 'cyrillic'],
weight: ['400', '500', '600', '700'],
variable: '--font-manrope',
})
const inter = Inter({
subsets: ['latin', 'cyrillic'],
weight: ['400', '500', '600', '700'],
variable: '--font-inter',
})
export const metadata: Metadata = {
title: 'KanbanFlow',
description: 'Эффективное управление задачами и проектами с помощью Kanban-досок',
}
export default function RootLayout({
children,
}: Readonly<{
children: React.ReactNode
}>) {
return (
<html
lang="ru"
className={`${inter.variable} ${manrope.variable}`}
suppressHydrationWarning
>
<body>
<Providers>
<Toaster />
{children}
</Providers>
</body>
</html>
)
}
Actual behavior
Selections before the Cyrillic line work correctly:
Selecting lines 13–17 is detected as 13–17
But selections that include or come after the Cyrillic line become incorrect:
Selecting lines 19–22 is detected as 19–26
Selecting only line 21 is detected as 21–26
Selecting lines 35–40 is detected as 38–44
Expected behavior
OpenCode should report the same line range that is selected in Zed.
For example:
Selecting 35–40 should be detected as 35–40
Selecting only line 21 should be detected as line 21
Possible cause
In packages/opencode/src/cli/cmd/tui/context/editor-zed.ts, OpenCode reads selection_start and selection_end from Zed’s SQLite database and then uses them directly as JavaScript string offsets:
const startOffset = Math.min(row.selection_start, row.selection_end)
const endOffset = Math.max(row.selection_start, row.selection_end)
text.slice(startOffset, endOffset)
offsetsToSelection(text, startOffset, endOffset)
However, Zed appears to store these offsets as UTF-8 byte offsets, while JavaScript string indexing uses UTF-16 code units. Because of that, any Cyrillic text, emoji, or other non-ASCII characters before the selection shift the calculated line range.
In my example, the Cyrillic description line has more UTF-8 bytes than JavaScript string characters, so all selections after it are shifted forward.
Possible fix
Before calling text.slice() and offsetsToSelection(), the byte offsets from Zed should be converted into JavaScript string indexes.
Something like:
function byteOffsetToStringIndex(text: string, byteOffset: number) {
let bytes = 0
for (let index = 0; index < text.length; index++) {
if (bytes >= byteOffset) return index
const codePoint = text.codePointAt(index)!
const char = String.fromCodePoint(codePoint)
bytes += Buffer.byteLength(char, 'utf8')
if (codePoint > 0xffff) index++
}
return text.length
}
Then:
const startByteOffset = Math.min(row.selection_start, row.selection_end)
const endByteOffset = Math.max(row.selection_start, row.selection_end)
const startOffset = byteOffsetToStringIndex(text, startByteOffset)
const endOffset = byteOffsetToStringIndex(text, endByteOffset)
return {
type: 'selection',
selection: {
text: text.slice(startOffset, endOffset),
filePath: row.buffer_path,
source: 'zed',
selection: offsetsToSelection(text, startOffset, endOffset),
},
}
I think this should fix incorrect ranges for Cyrillic, emoji, and other non-ASCII text when using the Zed DB.
Thanks!
Plugins
none
OpenCode version
1.14.28
Steps to reproduce
- Run Zed
- Open file
- Run OpenCode
- Select File with non-ASCII text
Screenshot and/or share link
Before cyrillic
After cyrillic
Operating System
Windows 11
Terminal
Windows Terminal
Description
Hi! I found an issue with Zed selection detection when OpenCode uses the Zed SQLite DB.
When the selected file contains non-ASCII text before or inside the selection, OpenCode reports an incorrect line range. It looks like Zed stores selection offsets as UTF-8 byte offsets, but OpenCode treats them as JavaScript string indexes.
Environment
Example
Given this Next.js layout file:
Actual behavior
Selections before the Cyrillic line work correctly:
Selecting lines 13–17 is detected as 13–17
But selections that include or come after the Cyrillic line become incorrect:
Selecting lines 19–22 is detected as 19–26
Selecting only line 21 is detected as 21–26
Selecting lines 35–40 is detected as 38–44
Expected behavior
OpenCode should report the same line range that is selected in Zed.
For example:
Selecting 35–40 should be detected as 35–40
Selecting only line 21 should be detected as line 21
Possible cause
In packages/opencode/src/cli/cmd/tui/context/editor-zed.ts, OpenCode reads selection_start and selection_end from Zed’s SQLite database and then uses them directly as JavaScript string offsets:
However, Zed appears to store these offsets as UTF-8 byte offsets, while JavaScript string indexing uses UTF-16 code units. Because of that, any Cyrillic text, emoji, or other non-ASCII characters before the selection shift the calculated line range.
In my example, the Cyrillic description line has more UTF-8 bytes than JavaScript string characters, so all selections after it are shifted forward.
Possible fix
Before calling text.slice() and offsetsToSelection(), the byte offsets from Zed should be converted into JavaScript string indexes.
Something like:
Then:
I think this should fix incorrect ranges for Cyrillic, emoji, and other non-ASCII text when using the Zed DB.
Thanks!
Plugins
none
OpenCode version
1.14.28
Steps to reproduce
Screenshot and/or share link
Before cyrillic
After cyrillic
Operating System
Windows 11
Terminal
Windows Terminal