Skip to content

Plaintext mode; JSONC support; and mime detection with file#30

Open
taciturnaxolotl wants to merge 3 commits intoAtaraxy-Labs:mainfrom
taciturnaxolotl:main
Open

Plaintext mode; JSONC support; and mime detection with file#30
taciturnaxolotl wants to merge 3 commits intoAtaraxy-Labs:mainfrom
taciturnaxolotl:main

Conversation

@taciturnaxolotl
Copy link

Closes #29

I also added support for detecting unknown files via the file command, so long as it exists. I also noticed in the process of doing that that JSONC breaks the current parsing so I added a fallback on JSON parse that tries JSONC parsing.

image

@rs545837
Copy link
Member

Thanks for the work, I will review it soon.

Copy link
Member

@rs545837 rs545837 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice additions! A few things to address before merging.

let beforeEntities: SemanticEntity[] = [];
let afterEntities: SemanticEntity[] = [];

const fallback = registry.getPluginById('fallback')!;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ! non-null assertion here will crash if no fallback plugin is registered. Should guard against it:

const fallback = registry.getPluginById('fallback');

Then check if (plugin !== fallback && fallback) before calling fallback.extractEntities.


// Map MIME types returned by `file --mime-type` to parser plugin IDs
const MIME_TO_PLUGIN: Record<string, string> = {
'application/json': 'json',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only covers 9 of the 20+ supported languages. Missing at minimum:

'text/x-c': 'code',
'text/x-c++src': 'code',
'text/x-java-source': 'code',
'text/x-ruby': 'code',
'text/x-php': 'code',
'text/x-csharp': 'code',
'text/x-shellscript': 'code',

If MIME detection only works for a few languages, it gives a false sense of coverage.

@@ -8,7 +9,9 @@ export class JsonParserPlugin implements SemanticParserPlugin {
extensions = ['.json'];

extractEntities(content: string, filePath: string): SemanticEntity[] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you added JSONC parsing, should .jsonc be added to the extensions list? Right now .jsonc files still fall to the fallback parser.

export function detectPluginByMime(absoluteFilePath: string): string | null {
if (!isFileCommandAvailable()) return null;
try {
const mime = execFileSync('file', ['--mime-type', '-b', absoluteFilePath], {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getPlugin() is called for every file in a diff. This shells out to file synchronously for every unrecognized file that exists on disk. On repos with many extensionless files (Makefile, Dockerfile, LICENSE, etc.) this could add up. Consider caching the result per path, or at least per MIME type lookup.


extractEntities(content: string, filePath: string): SemanticEntity[] {
const parsed = JSON.parse(content);
const errors: unknown[] = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check errors.length > 0 && parsed === undefined means partially parsed JSONC with errors will silently succeed. Regular JSON.parse would have thrown on malformed .json files. This changes strictness for normal JSON. Consider keeping strict mode for .json and only using lenient JSONC parsing for .jsonc files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Plaintext based mode

2 participants