Add Mojo language support#502
Conversation
|
Hey @Tokarzewski, could you please split this such that the grammar is not vendored through the PR and basically list which grammar should be checked and added through us? We would like to first audit vendored sources ourselves. So basically: Remove the vendored items, and let us know in the PR description that the dependency for this to work is the grammar XY which you fetched from the repo mentioned. |
|
Of course, apologies for the bloat. Will improve! @DeusData |
|
Thanks for adding Mojo support. Before this can move forward, we need the vendored grammar provenance tightened up: source repo, exact commit, generation command/version, license confirmation, and ideally a reproducible regeneration note or checksum. Please also link the tracking issue/language request and remove generated/session attribution from the commit message/PR body. |
Mojo (Modular) is a Python-superset systems language. Wire the standard language path — enum, extraction spec, language entry, test cases, and registration — so the hook is ready for the grammar to be re-vendored after provenance audit. Tracking: DeusData#737 Grammar: lsh/tree-sitter-mojo @ 33193a99afe6, MIT, ABI 15, C scanner (community grammar — not in nvim-treesitter/Helix registries) The grammar's node types mirror Python's, so the spec reuses the py_* arrays and overrides only class types: - "struct"/"class" → class_definition - "trait"/"__extension" → trait_definition / extension_definition (Interface) - "fn"/"def" → function_definition - "alias NAME = value" → assignment (no dedicated node in upstream grammar) Signed-off-by: Tokarzewski <bartlomiej.tokarzewski@gmail.com>
8dcb876 to
89071d2
Compare
…support Update branch to latest upstream/main (includes ObjectScript grammars, git worktree support, repro framework, and other changes).
|
@DeusData can you help? |
|
What exactly do u need? @Tokarzewski |
…support Resolved THIRD_PARTY.md grammar count (159→160) and README.md badge to match upstream/main's ObjectScript additions.
60af06e to
0aa1aed
Compare
|
@DeusData I am unable to pass the CIs |
|
Well, you need to sign your commits. That's something I cannot do for you. Otherwise the test and lint logs should be giving u hints on what you (or ur agent) should change |
|
Thanks for asking, and sorry for the earlier short answer. I checked the logs more carefully and you were right to ask for help here. This is not only a DCO problem. DCO still needs fixing on the branch update commit, but the larger CI blocker is that the PR wires Mojo into the compiled language/test path while the vendored Mojo grammar is absent, so the linker fails with That part is on the maintainer side. We have not yet had the time to audit and integrate the Mojo grammar, but we will do that now: pick the right upstream source, verify the license/provenance, run a security review on the generated/parser sources, and vendor it cleanly. Once that is in place, this PR should have a real path to green CI after rebasing/signing. So please keep this PR open. For now, the action item on your side is only the DCO cleanup; the missing grammar integration is something we need to handle. Thanks for sticking with it. |
|
Follow-up from maintainer side: I opened #744 to vendor the Mojo tree-sitter grammar after provenance/license/security review. Once that lands, this PR can rebase on top of it and should no longer carry the missing-grammar/linker blocker; the remaining contributor-side item here is the DCO cleanup. |
|
Follow-up now that the maintainer-side grammar work is complete: #744 has been merged. The Mojo grammar is now vendored in So the missing Thanks again for pushing this forward, and sorry again that we made you wait on a maintainer-side blocker here. Once the branch is rebased and signed, we can review the actual integration changes properly. |
What does this PR do?
Adds Mojo (Modular's Python-superset systems language) wiring — the enum, extraction spec, language entry, and tests — so the integration point is ready for the grammar to be re-vendored after provenance audit.
Spec design
The grammar's node types mirror Python's, so the spec reuses the
py_*arrays and overrides only the class types (same reuse pattern already used for CFScript→js_*). Mojo-specific divergences:fn/def→function_definitionstruct/class→class_definition;traitand__extensionget their own nodes (trait_definition/extension_definition), so traits map to the Interface labelalias NAME = valuehas no dedicated grammar node — the upstream grammar recovers it as anassignment(name still captured)What's in this PR (wiring only)
CBM_LANG_MOJOenum (appended, no renumbering of persisted DBs)lang_specs.c.mojoand.🔥extensions +LANG_NAMESinlanguage.cscripts/new-languages.jsonentrytest_grammar_regression.ccase (fn→Function,struct→Class,trait→Interface) and the matchingtest_grammar_labels.cgoldenMANIFEST.mdentry (marked PENDING — vendored files removed pending audit)THIRD_PARTY.mdand README language countWhat's NOT in this PR (deferred)
internal/cbm/vendored/grammars/mojo/) — removed from this PR per maintainer request. To build, copy the C parser + scanner from the pinned commit above into that directory and re-addgrammar_mojo.cto the build.Verification (original branch, before removal)
Indexed real Mojo corpora end-to-end:
NDArraystruct correctly surfaces as the most-referenced type (in-degree 306).