ruby_base is the open-source Ruby analysis package that powers Graphops Ruby
scanning. It parses Ruby files with tree-sitter, builds a project dictionary in
Phase 1, loads and applies rules in Phase 2, and writes protobuf batches used by
the Graphops backend and UI.
This package is open source and released under the MIT License.
ruby_base is responsible for:
- scanning Ruby projects and finding
.rbfiles - parsing Ruby ASTs with tree-sitter
- building a Phase 1 dictionary of discovered classes, modules, and namespaces
- resolving semantic rules from inheritance chains
- loading rule JSON from bundled data, local rules, cache, or backend
- extracting metadata such as methods, calls, includes, callbacks, associations, validations, and namespace structure
- writing protobuf+gzip batches for upload
Graphops currently publishes two open-source packages:
ruby_base performs the Ruby analysis. graphops_interface orchestrates scan
execution, project init, backend configuration, and upload.
ruby_base.scanner- file discoveryruby_base.parser- tree-sitter parsing and AST extractionruby_base.rules- rule loading and cachingruby_base.builders- Phase 1 and Phase 2 buildersruby_base.proto- protobuf schema, writer, and reader utilitiesruby_base.type_resolver- inheritance-based semantic rule resolution
From the monorepo root:
pip install -e ./graphops_interface
pip install -e ./ruby_baseFrom the package directory only:
pip install -e .graphops_interface is not published on the public PyPI index, so ruby_base
must be installed with the same private package index that hosts both packages.
pip install ruby_base \
--extra-index-url "https://YOUR_TOKEN:@api.graphops.tech/pypi/simple/" \
--trusted-host api.graphops.techYou can also configure this once:
export PIP_EXTRA_INDEX_URL="https://YOUR_TOKEN:@api.graphops.tech/pypi/simple/"
export PIP_TRUSTED_HOST="api.graphops.tech"Or in ~/.config/pip/pip.conf:
[global]
extra-index-url = https://YOUR_TOKEN:@api.graphops.tech/pypi/simple/
trusted-host = api.graphops.tech- Python 3.10+
tree-sittertree-sitter-rubyprotobufgraphops_interface
# Phase 1: build dictionary
ruby-base phase1 backend --exclude tmp --exclude vendor --output output/dictionary.json
# Phase 2: extract metadata using local rules
ruby-base phase2 backend --rules-dir backend/rules --output output/nodes.pbWith a backend URL for missing-rule downloads:
ruby-base phase2 backend \
--rules-dir backend/rules \
--backend-url http://localhost:3000 \
--output output/nodes.pbEnable strict ID validation:
RUBY_BASE_VALIDATE_IDS=1 ruby-base phase2 backend -o output/nodes.pbfrom pathlib import Path
from ruby_base import TypeDictionaryBuilder, MetadataExtractionBuilder
builder = TypeDictionaryBuilder()
dictionary = builder.build(
root_path=Path("backend"),
excluded_paths=["tmp", "log", "vendor", "node_modules"],
output_path=Path("output/dictionary.json"),
)
extractor = MetadataExtractionBuilder(rules_dir=Path("backend/rules"))
extractor.build(
root_path=Path("backend"),
excluded_paths=["tmp", "log", "vendor", "node_modules"],
output_path=Path("output/nodes.pb"),
return_nodes=False,
)Phase 1:
- scans Ruby files
- parses class and module definitions
- records inheritance references
- writes a dictionary keyed by fully qualified class/module name
The current dictionary payload includes:
id- stable node identifiertype- structural type:class,module, ornamespacerule- semantic rule name used by Phase 2file_path- canonical relative file path
Example:
{
"ApplicationJob": {
"id": "...",
"type": "class",
"rule": "active_job",
"file_path": "app/jobs/application_job.rb"
}
}Phase 1 uses inheritance to derive known semantic rules. For example:
MyJob < ApplicationJob < ActiveJob::Base
This resolves to:
- structural
type:class - semantic
rule:active_job
If no known inheritance-based rule is found, the fallback is:
classfor classesmodulefor modules
Phase 2:
- loads the Phase 1 dictionary
- determines which rules are needed
- downloads missing rules only when required
- parses each Ruby file
- applies the rule for each node during extraction
- resolves method calls using target-node rules where needed
- writes protobuf batches for nodes and namespaces
Phase 2 uses the dictionary in two ways:
typefor structural output concerns such as class/module/namespace handlingrulefor semantic rule loading and application
Rules live under rules/ruby/{rule_name}.json, for example:
rules/ruby/base.jsonrules/ruby/class.jsonrules/ruby/module.jsonrules/ruby/active_job.jsonrules/ruby/sidekiq_worker.json
Each rule may include:
extract- which metadata to extractextra- extra extraction flags layered on top ofbaseexclude- extraction fields to remove frombasecall_mappings- call aliases such asperform_later -> performmethod_operations- semantic operation labels such as Active Recordfind -> readcallback_trigger_options- callback option parsing rules
When Phase 2 needs a rule, RuleLoader checks these sources in order:
- local
--rules-dir - bundled package rules
- local cache
- backend API
This means missing rules are only downloaded when needed.
The Rails backend serves Ruby rules at:
GET /api/v1/rules/rubyGET /api/v1/rules/ruby/:typePOST /api/v1/rules/ruby/batch
The batch endpoint is used to fetch missing rules efficiently after Phase 1.
Phase 2 writes protobuf+gzip batches split by kind:
- node batches
- namespace batches
- a manifest file describing the output
The protobuf writer lives under ruby_base.proto.
Build a wheel locally:
pip install build
python -m build
ls -la dist/A valid wheel should be a normal .whl zip archive and should include code plus
packaged rules data.
Helpful areas to inspect when changing behavior:
ruby_base/builders/type_dictionary_builder.pyruby_base/builders/metadata_extraction_builder.pyruby_base/type_resolver.pyruby_base/rules/rule_loader.pyruby_base/parser/ast_extractor.py
This package is licensed under the MIT License.
See LICENSE for the full license text.