Skip to content

GraphopsTech/ruby_base

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ruby_base

ruby_base is the open-source Ruby analysis package that powers Graphops Ruby scanning. It parses Ruby files with tree-sitter, builds a project dictionary in Phase 1, loads and applies rules in Phase 2, and writes protobuf batches used by the Graphops backend and UI.

This package is open source and released under the MIT License.

What This Package Does

ruby_base is responsible for:

  • scanning Ruby projects and finding .rb files
  • parsing Ruby ASTs with tree-sitter
  • building a Phase 1 dictionary of discovered classes, modules, and namespaces
  • resolving semantic rules from inheritance chains
  • loading rule JSON from bundled data, local rules, cache, or backend
  • extracting metadata such as methods, calls, includes, callbacks, associations, validations, and namespace structure
  • writing protobuf+gzip batches for upload

Open Source Packages

Graphops currently publishes two open-source packages:

ruby_base performs the Ruby analysis. graphops_interface orchestrates scan execution, project init, backend configuration, and upload.

Package Structure

  • ruby_base.scanner - file discovery
  • ruby_base.parser - tree-sitter parsing and AST extraction
  • ruby_base.rules - rule loading and caching
  • ruby_base.builders - Phase 1 and Phase 2 builders
  • ruby_base.proto - protobuf schema, writer, and reader utilities
  • ruby_base.type_resolver - inheritance-based semantic rule resolution

Installation

Editable install for local development

From the monorepo root:

pip install -e ./graphops_interface
pip install -e ./ruby_base

From the package directory only:

pip install -e .

Install from your private Graphops package index

graphops_interface is not published on the public PyPI index, so ruby_base must be installed with the same private package index that hosts both packages.

pip install ruby_base \
  --extra-index-url "https://YOUR_TOKEN:@api.graphops.tech/pypi/simple/" \
  --trusted-host api.graphops.tech

You can also configure this once:

export PIP_EXTRA_INDEX_URL="https://YOUR_TOKEN:@api.graphops.tech/pypi/simple/"
export PIP_TRUSTED_HOST="api.graphops.tech"

Or in ~/.config/pip/pip.conf:

[global]
extra-index-url = https://YOUR_TOKEN:@api.graphops.tech/pypi/simple/
trusted-host = api.graphops.tech

Requirements

  • Python 3.10+
  • tree-sitter
  • tree-sitter-ruby
  • protobuf
  • graphops_interface

Quick Start

CLI

# Phase 1: build dictionary
ruby-base phase1 backend --exclude tmp --exclude vendor --output output/dictionary.json

# Phase 2: extract metadata using local rules
ruby-base phase2 backend --rules-dir backend/rules --output output/nodes.pb

With a backend URL for missing-rule downloads:

ruby-base phase2 backend \
  --rules-dir backend/rules \
  --backend-url http://localhost:3000 \
  --output output/nodes.pb

Enable strict ID validation:

RUBY_BASE_VALIDATE_IDS=1 ruby-base phase2 backend -o output/nodes.pb

Python API

from pathlib import Path
from ruby_base import TypeDictionaryBuilder, MetadataExtractionBuilder

builder = TypeDictionaryBuilder()
dictionary = builder.build(
    root_path=Path("backend"),
    excluded_paths=["tmp", "log", "vendor", "node_modules"],
    output_path=Path("output/dictionary.json"),
)

extractor = MetadataExtractionBuilder(rules_dir=Path("backend/rules"))
extractor.build(
    root_path=Path("backend"),
    excluded_paths=["tmp", "log", "vendor", "node_modules"],
    output_path=Path("output/nodes.pb"),
    return_nodes=False,
)

How The Two Phases Work

Phase 1: dictionary building

Phase 1:

  1. scans Ruby files
  2. parses class and module definitions
  3. records inheritance references
  4. writes a dictionary keyed by fully qualified class/module name

The current dictionary payload includes:

  • id - stable node identifier
  • type - structural type: class, module, or namespace
  • rule - semantic rule name used by Phase 2
  • file_path - canonical relative file path

Example:

{
  "ApplicationJob": {
    "id": "...",
    "type": "class",
    "rule": "active_job",
    "file_path": "app/jobs/application_job.rb"
  }
}

Phase 1 uses inheritance to derive known semantic rules. For example:

MyJob < ApplicationJob < ActiveJob::Base

This resolves to:

  • structural type: class
  • semantic rule: active_job

If no known inheritance-based rule is found, the fallback is:

  • class for classes
  • module for modules

Phase 2: rule-driven extraction

Phase 2:

  1. loads the Phase 1 dictionary
  2. determines which rules are needed
  3. downloads missing rules only when required
  4. parses each Ruby file
  5. applies the rule for each node during extraction
  6. resolves method calls using target-node rules where needed
  7. writes protobuf batches for nodes and namespaces

Phase 2 uses the dictionary in two ways:

  • type for structural output concerns such as class/module/namespace handling
  • rule for semantic rule loading and application

Rules

Rules live under rules/ruby/{rule_name}.json, for example:

  • rules/ruby/base.json
  • rules/ruby/class.json
  • rules/ruby/module.json
  • rules/ruby/active_job.json
  • rules/ruby/sidekiq_worker.json

Each rule may include:

  • extract - which metadata to extract
  • extra - extra extraction flags layered on top of base
  • exclude - extraction fields to remove from base
  • call_mappings - call aliases such as perform_later -> perform
  • method_operations - semantic operation labels such as Active Record find -> read
  • callback_trigger_options - callback option parsing rules

Rule Loading Order

When Phase 2 needs a rule, RuleLoader checks these sources in order:

  1. local --rules-dir
  2. bundled package rules
  3. local cache
  4. backend API

This means missing rules are only downloaded when needed.

Backend Rule API

The Rails backend serves Ruby rules at:

  • GET /api/v1/rules/ruby
  • GET /api/v1/rules/ruby/:type
  • POST /api/v1/rules/ruby/batch

The batch endpoint is used to fetch missing rules efficiently after Phase 1.

Output

Phase 2 writes protobuf+gzip batches split by kind:

  • node batches
  • namespace batches
  • a manifest file describing the output

The protobuf writer lives under ruby_base.proto.

Building and Publishing

Build a wheel locally:

pip install build
python -m build
ls -la dist/

A valid wheel should be a normal .whl zip archive and should include code plus packaged rules data.

Development Notes

Helpful areas to inspect when changing behavior:

  • ruby_base/builders/type_dictionary_builder.py
  • ruby_base/builders/metadata_extraction_builder.py
  • ruby_base/type_resolver.py
  • ruby_base/rules/rule_loader.py
  • ruby_base/parser/ast_extractor.py

License

This package is licensed under the MIT License.

See LICENSE for the full license text.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages