Skip to content

barturba/ruby-duplicates

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ruby-duplicates

A small duplicate-code metric for Ruby.

ruby-duplicates parses Ruby with the standard library Ripper, normalizes syntax trees so names and literal values do not dominate the comparison, fingerprints method subtrees, and reports methods with high Jaccard similarity.

It is inspired by Uncle Bob's dry4clj, which applies the same broad idea to Clojure code: compare normalized structure instead of doing plain text clone detection.

This is a metric tool, not a refactoring engine. It points at suspiciously similar methods so a human or coding agent can decide whether the duplication is accidental, intentional symmetry, or data-shaped boilerplate.

Install

From RubyGems:

gem install ruby-duplicates
ruby-duplicates app lib test

From this repo:

bundle install
exe/ruby-duplicates app lib test

Usage

ruby-duplicates [options] [file-or-directory ...]

Examples:

ruby-duplicates app lib test
ruby-duplicates --threshold 0.9 --min-lines 5 --min-nodes 30 app
ruby-duplicates --json app/models app/controllers

Options:

--threshold N    Minimum similarity score, default 0.82
--min-lines N    Minimum method source lines, default 4
--min-nodes N    Minimum normalized syntax nodes, default 20
--max-results N  Maximum matches to print, default 50
--format F       text or json, default text
--json           Same as --format json
--ignore-dir N   Directory basename or path to skip; may be repeated

Example output:

ruby_duplicates candidates=3 matches=1 threshold=0.82

DUPLICATE score=1.00 shared=21
  examples/duplicate_sample.rb:1-4 alpha nodes=64
  examples/duplicate_sample.rb:7-10 beta nodes=64

How It Works

For each Ruby method, the scanner:

  1. Parses the file with Ripper.sexp.
  2. Extracts def and defs method nodes.
  3. Normalizes identifiers, constants, instance variables, globals, labels, strings, and numbers into token classes.
  4. Normalizes most non-head symbols so tiny operator/name differences do not hide repeated shape.
  5. Fingerprints every normalized subtree with SHA1.
  6. Compares method fingerprint sets with Jaccard similarity.

The defaults intentionally favor high-signal matches. Lower --threshold, --min-lines, or --min-nodes when exploring.

Limits

  • It only scans Ruby methods, not arbitrary repeated blocks.
  • It is structural, not semantic.
  • Metaprogrammed code can look sparse because the useful behavior is hidden in data.
  • Rails controllers and tests can produce intentional symmetry. Treat those as review candidates, not automatic refactors.

Development

ruby -Ilib test/ruby_duplicates_test.rb
gem build ruby-duplicates.gemspec

Inspiration

About

Find structurally similar Ruby methods with normalized Ripper fingerprints

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages