Skip to content
This repository has been archived by the owner on Jun 4, 2019. It is now read-only.

CodeQuery

Dan Miller edited this page Nov 9, 2015 · 12 revisions

Code query

Introduction

codequery is an interactive tool a la SQL to query information about the structure of the code (the inheritance tree, the call graph, the data graph, etc). The data is the code. The query language is Prolog (http://en.wikipedia.org/wiki/Prolog), a logic-based programming language used mainly in AI but also popular in database (http://en.wikipedia.org/wiki/Datalog). The particular Prolog implementation we use is SWI-prolog (http://www.swi-prolog.org/pldoc/refman/).

By default when you give just a directory to codequery it builds the Prolog database and then enters Prolog's read-eval-print loop. After the ?- prompt, you can enter a query followed by a dot. For instance:

$ cd /tmp/test/
$ cat foo.php
  <?php
  class A { 
  }
  class B extends A { 
  }
  class C extends B { 
  }
$ codequery .
  generating prolog facts in /tmp/test/facts.pl
  compiling prolog facts with swipl in /tmp/test/prolog_compiled_db
  % /tmp/test/facts.pl compiled 0.00 sec, 13,984 bytes
  % /home/pad/pfff/h_program-lang/database_code.pl compiled 0.00 sec, 19,072 bytes
  ...
  Welcome to SWI-Prolog (Multi-threaded, 64 bits, Version 5.11.29-39-g35fdbf2)
  Copyright (c) 1990-2011 University of Amsterdam, VU Amsterdam
  SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software,
  and you are welcome to redistribute it under certain conditions.
  Please visit http://www.swi-prolog.org for details.

  For help, use ?- help(Topic). or ?- apropos(Word).

  ?- children(X, 'A').

Prolog will then try to find a solution to this query by unifying X with something that would satisfy the query given all the facts built into the database (see facts.pl in the same directory). But you'll get only one solution. To get the next solution type a semicolon:

X = 'B' ;
X = 'C' ;
false.

?- 

See https://github.com/facebook/pfff/blob/master/main_codequery.ml

Motivations

Synopsis

The synopsis is:

$ codequery [-lang <string>] <dir>

Files

The facts.pl file generated in the directory will contain the set of facts about your codebase.

The pfff/h_programl-lang/database_code.pl file contains some helper predicates. See https://github.com/facebook/pfff/blob/master/h_program-lang/database_code.pl to know which predicates are available and what they mean.

See also https://github.com/facebook/pfff/blob/master/lang_php/analyze/foundation/unit_prolog_php.ml for example of queries.

Examples

Listing all children of a class or interface

children(X, 'InterestingClass'), writeln(X), fail

Listing all parents of a class or interface

children('InterestingClass', X), writeln(X), fail

Finding all classes in a folder

kind(X, class), at(X, A, _), sub_string(A, 0, _, _, 'some/interesting/code'), writeln(X), fail

Finding all classes which extend some parent class, and which contain a function which calls another function -------------------------------------------------------------------

In this example, we look for all subclasses of WebController, who have a genResponse() method which invokes a getResponse() method in it.

children(X, 'WebController'), docall((X, 'genResponse'), getResponse, method), writeln(X), fail

Detecting useless delegateToYield wrapper methods because no parent similar method are defined ------------------------------------------------------------------------

I have no idea what it does, but it's an example in FB's wiki, so maybe useful to you.

docall((Class, Method), 'delegateToYield', method), not((children(Class, Parent), kind((Parent, Method), Kind))), writeln((Class, Method)), fail

Finding the most used functions

aggregate(count, A^docall(A, B, function), Count), writeln((Count, B)), fail

then take the result of that and pipe to | sort -rn | head -50

Finding all classes implementing an interface

children(X, 'IInterestingInterface'), kind(X, class), writeln(X), fail
May want to pipe that to | sort | uniq

Finding calls to new inside constructors

kind((A, '__construct'), _), docall((A, '__construct'), B, class), at((A, '__construct'), File, Line), writeln((File, A, B)), fail

Listing all the calls to builtins and their count

docall(X, B, function), at(B, File, Col), file(File, Dir), member('PHP_STDLIB', Dir), writeln(B), fail

May want to pipe that to | sort | uniq -c

Listing all read-only private fields

kind((C,F), field), is_private((C,F)), use((C,'__construct'), F, field, write), \+ (use((C, M), F, field, write), M \= '__construct'), type((C,F),T), writeln((C,F,T)), fail

Listing all private static methods in traits

kind(Trait, trait), kind((Trait, Method), method), is_private((Trait, Method)), static((Trait, Method)), writeln((Trait, Method)), fail

Find how many children all classes have, write results to a file

kind(X, class), aggregate_all(count, children(_, X), Count), open('results.txt', append, Stream), write(Stream, (X, Count)), nl(Stream), close(Stream), fail.

About Prolog

Why use Prolog? OCaml, now Prolog ... Why use those french esoteric programming languages? Because I don't know how to use SQL or PHP and Prolog is arguably a very good language to query a database. See http://en.wikipedia.org/wiki/Datalog

FAQ

How do I exit from this fucking interpreter?

Ctrl-D multiple times.