codemap codegraph oud 2013 notes

-*- org -*-

slides

pfff: PHP program analysis at FB

deadcode reaper (integrated with code review process)
use/def checker (undefined function/class/constant)
variable checker (unused variables)
sgrep lint (make it easy for people to write lint rules)
tainting analysis via abstract interpreter (XSS, code injection)

#- type inference

type checker daemon (CUFP talk julien)
separation logic? (monoidics ocaml startup acquisition)

#- datalog interprocedural analysis?

tools to help understand large codebase

subtitle: codemap, codegraph, codequery

terrified when I joined, 5M PHP, IMHO badly organized visualization + program analysis 30’ screen

huge challenge, understand large codebase, code written by other people in fact common solution for people: start from scratch, there has to be tools to help

demo codemap

first thing I did, try to visualize the whole codebase. If can not visualize the mess, cannot understand it

google maps for code treemap, color = aspect, e.g. test code (=> visual clue test coverage), core code help see huge subdirectory => I actually deleted 1M Loc at FB with that :) ##treemap code oriented, filter .o, etc also skip code as you zoom in, render the content of the file

precise identifier highligting, bad smells, tiling multi column (xmonad wm), use eyes to scroll emacs integration semantic feedback (bigger road = bigger functions, more important)

pattern of purple => type defs not even have to open the files, see already an overview of its content use your eyes to scroll, not keys

layers (age, number of authors, coverage, cyclomatic complexity, etc) layer nb users?

demo codegraph

the more I was using it, the more I realized I wanted to understand the “software architecture”. focus not on source code, but on code relationships! what are entry points? What are core code, What is all the code depending on that, etc

package mode, external mode, module mode, gephi, flibotomy, lots of tentatives. graph? I tried but does not scale, and need flexibility, visualize a different granularity, different focus => DSM.

left, top, same, number when x use y (call, import, etc), aggregated. hypertree

good structure = layer = empty upper right (enforced by ocaml linker actually) core code at the top, entry points at the bottom unfold reslice see patterns more easily, visualize the mess (can’t fix what can’t easily see), usually when backward deps => ugly hacks, things that need to be documented anyway

PLUG: NP problem reorganize minimal so that more layers

hard to see value, but when plan to change something, I look at deps quickly, help evaluate difficulty.

demo codemap + codegraph

more semantic feedback in codemap

layer bottomup (good macro level, good also micro level) layer nb users?

uses, users, file level (dead code) uses, users, fine grained level, e.g. fields (when something is immediate (not running git grep), ca change la donne, can scroll set of fields and immediately see if used or not, where,)

reslice => focus on current task

other tool to help understand, navigate

codequery, prolog on codebase (e.g. call(X, ‘foo’), not (children(X, ‘bar’))) (some nice queries by engineer, e.g. yesterday abe)
stags (many people needs that)
sgrep/spatch (fuzzy level) also for sgreplint

for many languages (parsers, use/def global analysis (graph_code)), matcher, visitor, etc) OCaml (thx to .cmt), PHP, Java (thx to javalib, joust), C/C++ (thx to clang), Javascript, …

conclusion?

pfff-web

syncweb? it also helps for large codebase understanding in some sense ocamltarzan?

PLUG: uncaught exceptions are a recurring pb, in cron especially, change something and boom, later have to capture it

ocaml side notes

side note during the talk about ocaml complaints or positives stuff:

ocamltarzan, visitor, mapper, dumper (thx to ‘a. )
ocaml.ml generic dumper, gazagnaire

need ocamltarzan, -dump_xxx ast useful for beginners. need syncweb? :)

codemap/codegraph

fil rouge: Huge codebase, terrified => set of tools to help. idea: intuition maybe visual could help, huge screen, make use of it

google maps on code => treemap + code thumbnails

archi_code
when fully zoomed, column layout, maximimze space

better than emacs, identifier coloring syntactical use of refs := or <- in big and purple light db => semantic info, important stuff layers

codegraph, focus on deps, understand global orga, software architecture understand “layers”

QUESTION for audience: tool to help find better orga, NP complete probably, but heuristics? minimize elts in upper triangle, and property is hierarchical orga so operations are move parent, move children. monte-carlo?

last iteration: codemap + codegraph integration

layer bottom up, (todo special key to see what is the structure)
file level deps, refocus treemap!
fine grained level

bottom up layer is super nice.

other tools:

prolog queries bill, scale of code => need tools to automate. all classes implement I of children => can put up in parent.
sgrep, spatch
precise tags

demo on ocaml source code, they are familiar with that.

4 years at FB

take what I presented at IRISA? Lessons learned :) Engler work … wait first need to detect easy bugs.

fil rouge: Huge codebase, terrified => set of tools to help. (reaper, t, coverage, lint, … codemap … codegraph)

stat #lines removed stat #bugfixes diffs stat #lint rules, sgrepLint pfff_logger stats?

tags (stats? hsh on all server and look if www has a symlinks to TAGS?) prolog sgrep, spatch (git log and search for codemod/spatch ?) codemap, codegraph software architecture!!

failures: codemap because X11? => pfff-web. weird, but small barrier and boom, they dont use. also no action. FBIDE focused on a few core things, that was useful, especially for beginner (search entity + string search, completion, goto defs, no config for good color) failures? codegraph need X11? not enough marketing? => pfff-web better? we have no soft architect af FB.

success: prolog, spatch, tags

lessons: need push for your idea a lot. Cf hack.

see google’s paper, similarity: cmf -n

skip_code => big improvment on my process. gradual fixing made tractable and reviewable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly