checkero

A Clojure code similarity search tool.

Checkero finds common Clojure source code inside a set of directories. It is primarily intended to study how Clojure learners write functions. As a side effect, you can find if students have honestly completed their homework. It could also be used to find commonly used patterns in code that require refactoring. The algorithm uses a state-of-the-art tree distance function that quickly finds common tree patterns. It analyzes the syntactical structure of Clojure programs and finds similar expressions.

Usage

java -jar checkero-0.1.0-SNAPSHOT-standalone.jar 8 100 24 30 source-folder

Parameters are inserted in order:

mnode: Minimum number of syntax nodes per expression to use. (short expressions are too common) 
h: Get the top "h" hot-spot expressions in the directories
k: Get "k" closest matches per query
range: Get matches that are at most n different complete sub-trees. 
source-folder: A folder that contains n folders for n students. 
              Each sub-folder inside "source-folder" will be treated 
              as one student homework.

Output

The script finds common sub-expressions and prints some global statistics.

Hot-spots

Prints out the most commonly used sub-expressions.

For example:

@@@ Commonly found expressions in the homework folder:
>>>       Student-Name
          [/path/Core.clj]
[Original]   (defn distance [user-seq matrix] (create-Matrix user-seq matrix))
[Normalized] (defn "s0" ["s1" "s2"] ("s3" "s1" "s2"))
[Multiplicity]    237

Here you can see:

The student responsible of the common expression.
Path of the expression
Original expression
Normalized expression (this is the expression actually used in the match)
Multiplicity: The number of times this hot-spot appears in the entire set of files.

Friendship Graph

This section of the program output intends to predict how close students are to each other. The output looks like:

###  <Student0>
<| 0 [Student1 4] [Student2 2] [Student3 2] [Student4 2] [Student5 2]                                       
<| 3 [Student4 1] |>

This reads as follows:

For expressions that have distance 0, Student 0 has:

4 matches with student1
2 matches with student2
2 matches with student3
2 matches with student4
2 matches with student5

For expressions that have distance 3, Student 0 has:

1 match with student4

Output per Directory

Besides the output in stdout, checkero creates a "checkero.txt" file on each student folder that contains details of the search.

-------------------------------Student code: Student1

[Query]
Original:   (ns tarea-binding-sites.core (:gen-class) (:require [clojure.java.io :as io] [clojure.string :as string]))
Normalized: (ns "s0" ("k1") ("k2" ["s3" "k4" "s5"] ["s6" "k4" "s7"]))

>>> student3 [/path/binding.clj]
Distance:   0
Original:   (ns genome-project.binding (:gen-class) (:require [clojure.java.io :as io] [clojure.string :as string]))
Normalized: (ns "s0" ("k1") ("k2" ["s3" "k4" "s5"] ["s6" "k4" "s7"]))

>>> student10 [/path/core.clj]
Distance:   6
Original:   (ns cromosoma.core (:gen-class) (:require [clojure.java.io :as io]) (:require [clojure.string :as string]))
Normalized: (ns "s0" ("k1") ("k2" ["s3" "k4" "s5"]) ("k2" ["s6" "k4" "s7"]))

>>> student20 [/path/core.clj]
Distance:   8
Original:   (ns homework.core (:gen-class) (:require [clojure.java.io :as io]))
Normalized: (ns "s0" ("k1") ("k2" ["s3" "k4" "s5"]))

The student name is stated at the beginning of the file. Each [query] entry shows the code that was found in Student1's folder. Each student entry prepended with ">>>" describes a close match against another student. The case shown is trivial, namespace definitions tend to be written in a very similar form.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
src/checkero		src/checkero
test/checkero/test		test/checkero/test
LICENSE		LICENSE
README		README
README.md		README.md
project.clj		project.clj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

checkero

Usage

Output

Hot-spots

Friendship Graph

Output per Directory

About

Releases

Packages

Languages

License

amuller/checkero

Folders and files

Latest commit

History

Repository files navigation

checkero

Usage

Output

Hot-spots

Friendship Graph

Output per Directory

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages