Code behind the project.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Code behind the project.

Generating the corpus of bytecodes / comments

This can be done only in a Debian-compatible GNU/Linux system with a Debian mirror in ./debian-mirror.

Packages that need to be installed

  • openjdk
  • libdox-java
  • jclassinfo
  • apt-file
  • (eclipse)

Compile the eclipse project under workspace. (You will need to have the package libqdox-java.)

Get the packages with jar files:

apt-file search --package-only .jar > packages-with-jar

Get their source packages

for i in cat packages-with-jars; do dpkg-query -p $i | perl -ne 'chomp; ($k,$v)=m/^([^:]+): (.*)$/; if($k eq "Source"){print "\t$v"};if($k eq "Filename"){print "\t$v\n"}' >> packages.tsv; done

Process all the relevant source packages in packages.tsv using

cat packages.tsv | ./ > corpus.tsv

The final output will have a method per line, with the following delimited columns:

  • full class name
  • method signature (from bytecode)
  • long method signature (from source code)
  • comment
  • [each byte code separated by a ...]