Skip to content

Repository accompanying the paper "SeSaMe: A Data Set of Semantically Similar Java Methods"

Notifications You must be signed in to change notification settings

FAU-Inf2/sesame

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SeSaMe

DOI

This repository contains the data set described in the paper

Kamp, M., Kreutzer P., Philippsen M.: SeSaMe: A Data Set of Semantically Similar Java Methods. 16th International Conference on Mining Software Repositories (MSR 2019), Montreal, QC, Canada. 2019

The data set is licenced under a Creative Commons Attribution 4.0 International Licence.

The repository consists of the following files:

  • dataset.json : The final data set in the format described in the paper.
  • dataset-unfiltered.json : The data set including the pairs we removed due to disagreement.
  • sampled-pairs.csv : The 900 sampled method pairs.
  • src : The source code of the tools we used to create the data set.

The relevant data are stored in a single JSON file (dataset.json) that contains an object describing the used Java projects and a list holding the classified method pairs. Each list element consists of four components: The pairid that identifies the method pair, information regarding the first and second method that this pair consists of, and the goals, operations, and effects rating and confidence assigned by the participants of the manual classification. A method is identified by its project, the file it is defined in, and the method signature.

The similarity rating and the confidence are stored using a numeric value. In both cases, -1 indicates an unknown value (the respective rater either did not rate this method pair or was unsure about it). For the similarity rating, 0, 1, and 2 correspond to disagree, conditionally agree, and agree, respectively. For the confidence rating, 0, 1, and 2 correspond to low, medium, and high confidence, respectively.

The database containing the mined methods, their JavaDoc comments and the computed similarity values is not included in this repository. Use the Zenodo link above to retrieve it. To obtain the source code of the analyzed repositories, invoke make pull in the src directory.

About

Repository accompanying the paper "SeSaMe: A Data Set of Semantically Similar Java Methods"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages