SeSaMe

This repository contains the data set described in the paper

Kamp, M., Kreutzer P., Philippsen M.: SeSaMe: A Data Set of Semantically Similar Java Methods. 16th International Conference on Mining Software Repositories (MSR 2019), Montreal, QC, Canada. 2019

The data set is licenced under a Creative Commons Attribution 4.0 International Licence.

The repository consists of the following files:

dataset.json : The final data set in the format described in the paper.
dataset-unfiltered.json : The data set including the pairs we removed due to disagreement.
sampled-pairs.csv : The 900 sampled method pairs.
src : The source code of the tools we used to create the data set.

The relevant data are stored in a single JSON file (dataset.json) that contains an object describing the used Java projects and a list holding the classified method pairs. Each list element consists of four components: The pairid that identifies the method pair, information regarding the first and second method that this pair consists of, and the goals, operations, and effects rating and confidence assigned by the participants of the manual classification. A method is identified by its project, the file it is defined in, and the method signature.

The similarity rating and the confidence are stored using a numeric value. In both cases, -1 indicates an unknown value (the respective rater either did not rate this method pair or was unsure about it). For the similarity rating, 0, 1, and 2 correspond to disagree, conditionally agree, and agree, respectively. For the confidence rating, 0, 1, and 2 correspond to low, medium, and high confidence, respectively.

The database containing the mined methods, their JavaDoc comments and the computed similarity values is not included in this repository. Use the Zenodo link above to retrieve it. To obtain the source code of the analyzed repositories, invoke make pull in the src directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SeSaMe

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
README.md		README.md
dataset-unfiltered.json		dataset-unfiltered.json
dataset.json		dataset.json
sampled-pairs.csv		sampled-pairs.csv

FAU-Inf2/sesame

Folders and files

Latest commit

History

Repository files navigation

SeSaMe

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages