Skip to content
Code4Bench: A Mutildimensional Benchmark of Codeforces Data for Different Program Analysis Techniques
Branch: master
Clone or download
Latest commit 3d06f81 Apr 12, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
scripts add code Mar 7, 2019
DBSchema.png Add files via upload Apr 12, 2019
README.md Update README.md Apr 12, 2019

README.md

Code4Bench

Code4Bench: A Mutildimensional Benchmark of Codeforces Data for Different Program Analysis Techniques

code4bench is now available for download at http://doi.org/10.5281/zenodo.2582968

installation (import)

  1. Download and unzip file from the given url
  2. Install mysql version 5.7
  3. Create database name it “code4bench”
  4. In MySQL Workbench
    a. Server->Data Import
    b. Select the extracted folder
    c. Push Start Import (it’s may take a time)
  5. Finish

Code4Bench Schema

The schema of Code4Bench is drawn below alt text

Fields definition

Field NameDescription
source
idA unique number
submissionID number given by Codeforces to this submission
sourceCodeThe submitted source code
authorID number of submitter
memoryThe memory used by this submission
timeThe execution time of this submission
sentThe submission time by user
countLineThe number of lines of code
problems_idProblem ID number
verdicts_idThe Codeforces' judgment on this submission
languages_idThe language in which this submission is written
isduplicatedThe submission is unique or duplicated
verdicts
idA unique number
nameThe name of a judgment
languages
idA unique number
nameThe name of a programming language
problems
idA unique number
fullnameThe ID number of competition and name of problem
contestID number of competition
nameID number of problem section
contextThe description of problem
testcases
idA unique number
inputDataInput data for problem
expectedResultExpected output for problem
problems_idID number of corresponding problem
isValidWhether test case is complete or deficient
user
idA unique number
author_idThe ID of user
genderThe user gender
ageThe user age
countryThe country in which the user lives
stateThe state in which the user lives
cityThe city in which the user lives
mainJobIs programming the user's main job
t0_4Does you work in time interval 00:00 to 04:00
t4_8Does you work in time interval 04:00 to 08:00
t8_12Does you work in time interval 08:00 to 12:00
t12_16Does you work in time interval 12:00 to 16:00
t16_20Does you work in time interval 16:00 to 20:00
t20_24Does you work in time interval 20:00 to 24:00
singleAre you single?
marriedAre you married?
divorcedAre you divorced?
oneChildDo you have one child?
twoChildDo you have two children?
moreChildDo you have more than two children
educationLevelEducation level from diploma to PhD
isFieldCSHave you graduated in computer science?
yearsWorkHow many years have you been programming?
hours_per_monthHow many hours do you work in a month?
teamOrAloneDo you wok alone or as a team member?
countries
idA unique number
sortNameAbbr. of each country on which names are sorted
nameThe full name of a country
phoneCodeThe area code of a country
states
idA unique number
nameThe names of a state
country_idThe country ID of each state
cities
idA unique number
nameThe names of a city
state_idThe state ID of each city
Realfaultslocations AND realfaultslocations_c_cpp
idA unique number
subAcceptedID number assigned by Codeforces' website to this accepted submission
subWrongID number assigned by Codeforces' website to this faulty submission
changeThe number of lines which have been changed
changeRateThe percentage of line which have been changed
insertThe number of lines which have been added
insertRateThe percentage of line which have been added
deleteThe number of lines which have been deleted
deleteRateThe percentage of line which have been deleted
faultLocationsThe locations of faults in faulty version relative to the correct version
countFaultsThe number of faults in faulty version relative to the correct version
countInsertFaultsThe number of addition-type faults in faulty version relative to the correct version
countDeleteFaultsThe number of deletion-type faults in faulty version relative to the correct version
countChangeFaultsThe number of change-type faults in faulty version relative to the correct version
insertFaultsLocationsThe locations of addition-type faults in faulty version relative to the correct version
changeFaultsLocationsThe locations of change-type faults in faulty version relative to the correct version
deleteFaultsLocationsThe locations of delete-type faults in faulty version relative to the correct version
wSimAThe percentage at which the faulty version is similar to the correct version
aSimWThe percentage at which the correct version is similar to the faulty version
matchLinesThe number of identical lines between the faulty and correct versions

Data in Code4Bench

The number of submissions for each programming language are listed below

IDLanguageSubmission Count
1GNU C++ 14604,155
2GNU C93,492
3MS C++164,912
4GNU C++ 11906,811
5FPC47,522
6GNU C++1,167,214
7Java 8154,087
8Python 352,433
9Go3,011
10D742
11MS C#14,896
12GNU C 1118,574
13Python 236,469
14PyPy 24,507
15Ruby3,806
16PHP2,570
17PyPy 33,222
18Delphi9,698
19Kotlin4,739
20JavaScript3,020
21Haskell3,585
22OCaml543
23Scala2,131
24Mono C#5,199
25Java 727,931
26Rust599
27Perl784
28GNU C++ 111,083
29Java 8 ZIP107
30J2,673
31GNU C++ 0X34,746
32Java 622,988
33Pike4,076
34Befunge4,343
35Cobol2,114
36Factor2,606
37Secret-171158
38Roco3,136
39Tcl3,752
40F#15
41Io2,908
You can’t perform that action at this time.