GitHub - akjayant/Analysis-of-Codeforces-Data: To analyze correlation between coding style and coding proficiency, and whether coding styles show regional variations.

Problem Statement:

To analyze correlation between coding style and cod- ing proficiency, and whether coding styles show regional variations. Data sets: Data for top 1000 performers in 5 Division-2 contests was scraped from Codeforces, a competitive coding platform. Total number of datapoints is ∼ 18000 codes.

Abstract:

An important employability indicator in software field is coding proficiency. Codeforces is a popular platform enabling one to practice coding skills through regular contests. It also assigns a proficiency rating to every user based on contest performances. Performing such an analysis can help in giving better feedback to novices in what structure they should use and which APIs they should use more often.

Approach:

Approaches taken in increasing complexity: a) Extracting simple features like function calls, variables declared, num- ber of macros etc. b) Using tree edit distance between abstract syntax trees of two different codebases to identify coding style similarity. c) Extracting features: low level- tokens used, and high level- code struc- ture via doc2vec embedding of the abstract syntax tree.

Conclusion:

We analyze correlation between coding style and coding pro- ficiency and whether coding styles varies across regions. We are able to find coding style difference across regions but are unable to find any significant correlation between coding proficiency and coding style. We also provide some possible explanations of how the features used help in determining the correlations under study.

References:

Zhang, Kaizhong, and Dennis Shasha. ”Simple fast algorithms for the editing distance between trees and related problems.” SIAM journal on computing 18.6 (1989): 1245-1262.
Lau, Jey Han, and Timothy Baldwin. ”An empirical evaluation of doc2vec with practical insights into document embedding generation.” arXiv preprint arXiv:1607.05368 (2016).

Report - Final Presentation Report

Files

Data Collected can be found in ./Data
Files related to primitive features are in PrimitiveApproach
Files related to tree similarity are in TreeSimilarityApproach
Files related to Doc2Vec and Tokenization and notebook having model on Country vs Coding Style are in doc2vec

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Data		Data
DataAnalysis		DataAnalysis
DataCollection		DataCollection
PrimitiveApproach		PrimitiveApproach
TreeSimilarityApproach		TreeSimilarityApproach
doc2vec		doc2vec
images		images
Abstract.tex		Abstract.tex
Analysis of Competitive Codebases.pdf		Analysis of Competitive Codebases.pdf
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Problem Statement:

Abstract:

Approach:

Conclusion:

References:

Report - Final Presentation Report

Files

About

Releases

Packages

Languages

akjayant/Analysis-of-Codeforces-Data

Folders and files

Latest commit

History

Repository files navigation

Problem Statement:

Abstract:

Approach:

Conclusion:

References:

Report - Final Presentation Report

Files

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages