Skip to content

SWE-Bench Coding Tasks Dataset is an extended programming languages dataset that builds on the original SWE-Bench benchmark with broader language coverage, golden/test patches, and real-world coding tasks like bug fixing, code completion, and automated code review

Notifications You must be signed in to change notification settings

UniData-NLP/swe-bench-coding-tasks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SWE-Bench Dataset - 8,712 files

The dataset comprises 8,712 files across 6 programming languages, featuring verified tasks and benchmarks for evaluating coding agents and language models. It supports coding agents, language models, and developer tools with verified benchmark scores and multi-language test sets. - Get the data

Dataset characteristics:

Characteristic Data
Description An extended benchmark of real-world software engineering tasks with enhanced artifacts and broader language coverage
Data types Text
Tasks Bug fixing, code completion, pull request generation, automated code review
Total number of files 8,712
Total number of people 30
Labeling Annotated with golden patches, test patches, post-patch reference states, and metadata stored in parquet files (e.g., repository name, issue/PR identifier, diffs, test results)
Programming languages C#, Go, PHP, Rust, Kotlin, Ruby

πŸ“Š Sample dataset available! For full access, contact us to discuss purchase terms.

Dataset structure

  • Go - Files in Go
  • Scala - Files in Scala

🧩 Like the dataset but need different data? We can collect a custom dataset just for you - learn more about our data collection services here

Similar Datasets:

  1. LLM Text Generation Dataset
  2. Synthetic Printed USA Passports Dataset
  3. DeepFake Videos Dataset

🌐 UniData - your trusted data partner. Unique, accurate, thoroughly collected and annotated data designed to fuel your AI/ML success.

About

SWE-Bench Coding Tasks Dataset is an extended programming languages dataset that builds on the original SWE-Bench benchmark with broader language coverage, golden/test patches, and real-world coding tasks like bug fixing, code completion, and automated code review

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published