Skip to content

Curating the best open reasoning datasets. A Bespoke Labs and DataComp community effort.

Our first goal is to curate a reasoning dataset to train state of the art small reasoning models that surpass DeepSeek-R1-Distill-32B and DeepSeek-R1-Distill-7B on math and code reasoning benchmarks.

About us

We are a team of researchers and engineers from Bespoke Labs, Stanford, University of California Berkeley, University of Washington, Juelich Supercomputing Center (JSC), LAION, UCLA, UNC Chapel Hill, and Toyota Research Institute united around building the best datasets (and thus the best models). See our previous works at datacomp.ai and mlfoundations.

Open Thoughts is supported by Bespoke Labs, Lambda Labs, NSF IFML, Juelich Supercomputing Center, Toyota Research Institute.

Popular repositories Loading

  1. open-thoughts open-thoughts Public

    Fully open data curation for reasoning models

    Python 2k 167

  2. open-thoughts-website open-thoughts-website Public

    MDX 2

  3. .github .github Public

Repositories

Showing 3 of 3 repositories
  • open-thoughts/open-thoughts-website’s past year of commit activity
    MDX 2 0 0 0 Updated Jul 14, 2025
  • open-thoughts Public

    Fully open data curation for reasoning models

    open-thoughts/open-thoughts’s past year of commit activity
    Python 1,974 Apache-2.0 167 14 0 Updated Jun 5, 2025
  • .github Public
    open-thoughts/.github’s past year of commit activity
    0 0 0 0 Updated Jan 28, 2025

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Python MDX

Most used topics

Loading…