Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Baseball Databank

Baseball Databank is a compilation of historical baseball data in a convenient, tidy format, distributed under Open Data terms.

This work is licensed by Chadwick Baseball Bureau under the Creative Commons Attribution-ShareAlike 3.0 Unported License. For details see

About this data

  • This is a legacy resource. Data in this format has been circulated by various people for many years, and there are many applications and users who have tools which take data in this format. It is maintained by Chadwick Baseball Bureau to support compatibility with those tools and programs. As such, the schema is not open to amendments, either in terms of the scope of coverage or in terms of the data categories available.
  • This is a free resource. Statistical data will be updated once at some point during the MLB offseason. To borrow the slogan used by ProMods, "It's ready when it's ready." New releases will be announced via our Twitter account at @chadwickbureau. We, politely, will not be able to respond to any enquiries as to when new versions of the data will be released.
  • These data are maintained wholly by Chadwick Baseball Bureau, for the benefit of the community. Users who require data of a different scope, in a different format, and/or with more specific schedules for updates are encouraged to enquire about our various licensing options.

Using or citing this data

We repeat, this is a legacy resource intended for backwards compatibility only. It is suitable for casual or exploratory use, as a convenient dataset for students to practice their data skills, and so forth.

It is not suitable for use as the basis for any kind of publication. The legacy parts of this data are not maintained, most likely contain errors, and definitely do not reflect many of the latest revisions to the historical record.

Researchers wanting a dataset that is suitable for research or publication purposes should contact Chadwick Baseball Bureau for enquiries.

Organisation of the files

There are three directories in the repository.

  • core/ contains the databank itself. These files are automatically produced from our larger dataset.
  • contrib/ contains files which are manually maintained by others using the same identifier system as the core. We bundle these for the convenience of the community.
  • upstream/ contains files used to construct the databank.

Maintenance and sources

Most of the data in the Databank is provided by Chadwick Baseball Bureau ( The data differ from the data the Bureau provides to its clients in that it contains less detail, is updated less frequently, and is provided on an as-is basis.

The Databank is historically based in part on the Lahman Baseball Database, version 2015-01-24, which is Copyright (C) 1996-2015 by Sean Lahman.

The tables Parks.csv and HomeGames.csv are based on the game logs and park code table published by Retrosheet. This information is available free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at

Enquiries and suggested revisions

Enquiries and suggested revisions to the data can be posted in the issue tracker at

Files in core/ are all generated by scripts. As such they are not edited manually (and therefore pull requests should not be submitted against these files).

Files in upstream/ are manually-maintained files which contain information specific to constructing the Databank. As they are maintained manually, it is valid to submit pull requests containing corrections or additions to these files.


Development for baseball databank, an Open Data collection of historical baseball data






No packages published