Release Supreme Court dataset (for Python 2) · bdewilde/textacy-data

A collection of ~8.4k (almost all) decisions issued by the U.S. Supreme Court from November 1946 through June 2016 — the "modern" era.

Records include the following fields:

text: full text of the Court's decision
case_name: name of the court case, in all caps
argument_date: date on which the case was argued before the Court, as a string with format 'YYYY-MM-DD'
decision_date: date on which the Court's decision was announced, as a string with format 'YYYY-MM-DD'
decision_direction: ideological direction of the majority decision; either 'conservative', 'liberal', or 'unspecifiable'
maj_opinion_author: name of the majority opinion's author, if available and identifiable, as an integer code whose mapping is given in SupremeCourt.opinion_author_codes
n_maj_votes: number of justices voting in the majority
n_min_votes: number of justices voting in the minority
issue: subject matter of the case's core disagreement (e.g. affirmative action) rather than its legal basis (e.g. the equal protection clause), as a string code whose mapping is given in SupremeCourt.issue_codes
issue_area: higher-level categorization of the issue (e.g. Civil Rights), as an integer code whose mapping is given in SupremeCourt.issue_area_codes
us_cite_id: citation identifier for each case according to the official United States Reports; Note: There are ~300 cases with duplicate ids, and it's not clear if that's "correct" or a data quality problem

The text in this dataset was derived from FindLaw's searchable database of court cases: http://caselaw.findlaw.com/court/us-supreme-court

The metadata was extracted without modification from the Supreme Court Database:
Harold J. Spaeth, Lee Epstein, et al. 2016 Supreme Court Database, Version 2016 Release 1. http://supremecourtdatabase.org.
Its license is CC BY-NC 3.0 US: https://creativecommons.org/licenses/by-nc/3.0/us/

This corpus' creation was inspired by a blog post by Emily Barry: http://www.emilyinamillion.me/blog/2016/7/13/visualizing-supreme-court-topics-over-time

NOTE: The two datasets were merged through much munging and a carefully trained model using the dedupe package. The model's duplicate threshold was set so as to maximize the F-score where precision had twice as much weight as recall. Still, given occasionally baffling inconsistencies in case naming, citation ids, and decision dates, a very small percentage of texts may be incorrectly matched to metadata. (Sorry.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supreme Court dataset (for Python 2)

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!