Number of Target Fields in the SCOTUS dataset on HuggingFace #37

AmanPriyanshu · 2022-12-16T06:37:55Z

The SCOTUS dataset available as part of the LexGlue corpus mentions 14 classes within the dataset. Upon verification over the HuggingFace SCOTUS dataset, we only get 13 classes through this method.

from datasets import load_dataset  # !pip install datasets
import numpy as np

scotus = load_dataset('lex_glue', 'scotus')
labels = list(scotus['train']['label'])
classes = np.unique(labels)
print(classes, len(classes))

scotus = load_dataset('lex_glue', 'scotus')
labels = list(scotus['test']['label'])
classes = np.unique(labels)
print(classes, len(classes))

The results display on 13 unique classes instead of 14, as shown below.

Is there an issue in which we're extracting the data, if so we'd greatly appreciate any help.

iliaschalkidis · 2022-12-16T06:45:21Z

Hi @AmanPriyanshu, No you're right. There are 14 issue areas based on the SCDB documentation (http://scdb.wustl.edu/documentation.php?var=issueArea), but only 13 of those are presented at least once in our SCOTUS dataset.

AmanPriyanshu · 2022-12-16T06:58:40Z

I see, we were confused regarding the mention of 14 classes on the HuggingFace documentation. Thank you so much for replying and clarifying my doubt!!

AmanPriyanshu closed this as completed Dec 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Number of Target Fields in the SCOTUS dataset on HuggingFace #37

Number of Target Fields in the SCOTUS dataset on HuggingFace #37

AmanPriyanshu commented Dec 16, 2022

iliaschalkidis commented Dec 16, 2022

AmanPriyanshu commented Dec 16, 2022

Number of Target Fields in the SCOTUS dataset on HuggingFace #37

Number of Target Fields in the SCOTUS dataset on HuggingFace #37

Comments

AmanPriyanshu commented Dec 16, 2022

iliaschalkidis commented Dec 16, 2022

AmanPriyanshu commented Dec 16, 2022