Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of Target Fields in the SCOTUS dataset on HuggingFace #37

Closed
AmanPriyanshu opened this issue Dec 16, 2022 · 2 comments
Closed

Number of Target Fields in the SCOTUS dataset on HuggingFace #37

AmanPriyanshu opened this issue Dec 16, 2022 · 2 comments

Comments

@AmanPriyanshu
Copy link

The SCOTUS dataset available as part of the LexGlue corpus mentions 14 classes within the dataset. Upon verification over the HuggingFace SCOTUS dataset, we only get 13 classes through this method.

from datasets import load_dataset  # !pip install datasets
import numpy as np

scotus = load_dataset('lex_glue', 'scotus')
labels = list(scotus['train']['label'])
classes = np.unique(labels)
print(classes, len(classes))

scotus = load_dataset('lex_glue', 'scotus')
labels = list(scotus['test']['label'])
classes = np.unique(labels)
print(classes, len(classes))

The results display on 13 unique classes instead of 14, as shown below.

image

Is there an issue in which we're extracting the data, if so we'd greatly appreciate any help.

@iliaschalkidis
Copy link
Collaborator

Hi @AmanPriyanshu, No you're right. There are 14 issue areas based on the SCDB documentation (http://scdb.wustl.edu/documentation.php?var=issueArea), but only 13 of those are presented at least once in our SCOTUS dataset.

@AmanPriyanshu
Copy link
Author

I see, we were confused regarding the mention of 14 classes on the HuggingFace documentation. Thank you so much for replying and clarifying my doubt!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants