-
Notifications
You must be signed in to change notification settings - Fork 156
Open
Description
Hi, I looked into the OOD results and many examples in the test sets seem to be in the train set. E.g. DengueFilipino has the same train and test set. KirundiNews has 90% overlap...
to reproduce:
from data import *
dataloaders = dict(DengueFilipino=load_filipino,
KirundiNews=load_kirnews,
KinyarwandaNews=load_kinnews,
SwahiliNews=load_swahili)
for data_name, loader in dataloaders.items():
train, test = loader();
overlap = 1 - len(set(test) - set(train)) / len(set(test))
print(data_name, f"train<->test overlap: {overlap * 100:.1f}%")DengueFilipino train<->test overlap: 100.0%
KirundiNews train<->test overlap: 90.4%
KinyarwandaNews train<->test overlap: 23.8%
SwahiliNews train<->test overlap: 0.5%
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels