add_samplet: feature_names allows dimension mismatch, order isn't paired -- will overwrite #45

WillForan · 2020-12-10T21:00:31Z

I had a few bugs (using wrong variable name), and realized I never got yelled at for providing bad feature names.

A few observations:

feature name length doesn't have to match features.

there can be too many (x, y, z and an additional "DNE" name)

ds = RegrDataset()
ds.descritpion="extra of feauture names"
ds.add_samplet('id1', target=100, features=[1,2,3], feature_names=['x','y','z'])
ds.add_samplet('id2', target=200, features=[4,5,6], feature_names=['x','y','z','DNE'])
(x, _, _) = ds.data_and_targets()
print(ds.feature_names)
print(x)

['x' 'y' 'z' 'DNE']
[[1. 2. 3.]
[4. 5. 6.]]

or too few (only x, but have x, y, and z)

ds = RegrDataset()
ds.descritpion="extra of feauture names"
ds.add_samplet('id1', target=100, features=[1,2,3], feature_names=['x'])
ds.add_samplet('id2', target=200, features=[6,5,4], feature_names=['x'])
[x, _, _] = ds.data_and_targets()
print(ds.feature_names)
print(x)

['x']
[[1. 2. 3.]
[6. 5. 4.]]

specifying feature names for one samplet changes names everywhere?

ds = RegrDataset()
ds.descritpion="extra of feauture names"
ds.add_samplet('id1', target=100, features=[1,2,3], feature_names=['x','y','z'])
ds.add_samplet('id2', target=200, features=[4,5,6], feature_names=['y','y','z'])
[x, _, _] = ds.data_and_targets()
print(ds.feature_names)
print(x)

['y' 'y' 'z']
[[1. 2. 3.]
[4. 5. 6.]]

this is a potentially surprising when features given to add_samplet in a different order -- even if feature and feature_names are paired correctly (@raamana -- a thing you warned me to check. good eye!)

ds = RegrDataset()
ds.descritpion="extra of feauture names"
ds.add_samplet('id1', target=100, features=[1,2,3], feature_names=['x','y','z'])
ds.add_samplet('id2', target=200, features=[6,5,4], feature_names=['z','y','x'])
[x, _, _] = ds.data_and_targets()
print(ds.feature_names)
print(x)

['z' 'y' 'x']
[[1. 2. 3.]
[6. 5. 4.]]

The text was updated successfully, but these errors were encountered:

raamana · 2020-12-10T21:25:15Z

Thanks a lot Will for putting pyradigm to test and reporting these bugs!

Let me look into them and see why they that happened. but these bugs hopefully haven't prevented you from running comparisons? I am zoom and we can discuss this more if you want -- and to prepare for the "progress report" so to say.

Currently throws out anything that doesn't exactly match previous feature names. A better solution might be to reorder features if features_names are out of order. Also could make np.nan in features if feature_names are missing

WillForan mentioned this issue Dec 11, 2020

more checks for inconsistent feature names #47

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add_samplet: feature_names allows dimension mismatch, order isn't paired -- will overwrite #45

add_samplet: feature_names allows dimension mismatch, order isn't paired -- will overwrite #45

WillForan commented Dec 10, 2020

raamana commented Dec 10, 2020

add_samplet: feature_names allows dimension mismatch, order isn't paired -- will overwrite #45

add_samplet: feature_names allows dimension mismatch, order isn't paired -- will overwrite #45

Comments

WillForan commented Dec 10, 2020

raamana commented Dec 10, 2020