Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_samplet: feature_names allows dimension mismatch, order isn't paired -- will overwrite #45

Open
WillForan opened this issue Dec 10, 2020 · 1 comment

Comments

@WillForan
Copy link
Contributor

I had a few bugs (using wrong variable name), and realized I never got yelled at for providing bad feature names.

A few observations:

  1. feature name length doesn't have to match features.

there can be too many (x, y, z and an additional "DNE" name)

ds = RegrDataset()
ds.descritpion="extra of feauture names"
ds.add_samplet('id1', target=100, features=[1,2,3], feature_names=['x','y','z'])
ds.add_samplet('id2', target=200, features=[4,5,6], feature_names=['x','y','z','DNE'])
(x, _, _) = ds.data_and_targets()
print(ds.feature_names)
print(x)

['x' 'y' 'z' 'DNE']
[[1. 2. 3.]
[4. 5. 6.]]

or too few (only x, but have x, y, and z)

ds = RegrDataset()
ds.descritpion="extra of feauture names"
ds.add_samplet('id1', target=100, features=[1,2,3], feature_names=['x'])
ds.add_samplet('id2', target=200, features=[6,5,4], feature_names=['x'])
[x, _, _] = ds.data_and_targets()
print(ds.feature_names)
print(x)

['x']
[[1. 2. 3.]
[6. 5. 4.]]

  1. specifying feature names for one samplet changes names everywhere?
ds = RegrDataset()
ds.descritpion="extra of feauture names"
ds.add_samplet('id1', target=100, features=[1,2,3], feature_names=['x','y','z'])
ds.add_samplet('id2', target=200, features=[4,5,6], feature_names=['y','y','z'])
[x, _, _] = ds.data_and_targets()
print(ds.feature_names)
print(x)

['y' 'y' 'z']
[[1. 2. 3.]
[4. 5. 6.]]

this is a potentially surprising when features given to add_samplet in a different order -- even if feature and feature_names are paired correctly (@raamana -- a thing you warned me to check. good eye!)

ds = RegrDataset()
ds.descritpion="extra of feauture names"
ds.add_samplet('id1', target=100, features=[1,2,3], feature_names=['x','y','z'])
ds.add_samplet('id2', target=200, features=[6,5,4], feature_names=['z','y','x'])
[x, _, _] = ds.data_and_targets()
print(ds.feature_names)
print(x)

['z' 'y' 'x']
[[1. 2. 3.]
[6. 5. 4.]]

@raamana
Copy link
Owner

raamana commented Dec 10, 2020

Thanks a lot Will for putting pyradigm to test and reporting these bugs!

Let me look into them and see why they that happened. but these bugs hopefully haven't prevented you from running comparisons? I am zoom and we can discuss this more if you want -- and to prepare for the "progress report" so to say.

WillForan added a commit to WillForan/pyradigm that referenced this issue Dec 11, 2020
Currently throws out anything that doesn't exactly match previous
feature names. A better solution might be to reorder features if
features_names are out of order. Also could make np.nan in features if
feature_names are missing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants