Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About failed to read the hdf5 file #57

Closed
2 tasks done
CHENGHUAN555 opened this issue Aug 18, 2023 · 4 comments
Closed
2 tasks done

About failed to read the hdf5 file #57

CHENGHUAN555 opened this issue Aug 18, 2023 · 4 comments
Assignees

Comments

@CHENGHUAN555
Copy link

CHENGHUAN555 commented Aug 18, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Bug description

When running the following code, an error was reported in line 30:

,"continuous_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["continuous1", "continuous2", "continuous3"])" ,mainly because an error occurred when cebra.load_data() was used as the.h5 file. I do not know how to solve it, and I hope to seek the author's help and solution.
---------------------------------------------------------------------------------------------------------------------------------------------
test.py
---------------------------------------------------------------------------------------------------------------------------------------------
# Create a .h5 file, containing a pd.DataFrame
import pandas as pd
import numpy as np
X_continuous = np.random.normal(0,1,(100,3))
X_discrete = np.random.randint(0,10,(100, ))
df = pd.DataFrame(np.array(X_continuous), columns=["continuous1", "continuous2", "continuous3"])
df["discrete"] = X_discrete
df.to_hdf("auxiliary_behavior_data.h5", key="auxiliary_variables")


import cebra
from numpy.random import uniform, randint
from sklearn.model_selection import train_test_split

# 1. Define a CEBRA model
cebra_model = cebra.CEBRA(
    model_architecture = "offset10-model",
    batch_size = 512,
    learning_rate = 1e-4,
    max_iterations = 10, # TODO(user): to change to at least 10'000
    max_adapt_iterations = 10, # TODO(user): to change to ~100-500
    time_offsets = 10,
    output_dimension = 8,
    verbose = False
)

# 2. Load example data
neural_data = cebra.load_data(file="neural_data.npz", key="neural")
new_neural_data = cebra.load_data(file="neural_data.npz", key="new_neural")
continuous_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["continuous1", "continuous2", "continuous3"])
discrete_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["discrete"]).flatten()

assert neural_data.shape == (100, 3)
assert new_neural_data.shape == (100, 4)
assert discrete_label.shape == (100, )
assert continuous_label.shape == (100, 3)

# 3. Split data and labels
(
    train_data,
    valid_data,
    train_discrete_label,
    valid_discrete_label,
    train_continuous_label,
    valid_continuous_label,
) = train_test_split(neural_data,
                    discrete_label,
                    continuous_label,
                    test_size=0.3)

# 4. Fit the model
# time contrastive learning
cebra_model.fit(train_data)
# discrete behavior contrastive learning
cebra_model.fit(train_data, train_discrete_label,)
# continuous behavior contrastive learning
cebra_model.fit(train_data, train_continuous_label)
# mixed behavior contrastive learning
cebra_model.fit(train_data, train_discrete_label, train_continuous_label)

# 5. Save the model
cebra_model.save('/tmp/foo.pt')

# 6. Load the model and compute an embedding
cebra_model = cebra.CEBRA.load('/tmp/foo.pt')
train_embedding = cebra_model.transform(train_data)
valid_embedding = cebra_model.transform(valid_data)
assert train_embedding.shape == (70, 8)
assert valid_embedding.shape == (30, 8)

# 7. Evaluate the model performances
goodness_of_fit = cebra.sklearn.metrics.infonce_loss(cebra_model,
                                                     valid_data,
                                                     valid_discrete_label,
                                                     valid_continuous_label,
                                                     num_batches=5)

# 8. Adapt the model to a new session
cebra_model.fit(new_neural_data, adapt = True)

# 9. Decode discrete labels behavior from the embedding
decoder = cebra.KNNDecoder()
decoder.fit(train_embedding, train_discrete_label)
prediction = decoder.predict(valid_embedding)
assert prediction.shape == (30,)

Operating System

windows 10

CEBRA version

cebra version 0.2.0

Device type

gpu

Steps To Reproduce

No response

Relevant log output

Traceback (most recent call last):
  File "E:\crop\injuryrun4\test.py", line 30, in <module>
    continuous_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["continuous1", "continuous2", "continuous3"])
  File "E:\anaconda\envs\injuryrun4test\lib\site-packages\cebra\data\load.py", line 661, in load
    data = loader.load(file, key=key, columns=columns)
  File "E:\anaconda\envs\injuryrun4test\lib\site-packages\cebra\data\load.py", line 211, in load
    raise ModuleNotFoundError()
ModuleNotFoundError

Anything else?

No response

Code of Conduct

@MMathisLab
Copy link
Member

@gonlairo can you take a look?

@MMathisLab
Copy link
Member

@CHENGHUAN555 did you install the data pip install cebra[datasets] otherwise indeed the module is not loaded. I suggest checking out demos here: https://cebra.ai/docs/demos.html, which use a particular data loader, but nonetheless you get the idea. See install here: https://cebra.ai/docs/installation.html#id1

@EricThomson
Copy link

This solved it for me when I hit this error when working through the code in the Usage page.

A couple of notes (feel free to ignore 😄) -- I found the ModuleNotFound message a bit hard to interpret as it didn't say what module, so I wasn't sure how to proceed. Also, the installation page says that the datasets optional dependency is for working with the datasets at Figshare. Hence, when I got the error on the Usage page when trying to do stuff with synthetic data, I didn't consider the correct solution.

Anyway, minor wrinkles -- congrats on the cool package I'm having fun with it so far!

@stes
Copy link
Member

stes commented Sep 28, 2023

@EricThomson , thanks for flagging. I created a new issue to track these potential improvements here: #77

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants