Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPVAREstimator - AssertionError: #3066

Open
ArianKhorasani opened this issue Nov 28, 2023 · 3 comments
Open

GPVAREstimator - AssertionError: #3066

ArianKhorasani opened this issue Nov 28, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@ArianKhorasani
Copy link

ArianKhorasani commented Nov 28, 2023

Dear @lostella or maybe @jaheba et al - I'd require your help!

I'm training my multivariate time series dataset which is converted to ListDataset on GPVAREstimator, but getting the following AssertionError in Training process:

```
AssertionError                            Traceback (most recent call last)
Cell In[38], line 1
----> 1 predictor = estimator.train(
      2     training_data = train_ds_residuals,
      3     shuffle_buffer_length = 100,
      4     cache_data = True,
      5 )

File ~/Project/pytorch-transformer-ts/myenv/lib/python3.8/site-packages/gluonts/mx/model/estimator.py:237, in GluonEstimator.train(self, training_data, validation_data, shuffle_buffer_length, cache_data, **kwargs)
    229 def train(
    230     self,
    231     training_data: Dataset,
   (...)
    235     **kwargs,
    236 ) -> Predictor:
--> 237     return self.train_model(
    238         training_data=training_data,
    239         validation_data=validation_data,
    240         shuffle_buffer_length=shuffle_buffer_length,
    241         cache_data=cache_data,
    242     ).predictor

File ~/Project/pytorch-transformer-ts/myenv/lib/python3.8/site-packages/gluonts/mx/model/estimator.py:205, in GluonEstimator.train_model(self, training_data, validation_data, from_predictor, shuffle_buffer_length, cache_data)
    197             transformed_validation_data = Cached(
...
     35         input_dim=self.target_dim,
     36         output_dim=4 * self.distr_output.rank,
     37     )

AssertionError:
```

Please note that the target_dim = 7, prediction_length = 1, and context_length = 5. Here is the whole code of GPVAREstimator that I'm using too:

```
estimator = GPVAREstimator(
    prediction_length = 1,
    target_dim = 7,
    freq = '1H',
    context_length = 5,
    num_layers = 4,
    num_cells = 32,
    distr_output = MultivariateGaussianOutput(dim=7),
    trainer = Trainer(ctx = "cpu", epochs = 50, weight_decay = 1e-8, num_batches_per_epoch = 100)
)
predictor = estimator.train(
    training_data = train_ds_residuals,
    shuffle_buffer_length = 100,
    cache_data = True,
)
```

I'd appreciated if you could help me with this! Thank you!

@ArianKhorasani ArianKhorasani added the bug Something isn't working label Nov 28, 2023
@lostella
Copy link
Contributor

@ArianKhorasani could you provide the entire error trace? It’s not clear which assertion is failing

@ArianKhorasani
Copy link
Author

@lostella - the error trace that I provided is the entire error that I get. Please check the screenshot below too:
Screen Shot 2023-11-28 at 4 02 20 PM

@ArianKhorasani
Copy link
Author

Dear @lostella - I have already checked the dimension of my multivariate time series too. Putting my whole dataset code below:

```
variables = ['DBP', 'SBP', 'Resp', 'Temp', 'HR', 'O2Sat', 'MAP']
df_actual = pd.read_csv('merged_test.csv')
static_features = df_actual[['patient_id', 'Age', 'Gender', 'HospAdmTime']].drop_duplicates().reset_index(drop=True)
df_residuals = pd.DataFrame()

for variable in variables:
    # First, let's load forecasted values 
    df_forecast = pd.read_csv(f'forecasts_{variable}.csv')

    # Ensure that the data are ordered in the same way
    df_actual = df_actual.sort_values(by=['patient_id', 'ICULOS'])
    df_forecast = df_forecast.sort_values(by=['patient_id', 'ICULOS'])

    # Calculate residuals 
    residuals = df_actual[variable] - df_forecast[f'{variable}_forecast']

    # Add residual to df_residuals
    df_residuals[variable] = residuals
    df_residuals['patient_id'] = df_actual['patient_id']
    df_residuals['ICULOS'] = df_actual['ICULOS']

# Convert df_residuals to ListDataset
data_residuals = []
for patient_id, group in df_residuals.groupby('patient_id'):
    target = group[variables].values  # Use the residuals as target
    start = pd.Timestamp("1970-01-01 00:00") + pd.Timedelta(hours=group['ICULOS'].iloc[0])
    entry = {
        FieldName.TARGET: target,
        FieldName.START: start,  # Use the index as the start date
        FieldName.FEAT_STATIC_CAT: static_features[static_features['patient_id'] == patient_id][['Age', 'Gender']].values[0]
    }
    data_residuals.append(entry)

dataset_residuals = ListDataset(data_residuals, freq='1H', one_dim_target=False)
```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants