Skip to content

Dataset.take(1) returning Buffer too small #860

@jadhosn

Description

@jadhosn

I'm trying to fetch the column header of a dataset registered in my workspace. Here are the steps I'm following:

  1. Define a datastore
  2. Access the dataset by name and specify the version
  3. Take a single row as a sample
  4. Convert sample into pandas data frame
  5. Call df.columns
test_data = Dataset.get_by_name(ws, "testing_data_prep", 1)
dataset_header = training_dataset.take(1).to_pandas_dataframe()
dataset_cols = dataset_header.columns

I have a wide dataset in hand (roughly 5000 columns), so I'm using .take(1) before I swap it into a pandas dataframe (I assumed that loading up a single record would be way more efficient that loading up the entire dataset).

In this case above, .take(1) returns an error (full Traceback at the very bottom)

azureml.data.dataset_error_handling.DatasetExecutionError: (Column 1: In chunk 0: Invalid: Buffer #1 too small in array of type bool and length 1: expected at least 1 byte(s), got 0)|session_id=

What's weird enough is that if I remove the .take(1) and try to_pandas_dataframe() on a smaple dataset (same number of columns, 100 records), it takes up a second or two but it works and returns the columns header.

Any idea what's going on there?

Traceback

Traceback (most recent call last):
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\data\dataset_error_handling.py", line 83, in _try_execute
return action()
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\data\tabular_dataset.py", line 146, in
out_of_range_datetime=out_of_range_datetime))
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\dataprep\api_loggerfactory.py", line 149, in wrapper
return func(*args, **kwargs)
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\dataprep\api\dataflow.py", line 708, in to_pandas_dataframe
ExecuteAnonymousActivityMessageArguments(anonymous_activity=Dataflow._dataflow_to_anonymous_activity_data(dataflow_to_execute)))
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\dataprep\api_aml_helper.py", line 38, in wrapper
return send_message_func(op_code, message, cancellation_token)
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\dataprep\api\engineapi\api.py", line 94, in execute_anonymous_activity
response = self._message_channel.send_message('Engine.ExecuteActivity', message_args, cancellation_token)
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\dataprep\api\engineapi\engine.py", line 120, in send_message
raise_engine_error(response['error'])
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\dataprep\api\errorhandlers.py", line 24, in raise_engine_error
raise ExecutionError(error_response)
azureml.dataprep.api.errorhandlers.ExecutionError: (Column 1: In chunk 0: Invalid: Buffer #1 too small in array of type bool and length 1: expected at least 1 byte(s), got 0)|session_id=3901bb11-05dc-4fd2-8d79-656071bfb8bf
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\miniconda3\envs\creditscore\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
a.to_pandas_dataframe()
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\data_loggerfactory.py", line 78, in wrapper
return func(*args, **kwargs)
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\data\tabular_dataset.py", line 145, in to_pandas_dataframe
df = _try_execute(lambda: dataflow.to_pandas_dataframe(on_error=on_error,
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\data\dataset_error_handling.py", line 85, in _try_execute
raise DatasetExecutionError(str(e))
azureml.data.dataset_error_handling.DatasetExecutionError: (Column 1: In chunk 0: Invalid: Buffer #1 too small in array of type bool and length 1: expected at least 1 byte(s), got 0)|session_id=

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions