-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
I'm trying to fetch the column header of a dataset registered in my workspace. Here are the steps I'm following:
- Define a datastore
- Access the dataset by name and specify the version
- Take a single row as a sample
- Convert sample into pandas data frame
- Call df.columns
test_data = Dataset.get_by_name(ws, "testing_data_prep", 1)
dataset_header = training_dataset.take(1).to_pandas_dataframe()
dataset_cols = dataset_header.columnsI have a wide dataset in hand (roughly 5000 columns), so I'm using .take(1) before I swap it into a pandas dataframe (I assumed that loading up a single record would be way more efficient that loading up the entire dataset).
In this case above, .take(1) returns an error (full Traceback at the very bottom)
azureml.data.dataset_error_handling.DatasetExecutionError: (Column 1: In chunk 0: Invalid: Buffer #1 too small in array of type bool and length 1: expected at least 1 byte(s), got 0)|session_id=
What's weird enough is that if I remove the .take(1) and try to_pandas_dataframe() on a smaple dataset (same number of columns, 100 records), it takes up a second or two but it works and returns the columns header.
Any idea what's going on there?
Traceback
Traceback (most recent call last):
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\data\dataset_error_handling.py", line 83, in _try_execute
return action()
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\data\tabular_dataset.py", line 146, in
out_of_range_datetime=out_of_range_datetime))
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\dataprep\api_loggerfactory.py", line 149, in wrapper
return func(*args, **kwargs)
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\dataprep\api\dataflow.py", line 708, in to_pandas_dataframe
ExecuteAnonymousActivityMessageArguments(anonymous_activity=Dataflow._dataflow_to_anonymous_activity_data(dataflow_to_execute)))
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\dataprep\api_aml_helper.py", line 38, in wrapper
return send_message_func(op_code, message, cancellation_token)
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\dataprep\api\engineapi\api.py", line 94, in execute_anonymous_activity
response = self._message_channel.send_message('Engine.ExecuteActivity', message_args, cancellation_token)
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\dataprep\api\engineapi\engine.py", line 120, in send_message
raise_engine_error(response['error'])
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\dataprep\api\errorhandlers.py", line 24, in raise_engine_error
raise ExecutionError(error_response)
azureml.dataprep.api.errorhandlers.ExecutionError: (Column 1: In chunk 0: Invalid: Buffer #1 too small in array of type bool and length 1: expected at least 1 byte(s), got 0)|session_id=3901bb11-05dc-4fd2-8d79-656071bfb8bf
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\miniconda3\envs\creditscore\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
a.to_pandas_dataframe()
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\data_loggerfactory.py", line 78, in wrapper
return func(*args, **kwargs)
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\data\tabular_dataset.py", line 145, in to_pandas_dataframe
df = _try_execute(lambda: dataflow.to_pandas_dataframe(on_error=on_error,
File "C:\miniconda3\envs\creditscore\lib\site-packages\azureml\data\dataset_error_handling.py", line 85, in _try_execute
raise DatasetExecutionError(str(e))
azureml.data.dataset_error_handling.DatasetExecutionError: (Column 1: In chunk 0: Invalid: Buffer #1 too small in array of type bool and length 1: expected at least 1 byte(s), got 0)|session_id=