You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In infer_variable_types() in entity_utils.py there is a len() call to get the dataframe length. This call causes the entity creation process to be slow for Dask dataframes. Additionally, this function also contains a .compute() call on the sample dataframe, but this computed sample dataframe is never used for Dask as the user must specify the datatypes for Dask entities.
This code could be refactored with these changes:
Only perform the len() call if the input dataframe is a Pandas dataframe
Remove the code block that computes the sample_df if the input df is a Dask dataframe as this sample is never used for a Dask entity
Revert the code for selecting the sample to match the code on master. This code was updated to work with Dask dataframes, but since sample_df is no longer needed for Dask, this can be reverted to its original form.
The text was updated successfully, but these errors were encountered:
In
infer_variable_types()
inentity_utils.py
there is alen()
call to get the dataframe length. This call causes the entity creation process to be slow for Dask dataframes. Additionally, this function also contains a.compute()
call on the sample dataframe, but this computed sample dataframe is never used for Dask as the user must specify the datatypes for Dask entities.This code could be refactored with these changes:
len()
call if the input dataframe is a Pandas dataframemaster
. This code was updated to work with Dask dataframes, but sincesample_df
is no longer needed for Dask, this can be reverted to its original form.The text was updated successfully, but these errors were encountered: