You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a problem to load a large MDF (mf4) file into a dataframe using the iter_to_dataframe() method.
To tackle this problem, we already tried to switch from to_dataframe() to iter_to_dataframe() method, which works fine for smaller files as before, but get's killed for larger files (>~20GB).
We also tried to alter the parameters of raster, chunk_ram_size and reduce_memory_usage to avoid memory issues, but the problem persists.
Do you know any workaroud to debug or have a solution to this problem?
Quick explanation of the workflow we are using:
loading a mf4 file to a df, then we are doing some processing and filtering and loading it into parquet at the end for further use.
snippet:
def_apply_dataframe_processing(self, mdf: MDF, signals_renaming_mapping: dict[str, str]) ->pd.DataFrame:
"""Converts mdf to dataframe, adjusts time column, renames signals, and drops duplicates after renaming"""df_list= []
fordfinmdf.iter_to_dataframe(time_from_zero=False, raster=1/10**self.precision, raw=True, reduce_memory_usage=True, chunk_ram_size=209715200):
ifdf.empty:
continuedf.reset_index(inplace=True, names="time")
df["time"] =df["time"].round(self.precision)
df=df.rename(columns=signals_renaming_mapping)
columns_to_keep=list(~df.columns.duplicated(keep="first"))
df=df.loc[:, columns_to_keep]
df_list.append(df)
# also tried using pickle and dask to store iterable in storage not memory but process gets killed inside iter_to_dataframe() methodreturndf
The text was updated successfully, but these errors were encountered:
I have a problem to load a large MDF (mf4) file into a dataframe using the iter_to_dataframe() method.
To tackle this problem, we already tried to switch from to_dataframe() to iter_to_dataframe() method, which works fine for smaller files as before, but get's killed for larger files (>~20GB).
We also tried to alter the parameters of raster, chunk_ram_size and reduce_memory_usage to avoid memory issues, but the problem persists.
Do you know any workaroud to debug or have a solution to this problem?
Quick explanation of the workflow we are using:
loading a mf4 file to a df, then we are doing some processing and filtering and loading it into parquet at the end for further use.
snippet:
The text was updated successfully, but these errors were encountered: