Problem with conversion of large MDF Files #1021

xoxStudios · 2024-05-16T11:28:55Z

I have a problem to load a large MDF (mf4) file into a dataframe using the iter_to_dataframe() method.
To tackle this problem, we already tried to switch from to_dataframe() to iter_to_dataframe() method, which works fine for smaller files as before, but get's killed for larger files (>~20GB).
We also tried to alter the parameters of raster, chunk_ram_size and reduce_memory_usage to avoid memory issues, but the problem persists.
Do you know any workaroud to debug or have a solution to this problem?

Quick explanation of the workflow we are using:
loading a mf4 file to a df, then we are doing some processing and filtering and loading it into parquet at the end for further use.

snippet:

def _apply_dataframe_processing(self, mdf: MDF, signals_renaming_mapping: dict[str, str]) -> pd.DataFrame:
    """Converts mdf to dataframe, adjusts time column, renames signals, and drops duplicates after renaming"""
    df_list = []
    for df in mdf.iter_to_dataframe(time_from_zero=False, raster=1/10**self.precision, raw=True, reduce_memory_usage=True, chunk_ram_size=209715200):
        if df.empty:
            continue
        df.reset_index(inplace=True, names="time")
        df["time"] = df["time"].round(self.precision)
        df = df.rename(columns=signals_renaming_mapping)
        columns_to_keep = list(~df.columns.duplicated(keep="first"))
        df = df.loc[:, columns_to_keep]
        df_list.append(df)
        # also tried using pickle and dask to store iterable in storage not memory but process gets killed inside iter_to_dataframe() method
    return df

danielhrisca · 2024-05-16T11:31:35Z

Any chance you could send the file?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with conversion of large MDF Files #1021

Problem with conversion of large MDF Files #1021

xoxStudios commented May 16, 2024

danielhrisca commented May 16, 2024

Problem with conversion of large MDF Files #1021

Problem with conversion of large MDF Files #1021

Comments

xoxStudios commented May 16, 2024

danielhrisca commented May 16, 2024