Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with conversion of large MDF Files #1021

Open
xoxStudios opened this issue May 16, 2024 · 1 comment
Open

Problem with conversion of large MDF Files #1021

xoxStudios opened this issue May 16, 2024 · 1 comment

Comments

@xoxStudios
Copy link

I have a problem to load a large MDF (mf4) file into a dataframe using the iter_to_dataframe() method.
To tackle this problem, we already tried to switch from to_dataframe() to iter_to_dataframe() method, which works fine for smaller files as before, but get's killed for larger files (>~20GB).
We also tried to alter the parameters of raster, chunk_ram_size and reduce_memory_usage to avoid memory issues, but the problem persists.
Do you know any workaroud to debug or have a solution to this problem?

Quick explanation of the workflow we are using:
loading a mf4 file to a df, then we are doing some processing and filtering and loading it into parquet at the end for further use.

snippet:

def _apply_dataframe_processing(self, mdf: MDF, signals_renaming_mapping: dict[str, str]) -> pd.DataFrame:
    """Converts mdf to dataframe, adjusts time column, renames signals, and drops duplicates after renaming"""
    df_list = []
    for df in mdf.iter_to_dataframe(time_from_zero=False, raster=1/10**self.precision, raw=True, reduce_memory_usage=True, chunk_ram_size=209715200):
        if df.empty:
            continue
        df.reset_index(inplace=True, names="time")
        df["time"] = df["time"].round(self.precision)
        df = df.rename(columns=signals_renaming_mapping)
        columns_to_keep = list(~df.columns.duplicated(keep="first"))
        df = df.loc[:, columns_to_keep]
        df_list.append(df)
        # also tried using pickle and dask to store iterable in storage not memory but process gets killed inside iter_to_dataframe() method
    return df
@danielhrisca
Copy link
Owner

Any chance you could send the file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants