Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modular/Refactor FlashLoader #311

Closed
wants to merge 41 commits into from
Closed

Modular/Refactor FlashLoader #311

wants to merge 41 commits into from

Conversation

zain-sohail
Copy link
Member

@zain-sohail zain-sohail commented Dec 12, 2023

Summary

In this PR, the loader is modularized. Key changes include the introduction of separate classes for distinct functionalities:

  1. DataFrameCreator:

    • Restructured the process of generating dataframes, organizing them by channel format (electron, pulse, train).
    • Implemented class properties for easy access to the dataframes, each tailored to the unique logic of creating the respective dataframes.
    • Electron dataframe loading is 3x faster due to directly loading all dld channels at once from the dataset.
    • Using concatenation instead of the individual join operation before (Not tested but should be faster)
    • Notably, the electron dataframe now checks and sorts pulses for monotonicity, resulting in unique index tuples of trainId, pulseId, and electronId (This was apparently not the case).
  2. BufferFileHandler:

    • Manages the creation of buffer files, offering the flexibility of serial or parallel generation as needed.
    • Conducts schema checks against the configuration file for existing buffer files.
  3. ParquetHandler:

    • Handles the creation of relevant filenames for storing files. This functionality is separated to accommodate both the loader when saving dataframes (df and df_timed) and the buffer handler, which also deals with storing parquets.

Additionally:

  • Breaking change: dldAux Modification:

    • Moved the treatment of dldAux to the df_train class.
    • Maybe we include a check and suggestion for users to switch to the correct format if necessary.
  • Config Handling:

    • Both DataFrameCreator and BufferFileHandler now utilize the config["dataframe"] section. Hence I needed to change split_dld_time_from_sector_id that.
    • I suggest modification to split_dld_time_from_sector_id to directly accept relevant parameters, omitting the need for the entire config file.
  • Tests

    • I have tried multiple times to test the individual dataframe features but testing based on data is rather difficult (there are multiple commented out tests)
    • Likely, tests for the buffer and parquet should be done and can be done much more easily.

This refactoring also addresses the referenced issues.

@coveralls
Copy link
Collaborator

coveralls commented Dec 13, 2023

Pull Request Test Coverage Report for Build 7535787913

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.6%) to 91.189%

Totals Coverage Status
Change from base Build 7437492641: 0.6%
Covered Lines: 5972
Relevant Lines: 6549

💛 - Coveralls

@zain-sohail zain-sohail changed the title Felloader Modular/Refactor FlashLoader Dec 13, 2023
@zain-sohail zain-sohail marked this pull request as ready for review December 13, 2023 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants