parquet pipeline #649

jpn-- · 2023-02-07T16:57:44Z

Initial implementation for #645

dhensle

Generally looks good and seems to work as designed for me. Wondering if the default pipeline_file_name setting should change in config.py to remove the ".h5". (Had to do some digging here for the actual setting to get this feature turned on!). Note that I think changing the default setting will also use this parquet functionality in the pipeline CI testing, which I think is desirable. Based on the getting_started.ipynb notebooks, I think the .h5 version should be specified explicitly in the settings for the prototype_mtc example.

dhensle · 2023-02-25T01:08:06Z

activitysim/core/pipeline.py

+
+    If the pipeline_file_name setting ends in ".h5", then the pandas
+    HDFStore file format is used, otherwise pipeline files are stored
+    as parquet files organized in regular file system directories.


I assume documentation on this setting will to be addressed in the other Pydantic task?

dhensle · 2023-02-25T01:09:50Z

activitysim/core/pipeline.py

+    although when using the parquet storage format this file is stored as "None.parquet"
+    to maintain a simple consistent file directory structure.
+
+    If the


Looks like unfinished thought here...

dhensle · 2023-02-25T01:11:16Z

activitysim/core/pipeline.py

+        store.joinpath(table_name).mkdir(parents=True, exist_ok=True)
+        df.to_parquet(store.joinpath(table_name, f"{checkpoint_name}.parquet"))
+    else:
+        complib = config.setting("pipeline_complib", None)


Another setting to not get lost in the Pydantic task.

jpn-- · 2023-11-22T23:55:57Z

This PR was superseded by #654

basic implementation of parquet pipeline

b4ac903

jpn-- requested a review from dhensle February 7, 2023 18:01

dhensle reviewed Feb 25, 2023

View reviewed changes

jpn-- closed this Nov 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parquet pipeline #649

parquet pipeline #649

jpn-- commented Feb 7, 2023

dhensle left a comment

dhensle Feb 25, 2023

dhensle Feb 25, 2023

dhensle Feb 25, 2023

jpn-- commented Nov 22, 2023

parquet pipeline #649

parquet pipeline #649

Conversation

jpn-- commented Feb 7, 2023

dhensle left a comment

Choose a reason for hiding this comment

dhensle Feb 25, 2023

Choose a reason for hiding this comment

dhensle Feb 25, 2023

Choose a reason for hiding this comment

dhensle Feb 25, 2023

Choose a reason for hiding this comment

jpn-- commented Nov 22, 2023