Small improvements to naming / structure in input_pipeline_interface.py #1845

NuojCheng · 2025-06-18T21:42:42Z

Description

TL;DR: Re-organize input_pipeline_interphase.py for better readability

Following changes are made:

Synthetic data iterator and placeholder synthetic data iterator are moved to a separate file
BadSyntheticDataIterator is renamed to PlaceHolderDataIterator
The if-else commands in the end are re-organized

FIXES: b/421596013

Tests

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

MaxText/input_pipeline/input_pipeline_interface.py

SurbhiJainUSC · 2025-06-23T22:04:22Z

MaxText/input_pipeline/_syn_data_processing.py

Maybe name this module as synthethic_data_processing.py

MaxText/input_pipeline/_syn_data_processing.py

MaxText/input_pipeline/input_pipeline_interface.py

aireenmei · 2025-06-25T00:02:24Z

MaxText/input_pipeline/input_pipeline_interface.py

+      assert config.packing, "c4_mlperf dataloader only works with packing. For padded version, use tfds dataloader"
+    train_iterator, eval_iterator = dataset_type_to_train_eval_iterator[config.dataset_type]
+  else:
+    max_logging.log(f"WARNING: '{config.dataset_type}' is not a supported dataset type." \


When user specify dataset_type=synthetic, this WARNING message will be confusing. We can exclude that case.

When dataset_type=synthetic, it should not incur this warning message as it exits in line 64.

I see. Sorry I missed that. Thanks!

aireenmei

Thanks Nuojin!

aireenmei · 2025-06-25T01:01:31Z

MaxText/input_pipeline/input_pipeline_interface.py

+      assert config.packing, "c4_mlperf dataloader only works with packing. For padded version, use tfds dataloader"
+    train_iterator, eval_iterator = dataset_type_to_train_eval_iterator[config.dataset_type]
+  else:
+    max_logging.log(f"WARNING: '{config.dataset_type}' is not a supported dataset type." \


I see. Sorry I missed that. Thanks!

NuojCheng changed the title ~~Chengnuojin/input pipeline~~ Refactor input pipeline with synthetic data iterator Jun 19, 2025

NuojCheng force-pushed the chengnuojin/input_pipeline branch 4 times, most recently from 99e1569 to 14c16f1 Compare June 23, 2025 15:51

NuojCheng changed the title ~~Refactor input pipeline with synthetic data iterator~~ Small improvements to naming / structure in input_pipeline_interface.py Jun 23, 2025

NuojCheng marked this pull request as ready for review June 23, 2025 17:40

NuojCheng requested review from aireenmei, SurbhiJainUSC and richjames0 as code owners June 23, 2025 17:40

SurbhiJainUSC reviewed Jun 23, 2025

View reviewed changes

MaxText/input_pipeline/_syn_data_processing.py Outdated Show resolved Hide resolved

NuojCheng force-pushed the chengnuojin/input_pipeline branch from 8728f1a to de5ef9c Compare June 24, 2025 16:36

NuojCheng requested a review from SurbhiJainUSC June 24, 2025 18:17

SurbhiJainUSC reviewed Jun 24, 2025

View reviewed changes

MaxText/input_pipeline/input_pipeline_interface.py Show resolved Hide resolved

MaxText/input_pipeline/input_pipeline_interface.py Outdated Show resolved Hide resolved

NuojCheng force-pushed the chengnuojin/input_pipeline branch from a4bc416 to dd92595 Compare June 24, 2025 20:50

NuojCheng requested a review from SurbhiJainUSC June 24, 2025 22:04

aireenmei reviewed Jun 25, 2025

View reviewed changes

aireenmei approved these changes Jun 25, 2025

View reviewed changes

SurbhiJainUSC approved these changes Jun 25, 2025

View reviewed changes

move synthetic data iterator

58af99b

NuojCheng force-pushed the chengnuojin/input_pipeline branch from dd92595 to 58af99b Compare June 25, 2025 05:37

github-actions bot added the pull ready label Jun 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Small improvements to naming / structure in input_pipeline_interface.py #1845

Small improvements to naming / structure in input_pipeline_interface.py #1845

NuojCheng commented Jun 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SurbhiJainUSC Jun 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aireenmei Jun 25, 2025

Uh oh!

NuojCheng Jun 25, 2025

Uh oh!

aireenmei Jun 25, 2025

Uh oh!

aireenmei left a comment

Uh oh!

aireenmei Jun 25, 2025

Uh oh!

Uh oh!

Small improvements to naming / structure in input_pipeline_interface.py #1845

Are you sure you want to change the base?

Small improvements to naming / structure in input_pipeline_interface.py #1845

Conversation

NuojCheng commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SurbhiJainUSC Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aireenmei Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

NuojCheng Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

aireenmei Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

aireenmei left a comment

Choose a reason for hiding this comment

Uh oh!

aireenmei Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NuojCheng commented Jun 18, 2025 •

edited

Loading