Skip to content

Add Prediction Exporter#366

Merged
mkolodner-sc merged 9 commits intomainfrom
mkolodner-sc/update_exporter
Oct 24, 2025
Merged

Add Prediction Exporter#366
mkolodner-sc merged 9 commits intomainfrom
mkolodner-sc/update_exporter

Conversation

@mkolodner-sc
Copy link
Copy Markdown
Collaborator

@mkolodner-sc mkolodner-sc commented Oct 23, 2025

Scope of work done

  • Adds an exporter for predictions and corresponding tests

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

Comment thread python/tests/unit/common/data/export_test.py
@mkolodner-sc mkolodner-sc changed the title Add Prediction Exporter [Draft] Add Prediction Exporter Oct 23, 2025
@mkolodner-sc mkolodner-sc changed the title [Draft] Add Prediction Exporter Add Prediction Exporter Oct 23, 2025
Comment thread python/gigl/common/data/export.py Outdated
Comment thread python/gigl/common/data/export.py Outdated
Comment thread python/gigl/common/data/export.py Outdated
Comment thread python/gigl/common/data/export.py Outdated
Comment thread python/gigl/common/data/export.py Outdated
Comment thread python/gigl/common/data/export.py
Comment thread python/gigl/common/data/export.py Outdated
Copy link
Copy Markdown
Collaborator

@svij-sc svij-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more comments, thanks for the work.

Comment thread python/tests/unit/common/data/export_test.py Outdated
Comment thread python/tests/unit/common/data/export_test.py
Comment thread python/tests/unit/common/data/export_test.py
Comment thread python/gigl/common/data/export.py Outdated
@mkolodner-sc mkolodner-sc marked this pull request as ready for review October 24, 2025 19:42
@mkolodner-sc mkolodner-sc disabled auto-merge October 24, 2025 19:59
@mkolodner-sc mkolodner-sc added this pull request to the merge queue Oct 24, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Oct 24, 2025
@mkolodner-sc mkolodner-sc added this pull request to the merge queue Oct 24, 2025
Merged via the queue into main with commit 4f7b05b Oct 24, 2025
4 checks passed
@mkolodner-sc mkolodner-sc deleted the mkolodner-sc/update_exporter branch October 24, 2025 21:48
Comment on lines +317 to +349
def add_prediction(
self,
id_batch: torch.Tensor,
prediction_batch: torch.Tensor,
prediction_type: str,
):
"""
Adds to the in-memory buffer the integer IDs and their corresponding predictions.

Args:
id_batch (torch.Tensor): A torch.Tensor containing integer IDs.
prediction_batch (torch.Tensor): A torch.Tensor containing predictions corresponding to the integer IDs in `id_batch`.
prediction_type (str): A tag for the type of the predictions, e.g., 'user', 'content', etc.
"""
# Convert torch tensors to NumPy arrays, and then to Python int(s)
# and Python list(s). This is faster than converting torch tensors
# directly to Python int(s) and Python list(s), as Numpy's implementation
# is more efficient.
ids = id_batch.numpy()
predictions = prediction_batch.numpy()

self._num_records_written += len(ids)

batched_records = (
{
_NODE_ID_KEY: int(node_id),
_NODE_TYPE_KEY: prediction_type,
_PREDICTION_KEY: float(prediction),
}
for node_id, prediction in zip(ids, predictions)
)

self.add_record(batched_records)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit odd, why do we have them both call add_record, (which should probably be a _private method?

The primary differences between PredictionExporter and EmbeddingExporter are that they use a different schema (and therefor a different dict transformation) right?

Why not just parameterize that on the class, and have everyone call add_record?

I think that having the base classes each have their own add_x methods isa bad use of OOO, as how should a user know to use add_prediction vs add_record. This is partially fixed by making add_record _private but this pattern is still odd to me.

Ditto on load_x_to_bigquery

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants