Merged
Conversation
mkolodner-sc
commented
Oct 23, 2025
svij-sc
reviewed
Oct 23, 2025
svij-sc
approved these changes
Oct 23, 2025
Collaborator
svij-sc
left a comment
There was a problem hiding this comment.
A couple more comments, thanks for the work.
yliu2-sc
approved these changes
Oct 24, 2025
Comment on lines
+317
to
+349
| def add_prediction( | ||
| self, | ||
| id_batch: torch.Tensor, | ||
| prediction_batch: torch.Tensor, | ||
| prediction_type: str, | ||
| ): | ||
| """ | ||
| Adds to the in-memory buffer the integer IDs and their corresponding predictions. | ||
|
|
||
| Args: | ||
| id_batch (torch.Tensor): A torch.Tensor containing integer IDs. | ||
| prediction_batch (torch.Tensor): A torch.Tensor containing predictions corresponding to the integer IDs in `id_batch`. | ||
| prediction_type (str): A tag for the type of the predictions, e.g., 'user', 'content', etc. | ||
| """ | ||
| # Convert torch tensors to NumPy arrays, and then to Python int(s) | ||
| # and Python list(s). This is faster than converting torch tensors | ||
| # directly to Python int(s) and Python list(s), as Numpy's implementation | ||
| # is more efficient. | ||
| ids = id_batch.numpy() | ||
| predictions = prediction_batch.numpy() | ||
|
|
||
| self._num_records_written += len(ids) | ||
|
|
||
| batched_records = ( | ||
| { | ||
| _NODE_ID_KEY: int(node_id), | ||
| _NODE_TYPE_KEY: prediction_type, | ||
| _PREDICTION_KEY: float(prediction), | ||
| } | ||
| for node_id, prediction in zip(ids, predictions) | ||
| ) | ||
|
|
||
| self.add_record(batched_records) |
Collaborator
There was a problem hiding this comment.
This is a bit odd, why do we have them both call add_record, (which should probably be a _private method?
The primary differences between PredictionExporter and EmbeddingExporter are that they use a different schema (and therefor a different dict transformation) right?
Why not just parameterize that on the class, and have everyone call add_record?
I think that having the base classes each have their own add_x methods isa bad use of OOO, as how should a user know to use add_prediction vs add_record. This is partially fixed by making add_record _private but this pattern is still odd to me.
Ditto on load_x_to_bigquery
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Scope of work done
Where is the documentation for this feature?: N/A
Did you add automated tests or write a test plan?
Updated Changelog.md? NO
Ready for code review?: NO