Skip to content

Conversation

@goeffthomas
Copy link
Contributor

@goeffthomas goeffthomas requested review from neshdev and rosbo April 15, 2025 23:55
@goeffthomas goeffthomas requested a review from neshdev April 21, 2025 18:11
}

# Mapping of adapters to the valid kwargs to use for that adapter
_DATASET_LOAD_VALID_KWARGS_MAP = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This can become tedious as the methods get more complex. Instead, we can use the inspect module to get the function signature. See https://docs.python.org/3/library/inspect.html#introspecting-callables-with-the-signature-object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, very cool. When we do the next data loader, I'll make a point to refactor this then.

Copy link
Member

@neshdev neshdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Feel free to refactor to use inspect module if you can or feel free to save it for next time.


def _load_polars_dataset_with_other_loader_kwargs_and_assert_warning(self) -> None:
output_stream = io.StringIO()
handler = logging.StreamHandler(output_stream)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the unittest works, but in our code base we use redirect_stdout when working with standard output in unit test.
https://docs.python.org/3/library/contextlib.html#contextlib.redirect_stdout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I found this example in the code base and based my implementation off of that:

def test_login_returns_403_for_bad_credentials(self) -> None:
output_stream = io.StringIO()
handler = logging.StreamHandler(output_stream)
logger.addHandler(handler)
login("invalid", "invalid")
captured_output = output_stream.getvalue()
self.assertEqual(
captured_output,
"Invalid Kaggle credentials. You can check your credentials on the [Kaggle settings page](https://www.kaggle.com/settings/account).\n",
)

So something like this would be a better pattern to follow?:

def test_kagglehub_console_filter_discards_logrecord(self) -> None:
with TemporaryDirectory() as f:
log_path = Path(f) / "test-log"
logger = logging.getLogger("kagglehub")
stream = StringIO()
with redirect_stdout(stream):
# reconfigure logger, otherwise streamhandler doesnt use the modified stderr
_configure_logger(log_path)
logger.info("HIDE", extra={**EXTRA_CONSOLE_BLOCK})
logger.info("SHOW")
self.assertEqual(stream.getvalue(), "SHOW\n")

@goeffthomas
Copy link
Contributor Author

@neshdev I'll merge this for now to keep things moving. LMK if I should address the logging unit test and I'm going to backburner the inspect module refactor for the next data loader.

@goeffthomas goeffthomas merged commit 110b817 into main Apr 23, 2025
6 checks passed
@goeffthomas goeffthomas deleted the add-kwargs-validation-for-adapters branch April 23, 2025 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants