Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error message for unsupported datetime dtypes in Pandas extra #3518

Closed
Zac-HD opened this issue Nov 29, 2022 · 1 comment · Fixed by #3628
Closed

Improve error message for unsupported datetime dtypes in Pandas extra #3518

Zac-HD opened this issue Nov 29, 2022 · 1 comment · Fixed by #3628
Labels
interop how to play nicely with other packages legibility make errors helpful and Hypothesis grokable

Comments

@Zac-HD
Copy link
Member

Zac-HD commented Nov 29, 2022

In this StackOverflow question, the asker notes that column('timestamp', dtype=pd.datetime) doesn't actually create a datetime column in the generated dataframe. This turns out to be a fairly general issue: passing dtype=datetime.datetime will actually create an object column too, and even passing dtype="datetime64" is problematic as Hypothesis will choose various time-resolution units. I think we should therefore:

  1. Ensure that if you get an error when we can tell that you probably wanted datetimes, we explicitly suggest dtype="datetime64[ns]" as part of the error message
  2. note_deprecation(...) if we can detect this and it wasn't previously an error
  3. note_deprecation(...) if the time-resolution unit is not ns - others are valid in Numpy but not Pandas
  4. Investigate doing the same for pd.Timedelta and related dtype arguments (see docs here)
@Zac-HD Zac-HD added legibility make errors helpful and Hypothesis grokable interop how to play nicely with other packages labels Nov 29, 2022
@big-g-squared
Copy link

I'm currently working on fixing this issue and wanted to ask for some guidance. I ran the example in the stack overflow question and got the following error:

practice.py:8: FutureWarning: The pandas.datetime class is deprecated and will be removed from pandas in a future version. Import from datetime module instead.
  @given(data_frames([column('A', dtype=str), column('B', dtype=int), column('Date', dtype=pd.datetime)]))
/home/spoderman94/basis_project/venv/lib/python3.8/site-packages/hypothesis/extra/pandas/impl.py:84: HypothesisDeprecationWarning: Passed dtype=<class 'datetime.datetime'> is not a valid Pandas dtype.  We'll treat it as dtype=object for now, but this will be an error in a future version.
  note_deprecation(
Traceback (most recent call last):
  File "practice.py", line 13, in <module>
    s.test_example_test_method()
  File "practice.py", line 9, in test_example_test_method
    def test_example_test_method(self, dataframe):
  File "/home/spoderman94/basis_project/venv/lib/python3.8/site-packages/hypothesis/core.py", line 1397, in wrapped_test
    raise the_error_hypothesis_found
  File "/home/spoderman94/basis_project/venv/lib/python3.8/site-packages/hypothesis/extra/numpy.py", line 183, in from_dtype
    raise InvalidArgument(f"No strategy inference for {dtype}")
hypothesis.errors.InvalidArgument: No strategy inference for object

I noticed that there was a deprecation message being sent out from the "extra/pandas/impl.py" file, specifically in a function called "elements_and_dtype." I was wondering: would it be appropriate to place a "dtype=pd.datetime"-specific deprecation notice in this area? This is what the function looks like:

@check_function
def elements_and_dtype(elements, dtype, source=None):
if source is None:
prefix = ""
else:
prefix = f"{source}."
if elements is not None:
check_strategy(elements, f"{prefix}elements")
else:
with check("dtype is not None"):
if dtype is None:
raise InvalidArgument(
f"At least one of {prefix}elements or {prefix}dtype must be provided."
)
with check("is_categorical_dtype"):
if is_categorical_dtype(dtype):
raise InvalidArgument(
f"{prefix}dtype is categorical, which is currently unsupported"
)
if isinstance(dtype, type) and np.dtype(dtype).kind == "O" and dtype is not object:
note_deprecation(
f"Passed dtype={dtype!r} is not a valid Pandas dtype. We'll treat it as "
"dtype=object for now, but this will be an error in a future version.",
since="2021-12-31",
has_codemod=False,
)
if isinstance(dtype, st.SearchStrategy):
raise InvalidArgument(
f"Passed dtype={dtype!r} is a strategy, but we require a concrete dtype "
"here. See https://stackoverflow.com/q/74355937 for workaround patterns."
)
dtype = try_convert(np.dtype, dtype, "dtype")
if elements is None:
elements = npst.from_dtype(dtype)
elif dtype is not None:

Additionally, the place where the error is actually raised is from "extras/numpy.py" in an area that looks like it's checking the different numpy "dtype.kind" properties of the given dtype. Would it be appropriate to place a "dtype=pd.datetime"-specific error message here? My thinking is that the deprecation notice that I mentioned above should be enough for the user to deduce that they may have used a deprecated datetime object as the dtype, but in your raised issue message you said that:

Ensure that if you get an error when we can tell that you probably wanted datetimes, we explicitly suggest dtype="datetime64[ns]" as part of the error message

Does this mean to do it in the raised error message? There is another piece that I want to explore which is the "try_convert" method from the "hypothesis.internal.validation" part of the program but I wanted to reach out for guidance just in case I'm overcomplicating things (as I tend to do). Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interop how to play nicely with other packages legibility make errors helpful and Hypothesis grokable
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants