Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datetime logical type transform does not throw TypeConversionError #991

Closed
tamargrey opened this issue Jun 16, 2021 · 3 comments · Fixed by #1001
Closed

Datetime logical type transform does not throw TypeConversionError #991

tamargrey opened this issue Jun 16, 2021 · 3 comments · Fixed by #1001
Assignees
Labels
bug Something isn't working evalml EvalML request

Comments

@tamargrey
Copy link
Contributor

tamargrey commented Jun 16, 2021

When transforming data for a specific logical type with LogicalType.transform, if the transformation to the correct dtype is not possible, a TypeConversionError should be thrown. This is not currently the case with the Datetime logical type.

The following code, which should produce a woodwork TypeConversionError

import pandas as pd
from woodwork.logical_types import Datetime

s = pd.Series(['a', 'b', 'c'])
Datetime().transform(s)

produces the following error: ValueError: Given date string not likely a datetime.

The base LogicalType class' transform catches any errors during the astype call and raises a TypeConversionError. The other logical types with their own transform functions might raise separate types of errors (ordinal with a data validation error, or latlong if the format does not match the required latlong format) before calling the super transform, but those are not related to the dtype change, so it makes sense that they're not TypeConversionErrors.

However with Datetime, we perform a separate astype call via the to_datetime function. This transformation was previously treated the same as the other dtype changes and, if an error occurred, was raised to the user as a TypeConversionError. Now, however, we do not have a try/except around the datetime-specific logic in Datetime.transform, so the pandas error is what we see.

We should make sure that a TypeConversionError is thrown in this case.

@tamargrey tamargrey added the bug Something isn't working label Jun 16, 2021
@jeremyliweishih
Copy link

jeremyliweishih commented Jun 18, 2021

In another case a TypeError is thrown instead of a ValueError:

    def test_ohe_woodwork_custom_overrides_returned_by_components(X_df):
        y = pd.Series([1, 2, 1])
        override_types = [Integer, Double, Categorical, NaturalLanguage, Datetime, Boolean]
        for logical_type in override_types:
            try:
                X = X_df.copy()
>               X.ww.init(logical_types={0: logical_type})

evalml/tests/component_tests/test_one_hot_encoder.py:737:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../.pyenv/versions/3.8.8/envs/evalml38/lib/python3.8/site-packages/woodwork/table_accessor.py:138: in init
    updated_series = logical_type.transform(series)
../../.pyenv/versions/3.8.8/envs/evalml38/lib/python3.8/site-packages/woodwork/logical_types.py:193: in transform
    series = pd.to_datetime(series, format=self.datetime_format)
../../.pyenv/versions/3.8.8/envs/evalml38/lib/python3.8/site-packages/pandas/core/tools/datetimes.py:805: in to_datetime
    values = convert_listlike(arg._values, format)
../../.pyenv/versions/3.8.8/envs/evalml38/lib/python3.8/site-packages/pandas/core/tools/datetimes.py:377: in _convert_listlike_datetimes
    arg, _ = maybe_convert_dtype(arg, copy=False)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

data = <BooleanArray>
[True, False, True]
Length: 3, dtype: boolean, copy = False

    def maybe_convert_dtype(data, copy):
        """
        Convert data based on dtype conventions, issuing deprecation warnings
        or errors where appropriate.

        Parameters
        ----------
        data : np.ndarray or pd.Index
        copy : bool

        Returns
        -------
        data : np.ndarray or pd.Index
        copy : bool

        Raises
        ------
        TypeError : PeriodDType data is passed
        """
        if not hasattr(data, "dtype"):
            # e.g. collections.deque
            return data, copy

        if is_float_dtype(data.dtype):
            # Note: we must cast to datetime64[ns] here in order to treat these
            #  as wall-times instead of UTC timestamps.
            data = data.astype(DT64NS_DTYPE)
            copy = False
            # TODO: deprecate this behavior to instead treat symmetrically
            #  with integer dtypes.  See discussion in GH#23675

        elif is_timedelta64_dtype(data.dtype) or is_bool_dtype(data.dtype):
            # GH#29794 enforcing deprecation introduced in GH#23539
>           raise TypeError(f"dtype {data.dtype} cannot be converted to datetime64[ns]")
E           TypeError: dtype boolean cannot be converted to datetime64[ns]

@jeff-hernandez
Copy link
Contributor

What error message do we want for the TypeConversionError?

@tamargrey
Copy link
Contributor Author

tamargrey commented Jun 21, 2021

@jeff-hernandez Thinking it should be the same as any other TypeConversionError that gets thrown for another logical type:

            error_msg = f'Error converting datatype for {series.name} from type {str(series.dtype)} ' \
                f'to type {new_dtype}. Please confirm the underlying data is consistent with ' \
                f'logical type {type(self)}.'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working evalml EvalML request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants