Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures with upcoming pandas 1.4 release #8580

Closed
jorisvandenbossche opened this issue Jan 18, 2022 · 14 comments · Fixed by #8626
Closed

Failures with upcoming pandas 1.4 release #8580

jorisvandenbossche opened this issue Jan 18, 2022 · 14 comments · Fixed by #8626
Labels
dataframe good second issue Clearly described, educational, but less trivial than "good first issue".

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 18, 2022

There are already some issues / PRs related to this (eg some upstream failures are also listed in ), but I thought it might be useful to have a dedicated issue about this.

Pandas released a 1.4.0rc0 last week, and probably will release a final 1.4.0 next week.

The upstream test build (eg https://github.com/dask/dask/runs/4823351165?check_suite_focus=true) shows several failures, falling into a few categories:

@jsignell
Copy link
Member

Thanks for raising this @jorisvandenbossche! It is indeed good to see these all in one place. If you have time to take a look at any of the PRs it would definitely be appreciated.

@jsignell
Copy link
Member

I updated the list with the PRs that I think address each issue.

@graingert
Copy link
Member

graingert commented Jan 18, 2022

re upgrading to sqlalchemy 1.4 #8158 or #8553

@ian-r-rose ian-r-rose added dataframe good second issue Clearly described, educational, but less trivial than "good first issue". labels Jan 19, 2022
@jsignell
Copy link
Member

jsignell commented Jan 24, 2022

Alright pandas 1.4 has been released and as expected there are some failures in the python 3.9 environments and in doctests where we pick up latest pandas.

@jsignell
Copy link
Member

I'm opening a PR now to skip append tests.

@jsignell
Copy link
Member

@martindurant do you have time to take a look at the timezone related issues with fastparquet?

@jsignell
Copy link
Member

Here is the output from the failing fastparquet tests:

Details
=============================================== test session starts ===============================================
platform linux -- Python 3.9.9, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/julia/conda/envs/dask-3.9/bin/python
cachedir: .pytest_cache
rootdir: /home/julia/dask, configfile: setup.cfg
plugins: rerunfailures-10.2, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 876 items / 873 deselected / 3 selected                                                                 
run-last-failure: rerun previous 3 failures

dask/dataframe/io/tests/test_parquet.py::test_roundtrip[fastparquet-df12-write_kwargs12-read_kwargs12] FAILED [ 33%]
dask/dataframe/io/tests/test_parquet.py::test_roundtrip[fastparquet-df13-write_kwargs13-read_kwargs13] FAILED [ 66%]
dask/dataframe/io/tests/test_parquet.py::test_timestamp96 FAILED                                            [100%]

==================================================== FAILURES =====================================================
__________________________ test_roundtrip[fastparquet-df12-write_kwargs12-read_kwargs12] __________________________

data = array(['now'], dtype=object), dayfirst = False, yearfirst = False, utc = False, errors = 'raise'
require_iso8601 = False, allow_object = True, allow_mixed = False

    def objects_to_datetime64ns(
        data: np.ndarray,
        dayfirst,
        yearfirst,
        utc=False,
        errors="raise",
        require_iso8601: bool = False,
        allow_object: bool = False,
        allow_mixed: bool = False,
    ):
        """
        Convert data to array of timestamps.
    
        Parameters
        ----------
        data : np.ndarray[object]
        dayfirst : bool
        yearfirst : bool
        utc : bool, default False
            Whether to convert timezone-aware timestamps to UTC.
        errors : {'raise', 'ignore', 'coerce'}
        require_iso8601 : bool, default False
        allow_object : bool
            Whether to return an object-dtype ndarray instead of raising if the
            data contains more than one timezone.
        allow_mixed : bool, default False
            Interpret integers as timestamps when datetime objects are also present.
    
        Returns
        -------
        result : ndarray
            np.int64 dtype if returned values represent UTC timestamps
            np.datetime64[ns] if returned values represent wall times
            object if mixed timezones
        inferred_tz : tzinfo or None
    
        Raises
        ------
        ValueError : if data cannot be converted to datetimes
        """
        assert errors in ["raise", "ignore", "coerce"]
    
        # if str-dtype, convert
        data = np.array(data, copy=False, dtype=np.object_)
    
        flags = data.flags
        order: Literal["F", "C"] = "F" if flags.f_contiguous else "C"
        try:
            result, tz_parsed = tslib.array_to_datetime(
                data.ravel("K"),
                errors=errors,
                utc=utc,
                dayfirst=dayfirst,
                yearfirst=yearfirst,
                require_iso8601=require_iso8601,
                allow_mixed=allow_mixed,
            )
            result = result.reshape(data.shape, order=order)
        except ValueError as err:
            try:
>               values, tz_parsed = conversion.datetime_to_datetime64(data.ravel("K"))

../conda/envs/dask-3.9/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py:2211: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   TypeError: Unrecognized value type: <class 'str'>

pandas/_libs/tslibs/conversion.pyx:360: TypeError

During handling of the above exception, another exception occurred:

column = Series([], Name: x, dtype: datetime64[ns, UTC]), name = 'x'

    def get_column_metadata(column, name):
        """Produce pandas column metadata block"""
        # from pyarrow.pandas_compat
        # https://github.com/apache/arrow/blob/master/python/pyarrow/pandas_compat.py
        inferred_dtype = infer_dtype(column)
        dtype = column.dtype
        if str(dtype) == "bool":
            # pandas accidentally calls this "boolean"
            inferred_dtype = "bool"
    
        if is_categorical_dtype(dtype):
            extra_metadata = {
                'num_categories': len(column.cat.categories),
                'ordered': column.cat.ordered,
            }
            dtype = column.cat.codes.dtype
        elif hasattr(dtype, 'tz'):
            try:
                stz = str(dtype.tz)
                if "UTC" in stz and ":" in stz:
                    extra_metadata = {'timezone': stz.strip("UTC")}
                elif "pytz" not in stz:
>                   pd.Series([pd.to_datetime('now')]).dt.tz_localize(stz)

../conda/envs/dask-3.9/lib/python3.9/site-packages/fastparquet/util.py:314: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

arg = 'now', errors = 'raise', dayfirst = False, yearfirst = False, utc = None, format = None, exact = True
unit = None, infer_datetime_format = False, origin = 'unix', cache = True

    def to_datetime(
        arg: DatetimeScalarOrArrayConvertible,
        errors: str = "raise",
        dayfirst: bool = False,
        yearfirst: bool = False,
        utc: bool | None = None,
        format: str | None = None,
        exact: bool = True,
        unit: str | None = None,
        infer_datetime_format: bool = False,
        origin="unix",
        cache: bool = True,
    ) -> DatetimeIndex | Series | DatetimeScalar | NaTType | None:
        """
        Convert argument to datetime.
    
        This function converts a scalar, array-like, :class:`Series` or
        :class:`DataFrame`/dict-like to a pandas datetime object.
    
        Parameters
        ----------
        arg : int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like
            The object to convert to a datetime. If a :class:`DataFrame` is provided, the
            method expects minimally the following columns: :const:`"year"`,
            :const:`"month"`, :const:`"day"`.
        errors : {'ignore', 'raise', 'coerce'}, default 'raise'
            - If :const:`'raise'`, then invalid parsing will raise an exception.
            - If :const:`'coerce'`, then invalid parsing will be set as :const:`NaT`.
            - If :const:`'ignore'`, then invalid parsing will return the input.
        dayfirst : bool, default False
            Specify a date parse order if `arg` is str or is list-like.
            If :const:`True`, parses dates with the day first, e.g. :const:`"10/11/12"`
            is parsed as :const:`2012-11-10`.
    
            .. warning::
    
                ``dayfirst=True`` is not strict, but will prefer to parse
                with day first. If a delimited date string cannot be parsed in
                accordance with the given `dayfirst` option, e.g.
                ``to_datetime(['31-12-2021'])``, then a warning will be shown.
    
        yearfirst : bool, default False
            Specify a date parse order if `arg` is str or is list-like.
    
            - If :const:`True` parses dates with the year first, e.g.
              :const:`"10/11/12"` is parsed as :const:`2010-11-12`.
            - If both `dayfirst` and `yearfirst` are :const:`True`, `yearfirst` is
              preceded (same as :mod:`dateutil`).
    
            .. warning::
    
                ``yearfirst=True`` is not strict, but will prefer to parse
                with year first.
    
        utc : bool, default None
            Control timezone-related parsing, localization and conversion.
    
            - If :const:`True`, the function *always* returns a timezone-aware
              UTC-localized :class:`Timestamp`, :class:`Series` or
              :class:`DatetimeIndex`. To do this, timezone-naive inputs are
              *localized* as UTC, while timezone-aware inputs are *converted* to UTC.
    
            - If :const:`False` (default), inputs will not be coerced to UTC.
              Timezone-naive inputs will remain naive, while timezone-aware ones
              will keep their time offsets. Limitations exist for mixed
              offsets (typically, daylight savings), see :ref:`Examples
              <to_datetime_tz_examples>` section for details.
    
            See also: pandas general documentation about `timezone conversion and
            localization
            <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
            #time-zone-handling>`_.
    
        format : str, default None
            The strftime to parse time, e.g. :const:`"%d/%m/%Y"`. Note that
            :const:`"%f"` will parse all the way up to nanoseconds. See
            `strftime documentation
            <https://docs.python.org/3/library/datetime.html
            #strftime-and-strptime-behavior>`_ for more information on choices.
        exact : bool, default True
            Control how `format` is used:
    
            - If :const:`True`, require an exact `format` match.
            - If :const:`False`, allow the `format` to match anywhere in the target
              string.
    
        unit : str, default 'ns'
            The unit of the arg (D,s,ms,us,ns) denote the unit, which is an
            integer or float number. This will be based off the origin.
            Example, with ``unit='ms'`` and ``origin='unix'`` (the default), this
            would calculate the number of milliseconds to the unix epoch start.
        infer_datetime_format : bool, default False
            If :const:`True` and no `format` is given, attempt to infer the format
            of the datetime strings based on the first non-NaN element,
            and if it can be inferred, switch to a faster method of parsing them.
            In some cases this can increase the parsing speed by ~5-10x.
        origin : scalar, default 'unix'
            Define the reference date. The numeric values would be parsed as number
            of units (defined by `unit`) since this reference date.
    
            - If :const:`'unix'` (or POSIX) time; origin is set to 1970-01-01.
            - If :const:`'julian'`, unit must be :const:`'D'`, and origin is set to
              beginning of Julian Calendar. Julian day number :const:`0` is assigned
              to the day starting at noon on January 1, 4713 BC.
            - If Timestamp convertible, origin is set to Timestamp identified by
              origin.
        cache : bool, default True
            If :const:`True`, use a cache of unique, converted dates to apply the
            datetime conversion. May produce significant speed-up when parsing
            duplicate date strings, especially ones with timezone offsets. The cache
            is only used when there are at least 50 values. The presence of
            out-of-bounds values will render the cache unusable and may slow down
            parsing.
    
            .. versionchanged:: 0.25.0
                changed default value from :const:`False` to :const:`True`.
    
        Returns
        -------
        datetime
            If parsing succeeded.
            Return type depends on input (types in parenthesis correspond to
            fallback in case of unsuccessful timezone or out-of-range timestamp
            parsing):
    
            - scalar: :class:`Timestamp` (or :class:`datetime.datetime`)
            - array-like: :class:`DatetimeIndex` (or :class:`Series` with
              :class:`object` dtype containing :class:`datetime.datetime`)
            - Series: :class:`Series` of :class:`datetime64` dtype (or
              :class:`Series` of :class:`object` dtype containing
              :class:`datetime.datetime`)
            - DataFrame: :class:`Series` of :class:`datetime64` dtype (or
              :class:`Series` of :class:`object` dtype containing
              :class:`datetime.datetime`)
    
        Raises
        ------
        ParserError
            When parsing a date from string fails.
        ValueError
            When another datetime conversion error happens. For example when one
            of 'year', 'month', day' columns is missing in a :class:`DataFrame`, or
            when a Timezone-aware :class:`datetime.datetime` is found in an array-like
            of mixed time offsets, and ``utc=False``.
    
        See Also
        --------
        DataFrame.astype : Cast argument to a specified dtype.
        to_timedelta : Convert argument to timedelta.
        convert_dtypes : Convert dtypes.
    
        Notes
        -----
    
        Many input types are supported, and lead to different output types:
    
        - **scalars** can be int, float, str, datetime object (from stdlib :mod:`datetime`
          module or :mod:`numpy`). They are converted to :class:`Timestamp` when
          possible, otherwise they are converted to :class:`datetime.datetime`.
          None/NaN/null scalars are converted to :const:`NaT`.
    
        - **array-like** can contain int, float, str, datetime objects. They are
          converted to :class:`DatetimeIndex` when possible, otherwise they are
          converted to :class:`Index` with :class:`object` dtype, containing
          :class:`datetime.datetime`. None/NaN/null entries are converted to
          :const:`NaT` in both cases.
    
        - **Series** are converted to :class:`Series` with :class:`datetime64`
          dtype when possible, otherwise they are converted to :class:`Series` with
          :class:`object` dtype, containing :class:`datetime.datetime`. None/NaN/null
          entries are converted to :const:`NaT` in both cases.
    
        - **DataFrame/dict-like** are converted to :class:`Series` with
          :class:`datetime64` dtype. For each row a datetime is created from assembling
          the various dataframe columns. Column keys can be common abbreviations
          like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or
          plurals of the same.
    
        The following causes are responsible for :class:`datetime.datetime` objects
        being returned (possibly inside an :class:`Index` or a :class:`Series` with
        :class:`object` dtype) instead of a proper pandas designated type
        (:class:`Timestamp`, :class:`DatetimeIndex` or :class:`Series`
        with :class:`datetime64` dtype):
    
        - when any input element is before :const:`Timestamp.min` or after
          :const:`Timestamp.max`, see `timestamp limitations
          <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
          #timeseries-timestamp-limits>`_.
    
        - when ``utc=False`` (default) and the input is an array-like or
          :class:`Series` containing mixed naive/aware datetime, or aware with mixed
          time offsets. Note that this happens in the (quite frequent) situation when
          the timezone has a daylight savings policy. In that case you may wish to
          use ``utc=True``.
    
        Examples
        --------
    
        **Handling various input formats**
    
        Assembling a datetime from multiple columns of a :class:`DataFrame`. The keys
        can be common abbreviations like ['year', 'month', 'day', 'minute', 'second',
        'ms', 'us', 'ns']) or plurals of the same
    
        >>> df = pd.DataFrame({'year': [2015, 2016],
        ...                    'month': [2, 3],
        ...                    'day': [4, 5]})
        >>> pd.to_datetime(df)
        0   2015-02-04
        1   2016-03-05
        dtype: datetime64[ns]
    
        Passing ``infer_datetime_format=True`` can often-times speedup a parsing
        if its not an ISO8601 format exactly, but in a regular format.
    
        >>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000'] * 1000)
        >>> s.head()
        0    3/11/2000
        1    3/12/2000
        2    3/13/2000
        3    3/11/2000
        4    3/12/2000
        dtype: object
    
        >>> %timeit pd.to_datetime(s, infer_datetime_format=True)  # doctest: +SKIP
        100 loops, best of 3: 10.4 ms per loop
    
        >>> %timeit pd.to_datetime(s, infer_datetime_format=False)  # doctest: +SKIP
        1 loop, best of 3: 471 ms per loop
    
        Using a unix epoch time
    
        >>> pd.to_datetime(1490195805, unit='s')
        Timestamp('2017-03-22 15:16:45')
        >>> pd.to_datetime(1490195805433502912, unit='ns')
        Timestamp('2017-03-22 15:16:45.433502912')
    
        .. warning:: For float arg, precision rounding might happen. To prevent
            unexpected behavior use a fixed-width exact type.
    
        Using a non-unix epoch origin
    
        >>> pd.to_datetime([1, 2, 3], unit='D',
        ...                origin=pd.Timestamp('1960-01-01'))
        DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'],
                      dtype='datetime64[ns]', freq=None)
    
        **Non-convertible date/times**
    
        If a date does not meet the `timestamp limitations
        <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
        #timeseries-timestamp-limits>`_, passing ``errors='ignore'``
        will return the original input instead of raising any exception.
    
        Passing ``errors='coerce'`` will force an out-of-bounds date to :const:`NaT`,
        in addition to forcing non-dates (or non-parseable dates) to :const:`NaT`.
    
        >>> pd.to_datetime('13000101', format='%Y%m%d', errors='ignore')
        datetime.datetime(1300, 1, 1, 0, 0)
        >>> pd.to_datetime('13000101', format='%Y%m%d', errors='coerce')
        NaT
    
        .. _to_datetime_tz_examples:
    
        **Timezones and time offsets**
    
        The default behaviour (``utc=False``) is as follows:
    
        - Timezone-naive inputs are converted to timezone-naive :class:`DatetimeIndex`:
    
        >>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 13:00:15'])
        DatetimeIndex(['2018-10-26 12:00:00', '2018-10-26 13:00:15'],
                      dtype='datetime64[ns]', freq=None)
    
        - Timezone-aware inputs *with constant time offset* are converted to
          timezone-aware :class:`DatetimeIndex`:
    
        >>> pd.to_datetime(['2018-10-26 12:00 -0500', '2018-10-26 13:00 -0500'])
        DatetimeIndex(['2018-10-26 12:00:00-05:00', '2018-10-26 13:00:00-05:00'],
                      dtype='datetime64[ns, pytz.FixedOffset(-300)]', freq=None)
    
        - However, timezone-aware inputs *with mixed time offsets* (for example
          issued from a timezone with daylight savings, such as Europe/Paris)
          are **not successfully converted** to a :class:`DatetimeIndex`. Instead a
          simple :class:`Index` containing :class:`datetime.datetime` objects is
          returned:
    
        >>> pd.to_datetime(['2020-10-25 02:00 +0200', '2020-10-25 04:00 +0100'])
        Index([2020-10-25 02:00:00+02:00, 2020-10-25 04:00:00+01:00],
              dtype='object')
    
        - A mix of timezone-aware and timezone-naive inputs is converted to
          a timezone-aware :class:`DatetimeIndex` if the offsets of the timezone-aware
          are constant:
    
        >>> from datetime import datetime
        >>> pd.to_datetime(["2020-01-01 01:00 -01:00", datetime(2020, 1, 1, 3, 0)])
        DatetimeIndex(['2020-01-01 01:00:00-01:00', '2020-01-01 02:00:00-01:00'],
                      dtype='datetime64[ns, pytz.FixedOffset(-60)]', freq=None)
    
        - Finally, mixing timezone-aware strings and :class:`datetime.datetime` always
          raises an error, even if the elements all have the same time offset.
    
        >>> from datetime import datetime, timezone, timedelta
        >>> d = datetime(2020, 1, 1, 18, tzinfo=timezone(-timedelta(hours=1)))
        >>> pd.to_datetime(["2020-01-01 17:00 -0100", d])
        Traceback (most recent call last):
            ...
        ValueError: Tz-aware datetime.datetime cannot be converted to datetime64
                    unless utc=True
    
        |
    
        Setting ``utc=True`` solves most of the above issues:
    
        - Timezone-naive inputs are *localized* as UTC
    
        >>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 13:00'], utc=True)
        DatetimeIndex(['2018-10-26 12:00:00+00:00', '2018-10-26 13:00:00+00:00'],
                      dtype='datetime64[ns, UTC]', freq=None)
    
        - Timezone-aware inputs are *converted* to UTC (the output represents the
          exact same datetime, but viewed from the UTC time offset `+00:00`).
    
        >>> pd.to_datetime(['2018-10-26 12:00 -0530', '2018-10-26 12:00 -0500'],
        ...                utc=True)
        DatetimeIndex(['2018-10-26 17:30:00+00:00', '2018-10-26 17:00:00+00:00'],
                      dtype='datetime64[ns, UTC]', freq=None)
    
        - Inputs can contain both naive and aware, string or datetime, the above
          rules still apply
    
        >>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 12:00 -0530',
        ...                datetime(2020, 1, 1, 18),
        ...                datetime(2020, 1, 1, 18,
        ...                tzinfo=timezone(-timedelta(hours=1)))],
        ...                utc=True)
        DatetimeIndex(['2018-10-26 12:00:00+00:00', '2018-10-26 17:30:00+00:00',
                       '2020-01-01 18:00:00+00:00', '2020-01-01 19:00:00+00:00'],
                      dtype='datetime64[ns, UTC]', freq=None)
        """
        if arg is None:
            return None
    
        if origin != "unix":
            arg = _adjust_to_origin(arg, origin, unit)
    
        tz = "utc" if utc else None
        convert_listlike = partial(
            _convert_listlike_datetimes,
            tz=tz,
            unit=unit,
            dayfirst=dayfirst,
            yearfirst=yearfirst,
            errors=errors,
            exact=exact,
            infer_datetime_format=infer_datetime_format,
        )
    
        result: Timestamp | NaTType | Series | Index
    
        if isinstance(arg, Timestamp):
            result = arg
            if tz is not None:
                if arg.tz is not None:
                    result = arg.tz_convert(tz)
                else:
                    result = arg.tz_localize(tz)
        elif isinstance(arg, ABCSeries):
            cache_array = _maybe_cache(arg, format, cache, convert_listlike)
            if not cache_array.empty:
                result = arg.map(cache_array)
            else:
                values = convert_listlike(arg._values, format)
                result = arg._constructor(values, index=arg.index, name=arg.name)
        elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):
            result = _assemble_from_unit_mappings(arg, errors, tz)
        elif isinstance(arg, Index):
            cache_array = _maybe_cache(arg, format, cache, convert_listlike)
            if not cache_array.empty:
                result = _convert_and_box_cache(arg, cache_array, name=arg.name)
            else:
                result = convert_listlike(arg, format, name=arg.name)
        elif is_list_like(arg):
            try:
                cache_array = _maybe_cache(arg, format, cache, convert_listlike)
            except OutOfBoundsDatetime:
                # caching attempts to create a DatetimeIndex, which may raise
                # an OOB. If that's the desired behavior, then just reraise...
                if errors == "raise":
                    raise
                # ... otherwise, continue without the cache.
                from pandas import Series
    
                cache_array = Series([], dtype=object)  # just an empty array
            if not cache_array.empty:
                result = _convert_and_box_cache(arg, cache_array)
            else:
                result = convert_listlike(arg, format)
        else:
>           result = convert_listlike(np.array([arg]), format)[0]

../conda/envs/dask-3.9/lib/python3.9/site-packages/pandas/core/tools/datetimes.py:1078: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

arg = array(['now'], dtype=object), format = None, name = None, tz = None, unit = None, errors = 'raise'
infer_datetime_format = False, dayfirst = False, yearfirst = False, exact = True

    def _convert_listlike_datetimes(
        arg,
        format: str | None,
        name: Hashable = None,
        tz: Timezone | None = None,
        unit: str | None = None,
        errors: str = "raise",
        infer_datetime_format: bool = False,
        dayfirst: bool | None = None,
        yearfirst: bool | None = None,
        exact: bool = True,
    ):
        """
        Helper function for to_datetime. Performs the conversions of 1D listlike
        of dates
    
        Parameters
        ----------
        arg : list, tuple, ndarray, Series, Index
            date to be parsed
        name : object
            None or string for the Index name
        tz : object
            None or 'utc'
        unit : str
            None or string of the frequency of the passed data
        errors : str
            error handing behaviors from to_datetime, 'raise', 'coerce', 'ignore'
        infer_datetime_format : bool, default False
            inferring format behavior from to_datetime
        dayfirst : bool
            dayfirst parsing behavior from to_datetime
        yearfirst : bool
            yearfirst parsing behavior from to_datetime
        exact : bool, default True
            exact format matching behavior from to_datetime
    
        Returns
        -------
        Index-like of parsed dates
        """
        if isinstance(arg, (list, tuple)):
            arg = np.array(arg, dtype="O")
    
        arg_dtype = getattr(arg, "dtype", None)
        # these are shortcutable
        if is_datetime64tz_dtype(arg_dtype):
            if not isinstance(arg, (DatetimeArray, DatetimeIndex)):
                return DatetimeIndex(arg, tz=tz, name=name)
            if tz == "utc":
                arg = arg.tz_convert(None).tz_localize(tz)
            return arg
    
        elif is_datetime64_ns_dtype(arg_dtype):
            if not isinstance(arg, (DatetimeArray, DatetimeIndex)):
                try:
                    return DatetimeIndex(arg, tz=tz, name=name)
                except ValueError:
                    pass
            elif tz:
                # DatetimeArray, DatetimeIndex
                return arg.tz_localize(tz)
    
            return arg
    
        elif unit is not None:
            if format is not None:
                raise ValueError("cannot specify both format and unit")
            return _to_datetime_with_unit(arg, unit, name, tz, errors)
        elif getattr(arg, "ndim", 1) > 1:
            raise TypeError(
                "arg must be a string, datetime, list, tuple, 1-d array, or Series"
            )
    
        # warn if passing timedelta64, raise for PeriodDtype
        # NB: this must come after unit transformation
        orig_arg = arg
        try:
            arg, _ = maybe_convert_dtype(arg, copy=False)
        except TypeError:
            if errors == "coerce":
                npvalues = np.array(["NaT"], dtype="datetime64[ns]").repeat(len(arg))
                return DatetimeIndex(npvalues, name=name)
            elif errors == "ignore":
                idx = Index(arg, name=name)
                return idx
            raise
    
        arg = ensure_object(arg)
        require_iso8601 = False
    
        if infer_datetime_format and format is None:
            format = _guess_datetime_format_for_array(arg, dayfirst=dayfirst)
    
        if format is not None:
            # There is a special fast-path for iso8601 formatted
            # datetime strings, so in those cases don't use the inferred
            # format because this path makes process slower in this
            # special case
            format_is_iso8601 = format_is_iso(format)
            if format_is_iso8601:
                require_iso8601 = not infer_datetime_format
                format = None
    
        if format is not None:
            res = _to_datetime_with_format(
                arg, orig_arg, name, tz, format, exact, errors, infer_datetime_format
            )
            if res is not None:
                return res
    
        assert format is None or infer_datetime_format
        utc = tz == "utc"
>       result, tz_parsed = objects_to_datetime64ns(
            arg,
            dayfirst=dayfirst,
            yearfirst=yearfirst,
            utc=utc,
            errors=errors,
            require_iso8601=require_iso8601,
            allow_object=True,
        )

../conda/envs/dask-3.9/lib/python3.9/site-packages/pandas/core/tools/datetimes.py:402: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

data = array(['now'], dtype=object), dayfirst = False, yearfirst = False, utc = False, errors = 'raise'
require_iso8601 = False, allow_object = True, allow_mixed = False

    def objects_to_datetime64ns(
        data: np.ndarray,
        dayfirst,
        yearfirst,
        utc=False,
        errors="raise",
        require_iso8601: bool = False,
        allow_object: bool = False,
        allow_mixed: bool = False,
    ):
        """
        Convert data to array of timestamps.
    
        Parameters
        ----------
        data : np.ndarray[object]
        dayfirst : bool
        yearfirst : bool
        utc : bool, default False
            Whether to convert timezone-aware timestamps to UTC.
        errors : {'raise', 'ignore', 'coerce'}
        require_iso8601 : bool, default False
        allow_object : bool
            Whether to return an object-dtype ndarray instead of raising if the
            data contains more than one timezone.
        allow_mixed : bool, default False
            Interpret integers as timestamps when datetime objects are also present.
    
        Returns
        -------
        result : ndarray
            np.int64 dtype if returned values represent UTC timestamps
            np.datetime64[ns] if returned values represent wall times
            object if mixed timezones
        inferred_tz : tzinfo or None
    
        Raises
        ------
        ValueError : if data cannot be converted to datetimes
        """
        assert errors in ["raise", "ignore", "coerce"]
    
        # if str-dtype, convert
        data = np.array(data, copy=False, dtype=np.object_)
    
        flags = data.flags
        order: Literal["F", "C"] = "F" if flags.f_contiguous else "C"
        try:
            result, tz_parsed = tslib.array_to_datetime(
                data.ravel("K"),
                errors=errors,
                utc=utc,
                dayfirst=dayfirst,
                yearfirst=yearfirst,
                require_iso8601=require_iso8601,
                allow_mixed=allow_mixed,
            )
            result = result.reshape(data.shape, order=order)
        except ValueError as err:
            try:
                values, tz_parsed = conversion.datetime_to_datetime64(data.ravel("K"))
                # If tzaware, these values represent unix timestamps, so we
                #  return them as i8 to distinguish from wall times
                values = values.reshape(data.shape, order=order)
                return values.view("i8"), tz_parsed
            except (ValueError, TypeError):
>               raise err

../conda/envs/dask-3.9/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py:2217: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

data = array(['now'], dtype=object), dayfirst = False, yearfirst = False, utc = False, errors = 'raise'
require_iso8601 = False, allow_object = True, allow_mixed = False

    def objects_to_datetime64ns(
        data: np.ndarray,
        dayfirst,
        yearfirst,
        utc=False,
        errors="raise",
        require_iso8601: bool = False,
        allow_object: bool = False,
        allow_mixed: bool = False,
    ):
        """
        Convert data to array of timestamps.
    
        Parameters
        ----------
        data : np.ndarray[object]
        dayfirst : bool
        yearfirst : bool
        utc : bool, default False
            Whether to convert timezone-aware timestamps to UTC.
        errors : {'raise', 'ignore', 'coerce'}
        require_iso8601 : bool, default False
        allow_object : bool
            Whether to return an object-dtype ndarray instead of raising if the
            data contains more than one timezone.
        allow_mixed : bool, default False
            Interpret integers as timestamps when datetime objects are also present.
    
        Returns
        -------
        result : ndarray
            np.int64 dtype if returned values represent UTC timestamps
            np.datetime64[ns] if returned values represent wall times
            object if mixed timezones
        inferred_tz : tzinfo or None
    
        Raises
        ------
        ValueError : if data cannot be converted to datetimes
        """
        assert errors in ["raise", "ignore", "coerce"]
    
        # if str-dtype, convert
        data = np.array(data, copy=False, dtype=np.object_)
    
        flags = data.flags
        order: Literal["F", "C"] = "F" if flags.f_contiguous else "C"
        try:
>           result, tz_parsed = tslib.array_to_datetime(
                data.ravel("K"),
                errors=errors,
                utc=utc,
                dayfirst=dayfirst,
                yearfirst=yearfirst,
                require_iso8601=require_iso8601,
                allow_mixed=allow_mixed,
            )

../conda/envs/dask-3.9/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py:2199: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???

pandas/_libs/tslib.pyx:381: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???

pandas/_libs/tslib.pyx:613: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???

pandas/_libs/tslib.pyx:751: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???

pandas/_libs/tslib.pyx:742: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???

pandas/_libs/tslibs/parsing.pyx:281: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

timestr = 'now', parserinfo = None
kwargs = {'dayfirst': False, 'default': datetime.datetime(1, 1, 1, 0, 0), 'yearfirst': False}

    def parse(timestr, parserinfo=None, **kwargs):
        """
    
        Parse a string in one of the supported formats, using the
        ``parserinfo`` parameters.
    
        :param timestr:
            A string containing a date/time stamp.
    
        :param parserinfo:
            A :class:`parserinfo` object containing parameters for the parser.
            If ``None``, the default arguments to the :class:`parserinfo`
            constructor are used.
    
        The ``**kwargs`` parameter takes the following keyword arguments:
    
        :param default:
            The default datetime object, if this is a datetime object and not
            ``None``, elements specified in ``timestr`` replace elements in the
            default object.
    
        :param ignoretz:
            If set ``True``, time zones in parsed strings are ignored and a naive
            :class:`datetime` object is returned.
    
        :param tzinfos:
            Additional time zone names / aliases which may be present in the
            string. This argument maps time zone names (and optionally offsets
            from those time zones) to time zones. This parameter can be a
            dictionary with timezone aliases mapping time zone names to time
            zones or a function taking two parameters (``tzname`` and
            ``tzoffset``) and returning a time zone.
    
            The timezones to which the names are mapped can be an integer
            offset from UTC in seconds or a :class:`tzinfo` object.
    
            .. doctest::
               :options: +NORMALIZE_WHITESPACE
    
                >>> from dateutil.parser import parse
                >>> from dateutil.tz import gettz
                >>> tzinfos = {"BRST": -7200, "CST": gettz("America/Chicago")}
                >>> parse("2012-01-19 17:21:00 BRST", tzinfos=tzinfos)
                datetime.datetime(2012, 1, 19, 17, 21, tzinfo=tzoffset(u'BRST', -7200))
                >>> parse("2012-01-19 17:21:00 CST", tzinfos=tzinfos)
                datetime.datetime(2012, 1, 19, 17, 21,
                                  tzinfo=tzfile('/usr/share/zoneinfo/America/Chicago'))
    
            This parameter is ignored if ``ignoretz`` is set.
    
        :param dayfirst:
            Whether to interpret the first value in an ambiguous 3-integer date
            (e.g. 01/05/09) as the day (``True``) or month (``False``). If
            ``yearfirst`` is set to ``True``, this distinguishes between YDM and
            YMD. If set to ``None``, this value is retrieved from the current
            :class:`parserinfo` object (which itself defaults to ``False``).
    
        :param yearfirst:
            Whether to interpret the first value in an ambiguous 3-integer date
            (e.g. 01/05/09) as the year. If ``True``, the first number is taken to
            be the year, otherwise the last number is taken to be the year. If
            this is set to ``None``, the value is retrieved from the current
            :class:`parserinfo` object (which itself defaults to ``False``).
    
        :param fuzzy:
            Whether to allow fuzzy parsing, allowing for string like "Today is
            January 1, 2047 at 8:21:00AM".
    
        :param fuzzy_with_tokens:
            If ``True``, ``fuzzy`` is automatically set to True, and the parser
            will return a tuple where the first element is the parsed
            :class:`datetime.datetime` datetimestamp and the second element is
            a tuple containing the portions of the string which were ignored:
    
            .. doctest::
    
                >>> from dateutil.parser import parse
                >>> parse("Today is January 1, 2047 at 8:21:00AM", fuzzy_with_tokens=True)
                (datetime.datetime(2047, 1, 1, 8, 21), (u'Today is ', u' ', u'at '))
    
        :return:
            Returns a :class:`datetime.datetime` object or, if the
            ``fuzzy_with_tokens`` option is ``True``, returns a tuple, the
            first element being a :class:`datetime.datetime` object, the second
            a tuple containing the fuzzy tokens.
    
        :raises ParserError:
            Raised for invalid or unknown string formats, if the provided
            :class:`tzinfo` is not in a valid format, or if an invalid date would
            be created.
    
        :raises OverflowError:
            Raised if the parsed date exceeds the largest valid C integer on
            your system.
        """
        if parserinfo:
            return parser(parserinfo).parse(timestr, **kwargs)
        else:
>           return DEFAULTPARSER.parse(timestr, **kwargs)

../conda/envs/dask-3.9/lib/python3.9/site-packages/dateutil/parser/_parser.py:1368: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <dateutil.parser._parser.parser object at 0x7f59c25b9550>, timestr = 'now'
default = datetime.datetime(1, 1, 1, 0, 0), ignoretz = False, tzinfos = None
kwargs = {'dayfirst': False, 'yearfirst': False}, res = None, skipped_tokens = None

    def parse(self, timestr, default=None,
              ignoretz=False, tzinfos=None, **kwargs):
        """
        Parse the date/time string into a :class:`datetime.datetime` object.
    
        :param timestr:
            Any date/time string using the supported formats.
    
        :param default:
            The default datetime object, if this is a datetime object and not
            ``None``, elements specified in ``timestr`` replace elements in the
            default object.
    
        :param ignoretz:
            If set ``True``, time zones in parsed strings are ignored and a
            naive :class:`datetime.datetime` object is returned.
    
        :param tzinfos:
            Additional time zone names / aliases which may be present in the
            string. This argument maps time zone names (and optionally offsets
            from those time zones) to time zones. This parameter can be a
            dictionary with timezone aliases mapping time zone names to time
            zones or a function taking two parameters (``tzname`` and
            ``tzoffset``) and returning a time zone.
    
            The timezones to which the names are mapped can be an integer
            offset from UTC in seconds or a :class:`tzinfo` object.
    
            .. doctest::
               :options: +NORMALIZE_WHITESPACE
    
                >>> from dateutil.parser import parse
                >>> from dateutil.tz import gettz
                >>> tzinfos = {"BRST": -7200, "CST": gettz("America/Chicago")}
                >>> parse("2012-01-19 17:21:00 BRST", tzinfos=tzinfos)
                datetime.datetime(2012, 1, 19, 17, 21, tzinfo=tzoffset(u'BRST', -7200))
                >>> parse("2012-01-19 17:21:00 CST", tzinfos=tzinfos)
                datetime.datetime(2012, 1, 19, 17, 21,
                                  tzinfo=tzfile('/usr/share/zoneinfo/America/Chicago'))
    
            This parameter is ignored if ``ignoretz`` is set.
    
        :param \\*\\*kwargs:
            Keyword arguments as passed to ``_parse()``.
    
        :return:
            Returns a :class:`datetime.datetime` object or, if the
            ``fuzzy_with_tokens`` option is ``True``, returns a tuple, the
            first element being a :class:`datetime.datetime` object, the second
            a tuple containing the fuzzy tokens.
    
        :raises ParserError:
            Raised for invalid or unknown string format, if the provided
            :class:`tzinfo` is not in a valid format, or if an invalid date
            would be created.
    
        :raises TypeError:
            Raised for non-string or character stream input.
    
        :raises OverflowError:
            Raised if the parsed date exceeds the largest valid C integer on
            your system.
        """
    
        if default is None:
            default = datetime.datetime.now().replace(hour=0, minute=0,
                                                      second=0, microsecond=0)
    
        res, skipped_tokens = self._parse(timestr, **kwargs)
    
        if res is None:
>           raise ParserError("Unknown string format: %s", timestr)
E           dateutil.parser._parser.ParserError: Unknown string format: now

../conda/envs/dask-3.9/lib/python3.9/site-packages/dateutil/parser/_parser.py:643: ParserError

The above exception was the direct cause of the following exception:

tmpdir = local('/tmp/pytest-of-julia/pytest-132/test_roundtrip_fastparquet_df10')
df =                                      x
index                                 
0     1970-01-01 00:00:00.000003+00:00
1     1970-01-01 00:00:00.000002+00:00
2     1970-01-01 00:00:00.000001+00:00
write_kwargs = {}, read_kwargs = {}, engine = 'fastparquet'

    @pytest.mark.parametrize(
        "df,write_kwargs,read_kwargs",
        [
            (pd.DataFrame({"x": [3, 2, 1]}), {}, {}),
            (pd.DataFrame({"x": ["c", "a", "b"]}), {}, {}),
            (pd.DataFrame({"x": ["cc", "a", "bbb"]}), {}, {}),
            (pd.DataFrame({"x": [b"a", b"b", b"c"]}), {"object_encoding": "bytes"}, {}),
            (
                pd.DataFrame({"x": pd.Categorical(["a", "b", "a"])}),
                {},
                {"categories": ["x"]},
            ),
            (pd.DataFrame({"x": pd.Categorical([1, 2, 1])}), {}, {"categories": ["x"]}),
            (pd.DataFrame({"x": list(map(pd.Timestamp, [3000, 2000, 1000]))}), {}, {}),
            (pd.DataFrame({"x": [3000, 2000, 1000]}).astype("M8[ns]"), {}, {}),
            pytest.param(
                pd.DataFrame({"x": [3, 2, 1]}).astype("M8[ns]"),
                {},
                {},
            ),
            (pd.DataFrame({"x": [3, 2, 1]}).astype("M8[us]"), {}, {}),
            (pd.DataFrame({"x": [3, 2, 1]}).astype("M8[ms]"), {}, {}),
            (pd.DataFrame({"x": [3000, 2000, 1000]}).astype("datetime64[ns]"), {}, {}),
            (pd.DataFrame({"x": [3000, 2000, 1000]}).astype("datetime64[ns, UTC]"), {}, {}),
            (pd.DataFrame({"x": [3000, 2000, 1000]}).astype("datetime64[ns, CET]"), {}, {}),
            (pd.DataFrame({"x": [3, 2, 1]}).astype("uint16"), {}, {}),
            (pd.DataFrame({"x": [3, 2, 1]}).astype("float32"), {}, {}),
            (pd.DataFrame({"x": [3, 1, 2]}, index=[3, 2, 1]), {}, {}),
            (pd.DataFrame({"x": [3, 1, 5]}, index=pd.Index([1, 2, 3], name="foo")), {}, {}),
            (pd.DataFrame({"x": [1, 2, 3], "y": [3, 2, 1]}), {}, {}),
            (pd.DataFrame({"x": [1, 2, 3], "y": [3, 2, 1]}, columns=["y", "x"]), {}, {}),
            (pd.DataFrame({"0": [3, 2, 1]}), {}, {}),
            (pd.DataFrame({"x": [3, 2, None]}), {}, {}),
            (pd.DataFrame({"-": [3.0, 2.0, None]}), {}, {}),
            (pd.DataFrame({".": [3.0, 2.0, None]}), {}, {}),
            (pd.DataFrame({" ": [3.0, 2.0, None]}), {}, {}),
        ],
    )
    def test_roundtrip(tmpdir, df, write_kwargs, read_kwargs, engine):
        if "x" in df and df.x.dtype == "M8[ns]" and "arrow" in engine:
            pytest.xfail(reason="Parquet pyarrow v1 doesn't support nanosecond precision")
        if (
            "x" in df
            and df.x.dtype == "M8[ns]"
            and engine == "fastparquet"
            and fastparquet_version <= parse_version("0.6.3")
        ):
            pytest.xfail(reason="fastparquet doesn't support nanosecond precision yet")
        if (
            PANDAS_GT_130
            and read_kwargs.get("categories", None)
            and engine == "fastparquet"
            and fastparquet_version <= parse_version("0.6.3")
        ):
            pytest.xfail("https://github.com/dask/fastparquet/issues/577")
    
        tmp = str(tmpdir)
        if df.index.name is None:
            df.index.name = "index"
        ddf = dd.from_pandas(df, npartitions=2)
    
        oe = write_kwargs.pop("object_encoding", None)
        if oe and engine == "fastparquet":
            dd.to_parquet(ddf, tmp, engine=engine, object_encoding=oe, **write_kwargs)
        else:
>           dd.to_parquet(ddf, tmp, engine=engine, **write_kwargs)

dask/dataframe/io/tests/test_parquet.py:988: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dask/dataframe/io/parquet/core.py:701: in to_parquet
    meta, schema, i_offset = engine.initialize_write(
dask/dataframe/io/parquet/fastparquet.py:1206: in initialize_write
    fmd = fastparquet.writer.make_metadata(
../conda/envs/dask-3.9/lib/python3.9/site-packages/fastparquet/writer.py:740: in make_metadata
    get_column_metadata(data[column], column))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

column = Series([], Name: x, dtype: datetime64[ns, UTC]), name = 'x'

    def get_column_metadata(column, name):
        """Produce pandas column metadata block"""
        # from pyarrow.pandas_compat
        # https://github.com/apache/arrow/blob/master/python/pyarrow/pandas_compat.py
        inferred_dtype = infer_dtype(column)
        dtype = column.dtype
        if str(dtype) == "bool":
            # pandas accidentally calls this "boolean"
            inferred_dtype = "bool"
    
        if is_categorical_dtype(dtype):
            extra_metadata = {
                'num_categories': len(column.cat.categories),
                'ordered': column.cat.ordered,
            }
            dtype = column.cat.codes.dtype
        elif hasattr(dtype, 'tz'):
            try:
                stz = str(dtype.tz)
                if "UTC" in stz and ":" in stz:
                    extra_metadata = {'timezone': stz.strip("UTC")}
                elif "pytz" not in stz:
                    pd.Series([pd.to_datetime('now')]).dt.tz_localize(stz)
                    extra_metadata = {'timezone': str(dtype.tz)}
                elif "Offset" in stz:
                    import pytz
                    extra_metadata = {'timezone': f"{dtype.tz._minutes // 60:+03}:00"}
                else:
                    raise KeyError
            except Exception as e:
>               raise ValueError("Time-zone information could not be serialised: "
                                 "%s, please use another" % str(dtype.tz)) from e
E               ValueError: Time-zone information could not be serialised: UTC, please use another

../conda/envs/dask-3.9/lib/python3.9/site-packages/fastparquet/util.py:322: ValueError
---------------------------------------------- Captured stderr call -----------------------------------------------
FutureWarning: The parsing of 'now' in pd.to_datetime without `utc=True` is deprecated. In a future version, this will match Timestamp('now') and Timestamp.now()
__________________________ test_roundtrip[fastparquet-df13-write_kwargs13-read_kwargs13] __________________________

data = array(['now'], dtype=object), dayfirst = False, yearfirst = False, utc = False, errors = 'raise'
require_iso8601 = False, allow_object = True, allow_mixed = False

    def objects_to_datetime64ns(
        data: np.ndarray,
        dayfirst,
        yearfirst,
        utc=False,
        errors="raise",
        require_iso8601: bool = False,
        allow_object: bool = False,
        allow_mixed: bool = False,
    ):
        """
        Convert data to array of timestamps.
    
        Parameters
        ----------
        data : np.ndarray[object]
        dayfirst : bool
        yearfirst : bool
        utc : bool, default False
            Whether to convert timezone-aware timestamps to UTC.
        errors : {'raise', 'ignore', 'coerce'}
        require_iso8601 : bool, default False
        allow_object : bool
            Whether to return an object-dtype ndarray instead of raising if the
            data contains more than one timezone.
        allow_mixed : bool, default False
            Interpret integers as timestamps when datetime objects are also present.
    
        Returns
        -------
        result : ndarray
            np.int64 dtype if returned values represent UTC timestamps
            np.datetime64[ns] if returned values represent wall times
            object if mixed timezones
        inferred_tz : tzinfo or None
    
        Raises
        ------
        ValueError : if data cannot be converted to datetimes
        """
        assert errors in ["raise", "ignore", "coerce"]
    
        # if str-dtype, convert
        data = np.array(data, copy=False, dtype=np.object_)
    
        flags = data.flags
        order: Literal["F", "C"] = "F" if flags.f_contiguous else "C"
        try:
            result, tz_parsed = tslib.array_to_datetime(
                data.ravel("K"),
                errors=errors,
                utc=utc,
                dayfirst=dayfirst,
                yearfirst=yearfirst,
                require_iso8601=require_iso8601,
                allow_mixed=allow_mixed,
            )
            result = result.reshape(data.shape, order=order)
        except ValueError as err:
            try:
>               values, tz_parsed = conversion.datetime_to_datetime64(data.ravel("K"))

../conda/envs/dask-3.9/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py:2211: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   TypeError: Unrecognized value type: <class 'str'>

pandas/_libs/tslibs/conversion.pyx:360: TypeError

During handling of the above exception, another exception occurred:

column = Series([], Name: x, dtype: datetime64[ns, CET]), name = 'x'

    def get_column_metadata(column, name):
        """Produce pandas column metadata block"""
        # from pyarrow.pandas_compat
        # https://github.com/apache/arrow/blob/master/python/pyarrow/pandas_compat.py
        inferred_dtype = infer_dtype(column)
        dtype = column.dtype
        if str(dtype) == "bool":
            # pandas accidentally calls this "boolean"
            inferred_dtype = "bool"
    
        if is_categorical_dtype(dtype):
            extra_metadata = {
                'num_categories': len(column.cat.categories),
                'ordered': column.cat.ordered,
            }
            dtype = column.cat.codes.dtype
        elif hasattr(dtype, 'tz'):
            try:
                stz = str(dtype.tz)
                if "UTC" in stz and ":" in stz:
                    extra_metadata = {'timezone': stz.strip("UTC")}
                elif "pytz" not in stz:
>                   pd.Series([pd.to_datetime('now')]).dt.tz_localize(stz)

../conda/envs/dask-3.9/lib/python3.9/site-packages/fastparquet/util.py:314: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

arg = 'now', errors = 'raise', dayfirst = False, yearfirst = False, utc = None, format = None, exact = True
unit = None, infer_datetime_format = False, origin = 'unix', cache = True

    def to_datetime(
        arg: DatetimeScalarOrArrayConvertible,
        errors: str = "raise",
        dayfirst: bool = False,
        yearfirst: bool = False,
        utc: bool | None = None,
        format: str | None = None,
        exact: bool = True,
        unit: str | None = None,
        infer_datetime_format: bool = False,
        origin="unix",
        cache: bool = True,
    ) -> DatetimeIndex | Series | DatetimeScalar | NaTType | None:
        """
        Convert argument to datetime.
    
        This function converts a scalar, array-like, :class:`Series` or
        :class:`DataFrame`/dict-like to a pandas datetime object.
    
        Parameters
        ----------
        arg : int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like
            The object to convert to a datetime. If a :class:`DataFrame` is provided, the
            method expects minimally the following columns: :const:`"year"`,
            :const:`"month"`, :const:`"day"`.
        errors : {'ignore', 'raise', 'coerce'}, default 'raise'
            - If :const:`'raise'`, then invalid parsing will raise an exception.
            - If :const:`'coerce'`, then invalid parsing will be set as :const:`NaT`.
            - If :const:`'ignore'`, then invalid parsing will return the input.
        dayfirst : bool, default False
            Specify a date parse order if `arg` is str or is list-like.
            If :const:`True`, parses dates with the day first, e.g. :const:`"10/11/12"`
            is parsed as :const:`2012-11-10`.
    
            .. warning::
    
                ``dayfirst=True`` is not strict, but will prefer to parse
                with day first. If a delimited date string cannot be parsed in
                accordance with the given `dayfirst` option, e.g.
                ``to_datetime(['31-12-2021'])``, then a warning will be shown.
    
        yearfirst : bool, default False
            Specify a date parse order if `arg` is str or is list-like.
    
            - If :const:`True` parses dates with the year first, e.g.
              :const:`"10/11/12"` is parsed as :const:`2010-11-12`.
            - If both `dayfirst` and `yearfirst` are :const:`True`, `yearfirst` is
              preceded (same as :mod:`dateutil`).
    
            .. warning::
    
                ``yearfirst=True`` is not strict, but will prefer to parse
                with year first.
    
        utc : bool, default None
            Control timezone-related parsing, localization and conversion.
    
            - If :const:`True`, the function *always* returns a timezone-aware
              UTC-localized :class:`Timestamp`, :class:`Series` or
              :class:`DatetimeIndex`. To do this, timezone-naive inputs are
              *localized* as UTC, while timezone-aware inputs are *converted* to UTC.
    
            - If :const:`False` (default), inputs will not be coerced to UTC.
              Timezone-naive inputs will remain naive, while timezone-aware ones
              will keep their time offsets. Limitations exist for mixed
              offsets (typically, daylight savings), see :ref:`Examples
              <to_datetime_tz_examples>` section for details.
    
            See also: pandas general documentation about `timezone conversion and
            localization
            <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
            #time-zone-handling>`_.
    
        format : str, default None
            The strftime to parse time, e.g. :const:`"%d/%m/%Y"`. Note that
            :const:`"%f"` will parse all the way up to nanoseconds. See
            `strftime documentation
            <https://docs.python.org/3/library/datetime.html
            #strftime-and-strptime-behavior>`_ for more information on choices.
        exact : bool, default True
            Control how `format` is used:
    
            - If :const:`True`, require an exact `format` match.
            - If :const:`False`, allow the `format` to match anywhere in the target
              string.
    
        unit : str, default 'ns'
            The unit of the arg (D,s,ms,us,ns) denote the unit, which is an
            integer or float number. This will be based off the origin.
            Example, with ``unit='ms'`` and ``origin='unix'`` (the default), this
            would calculate the number of milliseconds to the unix epoch start.
        infer_datetime_format : bool, default False
            If :const:`True` and no `format` is given, attempt to infer the format
            of the datetime strings based on the first non-NaN element,
            and if it can be inferred, switch to a faster method of parsing them.
            In some cases this can increase the parsing speed by ~5-10x.
        origin : scalar, default 'unix'
            Define the reference date. The numeric values would be parsed as number
            of units (defined by `unit`) since this reference date.
    
            - If :const:`'unix'` (or POSIX) time; origin is set to 1970-01-01.
            - If :const:`'julian'`, unit must be :const:`'D'`, and origin is set to
              beginning of Julian Calendar. Julian day number :const:`0` is assigned
              to the day starting at noon on January 1, 4713 BC.
            - If Timestamp convertible, origin is set to Timestamp identified by
              origin.
        cache : bool, default True
            If :const:`True`, use a cache of unique, converted dates to apply the
            datetime conversion. May produce significant speed-up when parsing
            duplicate date strings, especially ones with timezone offsets. The cache
            is only used when there are at least 50 values. The presence of
            out-of-bounds values will render the cache unusable and may slow down
            parsing.
    
            .. versionchanged:: 0.25.0
                changed default value from :const:`False` to :const:`True`.
    
        Returns
        -------
        datetime
            If parsing succeeded.
            Return type depends on input (types in parenthesis correspond to
            fallback in case of unsuccessful timezone or out-of-range timestamp
            parsing):
    
            - scalar: :class:`Timestamp` (or :class:`datetime.datetime`)
            - array-like: :class:`DatetimeIndex` (or :class:`Series` with
              :class:`object` dtype containing :class:`datetime.datetime`)
            - Series: :class:`Series` of :class:`datetime64` dtype (or
              :class:`Series` of :class:`object` dtype containing
              :class:`datetime.datetime`)
            - DataFrame: :class:`Series` of :class:`datetime64` dtype (or
              :class:`Series` of :class:`object` dtype containing
              :class:`datetime.datetime`)
    
        Raises
        ------
        ParserError
            When parsing a date from string fails.
        ValueError
            When another datetime conversion error happens. For example when one
            of 'year', 'month', day' columns is missing in a :class:`DataFrame`, or
            when a Timezone-aware :class:`datetime.datetime` is found in an array-like
            of mixed time offsets, and ``utc=False``.
    
        See Also
        --------
        DataFrame.astype : Cast argument to a specified dtype.
        to_timedelta : Convert argument to timedelta.
        convert_dtypes : Convert dtypes.
    
        Notes
        -----
    
        Many input types are supported, and lead to different output types:
    
        - **scalars** can be int, float, str, datetime object (from stdlib :mod:`datetime`
          module or :mod:`numpy`). They are converted to :class:`Timestamp` when
          possible, otherwise they are converted to :class:`datetime.datetime`.
          None/NaN/null scalars are converted to :const:`NaT`.
    
        - **array-like** can contain int, float, str, datetime objects. They are
          converted to :class:`DatetimeIndex` when possible, otherwise they are
          converted to :class:`Index` with :class:`object` dtype, containing
          :class:`datetime.datetime`. None/NaN/null entries are converted to
          :const:`NaT` in both cases.
    
        - **Series** are converted to :class:`Series` with :class:`datetime64`
          dtype when possible, otherwise they are converted to :class:`Series` with
          :class:`object` dtype, containing :class:`datetime.datetime`. None/NaN/null
          entries are converted to :const:`NaT` in both cases.
    
        - **DataFrame/dict-like** are converted to :class:`Series` with
          :class:`datetime64` dtype. For each row a datetime is created from assembling
          the various dataframe columns. Column keys can be common abbreviations
          like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or
          plurals of the same.
    
        The following causes are responsible for :class:`datetime.datetime` objects
        being returned (possibly inside an :class:`Index` or a :class:`Series` with
        :class:`object` dtype) instead of a proper pandas designated type
        (:class:`Timestamp`, :class:`DatetimeIndex` or :class:`Series`
        with :class:`datetime64` dtype):
    
        - when any input element is before :const:`Timestamp.min` or after
          :const:`Timestamp.max`, see `timestamp limitations
          <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
          #timeseries-timestamp-limits>`_.
    
        - when ``utc=False`` (default) and the input is an array-like or
          :class:`Series` containing mixed naive/aware datetime, or aware with mixed
          time offsets. Note that this happens in the (quite frequent) situation when
          the timezone has a daylight savings policy. In that case you may wish to
          use ``utc=True``.
    
        Examples
        --------
    
        **Handling various input formats**
    
        Assembling a datetime from multiple columns of a :class:`DataFrame`. The keys
        can be common abbreviations like ['year', 'month', 'day', 'minute', 'second',
        'ms', 'us', 'ns']) or plurals of the same
    
        >>> df = pd.DataFrame({'year': [2015, 2016],
        ...                    'month': [2, 3],
        ...                    'day': [4, 5]})
        >>> pd.to_datetime(df)
        0   2015-02-04
        1   2016-03-05
        dtype: datetime64[ns]
    
        Passing ``infer_datetime_format=True`` can often-times speedup a parsing
        if its not an ISO8601 format exactly, but in a regular format.
    
        >>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000'] * 1000)
        >>> s.head()
        0    3/11/2000
        1    3/12/2000
        2    3/13/2000
        3    3/11/2000
        4    3/12/2000
        dtype: object
    
        >>> %timeit pd.to_datetime(s, infer_datetime_format=True)  # doctest: +SKIP
        100 loops, best of 3: 10.4 ms per loop
    
        >>> %timeit pd.to_datetime(s, infer_datetime_format=False)  # doctest: +SKIP
        1 loop, best of 3: 471 ms per loop
    
        Using a unix epoch time
    
        >>> pd.to_datetime(1490195805, unit='s')
        Timestamp('2017-03-22 15:16:45')
        >>> pd.to_datetime(1490195805433502912, unit='ns')
        Timestamp('2017-03-22 15:16:45.433502912')
    
        .. warning:: For float arg, precision rounding might happen. To prevent
            unexpected behavior use a fixed-width exact type.
    
        Using a non-unix epoch origin
    
        >>> pd.to_datetime([1, 2, 3], unit='D',
        ...                origin=pd.Timestamp('1960-01-01'))
        DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'],
                      dtype='datetime64[ns]', freq=None)
    
        **Non-convertible date/times**
    
        If a date does not meet the `timestamp limitations
        <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
        #timeseries-timestamp-limits>`_, passing ``errors='ignore'``
        will return the original input instead of raising any exception.
    
        Passing ``errors='coerce'`` will force an out-of-bounds date to :const:`NaT`,
        in addition to forcing non-dates (or non-parseable dates) to :const:`NaT`.
    
        >>> pd.to_datetime('13000101', format='%Y%m%d', errors='ignore')
        datetime.datetime(1300, 1, 1, 0, 0)
        >>> pd.to_datetime('13000101', format='%Y%m%d', errors='coerce')
        NaT
    
        .. _to_datetime_tz_examples:
    
        **Timezones and time offsets**
    
        The default behaviour (``utc=False``) is as follows:
    
        - Timezone-naive inputs are converted to timezone-naive :class:`DatetimeIndex`:
    
        >>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 13:00:15'])
        DatetimeIndex(['2018-10-26 12:00:00', '2018-10-26 13:00:15'],
                      dtype='datetime64[ns]', freq=None)
    
        - Timezone-aware inputs *with constant time offset* are converted to
          timezone-aware :class:`DatetimeIndex`:
    
        >>> pd.to_datetime(['2018-10-26 12:00 -0500', '2018-10-26 13:00 -0500'])
        DatetimeIndex(['2018-10-26 12:00:00-05:00', '2018-10-26 13:00:00-05:00'],
                      dtype='datetime64[ns, pytz.FixedOffset(-300)]', freq=None)
    
        - However, timezone-aware inputs *with mixed time offsets* (for example
          issued from a timezone with daylight savings, such as Europe/Paris)
          are **not successfully converted** to a :class:`DatetimeIndex`. Instead a
          simple :class:`Index` containing :class:`datetime.datetime` objects is
          returned:
    
        >>> pd.to_datetime(['2020-10-25 02:00 +0200', '2020-10-25 04:00 +0100'])
        Index([2020-10-25 02:00:00+02:00, 2020-10-25 04:00:00+01:00],
              dtype='object')
    
        - A mix of timezone-aware and timezone-naive inputs is converted to
          a timezone-aware :class:`DatetimeIndex` if the offsets of the timezone-aware
          are constant:
    
        >>> from datetime import datetime
        >>> pd.to_datetime(["2020-01-01 01:00 -01:00", datetime(2020, 1, 1, 3, 0)])
        DatetimeIndex(['2020-01-01 01:00:00-01:00', '2020-01-01 02:00:00-01:00'],
                      dtype='datetime64[ns, pytz.FixedOffset(-60)]', freq=None)
    
        - Finally, mixing timezone-aware strings and :class:`datetime.datetime` always
          raises an error, even if the elements all have the same time offset.
    
        >>> from datetime import datetime, timezone, timedelta
        >>> d = datetime(2020, 1, 1, 18, tzinfo=timezone(-timedelta(hours=1)))
        >>> pd.to_datetime(["2020-01-01 17:00 -0100", d])
        Traceback (most recent call last):
            ...
        ValueError: Tz-aware datetime.datetime cannot be converted to datetime64
                    unless utc=True
    
        |
    
        Setting ``utc=True`` solves most of the above issues:
    
        - Timezone-naive inputs are *localized* as UTC
    
        >>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 13:00'], utc=True)
        DatetimeIndex(['2018-10-26 12:00:00+00:00', '2018-10-26 13:00:00+00:00'],
                      dtype='datetime64[ns, UTC]', freq=None)
    
        - Timezone-aware inputs are *converted* to UTC (the output represents the
          exact same datetime, but viewed from the UTC time offset `+00:00`).
    
        >>> pd.to_datetime(['2018-10-26 12:00 -0530', '2018-10-26 12:00 -0500'],
        ...                utc=True)
        DatetimeIndex(['2018-10-26 17:30:00+00:00', '2018-10-26 17:00:00+00:00'],
                      dtype='datetime64[ns, UTC]', freq=None)
    
        - Inputs can contain both naive and aware, string or datetime, the above
          rules still apply
    
        >>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 12:00 -0530',
        ...                datetime(2020, 1, 1, 18),
        ...                datetime(2020, 1, 1, 18,
        ...                tzinfo=timezone(-timedelta(hours=1)))],
        ...                utc=True)
        DatetimeIndex(['2018-10-26 12:00:00+00:00', '2018-10-26 17:30:00+00:00',
                       '2020-01-01 18:00:00+00:00', '2020-01-01 19:00:00+00:00'],
                      dtype='datetime64[ns, UTC]', freq=None)
        """
        if arg is None:
            return None
    
        if origin != "unix":
            arg = _adjust_to_origin(arg, origin, unit)
    
        tz = "utc" if utc else None
        convert_listlike = partial(
            _convert_listlike_datetimes,
            tz=tz,
            unit=unit,
            dayfirst=dayfirst,
            yearfirst=yearfirst,
            errors=errors,
            exact=exact,
            infer_datetime_format=infer_datetime_format,
        )
    
        result: Timestamp | NaTType | Series | Index
    
        if isinstance(arg, Timestamp):
            result = arg
            if tz is not None:
                if arg.tz is not None:
                    result = arg.tz_convert(tz)
                else:
                    result = arg.tz_localize(tz)
        elif isinstance(arg, ABCSeries):
            cache_array = _maybe_cache(arg, format, cache, convert_listlike)
            if not cache_array.empty:
                result = arg.map(cache_array)
            else:
                values = convert_listlike(arg._values, format)
                result = arg._constructor(values, index=arg.index, name=arg.name)
        elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):
            result = _assemble_from_unit_mappings(arg, errors, tz)
        elif isinstance(arg, Index):
            cache_array = _maybe_cache(arg, format, cache, convert_listlike)
            if not cache_array.empty:
                result = _convert_and_box_cache(arg, cache_array, name=arg.name)
            else:
                result = convert_listlike(arg, format, name=arg.name)
        elif is_list_like(arg):
            try:
                cache_array = _maybe_cache(arg, format, cache, convert_listlike)
            except OutOfBoundsDatetime:
                # caching attempts to create a DatetimeIndex, which may raise
                # an OOB. If that's the desired behavior, then just reraise...
                if errors == "raise":
                    raise
                # ... otherwise, continue without the cache.
                from pandas import Series
    
                cache_array = Series([], dtype=object)  # just an empty array
            if not cache_array.empty:
                result = _convert_and_box_cache(arg, cache_array)
            else:
                result = convert_listlike(arg, format)
        else:
>           result = convert_listlike(np.array([arg]), format)[0]

../conda/envs/dask-3.9/lib/python3.9/site-packages/pandas/core/tools/datetimes.py:1078: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

arg = array(['now'], dtype=object), format = None, name = None, tz = None, unit = None, errors = 'raise'
infer_datetime_format = False, dayfirst = False, yearfirst = False, exact = True

    def _convert_listlike_datetimes(
        arg,
        format: str | None,
        name: Hashable = None,
        tz: Timezone | None = None,
        unit: str | None = None,
        errors: str = "raise",
        infer_datetime_format: bool = False,
        dayfirst: bool | None = None,
        yearfirst: bool | None = None,
        exact: bool = True,
    ):
        """
        Helper function for to_datetime. Performs the conversions of 1D listlike
        of dates
    
        Parameters
        ----------
        arg : list, tuple, ndarray, Series, Index
            date to be parsed
        name : object
            None or string for the Index name
        tz : object
            None or 'utc'
        unit : str
            None or string of the frequency of the passed data
        errors : str
            error handing behaviors from to_datetime, 'raise', 'coerce', 'ignore'
        infer_datetime_format : bool, default False
            inferring format behavior from to_datetime
        dayfirst : bool
            dayfirst parsing behavior from to_datetime
        yearfirst : bool
            yearfirst parsing behavior from to_datetime
        exact : bool, default True
            exact format matching behavior from to_datetime
    
        Returns
        -------
        Index-like of parsed dates
        """
        if isinstance(arg, (list, tuple)):
            arg = np.array(arg, dtype="O")
    
        arg_dtype = getattr(arg, "dtype", None)
        # these are shortcutable
        if is_datetime64tz_dtype(arg_dtype):
            if not isinstance(arg, (DatetimeArray, DatetimeIndex)):
                return DatetimeIndex(arg, tz=tz, name=name)
            if tz == "utc":
                arg = arg.tz_convert(None).tz_localize(tz)
            return arg
    
        elif is_datetime64_ns_dtype(arg_dtype):
            if not isinstance(arg, (DatetimeArray, DatetimeIndex)):
                try:
                    return DatetimeIndex(arg, tz=tz, name=name)
                except ValueError:
                    pass
            elif tz:
                # DatetimeArray, DatetimeIndex
                return arg.tz_localize(tz)
    
            return arg
    
        elif unit is not None:
            if format is not None:
                raise ValueError("cannot specify both format and unit")
            return _to_datetime_with_unit(arg, unit, name, tz, errors)
        elif getattr(arg, "ndim", 1) > 1:
            raise TypeError(
                "arg must be a string, datetime, list, tuple, 1-d array, or Series"
            )
    
        # warn if passing timedelta64, raise for PeriodDtype
        # NB: this must come after unit transformation
        orig_arg = arg
        try:
            arg, _ = maybe_convert_dtype(arg, copy=False)
        except TypeError:
            if errors == "coerce":
                npvalues = np.array(["NaT"], dtype="datetime64[ns]").repeat(len(arg))
                return DatetimeIndex(npvalues, name=name)
            elif errors == "ignore":
                idx = Index(arg, name=name)
                return idx
            raise
    
        arg = ensure_object(arg)
        require_iso8601 = False
    
        if infer_datetime_format and format is None:
            format = _guess_datetime_format_for_array(arg, dayfirst=dayfirst)
    
        if format is not None:
            # There is a special fast-path for iso8601 formatted
            # datetime strings, so in those cases don't use the inferred
            # format because this path makes process slower in this
            # special case
            format_is_iso8601 = format_is_iso(format)
            if format_is_iso8601:
                require_iso8601 = not infer_datetime_format
                format = None
    
        if format is not None:
            res = _to_datetime_with_format(
                arg, orig_arg, name, tz, format, exact, errors, infer_datetime_format
            )
            if res is not None:
                return res
    
        assert format is None or infer_datetime_format
        utc = tz == "utc"
>       result, tz_parsed = objects_to_datetime64ns(
            arg,
            dayfirst=dayfirst,
            yearfirst=yearfirst,
            utc=utc,
            errors=errors,
            require_iso8601=require_iso8601,
            allow_object=True,
        )

../conda/envs/dask-3.9/lib/python3.9/site-packages/pandas/core/tools/datetimes.py:402: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

data = array(['now'], dtype=object), dayfirst = False, yearfirst = False, utc = False, errors = 'raise'
require_iso8601 = False, allow_object = True, allow_mixed = False

    def objects_to_datetime64ns(
        data: np.ndarray,
        dayfirst,
        yearfirst,
        utc=False,
        errors="raise",
        require_iso8601: bool = False,
        allow_object: bool = False,
        allow_mixed: bool = False,
    ):
        """
        Convert data to array of timestamps.
    
        Parameters
        ----------
        data : np.ndarray[object]
        dayfirst : bool
        yearfirst : bool
        utc : bool, default False
            Whether to convert timezone-aware timestamps to UTC.
        errors : {'raise', 'ignore', 'coerce'}
        require_iso8601 : bool, default False
        allow_object : bool
            Whether to return an object-dtype ndarray instead of raising if the
            data contains more than one timezone.
        allow_mixed : bool, default False
            Interpret integers as timestamps when datetime objects are also present.
    
        Returns
        -------
        result : ndarray
            np.int64 dtype if returned values represent UTC timestamps
            np.datetime64[ns] if returned values represent wall times
            object if mixed timezones
        inferred_tz : tzinfo or None
    
        Raises
        ------
        ValueError : if data cannot be converted to datetimes
        """
        assert errors in ["raise", "ignore", "coerce"]
    
        # if str-dtype, convert
        data = np.array(data, copy=False, dtype=np.object_)
    
        flags = data.flags
        order: Literal["F", "C"] = "F" if flags.f_contiguous else "C"
        try:
            result, tz_parsed = tslib.array_to_datetime(
                data.ravel("K"),
                errors=errors,
                utc=utc,
                dayfirst=dayfirst,
                yearfirst=yearfirst,
                require_iso8601=require_iso8601,
                allow_mixed=allow_mixed,
            )
            result = result.reshape(data.shape, order=order)
        except ValueError as err:
            try:
                values, tz_parsed = conversion.datetime_to_datetime64(data.ravel("K"))
                # If tzaware, these values represent unix timestamps, so we
                #  return them as i8 to distinguish from wall times
                values = values.reshape(data.shape, order=order)
                return values.view("i8"), tz_parsed
            except (ValueError, TypeError):
>               raise err

../conda/envs/dask-3.9/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py:2217: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

data = array(['now'], dtype=object), dayfirst = False, yearfirst = False, utc = False, errors = 'raise'
require_iso8601 = False, allow_object = True, allow_mixed = False

    def objects_to_datetime64ns(
        data: np.ndarray,
        dayfirst,
        yearfirst,
        utc=False,
        errors="raise",
        require_iso8601: bool = False,
        allow_object: bool = False,
        allow_mixed: bool = False,
    ):
        """
        Convert data to array of timestamps.
    
        Parameters
        ----------
        data : np.ndarray[object]
        dayfirst : bool
        yearfirst : bool
        utc : bool, default False
            Whether to convert timezone-aware timestamps to UTC.
        errors : {'raise', 'ignore', 'coerce'}
        require_iso8601 : bool, default False
        allow_object : bool
            Whether to return an object-dtype ndarray instead of raising if the
            data contains more than one timezone.
        allow_mixed : bool, default False
            Interpret integers as timestamps when datetime objects are also present.
    
        Returns
        -------
        result : ndarray
            np.int64 dtype if returned values represent UTC timestamps
            np.datetime64[ns] if returned values represent wall times
            object if mixed timezones
        inferred_tz : tzinfo or None
    
        Raises
        ------
        ValueError : if data cannot be converted to datetimes
        """
        assert errors in ["raise", "ignore", "coerce"]
    
        # if str-dtype, convert
        data = np.array(data, copy=False, dtype=np.object_)
    
        flags = data.flags
        order: Literal["F", "C"] = "F" if flags.f_contiguous else "C"
        try:
>           result, tz_parsed = tslib.array_to_datetime(
                data.ravel("K"),
                errors=errors,
                utc=utc,
                dayfirst=dayfirst,
                yearfirst=yearfirst,
                require_iso8601=require_iso8601,
                allow_mixed=allow_mixed,
            )

../conda/envs/dask-3.9/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py:2199: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???

pandas/_libs/tslib.pyx:381: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???

pandas/_libs/tslib.pyx:613: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???

pandas/_libs/tslib.pyx:751: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???

pandas/_libs/tslib.pyx:742: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???

pandas/_libs/tslibs/parsing.pyx:281: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

timestr = 'now', parserinfo = None
kwargs = {'dayfirst': False, 'default': datetime.datetime(1, 1, 1, 0, 0), 'yearfirst': False}

    def parse(timestr, parserinfo=None, **kwargs):
        """
    
        Parse a string in one of the supported formats, using the
        ``parserinfo`` parameters.
    
        :param timestr:
            A string containing a date/time stamp.
    
        :param parserinfo:
            A :class:`parserinfo` object containing parameters for the parser.
            If ``None``, the default arguments to the :class:`parserinfo`
            constructor are used.
    
        The ``**kwargs`` parameter takes the following keyword arguments:
    
        :param default:
            The default datetime object, if this is a datetime object and not
            ``None``, elements specified in ``timestr`` replace elements in the
            default object.
    
        :param ignoretz:
            If set ``True``, time zones in parsed strings are ignored and a naive
            :class:`datetime` object is returned.
    
        :param tzinfos:
            Additional time zone names / aliases which may be present in the
            string. This argument maps time zone names (and optionally offsets
            from those time zones) to time zones. This parameter can be a
            dictionary with timezone aliases mapping time zone names to time
            zones or a function taking two parameters (``tzname`` and
            ``tzoffset``) and returning a time zone.
    
            The timezones to which the names are mapped can be an integer
            offset from UTC in seconds or a :class:`tzinfo` object.
    
            .. doctest::
               :options: +NORMALIZE_WHITESPACE
    
                >>> from dateutil.parser import parse
                >>> from dateutil.tz import gettz
                >>> tzinfos = {"BRST": -7200, "CST": gettz("America/Chicago")}
                >>> parse("2012-01-19 17:21:00 BRST", tzinfos=tzinfos)
                datetime.datetime(2012, 1, 19, 17, 21, tzinfo=tzoffset(u'BRST', -7200))
                >>> parse("2012-01-19 17:21:00 CST", tzinfos=tzinfos)
                datetime.datetime(2012, 1, 19, 17, 21,
                                  tzinfo=tzfile('/usr/share/zoneinfo/America/Chicago'))
    
            This parameter is ignored if ``ignoretz`` is set.
    
        :param dayfirst:
            Whether to interpret the first value in an ambiguous 3-integer date
            (e.g. 01/05/09) as the day (``True``) or month (``False``). If
            ``yearfirst`` is set to ``True``, this distinguishes between YDM and
            YMD. If set to ``None``, this value is retrieved from the current
            :class:`parserinfo` object (which itself defaults to ``False``).
    
        :param yearfirst:
            Whether to interpret the first value in an ambiguous 3-integer date
            (e.g. 01/05/09) as the year. If ``True``, the first number is taken to
            be the year, otherwise the last number is taken to be the year. If
            this is set to ``None``, the value is retrieved from the current
            :class:`parserinfo` object (which itself defaults to ``False``).
    
        :param fuzzy:
            Whether to allow fuzzy parsing, allowing for string like "Today is
            January 1, 2047 at 8:21:00AM".
    
        :param fuzzy_with_tokens:
            If ``True``, ``fuzzy`` is automatically set to True, and the parser
            will return a tuple where the first element is the parsed
            :class:`datetime.datetime` datetimestamp and the second element is
            a tuple containing the portions of the string which were ignored:
    
            .. doctest::
    
                >>> from dateutil.parser import parse
                >>> parse("Today is January 1, 2047 at 8:21:00AM", fuzzy_with_tokens=True)
                (datetime.datetime(2047, 1, 1, 8, 21), (u'Today is ', u' ', u'at '))
    
        :return:
            Returns a :class:`datetime.datetime` object or, if the
            ``fuzzy_with_tokens`` option is ``True``, returns a tuple, the
            first element being a :class:`datetime.datetime` object, the second
            a tuple containing the fuzzy tokens.
    
        :raises ParserError:
            Raised for invalid or unknown string formats, if the provided
            :class:`tzinfo` is not in a valid format, or if an invalid date would
            be created.
    
        :raises OverflowError:
            Raised if the parsed date exceeds the largest valid C integer on
            your system.
        """
        if parserinfo:
            return parser(parserinfo).parse(timestr, **kwargs)
        else:
>           return DEFAULTPARSER.parse(timestr, **kwargs)

../conda/envs/dask-3.9/lib/python3.9/site-packages/dateutil/parser/_parser.py:1368: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <dateutil.parser._parser.parser object at 0x7f59c25b9550>, timestr = 'now'
default = datetime.datetime(1, 1, 1, 0, 0), ignoretz = False, tzinfos = None
kwargs = {'dayfirst': False, 'yearfirst': False}, res = None, skipped_tokens = None

    def parse(self, timestr, default=None,
              ignoretz=False, tzinfos=None, **kwargs):
        """
        Parse the date/time string into a :class:`datetime.datetime` object.
    
        :param timestr:
            Any date/time string using the supported formats.
    
        :param default:
            The default datetime object, if this is a datetime object and not
            ``None``, elements specified in ``timestr`` replace elements in the
            default object.
    
        :param ignoretz:
            If set ``True``, time zones in parsed strings are ignored and a
            naive :class:`datetime.datetime` object is returned.
    
        :param tzinfos:
            Additional time zone names / aliases which may be present in the
            string. This argument maps time zone names (and optionally offsets
            from those time zones) to time zones. This parameter can be a
            dictionary with timezone aliases mapping time zone names to time
            zones or a function taking two parameters (``tzname`` and
            ``tzoffset``) and returning a time zone.
    
            The timezones to which the names are mapped can be an integer
            offset from UTC in seconds or a :class:`tzinfo` object.
    
            .. doctest::
               :options: +NORMALIZE_WHITESPACE
    
                >>> from dateutil.parser import parse
                >>> from dateutil.tz import gettz
                >>> tzinfos = {"BRST": -7200, "CST": gettz("America/Chicago")}
                >>> parse("2012-01-19 17:21:00 BRST", tzinfos=tzinfos)
                datetime.datetime(2012, 1, 19, 17, 21, tzinfo=tzoffset(u'BRST', -7200))
                >>> parse("2012-01-19 17:21:00 CST", tzinfos=tzinfos)
                datetime.datetime(2012, 1, 19, 17, 21,
                                  tzinfo=tzfile('/usr/share/zoneinfo/America/Chicago'))
    
            This parameter is ignored if ``ignoretz`` is set.
    
        :param \\*\\*kwargs:
            Keyword arguments as passed to ``_parse()``.
    
        :return:
            Returns a :class:`datetime.datetime` object or, if the
            ``fuzzy_with_tokens`` option is ``True``, returns a tuple, the
            first element being a :class:`datetime.datetime` object, the second
            a tuple containing the fuzzy tokens.
    
        :raises ParserError:
            Raised for invalid or unknown string format, if the provided
            :class:`tzinfo` is not in a valid format, or if an invalid date
            would be created.
    
        :raises TypeError:
            Raised for non-string or character stream input.
    
        :raises OverflowError:
            Raised if the parsed date exceeds the largest valid C integer on
            your system.
        """
    
        if default is None:
            default = datetime.datetime.now().replace(hour=0, minute=0,
                                                      second=0, microsecond=0)
    
        res, skipped_tokens = self._parse(timestr, **kwargs)
    
        if res is None:
>           raise ParserError("Unknown string format: %s", timestr)
E           dateutil.parser._parser.ParserError: Unknown string format: now

../conda/envs/dask-3.9/lib/python3.9/site-packages/dateutil/parser/_parser.py:643: ParserError

The above exception was the direct cause of the following exception:

tmpdir = local('/tmp/pytest-of-julia/pytest-132/test_roundtrip_fastparquet_df11')
df =                                      x
index                                 
0     1970-01-01 01:00:00.000003+01:00
1     1970-01-01 01:00:00.000002+01:00
2     1970-01-01 01:00:00.000001+01:00
write_kwargs = {}, read_kwargs = {}, engine = 'fastparquet'

    @pytest.mark.parametrize(
        "df,write_kwargs,read_kwargs",
        [
            (pd.DataFrame({"x": [3, 2, 1]}), {}, {}),
            (pd.DataFrame({"x": ["c", "a", "b"]}), {}, {}),
            (pd.DataFrame({"x": ["cc", "a", "bbb"]}), {}, {}),
            (pd.DataFrame({"x": [b"a", b"b", b"c"]}), {"object_encoding": "bytes"}, {}),
            (
                pd.DataFrame({"x": pd.Categorical(["a", "b", "a"])}),
                {},
                {"categories": ["x"]},
            ),
            (pd.DataFrame({"x": pd.Categorical([1, 2, 1])}), {}, {"categories": ["x"]}),
            (pd.DataFrame({"x": list(map(pd.Timestamp, [3000, 2000, 1000]))}), {}, {}),
            (pd.DataFrame({"x": [3000, 2000, 1000]}).astype("M8[ns]"), {}, {}),
            pytest.param(
                pd.DataFrame({"x": [3, 2, 1]}).astype("M8[ns]"),
                {},
                {},
            ),
            (pd.DataFrame({"x": [3, 2, 1]}).astype("M8[us]"), {}, {}),
            (pd.DataFrame({"x": [3, 2, 1]}).astype("M8[ms]"), {}, {}),
            (pd.DataFrame({"x": [3000, 2000, 1000]}).astype("datetime64[ns]"), {}, {}),
            (pd.DataFrame({"x": [3000, 2000, 1000]}).astype("datetime64[ns, UTC]"), {}, {}),
            (pd.DataFrame({"x": [3000, 2000, 1000]}).astype("datetime64[ns, CET]"), {}, {}),
            (pd.DataFrame({"x": [3, 2, 1]}).astype("uint16"), {}, {}),
            (pd.DataFrame({"x": [3, 2, 1]}).astype("float32"), {}, {}),
            (pd.DataFrame({"x": [3, 1, 2]}, index=[3, 2, 1]), {}, {}),
            (pd.DataFrame({"x": [3, 1, 5]}, index=pd.Index([1, 2, 3], name="foo")), {}, {}),
            (pd.DataFrame({"x": [1, 2, 3], "y": [3, 2, 1]}), {}, {}),
            (pd.DataFrame({"x": [1, 2, 3], "y": [3, 2, 1]}, columns=["y", "x"]), {}, {}),
            (pd.DataFrame({"0": [3, 2, 1]}), {}, {}),
            (pd.DataFrame({"x": [3, 2, None]}), {}, {}),
            (pd.DataFrame({"-": [3.0, 2.0, None]}), {}, {}),
            (pd.DataFrame({".": [3.0, 2.0, None]}), {}, {}),
            (pd.DataFrame({" ": [3.0, 2.0, None]}), {}, {}),
        ],
    )
    def test_roundtrip(tmpdir, df, write_kwargs, read_kwargs, engine):
        if "x" in df and df.x.dtype == "M8[ns]" and "arrow" in engine:
            pytest.xfail(reason="Parquet pyarrow v1 doesn't support nanosecond precision")
        if (
            "x" in df
            and df.x.dtype == "M8[ns]"
            and engine == "fastparquet"
            and fastparquet_version <= parse_version("0.6.3")
        ):
            pytest.xfail(reason="fastparquet doesn't support nanosecond precision yet")
        if (
            PANDAS_GT_130
            and read_kwargs.get("categories", None)
            and engine == "fastparquet"
            and fastparquet_version <= parse_version("0.6.3")
        ):
            pytest.xfail("https://github.com/dask/fastparquet/issues/577")
    
        tmp = str(tmpdir)
        if df.index.name is None:
            df.index.name = "index"
        ddf = dd.from_pandas(df, npartitions=2)
    
        oe = write_kwargs.pop("object_encoding", None)
        if oe and engine == "fastparquet":
            dd.to_parquet(ddf, tmp, engine=engine, object_encoding=oe, **write_kwargs)
        else:
>           dd.to_parquet(ddf, tmp, engine=engine, **write_kwargs)

dask/dataframe/io/tests/test_parquet.py:988: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dask/dataframe/io/parquet/core.py:701: in to_parquet
    meta, schema, i_offset = engine.initialize_write(
dask/dataframe/io/parquet/fastparquet.py:1206: in initialize_write
    fmd = fastparquet.writer.make_metadata(
../conda/envs/dask-3.9/lib/python3.9/site-packages/fastparquet/writer.py:740: in make_metadata
    get_column_metadata(data[column], column))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

column = Series([], Name: x, dtype: datetime64[ns, CET]), name = 'x'

    def get_column_metadata(column, name):
        """Produce pandas column metadata block"""
        # from pyarrow.pandas_compat
        # https://github.com/apache/arrow/blob/master/python/pyarrow/pandas_compat.py
        inferred_dtype = infer_dtype(column)
        dtype = column.dtype
        if str(dtype) == "bool":
            # pandas accidentally calls this "boolean"
            inferred_dtype = "bool"
    
        if is_categorical_dtype(dtype):
            extra_metadata = {
                'num_categories': len(column.cat.categories),
                'ordered': column.cat.ordered,
            }
            dtype = column.cat.codes.dtype
        elif hasattr(dtype, 'tz'):
            try:
                stz = str(dtype.tz)
                if "UTC" in stz and ":" in stz:
                    extra_metadata = {'timezone': stz.strip("UTC")}
                elif "pytz" not in stz:
                    pd.Series([pd.to_datetime('now')]).dt.tz_localize(stz)
                    extra_metadata = {'timezone': str(dtype.tz)}
                elif "Offset" in stz:
                    import pytz
                    extra_metadata = {'timezone': f"{dtype.tz._minutes // 60:+03}:00"}
                else:
                    raise KeyError
            except Exception as e:
>               raise ValueError("Time-zone information could not be serialised: "
                                 "%s, please use another" % str(dtype.tz)) from e
E               ValueError: Time-zone information could not be serialised: CET, please use another

../conda/envs/dask-3.9/lib/python3.9/site-packages/fastparquet/util.py:322: ValueError
---------------------------------------------- Captured stderr call -----------------------------------------------
FutureWarning: The parsing of 'now' in pd.to_datetime without `utc=True` is deprecated. In a future version, this will match Timestamp('now') and Timestamp.now()
________________________________________________ test_timestamp96 _________________________________________________

data = 0    1643143105000000000
Name: a, dtype: object
se = <class 'fastparquet.parquet_thrift.parquet.ttypes.SchemaElement'>
converted_type: 0
field_id: None
logicalType: None
name: a
num_children: None
precision: None
repetition_type: 1
scale: None
type: 6
type_length: None


    def convert(data, se):
        """Convert data according to the schema encoding"""
        dtype = data.dtype
        type = se.type
        converted_type = se.converted_type
        if dtype.name in typemap:
            if type in revmap:
                out = data.values.astype(revmap[type], copy=False)
            elif type == parquet_thrift.Type.BOOLEAN:
                # TODO: with our own bitpack writer, no need to copy for
                #  the padding
                padded = np.lib.pad(data.values, (0, 8 - (len(data) % 8)),
                                    'constant', constant_values=(0, 0))
                out = np.packbits(padded.reshape(-1, 8)[:, ::-1].ravel())
            elif dtype.name in typemap:
                out = data.values
        elif "S" in str(dtype)[:2] or "U" in str(dtype)[:2]:
            out = data.values
        elif dtype == "O":
            # TODO: nullable types
            try:
                if converted_type == parquet_thrift.ConvertedType.UTF8:
                    # getattr for new pandas StringArray
                    # TODO: to bytes in one step
>                   out = array_encode_utf8(data)

../conda/envs/dask-3.9/lib/python3.9/site-packages/fastparquet/writer.py:245: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   TypeError: bad argument type for built-in operation

fastparquet/speedups.pyx:50: TypeError

During handling of the above exception, another exception occurred:

tmpdir = local('/tmp/pytest-of-julia/pytest-132/test_timestamp960')

    @FASTPARQUET_MARK
    def test_timestamp96(tmpdir):
        fn = str(tmpdir)
        df = pd.DataFrame({"a": ["now"]}, dtype="M8[ns]")
        ddf = dd.from_pandas(df, 1)
>       ddf.to_parquet(fn, write_index=False, times="int96")

dask/dataframe/io/tests/test_parquet.py:1644: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dask/dataframe/core.py:4736: in to_parquet
    return to_parquet(self, path, *args, **kwargs)
dask/dataframe/io/parquet/core.py:784: in to_parquet
    return compute_as_if_collection(
dask/base.py:315: in compute_as_if_collection
    return schedule(dsk2, keys, **kwargs)
dask/threaded.py:79: in get
    results = get_async(
dask/local.py:507: in get_async
    raise_exception(exc, tb)
dask/local.py:315: in reraise
    raise exc
dask/local.py:220: in execute_task
    result = _execute_task(task, data)
dask/core.py:119: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
dask/utils.py:40: in apply
    return func(*args, **kwargs)
dask/dataframe/io/parquet/fastparquet.py:1279: in write_partition
    rg = make_part_file(
../conda/envs/dask-3.9/lib/python3.9/site-packages/fastparquet/writer.py:669: in make_part_file
    rg = make_row_group(f, data, schema, compression=compression,
../conda/envs/dask-3.9/lib/python3.9/site-packages/fastparquet/writer.py:655: in make_row_group
    chunk = write_column(f, data[column.name], column,
../conda/envs/dask-3.9/lib/python3.9/site-packages/fastparquet/writer.py:533: in write_column
    repetition_data, definition_data, encode[encoding](data, selement), 8 * b'\x00'
../conda/envs/dask-3.9/lib/python3.9/site-packages/fastparquet/writer.py:340: in encode_plain
    out = convert(data, se)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

data = 0    1643143105000000000
Name: a, dtype: object
se = <class 'fastparquet.parquet_thrift.parquet.ttypes.SchemaElement'>
converted_type: 0
field_id: None
logicalType: None
name: a
num_children: None
precision: None
repetition_type: 1
scale: None
type: 6
type_length: None


    def convert(data, se):
        """Convert data according to the schema encoding"""
        dtype = data.dtype
        type = se.type
        converted_type = se.converted_type
        if dtype.name in typemap:
            if type in revmap:
                out = data.values.astype(revmap[type], copy=False)
            elif type == parquet_thrift.Type.BOOLEAN:
                # TODO: with our own bitpack writer, no need to copy for
                #  the padding
                padded = np.lib.pad(data.values, (0, 8 - (len(data) % 8)),
                                    'constant', constant_values=(0, 0))
                out = np.packbits(padded.reshape(-1, 8)[:, ::-1].ravel())
            elif dtype.name in typemap:
                out = data.values
        elif "S" in str(dtype)[:2] or "U" in str(dtype)[:2]:
            out = data.values
        elif dtype == "O":
            # TODO: nullable types
            try:
                if converted_type == parquet_thrift.ConvertedType.UTF8:
                    # getattr for new pandas StringArray
                    # TODO: to bytes in one step
                    out = array_encode_utf8(data)
                elif converted_type == parquet_thrift.ConvertedType.DECIMAL:
                    out = data.values.astype(np.float64, copy=False)
                elif converted_type is None:
                    if type in revmap:
                        out = data.values.astype(revmap[type], copy=False)
                    elif type == parquet_thrift.Type.BOOLEAN:
                        # TODO: with our own bitpack writer, no need to copy for
                        #  the padding
                        padded = np.lib.pad(data.values, (0, 8 - (len(data) % 8)),
                                            'constant', constant_values=(0, 0))
                        out = np.packbits(padded.reshape(-1, 8)[:, ::-1].ravel())
                    else:
                        out = data.values
                elif converted_type == parquet_thrift.ConvertedType.JSON:
                    # TODO: avoid list, use better JSON
                    out = np.array([json.dumps(x).encode('utf8') for x in data],
                                   dtype="O")
                elif converted_type == parquet_thrift.ConvertedType.BSON:
                    out = data.map(tobson).values
                if type == parquet_thrift.Type.FIXED_LEN_BYTE_ARRAY:
                    out = out.astype('S%i' % se.type_length)
            except Exception as e:
                ct = parquet_thrift.ConvertedType._VALUES_TO_NAMES[
                    converted_type] if converted_type is not None else None
>               raise ValueError('Error converting column "%s" to bytes using '
                                 'encoding %s. Original error: '
E                                ValueError: Error converting column "a" to bytes using encoding UTF8. Original error: bad argument type for built-in operation

../conda/envs/dask-3.9/lib/python3.9/site-packages/fastparquet/writer.py:270: ValueError
---------------------------------------------- Captured stderr call -----------------------------------------------
FutureWarning: The parsing of 'now' in pd.to_datetime without `utc=True` is deprecated. In a future version, this will match Timestamp('now') and Timestamp.now()
================================================ warnings summary =================================================
dask/dataframe/io/tests/test_parquet.py::test_roundtrip[fastparquet-df12-write_kwargs12-read_kwargs12]
dask/dataframe/io/tests/test_parquet.py::test_roundtrip[fastparquet-df13-write_kwargs13-read_kwargs13]
dask/dataframe/io/tests/test_parquet.py::test_timestamp96
  /home/julia/conda/envs/dask-3.9/lib/python3.9/site-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning: Exception ignored in: 'pandas._libs.tslib._parse_today_now'
  
  Traceback (most recent call last):
    File "/home/julia/conda/envs/dask-3.9/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2199, in objects_to_datetime64ns
      result, tz_parsed = tslib.array_to_datetime(
  FutureWarning: The parsing of 'now' in pd.to_datetime without `utc=True` is deprecated. In a future version, this will match Timestamp('now') and Timestamp.now()
  
    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

-- Docs: https://docs.pytest.org/en/stable/warnings.html
============================================== slowest 10 durations ===============================================
0.01s call     dask/dataframe/io/tests/test_parquet.py::test_roundtrip[fastparquet-df12-write_kwargs12-read_kwargs12]
0.00s call     dask/dataframe/io/tests/test_parquet.py::test_timestamp96
0.00s call     dask/dataframe/io/tests/test_parquet.py::test_roundtrip[fastparquet-df13-write_kwargs13-read_kwargs13]
0.00s setup    dask/dataframe/io/tests/test_parquet.py::test_roundtrip[fastparquet-df12-write_kwargs12-read_kwargs12]
0.00s setup    dask/dataframe/io/tests/test_parquet.py::test_roundtrip[fastparquet-df13-write_kwargs13-read_kwargs13]
0.00s setup    dask/dataframe/io/tests/test_parquet.py::test_timestamp96
0.00s teardown dask/dataframe/io/tests/test_parquet.py::test_roundtrip[fastparquet-df13-write_kwargs13-read_kwargs13]
0.00s teardown dask/dataframe/io/tests/test_parquet.py::test_timestamp96
0.00s teardown dask/dataframe/io/tests/test_parquet.py::test_roundtrip[fastparquet-df12-write_kwargs12-read_kwargs12]
============================================= short test summary info =============================================
FAILED dask/dataframe/io/tests/test_parquet.py::test_roundtrip[fastparquet-df12-write_kwargs12-read_kwargs12] - ...
FAILED dask/dataframe/io/tests/test_parquet.py::test_roundtrip[fastparquet-df13-write_kwargs13-read_kwargs13] - ...
FAILED dask/dataframe/io/tests/test_parquet.py::test_timestamp96 - ValueError: Error converting column "a" to by...
================================== 3 failed, 873 deselected, 3 warnings in 1.31s ==================================

@jsignell
Copy link
Member

PRs left to merge:

@martindurant
Copy link
Member

I don't know about these yet, but fastparquet tests are failing too because of Int64Index -> Index(Int64) ( supposed to be marked as depr, but seems the default behaviour changed too https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.4.0.html#deprecated-int64index-uint64index-float64index ).
Looks like this issue here seems to be something else.

@jorisvandenbossche
Copy link
Member Author

supposed to be marked as depr, but seems the default behaviour changed too

There shouldn't be a change in behaviour (yet), the code snippet in what you linked is "future behaviour", not what is released (that's a bit confusing, as we typically show "new behaviour" in those whatsnew code snippets, i.e. for the new release)

@martindurant
Copy link
Member

@jorisvandenbossche specifically, it was the index type of a read_csv call that changed (for the fastparquet breakage), and pandas refuses to compare two series with different index types, although the values are the same.

@martindurant
Copy link
Member

dask/fastparquet#738

@martindurant
Copy link
Member

^ the good news is that now, only the same three dask test suite failures are seen in fastparquet's CI.

@jorisvandenbossche
Copy link
Member Author

specifically, it was the index type of a read_csv call that changed (for the fastparquet breakage), and pandas refuses to compare two series with different index types, although the values are the same.

Ah, that's because you are using extension types, that indeed changed: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.4.0.html#index-can-hold-arbitrary-extensionarrays (but so for default Int64Index, nothing yet changed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataframe good second issue Clearly described, educational, but less trivial than "good first issue".
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants