Skip to content

Commit

Permalink
ARROW-3910: [Python] Set date_as_objects=True as default in to_pandas…
Browse files Browse the repository at this point in the history
… methods

This does not add a deprecation warning primarily because it's a bit difficult to do (we would need to check the data types whether it's a date -- or in the case of a table, if any field is a date--, and then warn if so). `True` is the correct option though in order to accurately roundtrip data to and from pandas. Some users might have some workarounds floating around, but this is sufficiently advanced stuff already.

With this patch, date data round trips with no special options

```
In [2]: import pyarrow as pa

In [3]: import datetime

In [4]: arr = pa.array([datetime.date(2000, 1, 1), None])

In [5]: arr
Out[5]:
<pyarrow.lib.Date32Array object at 0x0000022CCDB1BBD8>
[
  10957,
  null
]

In [6]: arr.to_pandas()
Out[6]: array([datetime.date(2000, 1, 1), None], dtype=object)

In [7]: pa.array(arr.to_pandas())
Out[7]:
<pyarrow.lib.Date32Array object at 0x0000022CCDC7FE58>
[
  10957,
  null
]
```

If others strongly feel it's worth going to the effort of raising a deprecation warning, please chime in.

Author: Wes McKinney <wesm+git@apache.org>

Closes #3272 from wesm/ARROW-3910 and squashes the following commits:

308afe5 <Wes McKinney> Add Windows makefile for Sphinx, add section about date conversions to pandas.rst
f77c296 <Wes McKinney> Set date_as_objects=True as default in to_pandas methods
  • Loading branch information
wesm committed Jan 1, 2019
1 parent 71ccba9 commit 9376d85
Show file tree
Hide file tree
Showing 7 changed files with 231 additions and 116 deletions.
52 changes: 52 additions & 0 deletions docs/make.bat
@@ -0,0 +1,52 @@
@rem Licensed to the Apache Software Foundation (ASF) under one
@rem or more contributor license agreements. See the NOTICE file
@rem distributed with this work for additional information
@rem regarding copyright ownership. The ASF licenses this file
@rem to you under the Apache License, Version 2.0 (the
@rem "License"); you may not use this file except in compliance
@rem with the License. You may obtain a copy of the License at
@rem
@rem http://www.apache.org/licenses/LICENSE-2.0
@rem
@rem Unless required by applicable law or agreed to in writing,
@rem software distributed under the License is distributed on an
@rem "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@rem KIND, either express or implied. See the License for the
@rem specific language governing permissions and limitations
@rem under the License.

@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=_build

if "%1" == "" goto help

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%

:end
popd
71 changes: 71 additions & 0 deletions docs/source/building.rst
@@ -0,0 +1,71 @@
.. Licensed to the Apache Software Foundation (ASF) under one
.. or more contributor license agreements. See the NOTICE file
.. distributed with this work for additional information
.. regarding copyright ownership. The ASF licenses this file
.. to you under the Apache License, Version 2.0 (the
.. "License"); you may not use this file except in compliance
.. with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
.. software distributed under the License is distributed on an
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
.. KIND, either express or implied. See the License for the
.. specific language governing permissions and limitations
.. under the License.
Building the Documentation
==========================

Prerequisites
-------------

The documentation build process uses `Doxygen <http://www.doxygen.nl/>`_ and
`Sphinx <http://www.sphinx-doc.org/>`_ along with a few extensions.

If you're using Conda, the required software can be installed in a single line:

.. code-block:: shell
conda install -c conda-forge --file ci/conda_env_sphinx.yml
Otherwise, you'll first need to install `Doxygen <http://www.doxygen.nl/>`_
yourself (for example from your distribution's official repositories, if
using Linux). Then you can install the Python-based requirements with the
following command:

.. code-block:: shell
pip install -r docs/requirements.txt
Building
--------

.. note::

If you are building the documentation on Windows, not all sections
may build properly.

These two steps are mandatory and must be executed in order.

#. Process the C++ API using Doxygen

.. code-block:: shell
pushd cpp/apidoc
doxygen
popd
#. Build the complete documentation using Sphinx

.. code-block:: shell
pushd docs
make html
popd
After these steps are completed, the documentation is rendered in HTML
format in ``docs/_build/html``. In particular, you can point your browser
at ``docs/_build/html/index.html`` to read the docs and review any changes
you made.
6 changes: 6 additions & 0 deletions docs/source/index.rst
Expand Up @@ -40,3 +40,9 @@ messaging and interprocess communication.

cpp/index
python/index

.. toctree::
:maxdepth: 2
:caption: Other Topics

building
50 changes: 0 additions & 50 deletions docs/source/python/development.rst
Expand Up @@ -364,53 +364,3 @@ Getting ``python-test.exe`` to run is a bit tricky because your
set PYTHONHOME=%CONDA_PREFIX%
Now ``python-test.exe`` or simply ``ctest`` (to run all tests) should work.

Building the Documentation
==========================

Prerequisites
-------------

The documentation build process uses `Doxygen <http://www.doxygen.nl/>`_ and
`Sphinx <http://www.sphinx-doc.org/>`_ along with a few extensions.

If you're using Conda, the required software can be installed in a single line:

.. code-block:: shell
conda install -c conda-forge --file ci/conda_env_sphinx.yml
Otherwise, you'll first need to install `Doxygen <http://www.doxygen.nl/>`_
yourself (for example from your distribution's official repositories, if
using Linux). Then you can install the Python-based requirements with the
following command:

.. code-block:: shell
pip install -r docs/requirements.txt
Building
--------

These two steps are mandatory and must be executed in order.

#. Process the C++ API using Doxygen

.. code-block:: shell
pushd cpp/apidoc
doxygen
popd
#. Build the complete documentation using Sphinx

.. code-block:: shell
pushd docs
make html
popd
After these steps are completed, the documentation is rendered in HTML
format in ``docs/_build/html``. In particular, you can point your browser
at ``docs/_build/html/index.html`` to read the docs and review any changes
you made.
68 changes: 67 additions & 1 deletion docs/source/python/pandas.rst
Expand Up @@ -29,6 +29,13 @@ to them.
(such as a different type system, and support for null values) that this
is a separate topic from :ref:`numpy_interop`.

To follow examples in this document, make sure to run:

.. ipython:: python
import pandas as pd
import pyarrow as pa
DataFrames
----------

Expand Down Expand Up @@ -120,5 +127,64 @@ Arrow -> pandas Conversion
+-------------------------------------+--------------------------------------------------------+
| ``TIMESTAMP(unit=*)`` | ``pd.Timestamp`` (``np.datetime64[ns]``) |
+-------------------------------------+--------------------------------------------------------+
| ``DATE`` | ``pd.Timestamp`` (``np.datetime64[ns]``) |
| ``DATE`` | ``object``(with ``datetime.date`` objects) |
+-------------------------------------+--------------------------------------------------------+

Categorical types
~~~~~~~~~~~~~~~~~

TODO

Datetime (Timestamp) types
~~~~~~~~~~~~~~~~~~~~~~~~~~

TODO

Date types
~~~~~~~~~~

While dates can be handled using the ``datetime64[ns]`` type in
pandas, some systems work with object arrays of Python's built-in
``datetime.date`` object:

.. ipython:: python
from datetime import date
s = pd.Series([date(2018, 12, 31), None, date(2000, 1, 1)])
s
When converting to an Arrow array, the ``date32`` type will be used by
default:

.. ipython:: python
arr = pa.array(s)
arr.type
arr[0]
To use the 64-bit ``date64``, specify this explicitly:

.. ipython:: python
arr = pa.array(s, type='date64')
arr.type
When converting back with ``to_pandas``, object arrays of
``datetime.date`` objects are returned:

.. ipython:: python
arr.to_pandas()
If you want to use NumPy's ``datetime64`` dtype instead, pass
``date_as_object=False``:

.. ipython:: python
s2 = pd.Series(arr.to_pandas(date_as_object=False))
s2.dtype
Time types
~~~~~~~~~~

TODO
6 changes: 2 additions & 4 deletions python/pyarrow/array.pxi
Expand Up @@ -343,10 +343,8 @@ cdef class _PandasConvertible:

def to_pandas(self, categories=None, bint strings_to_categorical=False,
bint zero_copy_only=False, bint integer_object_nulls=False,
bint date_as_object=False,
bint use_threads=True,
bint deduplicate_objects=True,
bint ignore_metadata=False):
bint date_as_object=True, bint use_threads=True,
bint deduplicate_objects=True, bint ignore_metadata=False):
"""
Convert to a pandas-compatible NumPy array or DataFrame, as appropriate
Expand Down

0 comments on commit 9376d85

Please sign in to comment.