[Python] Possible performance regression in Feather read/write path #18037

asfimport · 2018-01-30T19:58:08Z

See discussion in wesm/feather#329. Needs to be investigated

Reporter: Wes McKinney / @wesm
Assignee: Antoine Pitrou / @pitrou

Related issues:

[Python] Speed up null testing with Pandas semantics (relates to)

Original Issue Attachments:

perf.svg

_{Note: This issue was originally created as ARROW-2059. Please see the migration documentation for further details.}

asfimport · 2018-02-05T04:07:11Z

Jingyuan Wang / @alphalfalfa:
Here is what I've done. I simple repeated the 1M rows and created a 30M and 100M testing csv files and try to repeat the process of reading from csv, writing as feather and reading from feather and time each part. I also repeated the measurement 10 times for the four combination of (python-2.7, python-3.6) x (feather-format-0.3.1, feather-format-0.4.0).

Processing 100M rows files all failed on my laptop (16GB memory) except for the version of python2.7 and feather-format-0.3.1.

The measurement of 1M rows is as following:

|python version|feather version|1. rows|write feather|read feather|
|
|-|-|-|-|-|-|
|2.7|0.3.1|1M|0.06216781139|0.05903599262|
|
|2.7|0.4.0|1M|0.1335380793|0.04576666355|
|
|3.6|0.3.1|1M|0.07768514156|0.09041910172|
|
|3.6|0.4.0|1M|0.08690385818|0.05801310539|

The measuremnt of 30M rows is as following:
|
|python version|feather version|1. rows|write feather|read feather|
|
|2.7|0.3.1|30M|1.747310066|2.35606482|
|
|2.7|0.4.0|30M|3.5653723|1.934461188|
|
|3.6|0.3.1|30M|2.407458949|2.811572456|
|
|3.6|0.4.0|30M|2.925034189|1.852504301|

From both tables, performance of writing to feather did degrade from 0.3.1 to 0.4.0 with python2 being more dramatically. Reading feather files were actually faster with the newer feather version.

One other thing, I noticed that feather-format-0.3.1 does not even depend on Arrow. So the performance difference is more than the Arrow's version upgrade. And I do think we need some thorough benchmarks for Arrow or do we already have them?|

asfimport · 2018-02-05T04:09:12Z

Jingyuan Wang / @alphalfalfa:
Stack trace of the failure when processing 100M-row data (feather-0.4.0):


Traceback (most recent call last):
  File "test.py", line 44, in <module>
    elapsed_ts.append(test(df, i))
  File "test.py", line 18, in test
    feather.write_dataframe(df, fthfilename)
  File "/home/jingyuan/anaconda3/envs/arrow-2059-feather-0.4.0/lib/python3.6/site-packages/pyarrow/feather.py", line 100, in write_feather
    writer.write(df)
  File "/home/jingyuan/anaconda3/envs/arrow-2059-feather-0.4.0/lib/python3.6/site-packages/pyarrow/feather.py", line 80, in write
    batch = RecordBatch.from_pandas(df, preserve_index=False)
  File "table.pxi", line 705, in pyarrow.lib.RecordBatch.from_pandas
  File "table.pxi", line 739, in pyarrow.lib.RecordBatch.from_arrays
TypeError: Cannot convert pyarrow.lib.ChunkedArray to pyarrow.lib.Array

asfimport · 2018-02-05T04:10:41Z

Jingyuan Wang / @alphalfalfa:
Stack trace of the failure when processing 100M-row data (feather-0.3.1):


UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 9: invalid continuation byte

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "test.py", line 44, in <module>
    elapsed_ts.append(test(df, i))
  File "test.py", line 20, in test
    df = feather.read_dataframe(fthfilename)
  File "/home/jingyuan/anaconda3/envs/arrow-2059-feather-0.3.1/lib/python3.6/site-packages/feather/api.py", line 93, in read_dataframe
    arr = col.read()
SystemError: <built-in method read of feather.ext.Column object at 0x7f5fdc958a40> returned a result with an error set

asfimport · 2018-02-05T16:55:36Z

Wes McKinney / @wesm:
Thanks @alphalfalfa – this suggests that we need to optimize the string handling path (since strings are different in py2 vs py3 this explains why the performance is a lot different). The write failure with the 100M table can't be fixed at the moment (see ARROW-1907) – as soon as we have R bindings for Arrow (and thus the current iteration of the feather format code) we can fix.

asfimport · 2018-07-12T16:25:56Z

Antoine Pitrou / @pitrou:
Here is a profile tree got by exercising the write path on Python 3.6:

   - arrow::py::NumPyConverter::ConvertObjectStrings()
      - 80,27% arrow::py::AppendObjectStrings(tagPyArrayObject*, tagPyArrayObject*, long, bool, arrow::StringBuilder*, long*, bool*)
         - 50,74% arrow::py::internal::BuilderAppend(arrow::StringBuilder*, _object*, bool, bool*)
            - 24,95% arrow::BinaryBuilder::Append(unsigned char const*, int)
                 7,43% arrow::BinaryBuilder::AppendNextOffset()
               + 6,28% arrow::BufferBuilder::Resize(long, bool)
                 2,30% __memcpy_avx_unaligned
                 0,71% arrow::ArrayBuilder::Reserve(long)
              6,16% PyUnicode_AsUTF8AndSize
            + 4,37% PyErr_Occurred
         - 16,70% arrow::py::internal::PandasObjectIsNull(_object*)
            - 8,29% arrow::py::internal::PyDecimal_Check(_object*)
                 PyType_IsSubtype
            - 4,59% arrow::py::internal::PyFloat_IsNaN(_object*)
                 PyType_IsSubtype
           2,51% PyArray_MultiplyList
           2,41% PyType_IsSubtype
      + 1,57% arrow::ArrayBuilder::Finish(std::shared_ptr<arrow::Array>*)

asfimport · 2018-07-12T16:37:41Z

Wes McKinney / @wesm:
Thanks @pitrou, do you have some tricks for creating profiles like that (I have made FlameGraphs but not much more)?

asfimport · 2018-07-12T16:45:33Z

Antoine Pitrou / @pitrou:
I use Linux "perf" on a benchmark script than ran "perf report" to get call trees and then manually fed some paste of that into "c++filt".

The most accessible resource I've found about the "perf" utility is http://www.brendangregg.com/perf.html

asfimport · 2018-07-16T16:10:17Z

Antoine Pitrou / @pitrou:
There doesn't seem to be anything obvious left to do here. Most of the time taken here is converting strings around (in Python 3, that means bytes to unicode, or the reverse). There's probably some overhead elsewhere, but nothing seems to stand out.

asfimport · 2018-07-16T17:00:04Z

Wes McKinney / @wesm:
Thanks @pitrou for investigating. I agree there doesn't seem much more to do. I'll review the patch you opened and then resolve this once that's merged (I will quickly check the write perf to see what things look like on my end after that).

asfimport · 2018-07-16T17:02:33Z

Antoine Pitrou / @pitrou:
The patch was already merged actually (see #2258 ).

asfimport · 2018-07-16T17:07:09Z

Wes McKinney / @wesm:
Oh great, thanks =) Well I'll run some perf tests locally, post the results here if anything interesting, and then resolve

asfimport · 2018-07-27T01:45:55Z

Wes McKinney / @wesm:
I ran some benchmarks locally (quad-core Xeon E3-1535M, Ubuntu 14.04). There is still a slight ~5-10% write performance regression, but reading is faster.

Feather 0.3.1:

# WRITE
$ python bench.py
Elapsed: 15.497231721878052 seconds
Average: 1.549723172187805

# READ
$ python bench.py 
Elapsed: 9.88158106803894 seconds
Average: 0.988158106803894

Feather 0.4.0

# WRITE
$ python bench.py
Elapsed: 16.36524486541748 seconds
Average: 1.636524486541748

# READ
$ python bench.py 
Elapsed: 7.4859395027160645 seconds
Average: 0.7485939502716065

Here's the benchmarking script so people can run their own experiments. It would be useful to look at the perf output more closely and see what else we can do to make things faster:

import io
import pickle
import time

import feather
import pandas as pd


def generate_example():
    buf = io.StringIO("""07300003030539,42198997,-1,2016-10-03T13:14:22.326Z
    41130003053286,42224636,-1,2016-09-20T19:31:51.196Z
    """)

    table = pd.read_csv(buf, header=None)
    table = pd.concat([table] * 5000, axis=0, ignore_index=True)
    table = pd.concat([table] * 1000, axis=0, ignore_index=True)

    with open('example.pkl', 'wb') as f:
        pickle.dump(table, f)


def _get_time():
    return time.clock_gettime(time.CLOCK_REALTIME)


class Timer:

    def __init__(self, iterations):
        self.start_time = _get_time()
        self.iterations = iterations

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value, tb):
        elapsed = _get_time() - self.start_time
        print("Elapsed: {0} seconds\nAverage: {1}"
              .format(elapsed, elapsed / self.iterations))


def feather_write_bench(iterations=10):
    with open('example.pkl', 'rb') as f:
        data = pickle.load(f)

    with Timer(iterations):
        for i in range(iterations):
            feather.write_dataframe(data, 'example.fth')


def feather_read_bench(iterations=10):
    import gc
    gc.disable()
    with Timer(iterations):
        for i in range(iterations):
            feather.read_dataframe('example.fth')
    gc.enable()


# generate_example()
# feather_write_bench()
# feather_read_bench()

To use

Run with generate_example() to make the data file
Run write benchmarks with feather_write_bench
Run read benchmarks with feather_read_bench

asfimport · 2018-08-03T19:30:18Z

Wes McKinney / @wesm:
I just attached a FlameGraph (perf.svg) created after the work in ARROW-2814.

Some notes:

Buffer trimming in BinaryBuilder::FinishInternal is taking up a good amount of time
perf isn't recording what symbols are under the libpthread calls; that would be interesting to know.

asfimport · 2018-08-06T09:26:13Z

Antoine Pitrou / @pitrou:
If you get a lot of unknown symbols in the call stacks, my experience is that you may try to change the way call stacks are recorded by "perf record". On my machine, for example, only "--call-graph=dwarf" gives good results.

asfimport · 2018-08-06T23:15:26Z

Wes McKinney / @wesm:
Ah great, I will try that out

asfimport · 2018-08-19T09:24:25Z

Uwe Korn / @xhochy:
Also adding -fno-omit-frame-pointer makes the stacks much more clearer. But this sadly comes at a bit of overhead, so one needs to be careful to make final measurements without this flag again.

asfimport closed this as completed Jul 27, 2018

asfimport assigned pitrou Jan 10, 2023

asfimport mentioned this issue Jan 11, 2023

[Python] Speed up null testing with Pandas semantics #19214

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Possible performance regression in Feather read/write path #18037

[Python] Possible performance regression in Feather read/write path #18037

asfimport commented Jan 30, 2018 •

edited

asfimport commented Feb 5, 2018

asfimport commented Feb 5, 2018

asfimport commented Feb 5, 2018

asfimport commented Feb 5, 2018

asfimport commented Jul 12, 2018

asfimport commented Jul 12, 2018

asfimport commented Jul 12, 2018

asfimport commented Jul 16, 2018

asfimport commented Jul 16, 2018

asfimport commented Jul 16, 2018

asfimport commented Jul 16, 2018

asfimport commented Jul 27, 2018

asfimport commented Aug 3, 2018

asfimport commented Aug 6, 2018

asfimport commented Aug 6, 2018

asfimport commented Aug 19, 2018

[Python] Possible performance regression in Feather read/write path #18037

[Python] Possible performance regression in Feather read/write path #18037

Comments

asfimport commented Jan 30, 2018 • edited

Related issues:

Original Issue Attachments:

asfimport commented Feb 5, 2018

asfimport commented Feb 5, 2018

asfimport commented Feb 5, 2018

asfimport commented Feb 5, 2018

asfimport commented Jul 12, 2018

asfimport commented Jul 12, 2018

asfimport commented Jul 12, 2018

asfimport commented Jul 16, 2018

asfimport commented Jul 16, 2018

asfimport commented Jul 16, 2018

asfimport commented Jul 16, 2018

asfimport commented Jul 27, 2018

asfimport commented Aug 3, 2018

asfimport commented Aug 6, 2018

asfimport commented Aug 6, 2018

asfimport commented Aug 19, 2018

asfimport commented Jan 30, 2018 •

edited