Skip to content

ARROW-6618: [Python] Fix read_message() segfault on end of stream#5437

Closed
pitrou wants to merge 2 commits intoapache:masterfrom
pitrou:ARROW-6618-read-message-crash
Closed

ARROW-6618: [Python] Fix read_message() segfault on end of stream#5437
pitrou wants to merge 2 commits intoapache:masterfrom
pitrou:ARROW-6618-read-message-crash

Conversation

@pitrou
Copy link
Copy Markdown
Member

@pitrou pitrou commented Sep 19, 2019

No description provided.

check_status(ReadMessage(c_stream, &result.message))

if result.message == nullptr:
raise EOFError("End of Arrow stream")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also have a null message if it was an invalid message (eg pa.read_message(b"invalidbuffer") also gives this error now). Maybe the error message should indicated that? Although this is probably a less typical case to do (I only tried that while trying to find a minimal reproducible example for the segfault).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. We should perhaps make error checking smarter on the C++ side. Can you open a separate JIRA?

@wesm
Copy link
Copy Markdown
Member

wesm commented Sep 19, 2019

I'm taking a quick look at this

@wesm
Copy link
Copy Markdown
Member

wesm commented Sep 19, 2019

There's some other weird stuff here, for example

In [12]: def f():  
    ...:     try: 
    ...:         pa.ipc.read_message(b'fooo') 
    ...:     except: 
    ...:         pass 
    ...:                                                                                                                                                                                       

In [13]: f()                                                                                                                                                                                   

In [14]: f()                                                                                                                                                                                   

In [15]: timeit f()                                                                                                                                                                            
108 ms ± 447 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

No idea why it takes 100ms to fail

@wesm
Copy link
Copy Markdown
Member

wesm commented Sep 19, 2019

Ah, it's allocating memory unconditionally. I'll push some fixes to this branch

@wesm
Copy link
Copy Markdown
Member

wesm commented Sep 19, 2019

OK now

In [1]: pa.ipc.read_message(b'b0')                                                                                                                                                             
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-1-b403132740f1> in <module>
----> 1 pa.ipc.read_message(b'b0')

~/code/arrow/python/pyarrow/ipc.pxi in pyarrow.lib.read_message()
    547 
    548     with nogil:
--> 549         check_status(ReadMessage(c_stream, &result.message))
    550 
    551     if result.message == nullptr:

~/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
     76         message = frombytes(status.message())
     77         if status.IsInvalid():
---> 78             raise ArrowInvalid(message)
     79         elif status.IsIOError():
     80             raise ArrowIOError(message)

ArrowInvalid: Corrupted message, only 2 bytes available

In [2]: pa.ipc.read_message(b'')                                                                                                                                                               
---------------------------------------------------------------------------
EOFError                                  Traceback (most recent call last)
<ipython-input-2-a2725e1f311d> in <module>
----> 1 pa.ipc.read_message(b'')

~/code/arrow/python/pyarrow/ipc.pxi in pyarrow.lib.read_message()
    550 
    551     if result.message == nullptr:
--> 552         raise EOFError("End of Arrow stream")
    553 
    554     return result

EOFError: End of Arrow stream

I also addressed the unneeded memory allocation issue (not sure how to easily test that)

@wesm
Copy link
Copy Markdown
Member

wesm commented Sep 19, 2019

@wesm wesm closed this in dbacce6 Sep 19, 2019
@pitrou pitrou deleted the ARROW-6618-read-message-crash branch September 20, 2019 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants