New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-1964: [Python] Expose StringBuilder to Python #1930

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@dsimmie
Contributor

dsimmie commented Apr 22, 2018

  • Partial implementation of ARROW-1964
  • Only implements StringBuilder not the other builders, notably the DictionaryBuilder mentioned in issue 1964

@dsimmie dsimmie changed the title from ARROW-1964: [Python ] Exposed StringBuilder to Cython to ARROW-1964: [Python] Exposed StringBuilder to Cython Apr 22, 2018

@dsimmie dsimmie force-pushed the dsimmie:ARROW-1964-Partial branch from f7d483f to 866d4bf Apr 23, 2018

@pitrou

Thanks for doing this. I've posted some review comments below.
Also, on a higher level, do you plan to tackle other builder types in this PR or not?

def finish(self):
return pyarrow_wrap_array(self._finish())
cdef shared_ptr[CArray] _finish(self) nogil:

This comment has been minimized.

@pitrou

pitrou Apr 23, 2018

Contributor

Declaring a function nogil tells Cython that it is safe to call the function without the GIL, but Cython will not release the GIL automatically. You need a with nogil block for that (and then the private function is useless).

See Cython docs at https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html?highlight=nogil#releasing-the-gil

@pytest.fixture
def sbuilder():
return StringBuilder()

This comment has been minimized.

@pitrou

pitrou Apr 23, 2018

Contributor

I don't think there's a point in writing a fixture for that. Just create StringBuilder directly where you need it.

assert (arr.null_count == 2)
def test_string_builder_append_both(sbuilder):

This comment has been minimized.

@pitrou

pitrou Apr 23, 2018

Contributor

All those value-appending tests are a bit verbose. You can define a single one that appends both Nones, nans and strings.

sbuilder.append("Some text")
sbuilder.append(np.nan)
sbuilder.append(None)
assert (sbuilder.null_count() == 4)

This comment has been minimized.

@pitrou

pitrou Apr 23, 2018

Contributor

Style nit: in Python, you don't need parentheses around assertions.

def length(self):
return self.builder.get().length()
def null_count(self):

This comment has been minimized.

@pitrou

pitrou Apr 23, 2018

Contributor

Since Array.null_count is a property, Builder.null_count probably should as well.

for value in values:
self.append(value)
def length(self):

This comment has been minimized.

@pitrou

pitrou Apr 23, 2018

Contributor

Please implement __len__ instead.

assert (arr.to_pylist() == expected)
def test_string_builder_append_numpy(sbuilder):

This comment has been minimized.

@pitrou

pitrou Apr 23, 2018

Contributor

I'm not sure it makes sense to have separate test functions for each input iterable type. And there's a lot of code duplication here too. Since the implementation uses iteration, I think testing for lists is enough.

@pitrou

This comment has been minimized.

Contributor

pitrou commented Apr 23, 2018

As a side note, append_values here won't be very optimized since it does a Python method call at each iteration. But we can optimize later.

@pitrou pitrou changed the title from ARROW-1964: [Python] Exposed StringBuilder to Cython to ARROW-1964: [Python] Expose StringBuilder to Python Apr 23, 2018

@pitrou

This comment has been minimized.

Contributor

pitrou commented May 3, 2018

@dsimmie Do you think you'll have time to address the review comments? Otherwise we could take over.

@AlexHagerman

This comment has been minimized.

Contributor

AlexHagerman commented May 4, 2018

@pitrou I'm going to have more time this month to contribute than I did the last few weeks. Pending dsimmie response would this be a good non blocking ticket to pick up and work on?

@xhochy

This comment has been minimized.

Member

xhochy commented May 4, 2018

@AlexHagerman yes this would be a good ticket.

@dsimmie

This comment has been minimized.

Contributor

dsimmie commented May 5, 2018

@pitrou

This comment has been minimized.

Contributor

pitrou commented May 5, 2018

No worries, I just wanted to make sure this PR doesn't become stale.

@AlexHagerman

This comment has been minimized.

Contributor

AlexHagerman commented May 5, 2018

@dsimmie do you plan to work on the other builders? If not I'll begin working on the DictionaryBuilder.

@dsimmie

This comment has been minimized.

Contributor

dsimmie commented May 5, 2018

@AlexHagerman

This comment has been minimized.

Contributor

AlexHagerman commented May 6, 2018

@dsimmie no rush and good luck I'll work on a different ticket.

@dsimmie dsimmie force-pushed the dsimmie:ARROW-1964-Partial branch from 866d4bf to b3f7cd7 May 6, 2018

dsimmie added a commit to dsimmie/arrow that referenced this pull request May 6, 2018

@dsimmie dsimmie force-pushed the dsimmie:ARROW-1964-Partial branch from b3f7cd7 to 9d32574 May 6, 2018

dsimmie added a commit to dsimmie/arrow that referenced this pull request May 6, 2018

@dsimmie

This comment has been minimized.

Contributor

dsimmie commented May 6, 2018

I have made the changes you requested @pitrou. I think I will just leave this PR for just the StringBuilder as it has been around for long enough as is.

I would like to work to work on the other builders but as I'm just starting off on these it might take me a little longer than someone else. If I don't hear that someone else is going to take them over I will have a try with them this week and submit through a different PR.

@pitrou

Thanks for the update @dsimmie. I posted a few more comments below.

sbuilder.append('jsaafakjh')
sbuilder.append(np.nan)
sbuilder.append(None)
assert (len(sbuilder) == 4)

This comment has been minimized.

@pitrou

pitrou May 7, 2018

Contributor

Nit: you don't need parentheses around asserts.

assert (isinstance(arr, pa.Array))
assert (arr.null_count == 2)
assert (arr.type == 'str')
assert(len(arr.to_pylist()) == 4)

This comment has been minimized.

@pitrou

pitrou May 7, 2018

Contributor

Why not check the value instead of just its length?

import six
from pyarrow.compat import tobytes
cdef class StringBuilder:

This comment has been minimized.

@pitrou

pitrou May 7, 2018

Contributor

It would be nice to add a docstring here and on non-trivial public methods. You can take a look at e.g. the Array class in array.pxi to get a sense of our conventions.

@dsimmie dsimmie force-pushed the dsimmie:ARROW-1964-Partial branch from 9d32574 to e4e54ec May 8, 2018

@dsimmie

This comment has been minimized.

Contributor

dsimmie commented May 8, 2018

Does anyone know why only one of the four C++ compiles would have failed?

The issue is in the plasma test teardown. Error from the gcc C++ build is:

test_method = <bound method TestPlasmaClient.test_subscribe of <pyarrow.tests.test_plasma.TestPlasmaClient object at 0x7f3c14116890>>
    def teardown_method(self, test_method):
        try:
            # Check that the Plasma store is still alive.
            assert self.p.poll() is None
            # Ensure Valgrind detected no issues
            if USE_VALGRIND:
                self.p.send_signal(signal.SIGTERM)
                self.p.wait()
>               assert self.p.returncode == 0
E               assert 1 == 0
E                +  where 1 = <subprocess.Popen object at 0x7f3c14116a50>.returncode
E                +    where <subprocess.Popen object at 0x7f3c14116a50> = <pyarrow.tests.test_plasma.TestPlasmaClient object at 0x7f3c14116890>.p
pyarrow-test-2.7/lib/python2.7/site-packages/pyarrow/tests/test_plasma.py:203: AssertionError
@xhochy

This comment has been minimized.

Member

xhochy commented May 8, 2018

@dsimmie This is unrelated to your changes. Seems there was recently some flakiness introduced to Plasma, this is tracked in https://issues.apache.org/jira/browse/ARROW-2552.

@pitrou

pitrou approved these changes May 8, 2018

@pitrou pitrou closed this in 414268b May 10, 2018

@pitrou

This comment has been minimized.

Contributor

pitrou commented May 10, 2018

Thanks your contribution @dsimmie !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment