Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-3199: [Plasma] File descriptor send and receive retries #2551

Closed
wants to merge 6 commits into from

Conversation

pcmoritz
Copy link
Contributor

@pcmoritz pcmoritz commented Sep 12, 2018

An additional piece of eyes would be appreciated for this. It seems to solve our issue reported in the JIRA, but I'm not sure what the semantics of partial reads/writes is here (in particular, how are partial read/writes handled for ancillary data like file descriptors?).

found by cc @stephanie-wang

@pcmoritz pcmoritz changed the title ARROW-3199: [Plasma] File descriptor send and receive retries [WIP] ARROW-3199: [Plasma] File descriptor send and receive retries Sep 12, 2018
@wesm
Copy link
Member

wesm commented Sep 13, 2018

@pcmoritz is this the cause of

pyarrow/tests/test_plasma.py::test_plasma_list FAILED                                                                                                                                   [ 96%]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> captured stderr >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Allowing the Plasma store to use up to 0.1GB of memory.
Starting object store with directory /dev/shm and huge page support disabled
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traceback >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

    @pytest.mark.plasma
    def test_plasma_list():
        import pyarrow.plasma as plasma
    
        with plasma.start_plasma_store(
                plasma_store_memory=DEFAULT_PLASMA_STORE_MEMORY) \
                as (plasma_store_name, p):
            plasma_client = plasma.connect(plasma_store_name, "", 0)
    
            # Test sizes
            u, _, _ = create_object(plasma_client, 11, metadata_size=7, seal=False)
            l1 = plasma_client.list()
            assert l1[u]["data_size"] == 11
            assert l1[u]["metadata_size"] == 7
    
            # Test ref_count
            v = plasma_client.put(np.zeros(3))
            l2 = plasma_client.list()
            # Ref count has already been released
            assert l2[v]["ref_count"] == 0
            a = plasma_client.get(v)
            l3 = plasma_client.list()
>           assert l3[v]["ref_count"] == 1
E           assert 0 == 1

pyarrow/tests/test_plasma.py:820: AssertionError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /home/wesm/code/arrow/python/pyarrow/tests/test_plasma.py(820)test_plasma_list()
-> assert l3[v]["ref_count"] == 1

Just seeing this locally

@pcmoritz
Copy link
Contributor Author

Hmm, that sounds unlikely to me. What's the configuration you are running the test on, do you think I could reproduce it?

@wesm
Copy link
Member

wesm commented Sep 13, 2018

Running on Ubuntu 14.04, clang-6.0

@pcmoritz
Copy link
Contributor Author

And which python?

@wesm
Copy link
Member

wesm commented Sep 13, 2018

Python 3.6.5

@pcmoritz pcmoritz force-pushed the plasma-retries branch 2 times, most recently from 126ccf2 to 1e3f91f Compare November 29, 2018 23:14
@codecov-io
Copy link

codecov-io commented Nov 30, 2018

Codecov Report

Merging #2551 into master will increase coverage by 1.1%.
The diff coverage is 39.13%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #2551     +/-   ##
=========================================
+ Coverage   87.07%   88.18%   +1.1%     
=========================================
  Files         489      431     -58     
  Lines       69010    65454   -3556     
=========================================
- Hits        60093    57721   -2372     
+ Misses       8816     7733   -1083     
+ Partials      101        0    -101
Impacted Files Coverage Δ
cpp/src/plasma/store.cc 92.42% <100%> (+1.04%) ⬆️
cpp/src/plasma/fling.cc 67.18% <36.36%> (-17.26%) ⬇️
cpp/src/arrow/io/interfaces.cc 50% <0%> (-5%) ⬇️
cpp/src/plasma/thirdparty/ae/ae.c 72.03% <0%> (-0.95%) ⬇️
cpp/src/arrow/util/thread-pool-test.cc 98.91% <0%> (-0.55%) ⬇️
cpp/src/arrow/compute/kernels/cast.cc 91.6% <0%> (-0.26%) ⬇️
cpp/src/arrow/python/numpy_to_arrow.cc 93.53% <0%> (-0.2%) ⬇️
cpp/src/arrow/io/memory.cc 89.22% <0%> (-0.2%) ⬇️
cpp/src/parquet/util/memory.cc 86.33% <0%> (-0.12%) ⬇️
python/pyarrow/tests/test_plasma_tf_op.py 97.87% <0%> (-0.05%) ⬇️
... and 73 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 681efd8...c7ca3b7. Read the comment docs.

return static_cast<int>(r);
}
} else if (r == 0) {
ARROW_LOG(INFO) << "Encountered unexpected EOF";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we may need to return right? Will we infinite loop here?

In the past it looks like we were returning 0 here and indicating success.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I fixed it!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a little unclear on this case. Does r == 0 mean the recipient hung up? In that case we return 0 and it's treated as success?

Copy link
Contributor Author

@pcmoritz pcmoritz Dec 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that seems to be the right behavior. Note there will also be a "Disconnected client" message in the main loop of the store to detect this.

@pcmoritz pcmoritz changed the title [WIP] ARROW-3199: [Plasma] File descriptor send and receive retries ARROW-3199: [Plasma] File descriptor send and receive retries Dec 3, 2018
@pcmoritz
Copy link
Contributor Author

pcmoritz commented Dec 3, 2018

+1 This is now ready to merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants