Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page 660: The warning of what to watch out for under the "Parallel Algorithms Are Not Magic" sounds probably wrong/misleading in part. #141

Closed
WraithGlade opened this issue Jan 21, 2020 · 9 comments

Comments

@WraithGlade
Copy link

@WraithGlade WraithGlade commented Jan 21, 2020

From the text:

A red flag is any algorithm that passes a function object to the algorithm. If the function object has mutable state, the executing threads will have shared access and you might have a race condition.

I'm no expert in concurrency or parallelism, but two parts of this section of the text sound very likely to be wrong/misleading.

From what I understand, the existence of mutable state in the function object should only result in the potential for race conditions if that mutable state is shared. In contrast though, internal mutable state that isn't visible outside of the function object seems like it would be fine. The way you've worded it though sounds like no mutable state whatsoever should ever be used inside a function object though, which would be very limiting. It sounds wrong. Your example of a race for std::transform also uses shared mutable state, not just normal internal mutable state, so this lines up with my suspicions here.

On the other hand though, I'm no expert in concurrency and parallelism, so I might be wrong potentially. I don't think so though. The above section of text from the book sounds fishy and doesn't seem to line up with what concurrency/parallelism is supposed to be like, from what I understand of it.

Thus the two problems in the above text:

(1) "any algorithm that passes a function object to the algorithm" is not in fact a red flag, in and of itself, if my understanding is correct.

(2) The other sentence should probably be changed to read as follows (or similar):

If the function object has shared mutable state, the executing threads will have shared access and you might have a race condition.

Correct me if I'm wrong. I'm not 100% sure, but I'm fairly confident.

@JLospinoso

This comment has been minimized.

Copy link
Owner

@JLospinoso JLospinoso commented Jan 21, 2020

Hi @WraithGlade, thanks for posting!

You are absolutely correct that mutable state isn't an issue if it isn't shared. But by passing this mutable state via a function object into a parallel algorithm, you are sharing that mutable state. The parallel algorithm will dispatch work to executing threads that will share the function object you've passed. It's possible to do this correctly e.g. by using locks or atomics, but you need to pay very careful attention--hence the "red flag" idiom.

Does that make sense?

@WraithGlade

This comment has been minimized.

Copy link
Author

@WraithGlade WraithGlade commented Jan 22, 2020

Hey Josh, thanks for the feedback and help.

I tested what you said though and my compiler returned results that do not match what you said. Instead, the results I got back imply that each function object is independently instantiated in some sense and hence probably that only captured references, shared static variables, and any other transitively shared objects are actually shared. The algorithm seems to create separate copies of the function object for every thread as needed. No state changes from within std::transform ever reach the function object that was passed in.

Here's the code I used to test it:

template <typename TVecElem>
ostream& operator<<(ostream& os, vector<TVecElem> vec)
{
	os << "{ ";
	for (const TVecElem& num : vec)
	{
		os << num << " ";
	}
	os << "}";
	
	return os;
}

template <typename TNum>
struct SqFuncObj
{
	static_assert(std::is_arithmetic_v<TNum>);
	
	size_t num_of_calls{ 0 };
	TNum operator()(const TNum num_to_sq)
	{
		++num_of_calls;
		return num_to_sq * num_to_sq;
	}
};

TEST_CASE("std::transform - test a func obj with mutable state")  //a test of my own
{
	using num_type = int;
	constexpr size_t number_count{ 1'000'000 };

	vector<num_type> numbers(number_count);
	iota(numbers.begin(), numbers.end(), 0);
	
	vector<num_type> numbers_squared(number_count);
	
	SqFuncObj<num_type> sq_func_obj;
	
	std::cout << "Performing " << number_count << " parallel calls via std::transform with sq_func_obj.\n";
	transform(
		execution::par, 
		numbers.begin(), numbers.end(), 
		numbers_squared.begin(), 
		sq_func_obj);
	std::cout << "Finished the parallel task.\n";
	
	std::cout << "sq_func_obj.num_of_calls: " << sq_func_obj.num_of_calls << "\n";
	
	vector<num_type> first_ten_sq(numbers_squared.begin(), numbers_squared.begin() + 10);
	std::cout << "first_ten_sq: " << first_ten_sq << "\n";
	
	//Test that the counter is indeed working:
	sq_func_obj(2);
	REQUIRE(sq_func_obj.num_of_calls == 1);
	sq_func_obj(3);
	REQUIRE(sq_func_obj.num_of_calls == 2);
}

This code produces the following output on my computer:

Performing 1000000 parallel calls via std::transform with sq_func_obj.
Finished the parallel task.
sq_func_obj.num_of_calls: 0
first_ten_sq: { 0 1 4 9 16 25 36 49 64 81 }

I'm running Windows 7 64 bit, Visual Studio 2019, C++17, in debug mode, in x64 (i.e. 64 bit) mode.

Am I still missing something? Are these results specific to my computer?

If not though, and these results indicate what they seem to, then it would appear that your statement of how it works is wrong, in which case indeed only shared mutable state can cause problems, not just any mutable state in the function object, in which case the book text should be updated to reflect that to avoid confusion.

Concurrency and parallelism can be very tricky and subtle though. I could be wrong. I had trouble finding a clear answer to this question on the internet.

One strange thing though that I observed was that when I defined a constructor for SqFuncObj (which I later deleted in the above code) that used std::cout to print a message every time a SqFuncObj was constructed then only a single such message ever appeared. This makes it seem like only one instance was constructed... and yet std::transform seems to be keeping the operator() totally independent regardless. Perhaps it is doing a memcopy that bypasses the constructor, or perhaps something else. What are your thoughts on this?

@JLospinoso

This comment has been minimized.

Copy link
Owner

@JLospinoso JLospinoso commented Jan 22, 2020

Hi @WraithGlade, thanks for the code example! You've definitely uncovered some nuance here with function object state.

The parallel algorithms require function objects to be copyable. (You can verify this by deleting the copy constructor for your SqFuncObj!) As you've illustrated, this means and local state that the function object keeps won't be visible to other threads.

I think it's relatively niche application to have purely local state on a function object that you've passed into a parallel algorithm. Maybe you'd want this if you were caching calculations within a thread, making network requests, etc.

It's seems more likely that a function object would want to modify some non-local state. Looking to your example, you've illustrated just this point -- the num_of_calls member gets totally discarded by the program. If the programmer wanted to count the number of calls, they would either refer to a variable with static storage duration or, if using a lambda, capture some variable by reference.

All this is to say that I would still call this a red flag, in the sense that the programmer should be on high alert that they don't introduce race conditions. But I like your proposed modification, since the text is currently misleading on the point of function object local state:

If the function object has shared mutable state, the executing threads will have shared access and you might have a race condition.

In a future revision, I think it would be nice to have a paragraph or summarizing our discussion here though!

What do you think?

@WraithGlade

This comment has been minimized.

Copy link
Author

@WraithGlade WraithGlade commented Jan 22, 2020

Another thing I tried was changing the operator() function to this:

	TNum operator()(const TNum num_to_sq)
	{
		++num_of_calls;
		cout << "(" << num_of_calls << ") ";
		return num_to_sq * num_to_sq;
	}

... Which does indeed result in some pretty weird stuff being printed to std::cout. I thought for a moment that perhaps this proved that it actually isn't safe after all. However, just a moment later, it occurred to me that this would of course result in weird behavior regardless of the state of num_of_calls, due to the fact that std::cout (and the associated console) counts as a shared mutable state.

Hmm. Still not sure if the num_of_calls variable is safe or not really. Concurrency is subtle stuff sometimes.

The upside of being able to have mutable state in the passed-in function object would be that you could use it to help with computations per each instance maybe. Perhaps it would be better to just keep any such mutable state inside the actual function-local scope of the function call, but it would be nice to know what the exact nuances of what you can get away with in these parallel algorithm calls is.

@JLospinoso

This comment has been minimized.

Copy link
Owner

@JLospinoso JLospinoso commented Jan 22, 2020

@WraithGlade, this totally fits within our discussion -- std::cout isn't thread safe, and it's got static storage duration. If you want to access it from within your function object, you'll need synchronization!

@WraithGlade

This comment has been minimized.

Copy link
Author

@WraithGlade WraithGlade commented Jan 22, 2020

What of the num_of_calls variable though?

If it was unsafe to have that mutable state attached to the function object then I would think the most likely outcome would be that the num_of_calls would end up with some random high value. Even if the assembly instructions end up interleaved in weird ways, I'd think it would be extremely likely to at least increase past 0.

Instead though, it always ends up as 0, which is more consistent with the theory that copies of the function object are being created than that it is shared mutable state.

Maybe I'll eventually be able to find an official authoritative source on exactly what the behavior here should be somewhere online, but I haven't been able to find one yet.

@JLospinoso

This comment has been minimized.

Copy link
Owner

@JLospinoso JLospinoso commented Jan 22, 2020

So sq_func_obj -- the object -- gets copied by std::transform and doesn't get touched while std::transform executes. So you've got N copies of sq_func_obj, each with their own num_of_calls value representing how many times that particular copy got invoked.

@WraithGlade

This comment has been minimized.

Copy link
Author

@WraithGlade WraithGlade commented Jan 22, 2020

Oh, and it also occurs to me that I should mention that I'm not actually trying to compute num_of_calls in my code (not really), but rather I am just using that to test the claim about shared mutable state being unstable or not in this context.

It is clear to me that regardless of whether the num_of_calls variable is thread-safe or not that it can't possibly return a reliable value of the number of calls that were made, unless we used locks or other concurrency mechanisms.

My code is just intended to try to figure out whether or not mutable states internal to the function object are thread safe or not or if instead we cannot safely use such constructs.

@WraithGlade

This comment has been minimized.

Copy link
Author

@WraithGlade WraithGlade commented Jan 22, 2020

Hey again Josh, I just saw your new edit you made to your second response.

Thank you very much for the clarifications. The comment about the copy constructor deletion etc made it finally click for me. For some reason I hadn't considered using the copy constructor to test whether the function object was actually being copied or not. I was trying to work from the default constructor instead, to test copying, which in hindsight doesn't make much sense.

Now I understand the nuances enough to feel more comfortable using these parallel algorithms from . And, upon more consideration, I do think you're right that this is a fairly niche use case in many contexts, but its nice to know that it's there. It also makes sense from a function object parameter value copy semantics standpoint that it would be designed this way.

I think adding a new paragraph as you've suggested in your comment edit sounds good. It will probably help some readers get a clearer understanding of it. Where there's one confused person, there's bound to be others.

Anyway: Thanks a bunch for the help. I think I understand it significantly better now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.