Skip to content

[SYSTEMDS-2972] Initial Multi-Threaded transformencode #1261

Closed
ilovemesomeramen wants to merge 11 commits intoapache:masterfrom
ilovemesomeramen:master
Closed

[SYSTEMDS-2972] Initial Multi-Threaded transformencode #1261
ilovemesomeramen wants to merge 11 commits intoapache:masterfrom
ilovemesomeramen:master

Conversation

@ilovemesomeramen
Copy link
Copy Markdown
Contributor

This PR adds basic Multithreading capability to the transform encode implementation.
Each ColumnEncoder can be executed on a separate thread or can be split up into even smaller subjob which only apply to a certain row range
Initial benchmarks with 16CPUs show up to a 50x speed improvement in comparison to the old SystemML implementation.
Currently this code is dormant, which means a call to transformencode in a DML script still uses a single threaded implementation. This will be changed when further improvements and testing are complete.
Large Matrices (e.g. 1000000x1000) are still not viable due to suspected Thread starving. This will be addressed in a future PR with some sort of access partitioning (Radix/Range).

This PR also brings back sparse support for large dummycoded matrices, which was accidentally removed in a prior PR

Added sparse support
# Conflicts:
#	src/main/java/org/apache/sysds/runtime/transform/encode/ColumnEncoderDummycode.java
#	src/test/resources/datasets/homes3/homes.tfspec_dummy_all.json
Copy link
Copy Markdown
Contributor

@phaniarnab phaniarnab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @ilovemesomeramen. LGTM.
I have no comments which must be addressed before merging. However, I have a few comments and suggestions for future discussions and commits.
I fixed a few formatting issues before merging.

Comment on lines +656 to +661
synchronized (sparseBlock.get(r)){
sparseBlock.set(r,c,v);
}
}else{
denseBlock.set(r,c,v);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the denseBlock/sparseBlock guaranteed to be allocated here?
why synchronize only sparse?

Copy link
Copy Markdown
Contributor Author

@ilovemesomeramen ilovemesomeramen May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it must be allocated before
Since denseblocks are just a collection of arrays and at the moment of writing to the block there should be nothing that is reading we can write concurrently without any need of synchronization, on the other hand the sparse blocks are only row independent so we need to sync over rows.

Comment on lines +62 to +64
public void setApplyBlockSize(int blk) {
APPLY_BLOCKSIZE = blk;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test with non-zero APPLY_BLOCKSIZE?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes sure, i have it in my local testcases and forgot to add it

Comment on lines +121 to +134
for(ColumnEncoderComposite encoder : _columnEncoders) {
List<Callable<Object>> partialBuildTasks = encoder.getPartialBuildTasks(in, blockSize);
if(partialBuildTasks == null) {
partials.add(null);
continue;
}
partials.add(pool.invokeAll(partialBuildTasks));
}
for(int e = 0; e < _columnEncoders.size(); e++) {
List<Future<Object>> partial = partials.get(e);
if(partial == null)
continue;
tasks.add(new ColumnMergeBuildPartialTask(_columnEncoders.get(e), partial));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion: This logic of creating tasks (column-wise row partition) restricts us from more sophisticated task creation with an arbitrary number of columns. This may not be a problem though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR i did a lot more testing and this partial building is rather complicated, especially since a ton of intermediates are being created increasing GC. At the moment partial building is not really viable in most scenario. This will be good to discuss on Friday.

Comment on lines +114 to +115
int blockSize = BUILD_BLOCKSIZE <= 0 ? in.getNumRows() : BUILD_BLOCKSIZE;
List<Callable<Integer>> tasks = new ArrayList<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion: What if we need column-specific block sizes in the future?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a good point.
This should not be a problem, since the encoders are independent we can just call the encoders with the blocksize we need.
So in the future we might get a array of blocksizes which we then just need to match

Comment on lines +655 to +656
_encoder.mergeBuildPartial(_partials, 0, _partials.size());
return 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the significance of this hard-coded 1?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that. should be null and the callable should be a Callable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants