Added (local) kendall tau#18
Added (local) kendall tau#18CGMossa wants to merge 6 commits intoMoseleyBioinformaticsLab:masterfrom
Conversation
|
A comment on the Rcpp version: I think the default type for the sum stuff should be integers. Summing integers should be faster than summing floats, but that might just be me. |
|
So I tried to do this change to the rcpp code: And I think that is what broke the rcpp implementation. Reverting back yields results comparable to what my rust version is outputting. |
|
Cannot simply change these variables to unsigned integers without
converting them back to float in the final calculations.
…On Tue, Sep 29, 2020 at 11:05 AM CGMossa ***@***.***> wrote:
So I tried to do this change to the rcpp code:
unsigned int sum_concordant = 0;
unsigned int sum_discordant = 0;
unsigned int sum_x_ties = 0;
unsigned int sum_y_ties = 0;
unsigned int sum_tied_x = 0;
unsigned int sum_tied_y = 0;
unsigned int sum_tied_x_na = 0;
unsigned int sum_tied_y_na = 0;
unsigned int sum_all_na = 0;
And I think that is what broke the rcpp implementation. Reverting back
yields results comparable to what my rust version is outputting.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#18 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADEP7B247VRS6RHZRWCWHRLSIHZUPANCNFSM4R55IZ4A>
.
--
Hunter Moseley, Ph.D. -- Univ. of Kentucky
Associate Professor, Dept. of Molec. & Cell. Biochemistry / Markey Cancer
Center
/ Institute for Biomedical Informatics / UK Superfund Research Center
Not just a scientist, but a fencer as well.
My foil is sharp, but my mind sharper still.
---------------------------------------------------------------
Email: hunter.moseley@uky.edu (work) hunter.moseley@gmail.com
(personal)
Phone: 859-218-2964 (office) 859-218-2965 (lab) 859-257-7715 (fax)
Web: http://bioinformatics.cesb.uky.edu/
Address: CC434 Roach Building, 800 Rose Street, Lexington, KY 40536-0093
|
|
Thanks. I think I just tried. I'm just a bit spoiled with rust, that it
would warn, where in there are ambiguities.
Next step would be to implement the rest of the cases, and try to
parallelize some stuff..
…On Tue, Sep 29, 2020, 17:26 Hunter Moseley ***@***.***> wrote:
Cannot simply change these variables to unsigned integers without
converting them back to float in the final calculations.
On Tue, Sep 29, 2020 at 11:05 AM CGMossa ***@***.***> wrote:
> So I tried to do this change to the rcpp code:
>
>
> unsigned int sum_concordant = 0;
> unsigned int sum_discordant = 0;
> unsigned int sum_x_ties = 0;
> unsigned int sum_y_ties = 0;
> unsigned int sum_tied_x = 0;
> unsigned int sum_tied_y = 0;
> unsigned int sum_tied_x_na = 0;
> unsigned int sum_tied_y_na = 0;
> unsigned int sum_all_na = 0;
>
> And I think that is what broke the rcpp implementation. Reverting back
> yields results comparable to what my rust version is outputting.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <
#18 (comment)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/ADEP7B247VRS6RHZRWCWHRLSIHZUPANCNFSM4R55IZ4A
>
> .
>
--
Hunter Moseley, Ph.D. -- Univ. of Kentucky
Associate Professor, Dept. of Molec. & Cell. Biochemistry / Markey Cancer
Center
/ Institute for Biomedical Informatics / UK Superfund Research Center
Not just a scientist, but a fencer as well.
My foil is sharp, but my mind sharper still.
---------------------------------------------------------------
Email: ***@***.*** (work) ***@***.***
(personal)
Phone: 859-218-2964 (office) 859-218-2965 (lab) 859-257-7715 (fax)
Web: http://bioinformatics.cesb.uky.edu/
Address: CC434 Roach Building, 800 Rose Street, Lexington, KY 40536-0093
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#18 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIDVSFBZPUAEHFDLIYPWKLSIH4C5ANCNFSM4R55IZ4A>
.
|
|
Yes, my initial implementation used unsigned integers as well, but there is a division that happens in there by 2, and it doesn't always result in an integer (that is another discussion as to whether it should or not, currently it does not), and then gets added to everything else, and then we have a division at the end, so it was easier to just use floats. |
|
Irregardless: I don't know if my intuition is correct anyways. Parallel versionOutputs from parallel version: The benchmarks from the parallel version: ici-kendall-tau 1000 time: [2.2719 ms 2.2907 ms 2.3115 ms]
change: [-75.978% -75.667% -75.401%] (p = 0.00 < 0.05)
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severeInteresting, huh? |
Just remove the .par_into_iter() and the version stops being the parallelized version..
Just to get to the most important part:
A benchmark of the current rust implementation of this yields these results:
There's something not right, and I cannot seem to decipher it:
This is output from repeated simulation from standard normal for both x and y.
If I do the same with the r/rcpp-version I get:
i.e.
replicate(10, ici_kendallt(rnorm(1000), rnorm(1000))).