Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added trimmed mean and symetric trimmed mean implementations and tests #22

Merged
merged 7 commits into from
May 18, 2022

Conversation

filipecosta90
Copy link
Collaborator

@filipecosta90 filipecosta90 commented May 17, 2022

Fixes #16 .
This PR adds the following 2 new APIs:

  • td_trimmed_mean: Returns the trimmed mean ignoring values outside given cutoff upper and lower limits
  • td_trimmed_mean_symmetric: Returns the trimmed mean ignoring values outside given a symmetric cutoff limits

To test out I've used scipy's stats trim_mean

To get a grasp of the trimmed means performance I've added a set of benchmarks. Here are the full benchmark results:

-------------------------------------------------------------------------------------------------------------------
Benchmark                                                         Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------------
BM_td_add_uniform_dist/100/10000000                       546590279 ns    546571841 ns           26 Centroid_Count=70 Total_Compressions=481.984k items_per_second=703.687k/s
BM_td_add_uniform_dist/200/10000000                       582119111 ns    582098920 ns           24 Centroid_Count=116 Total_Compressions=219.631k items_per_second=715.8k/s
BM_td_add_uniform_dist/300/10000000                       605849089 ns    605828072 ns           23 Centroid_Count=160 Total_Compressions=139.58k items_per_second=717.667k/s
BM_td_add_uniform_dist/400/10000000                       621972610 ns    621953201 ns           22 Centroid_Count=199 Total_Compressions=99.732k items_per_second=730.835k/s
BM_td_add_uniform_dist/500/10000000                       634853925 ns    634832442 ns           22 Centroid_Count=241 Total_Compressions=79.604k items_per_second=716.009k/s
BM_td_add_lognormal_dist/100/10000000                     546540266 ns    546521255 ns           26 Centroid_Count=68 Total_Compressions=481.506k items_per_second=703.752k/s
BM_td_add_lognormal_dist/200/10000000                     582261780 ns    582242339 ns           24 Centroid_Count=114 Total_Compressions=219.596k items_per_second=715.624k/s
BM_td_add_lognormal_dist/300/10000000                     604777683 ns    604757111 ns           23 Centroid_Count=157 Total_Compressions=139.422k items_per_second=718.938k/s
BM_td_add_lognormal_dist/400/10000000                     622590766 ns    622569543 ns           22 Centroid_Count=200 Total_Compressions=99.753k items_per_second=730.112k/s
BM_td_add_lognormal_dist/500/10000000                     634788521 ns    634767624 ns           22 Centroid_Count=245 Total_Compressions=79.66k items_per_second=716.082k/s
BM_td_quantile_lognormal_dist/100/10000000                586780631 ns    586761401 ns           25 items_per_second=681.708k/s
BM_td_quantile_lognormal_dist/200/10000000                811997490 ns    811971767 ns           17 items_per_second=724.453k/s
BM_td_quantile_lognormal_dist/300/10000000               1016801894 ns   1016770247 ns           14 items_per_second=702.505k/s
BM_td_quantile_lognormal_dist/400/10000000               1286479714 ns   1286436697 ns           11 items_per_second=706.674k/s
BM_td_quantile_lognormal_dist/500/10000000               1483928439 ns   1483880300 ns            9 items_per_second=748.788k/s
BM_td_merge_lognormal_dist/100/10000000                   141674160 ns    141670125 ns           88 items_per_second=8.02119k/s
BM_td_merge_lognormal_dist/200/10000000                   286590413 ns    286582373 ns           50 items_per_second=6.9788k/s
BM_td_merge_lognormal_dist/300/10000000                   392189307 ns    392178201 ns           32 items_per_second=7.96832k/s
BM_td_merge_lognormal_dist/400/10000000                   532181754 ns    532167090 ns           25 items_per_second=7.51644k/s
BM_td_merge_lognormal_dist/500/10000000                   675686837 ns    675667441 ns           19 items_per_second=7.78957k/s
BM_td_trimmed_mean_symmetric_lognormal_dist/100/10000000  868254617 ns    868223418 ns           17 items_per_second=677.516k/s
BM_td_trimmed_mean_symmetric_lognormal_dist/200/10000000 1305046601 ns   1305005590 ns           10 items_per_second=766.28k/s
BM_td_trimmed_mean_symmetric_lognormal_dist/300/10000000 1763557703 ns   1763499276 ns            8 items_per_second=708.818k/s
BM_td_trimmed_mean_symmetric_lognormal_dist/400/10000000 2188760113 ns   2188686563 ns            6 items_per_second=761.492k/s
BM_td_trimmed_mean_symmetric_lognormal_dist/500/10000000 2692495267 ns   2692408313 ns            5 items_per_second=742.829k/s

@filipecosta90 filipecosta90 added the enhancement New feature or request label May 17, 2022
@codecov
Copy link

codecov bot commented May 17, 2022

Codecov Report

Merging #22 (4165665) into master (f6ee51b) will increase coverage by 1.73%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #22      +/-   ##
==========================================
+ Coverage   84.98%   86.71%   +1.73%     
==========================================
  Files           1        1              
  Lines         253      286      +33     
==========================================
+ Hits          215      248      +33     
  Misses         38       38              
Impacted Files Coverage Δ
src/tdigest.c 86.71% <100.00%> (+1.73%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f6ee51b...4165665. Read the comment docs.

Copy link

@ashtul ashtul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great.
Quick minor comments.

src/tdigest.c Show resolved Hide resolved
src/tdigest.c Show resolved Hide resolved
src/tdigest.c Show resolved Hide resolved
tests/td_test.c Show resolved Hide resolved
@filipecosta90 filipecosta90 requested a review from ashtul May 18, 2022 11:17
Copy link

@ashtul ashtul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@filipecosta90 filipecosta90 merged commit 5dccd1a into master May 18, 2022
@filipecosta90 filipecosta90 deleted the trimmed.mean branch May 18, 2022 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add API for for trimmed mean calculations
2 participants