Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect and redirect ifelse usage in j to fifelse #3800

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

MichaelChirico
Copy link
Member

Closes #2677

Simply copying the approach used for strptime to accomplish this.

I think the ifelse --> if else redirection is more generally useful and considered filing a patch to base but it won't be a simple drop-in because ifelse does the "wrong" thing with attributes (one of the reasons we like fifelse in the first place) so it would need to be done more carefully to maintain backwards compatibility.

I don't think we have to worry about that for usage in j.

@jangorecki
Copy link
Member

Why not compute both ifelse and fifelse and warn when both does not match(asking for report to our issues)? then for future version we can change that to compute only fifelse.
Making change like as it is now is pretty risky, maybe using an option, that would be disabled by default now.

@MichaelChirico
Copy link
Member Author

well, we do know a lot of cases when they won't be identical -- e.g. for yes/no Date objects, and I think we want to assert fifelse is superior in such case.

Is there a way to compare the objects and only raise ifelse result if the objects are unexpectedly different?

@jangorecki
Copy link
Member

OK, so if we want to have it in 1.12.4 then best to use this with option by default disabled.

@codecov
Copy link

codecov bot commented Aug 29, 2019

Codecov Report

Merging #3800 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3800      +/-   ##
==========================================
+ Coverage   99.41%   99.41%   +<.01%     
==========================================
  Files          71       71              
  Lines       13247    13252       +5     
==========================================
+ Hits        13170    13175       +5     
  Misses         77       77
Impacted Files Coverage Δ
R/data.table.R 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa38cd7...b669fe5. Read the comment docs.

@MichaelChirico
Copy link
Member Author

I'm not so sure... escape valve to base::ifelse is still there. We never had such option about strptime... OTOH you're right that this will almost certainly break a bunch of revdeps. It might be good to leave this in and see just how widespread ifelse-in-j usage is "in the wild", would be a nice thing to note and proselytize on...

@mattdowle mattdowle added this to the 1.12.4 milestone Aug 30, 2019
Copy link
Member

@mattdowle mattdowle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for 1.12.4. Let's make progress. After all, ifelse is quite a simple function really.
Warnings could be messages under verbose mode. Because warnings can halt production running under options(warn=2).
Could be a level 2 optimization. Then users can turn it off by reducing optimization level. And if so, this should be mentioned in news item.
The words "strictly superior" could be softened/removed in news item.
Comparing to base::ifelse at runtime would be safest. But that could slow things down. On balance, because ifelse() is so simple and low risk in vast majority of cases, I don't think we need to do that. But let's postpone that until we have revdep results.
If ifelse() is opimized to fifelse() and that's by group, does it introduce a big slowdown due to openmp startup overhead? Like we've seen with uniqueN via forder by group. Need to test, and if so, turn off multithreading below a threshold of length(test)?
If you don't have time, I can make the changes. Looking forward to getting this in!

@MichaelChirico
Copy link
Member Author

I'll essentially be MIA until mid-late next week. Good point re: potential issues w/ large cardinality grouping x openMP start-up, we should benchmark.

@MichaelChirico
Copy link
Member Author

OK found some time this morning.

The reason to use warning is that even though we do have a bunch of tests & ifelse is conceptually simple, I worry about edge cases with factor/non-atomic/S4 outputs where the switch to fifelse might break unanticipated. I think a warning popping up in those cases is best... optimization level sounds good, but I would use level 3 -- shutting off GForce because of an ifelse conflict seems harsh.

Will make that change & then investigate the grouping bit.

@MichaelChirico
Copy link
Member Author

MichaelChirico commented Aug 31, 2019

Have run this benchmark:

library(microbenchmark)
library(data.table)
set.seed(2384)
NN = 1e7
DT = data.table(u = runif(NN), yes = rnorm(NN), no = rnorm(NN))

runs = CJ(
  n_grps = 10^(0:6),
  bool_pct = seq(0, 1, length.out = 5L)
)
runs[ , run_id := .I]

performance = runs[ , {
  print(.BY$run_id)
  DT[ , c('grp', 'bool') := .(sample(n_grps, .N, TRUE), u > bool_pct)]
  summary(microbenchmark(
    times = 10L,
    base = DT[ , base::ifelse(bool, yes, no), by = grp],
    DT = DT[ , fifelse(bool, yes, no), by = grp]
  ), unit='s')
}, by = .(run_id, n_grps, bool_pct)]

performance[ , expr := as.character(expr)]

performance[ , {
  par(mar = c(10.1, 5.1, 4.1, 2.1))
  print(.SD[ , .(median, paste(n_grps, bool_pct, expr, sep = ' / '))])
  barplot(median, names.arg = paste(n_grps, bool_pct, expr, sep = ' / '),
          col = c('red', 'blue'),
         xlab = '', las = 2L, ylab = 'Median evaluation time (s)', log = 'y')
  mtext(side = 1L, line = 8, '# Groups / % False / Package')
  NULL
}]

I really don't know what's going on with base plotting (the labels appear to be all-wrong) but red is base and blue is data.table:

Screenshot 2019-09-01 at 12 54 06 AM

Note the log scale so we can compare absolute differences for each bar (i.e. the absolute differences capture the speed ratio for dt-vs-base).

While there may or may not be a per-group cost for fifelse, it seems to be all the worse for base::ifelse

Output of fwrite(performance, 'performance.txt') attached
performance.txt

@MichaelChirico
Copy link
Member Author

@mattdowle OK I think this is good to go

@jangorecki
Copy link
Member

jangorecki commented Aug 31, 2019

Log scale is very non-intuitive to compare those timings.

Anyway it doesn't look safe to merge.
Following case has only 10% of duplicates in grp.

library(data.table) # default 20 threads
set.seed(108)
sample_dups = function(N, ratio) {
  unq = sample(N, max(N*(1-ratio), 1L))
  sample(c(unq, sample(unq, N*ratio)))
}
NN = 1e7
DT = data.table(test=sample(c(TRUE,FALSE), NN, TRUE), yes=rnorm(NN), no=rnorm(NN), grp=sample_dups(NN, 0.1))

system.time(DT[, base::ifelse(test, yes, no), by=grp])
#   user  system elapsed 
# 66.619   0.687  63.239
system.time(DT[, data.table::fifelse(test, yes, no), by=grp])
#    user   system  elapsed 
#3054.675    8.496  154.353 

@MichaelChirico are you able to reproduce those timings?

@MichaelChirico
Copy link
Member Author

I don't see those timings:

system.time(DT[, base::ifelse(test, yes, no), by=grp])
#   user  system elapsed 
# 55.233   0.358  53.143 
system.time(DT[, data.table::fifelse(test, yes, no), by=grp])
#    user  system elapsed 
# 270.953   0.743  68.057 

and repeated:

microbenchmark::microbenchmark(times=10L, base=DT[, base::ifelse(test, yes, no), by=grp], DT=DT[, data.table::fifelse(test, yes, no), by=grp])
R Unit: seconds
 expr      min       lq     mean   median       uq      max neval
 base 52.63954 55.03329 56.32826 56.08346 56.43899 60.67210    10
   DT 68.92007 72.52211 75.28474 76.14368 77.77457 80.74988    10

It is slower, though I'm not quite sure what's wrong w my benchmark 🤔

@mattdowle
Copy link
Member

mattdowle commented Sep 16, 2019

Have postponed to next release. Looks good but a bit too risky for this stage of cycle. At least fifelse is exported and available and folk can start playing with it. Then this PR will turn optimization on by default to use it in the next release, assuming no difficulties arise in the meantime.
WIP added as I'd like to run the benchmarks and have a look in combination with #3743.

@mattdowle mattdowle added the WIP label Sep 16, 2019
@MichaelChirico
Copy link
Member Author

Sounds good, as was expected I think. The reason it kept the 1.12.4 Milestone was with the aim to run revdep with this change to get an idea of the potential impact.

But that can also easily be done after release.

@MichaelChirico
Copy link
Member Author

MichaelChirico commented Sep 24, 2019

Would it be possible to automatically re-route certain parallel loops to serial ones for a certain length condition? A macro is the first thing that comes to mind, something like (?)

data.table/src/fifelse.c

Lines 75 to 78 in d99b8ae

#pragma omp parallel for num_threads(getDTthreads())
for (int64_t i=0; i<len0; ++i) {
pans[i] = pl[i]==0 ? pb[i & bmask] : (pl[i]==1 ? pa[i & amask] : pna);
}

becomes

    AUTOLOOP(len0) {
      pans[i] = pl[i]==0 ? pb[i & bmask] : (pl[i]==1 ? pa[i & amask] : pna);
    }

where

#define PARALLEL_THRESHOLD 10000
#define AUTOLOOP(len0) \
  if(len0 < PARALLEL_THRESHOLD) \
    for (int i=0; i<len0; ++i) \
  else \
    #pragma omp parallel for num_threads(getDTthreads()) \
    for (int64_t i=0; i<len0; ++i) \

I don't think that's proper syntax but is something like that possible? Maybe something from here? https://stackoverflow.com/q/12989298/3576984

@jangorecki
Copy link
Member

jangorecki commented Sep 24, 2019

AFAIU looping in macro will be portable only using do ... while(0) construct.
It should be easier to just tweak getDTthreads, to take n and returns min(real_threads, n). If num_threads is supplied with 1, then there should be(?) no omp overhead.
There is another way to escape omp using if in pragma as made in nafill

data.table/src/nafill.c

Lines 147 to 148 in d99b8ae

#pragma omp parallel for if (nx>1) num_threads(getDTthreads())
for (R_len_t i=0; i<nx; i++) {

this would be much cleaner way too.

Actually if the above (num_thread(1)) disables omp, then such if if (nx > 1) should be redundant.

@MichaelChirico
Copy link
Member Author

MichaelChirico commented Sep 24, 2019

Yes that looks great. Then we can focus on how to set the threshold -- do we want to be more sophisticated than to pick a fixed threshold? Wouldn't be too hard to write e.g. a Bayesian Optimization script to pick the threshold for a given loop targeting performance improvements at various input sizes (i.e. the insanely sophisticated approach)

@jangorecki
Copy link
Member

jangorecki commented Sep 24, 2019

It sounds like best would be to have a new function useDTthreads having signature

int useDTthreads(int DTthreads, int64_t Niter, double Pratio)

that could be used in a way

double p = 0.55;   ## some fixed ratio for an algorithm
#pragma omp parallel for num_threads(useDTthreads(getDTthreads(), nx, p))
for (R_len_t i=0; i<nx; i++) { 

where Niter is number of iterations to perform, and Pratio is a ratio used to make adjusting more dynamic according to an algorithm.

yet to confirm if num_threads(1) disables omp for most compilers, ideally looking at assembly of compiled code.


or probably cleaner to keep that logic in #pragma if (helperFun(nx))

@mattdowle mattdowle modified the milestones: 1.12.7, 1.12.9 Dec 8, 2019
@MichaelChirico
Copy link
Member Author

As mentioned by @ColeMiller1 probably we won't want to do this unless we'll fix #4549

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

warn or optimize usage of ifelse()
3 participants