Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table size is not a good predictor of algo performance in vegTypeMapGenerator #39

Open
CeresBarros opened this issue May 24, 2020 · 4 comments

Comments

@CeresBarros
Copy link
Member

CeresBarros commented May 24, 2020

I noticed that for a table with 1340184 rows, turning assertions off in vegTypeMapGenerator, makes the function use algo == 2 :

Browse[2]>   nrowCohortData
[1] 1340184

Browse[2]>   ## use new vs old algorithm based on size of x. new one (2) is faster in most cases.
Browse[2]>   ## enable assertions to view timings for each algorithm before deciding which to use.
Browse[2]>   algo <- ifelse(nrowCohortData > 3.5e6, 1, 2)
Browse[2]> algo
[1] 2

However, I ran the function a few times with assertions on to compare timings of the two algos and no. 2 (the 'new algo') was at least 5xs slower than no. 1 (the 'old algo'). So table size doesn't seem to be a good predictor...

LandR::vegTypeMapGenerator: new algo 10.1584620475769
LandR::vegTypeMapGenerator: old algo 2.12155604362488

LandR::vegTypeMapGenerator: new algo 5.77787399291992
LandR::vegTypeMapGenerator: old algo 0.236522197723389

This is a minor issue, but I wanted to raise it here because I feel we're not ready to decide on which algorithm to use and if we keep the mechanism of "let the function decide" we'll need a better predictor.

@eliotmcintire
Copy link
Contributor

Maybe just always use algo 1, which is the simpler data table one. It is possible that the data.table one got faster.

@CeresBarros
Copy link
Member Author

CeresBarros commented May 26, 2020

At this point, I believe the user can't control which algo to use, but that could be added easily. Is this what you meant? or actually remove algo == 2 altogether?

@achubaty
Copy link
Contributor

achubaty commented Oct 24, 2022

re-upping. new algo is faster for smaller tables but mid-run for LandWeb I see the old algo is faster

@eliotmcintire
Copy link
Contributor

eliotmcintire commented Oct 25, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants