Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when trying to fit models with less than three topic numbers #44

Closed
wkopp opened this issue Feb 23, 2020 · 5 comments
Closed

Errors when trying to fit models with less than three topic numbers #44

wkopp opened this issue Feb 23, 2020 · 5 comments

Comments

@wkopp
Copy link
Contributor

wkopp commented Feb 23, 2020

Hi,

selectModel fails when fitting the model with runCGSModels using as the topic argument a single number (e.g. topic=c(30)).

The error message that I get is

Error in `$<-.data.frame`(`*tmp*`, "second_derivative", value = c(-Inf,  : 
  replacement has 2 rows, data has 1
Calls: selectModel -> $<- -> $<-.data.frame

I also get an error when running runCGSModels with only two topic numbers (e.g. topic=c(29,30)), but then the error is different:

Error in plot.window(...) : need finite 'xlim' values
Calls: selectModel ... plot -> plot -> plot.default -> localWindow -> plot.window
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
Execution halted

When run the model for more then two topic numbers (e.g. topic=c(29,30,31)) it seems to work.

Best,
Wolfgang

@cbravo93
Copy link
Member

Hi @wkopp !

Here, the second derivative method is not meant to work with less than three points. In the approach we use (based on the central difference, inspired by https://stackoverflow.com/questions/4471993/compute-the-elbow-for-a-curve-automatically-and-mathematically), the first derivative measures the slope of the line between two points in the likelihood curve (the change between two points), the second measures the difference between two consecutive slopes (or the change of the change, the point with the maximum curvature); so you need at least two slopes (or three points). I have added an error message for this.

Thanks for reporting!

C

@wkopp
Copy link
Contributor Author

wkopp commented Feb 24, 2020

Hi,

would it be possible then to ignore the first and second derivative computation if there are too little numbers of topics? Because, in case the user already knows what number to chose, it requires to nevertheless compute other dummy topic numbers which requires time and resources.

Thank you.

@cbravo93
Copy link
Member

Sure! Just use method='maximum', select='Your number of topics'. Nevertheless, for a proper topic selection I would recommend to run models in a bigger topic space.

Cheers,

C

@wkopp
Copy link
Contributor Author

wkopp commented Feb 24, 2020

I see. However, what is the purpose of having to specify method="maximum" if a specific topic number is selected anyways. In my opinion, it would be better not to have to specify this argument, because for a user this isn't intuitive and also it is not backward compatible, which would be nice and probably possible in this case.

Another issue with the second derivative computation seems to be that it is used by default with runCGSModels, right? However, from the documentation of the type parameter in the selectModel method, you recommend against using the derivative method with collapsed Gibbs sampling.
So perhaps the selectModel method should be used in runCGSModels and runWarpLDAModels following the respective recommended model selection criteria.

Best,
Wolfgang

@cbravo93
Copy link
Member

Hi!

Sometimes the differences between models are small (in likelihood, or 2nd derivative), so we recommend to select the less complex model. If we only allow the automatic selection, it wouldn't be possible to manually change to other proper models.

selectModel is generally run after running the models (despite they are CGS or WarpLDA), we use derivative as default to agree with the latest WarpLDA estimation. I will add a warning if the models are CGS based.

Cheers!

C

@cbravo93 cbravo93 closed this as completed May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants