New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Methods For Determining the Type of Data Need Revisited. #888
Comments
@jdramsey Joe, what are your thoughts? |
Just want to mention that in algo chooser we are also using |
@kvb2univpitt I think if you set the max discrete value too low and it reads it in as continuous, it's your fault. :) The dataset you read in is in fact continuous. Or are you suggesting that if a variable has only two values, it should be interpreted as discrete even if the values are all real values? That's a possibility. Sorry guys for not checking all of these issues earlier! I think this issue can be closed. :) |
Two valued variables with real values should NOT be treated as discrete.
If they were, then binary variables could never be used in a linear
regression.
…On Tue, Apr 9, 2019 at 1:19 PM Joseph Ramsey ***@***.***> wrote:
@kvb2univpitt <https://github.com/kvb2univpitt> I think if you set the
max discrete value too low and it reads it in as continuous, it's your
fault. :) The dataset you read in is in fact continuous.
Or are you suggesting that if a variable has only two values, it should be
interpreted as discrete even if the values are all real values? That's a
possibility.
Sorry guys for not checking all of these issues earlier!
*I* think this issue can be closed. :)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#888 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/APmNuYpCO823LAB8yJL8oe0FyQY6E9yFks5vfMu2gaJpZM4V-kI_>
.
|
Good point.
…On Tue, Apr 9, 2019 at 2:03 PM cg09 ***@***.***> wrote:
Two valued variables with real values should NOT be treated as discrete.
If they were, then binary variables could never be used in a linear
regression.
On Tue, Apr 9, 2019 at 1:19 PM Joseph Ramsey ***@***.***>
wrote:
> @kvb2univpitt <https://github.com/kvb2univpitt> I think if you set the
> max discrete value too low and it reads it in as continuous, it's your
> fault. :) The dataset you read in is in fact continuous.
>
> Or are you suggesting that if a variable has only two values, it should
be
> interpreted as discrete even if the values are all real values? That's a
> possibility.
>
> Sorry guys for not checking all of these issues earlier!
>
> *I* think this issue can be closed. :)
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#888 (comment)>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/APmNuYpCO823LAB8yJL8oe0FyQY6E9yFks5vfMu2gaJpZM4V-kI_
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#888 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJZZR1zw_7RNqtza8UV4tXP7PXAmSSK-ks5vfNXwgaJpZM4V-kI_>
.
--
Joseph D. Ramsey
Special Faculty and Director of Research Computing
Department of Philosophy
135 Baker Hall
Carnegie Mellon University
Pittsburgh, PA 15213
jsph.ramsey@gmail.com
Office: (412) 268-8063
http://www.andrew.cmu.edu/user/jdramsey
|
This all works now and has for a long time. Closing. |
The method isMixed() of the BoxDataSet determines if the dataset is mixed by counting the number of continuous variables and discrete variables the dataset has. If the number of discrete variables and continuous variables are both non-zero, it is considered as mixed. This is not quite right, because the max-discrete-category value can be set small such that there will be no variable considered discrete. In this case, the dataset is still mixed, but the method isMixed() will return false and the method isContinuous() will return true. Having said that, the method isContinuous() and isDiscrete() are not correct either.
The text was updated successfully, but these errors were encountered: