Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross Validation Conflict With Pseudocode #566

Closed
antmarakis opened this issue Jun 27, 2017 · 2 comments
Closed

Cross Validation Conflict With Pseudocode #566

antmarakis opened this issue Jun 27, 2017 · 2 comments

Comments

@antmarakis
Copy link
Collaborator

In the cross validation pseudocode, when running a learner we pass an argument, size, which reflects a parameter of the learner (like the k in k-Nearest Neighbors, if I understand correctly). Unfortunately in the implementations of the algorithms, there are cases where more than one, or none at all, parameters are needed for the algorithm (like the neural network, which needs multiple parameters).

How should we go about this?

I was thinking that when we partition for the given fold, we call another function which will take as input the learner and size. It will then check if size matches the learner (eg. neural net needs hidden layers, epochs etc.) and if it does it will run the learner on the partitioned data and will return the hypothesis (h in the pseudocode). After that the algorithm will go on as normal.

I think this is a good enough approach, since it does not diverge much from the pseudocode, although we do have to write some "ugly" if-then statements which are not found in the pseudocode.

Any feedback/suggestions?

@norvig
Copy link
Collaborator

norvig commented Jun 28, 2017

Good point ... size only makes sense for things with discrete increasing complexity, like k-nearest neighbors or n-degree polynomial, or n-deep random forest.

How about if we rename size as complexity and change for size = 1 to ∞ do to for complexity in degress_of_complexity(Learner) ?

@antmarakis
Copy link
Collaborator Author

I like this idea.

My only issue is how will we fit this with modern languages. If we simply write Learner(dataset, complexity) for a given complexity, we are veering away from how most modern languages work, since we are expecting multiple arguments but we only give two (dataset and complexity).

One fix to this would be to adjust algorithm definitions to receive as input a dataset and a list of arguments and then unpack the list:

def SomeLearner(dataset, complexity):
    a, b, c = complexity

I don't think this is elegant enough though.


We could do something like this:

a) As you suggested, replace for size = 1 to inf do to for complexity in degrees_of_complexity(Learner). The added function would return parameters in the format accepted by the Learner (for example, for the Perceptron it would be something like [0.01, 100], since it needs learning rate and epochs).

b) When we call the Cross-Validation function, replace h = Learner(size, training_set) with h = Test(Learner, complexity, training_set). This in turn will take care of the arguments for the Learner the same way degrees_of_complexity does. It will unpack complexity to fit the given Learner, train the Learner and finally return h. I believe the algorithm will resume without any issues after that point.


I don't see any loose ends with this idea, but I could be wrong. All in all, I think we should have a "symmetrical" system with how we handle complexity. Not only do we need to "pack" it in a way the given Learner accepts, but we need to "unpack" it to call the Learner with too, since this is compatible with how languages work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants